Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the ryancv domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home/u544053196/domains/matin-zarei.com/public_html/wp-includes/functions.php on line 6114

Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the ryancv domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home/u544053196/domains/matin-zarei.com/public_html/wp-includes/functions.php on line 6114

Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the ryancv-plugin domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home/u544053196/domains/matin-zarei.com/public_html/wp-includes/functions.php on line 6114
Gender Prediction – Matin Zarei
Matin Zarei

Data Scientist

Data Analyst

Data Engineer

Data Specialist

ML Engineer

Matin Zarei

Data Scientist

Data Analyst

Data Engineer

Data Specialist

ML Engineer

Gender Prediction

  • Year: 2022
  • Category: NLP
See Demo

an experiment to compare the performance of four different machine learning models on a corpus that was processed by four natural language processing models. The machine learning models we used were random forest, XGBoost, logistic regression, and support vector machine (SVM), while the NLP models we employed were word2vec, GloVe, tf–idf, and text bleaching.

Our results showed that the random forest model delivered the highest accuracy at 75%, with XGBoost coming in second with an accuracy of 73%. In terms of the NLP models, word2vec had the best result with an accuracy of 72%, followed by GloVe with 68%.

However, we also found that when we changed the context of the corpus, the accuracy of the models changed. Specifically, when we used text bleaching method to remove context-specific information from the corpus, the accuracy of the models dropped less significantly than when we used word2vec. In fact, the accuracy drop was only 5% with text bleaching compared to 10% with word2vec.