Inferring Age and Gender of Facebook Users Based on their Status Updates
Preview:
Citation preview
- 1. Inferring Age and Gender of Facebook Users Based on their
Status Updates Angel Oswaldo Vzquez-Patio Master of Science in
Artificial Intelligence February, 2015, Leuven, Belgium
- 2. Outline Introduction Methodology Results Conclusions
- 3. Introduction: Social Media Users Kemp, Simon, 2014. Global
Social Media Users Pass 2 Billion. We Are Social.
- 4. Introduction: Facebook Penetration Kemp, Simon, 2014. Global
Social Media Users Pass 2 Billion. We Are Social.
- 5. Introduction: Computational social science
- 6. Introduction: Importance Social Media Marketing Important
Attributes: Age and Gender Attribute Disclosure
- 7. Goal of the study Age and gender inference models Reduce the
feature dimension Second Order Representation (SOR)
- 8. Literature review Study of Kosinski et al., 2013 relying on
Facebook likes The Open Vocabulary Approach (Schwartz et al., 2013)
General approach Extraction of features User representation
Classification model
- 9. Methodology
- 10. Methodology Pre-processing 4-folds Vocabulary generation
Feature selection Document representation 31,169
- 11. The Open Vocabulary Approach Linguistic Feature Extraction
n-grams of 1 to 3 words PMI greater that 2*length Terms used by 1%
of users Feature Dimension Reduction PCA Representation BOT
31,169
- 12. The Second Order Representation 1. Building term vectors 2.
Building document vectors
- 13. Methodology Gender prediction SVMs: Linear and RBF kernels
Age prediction Ridge regression Lasso regression
- 14. Results 1. OVA-PCA-DR 2. OVA-No-DR 3. OVA-CHI2-DR 4. SOR 5.
SOR-CHI2-DR Classification Accuracy F1-score Regression R MAE MSE
EVS
- 15. Results: OVA-x-DR Gender OVA-No-DR 0.905 OVA-2 -DR 10k
0.908 OVA-2 -DR 15k 0.908 OVA-No-DR 0.905 OVA-2 -DR 10k 0.908 OVA-2
-DR 15k 0.907
- 16. Results: OVA-x-DR Age
- 17. Results: SOR Gender OVA-PCA-DR 0.886 SOR-No-DR 0.815
OVA-PCA-DR 0.885 SOR-No-DR 0.813
- 18. Results: SOR Age
- 19. General comparison of models
- 20. Comparison of running time
- 21. Conclusions and future work Age and gender inference models
Reduce the feature dimension X2 15K terms Second Order
Representation (SOR) Reduce running time dramatically, age PAN 2015
workshop and competition Author Profiling
- 22. Thank you!