Inferring Age and Gender of Facebook Users Based on their Status Updates

Preview:

Citation preview

  1. 1. Inferring Age and Gender of Facebook Users Based on their Status Updates Angel Oswaldo Vzquez-Patio Master of Science in Artificial Intelligence February, 2015, Leuven, Belgium
  2. 2. Outline Introduction Methodology Results Conclusions
  3. 3. Introduction: Social Media Users Kemp, Simon, 2014. Global Social Media Users Pass 2 Billion. We Are Social.
  4. 4. Introduction: Facebook Penetration Kemp, Simon, 2014. Global Social Media Users Pass 2 Billion. We Are Social.
  5. 5. Introduction: Computational social science
  6. 6. Introduction: Importance Social Media Marketing Important Attributes: Age and Gender Attribute Disclosure
  7. 7. Goal of the study Age and gender inference models Reduce the feature dimension Second Order Representation (SOR)
  8. 8. Literature review Study of Kosinski et al., 2013 relying on Facebook likes The Open Vocabulary Approach (Schwartz et al., 2013) General approach Extraction of features User representation Classification model
  9. 9. Methodology
  10. 10. Methodology Pre-processing 4-folds Vocabulary generation Feature selection Document representation 31,169
  11. 11. The Open Vocabulary Approach Linguistic Feature Extraction n-grams of 1 to 3 words PMI greater that 2*length Terms used by 1% of users Feature Dimension Reduction PCA Representation BOT 31,169
  12. 12. The Second Order Representation 1. Building term vectors 2. Building document vectors
  13. 13. Methodology Gender prediction SVMs: Linear and RBF kernels Age prediction Ridge regression Lasso regression
  14. 14. Results 1. OVA-PCA-DR 2. OVA-No-DR 3. OVA-CHI2-DR 4. SOR 5. SOR-CHI2-DR Classification Accuracy F1-score Regression R MAE MSE EVS
  15. 15. Results: OVA-x-DR Gender OVA-No-DR 0.905 OVA-2 -DR 10k 0.908 OVA-2 -DR 15k 0.908 OVA-No-DR 0.905 OVA-2 -DR 10k 0.908 OVA-2 -DR 15k 0.907
  16. 16. Results: OVA-x-DR Age
  17. 17. Results: SOR Gender OVA-PCA-DR 0.886 SOR-No-DR 0.815 OVA-PCA-DR 0.885 SOR-No-DR 0.813
  18. 18. Results: SOR Age
  19. 19. General comparison of models
  20. 20. Comparison of running time
  21. 21. Conclusions and future work Age and gender inference models Reduce the feature dimension X2 15K terms Second Order Representation (SOR) Reduce running time dramatically, age PAN 2015 workshop and competition Author Profiling
  22. 22. Thank you!

Recommended