Upload
francine-george
View
217
Download
2
Tags:
Embed Size (px)
Citation preview
Predicting Click Through Rate for Job Listings
Manish Gupta
Yahoo! HotJobs
Jan 22, 2009
CTR and its applications
• CTR = Ratio of clicks to get full description of entity to views of a reduced version
• Rank results• Impacts publisher revenue in pay for perf
models• Bidding in ad exchanges• Trends can help detect click frauds
CTR for new job listings
• Avg CTR = 2.29%• MLE would have high variance
CTR for job listings
Related work• Regelson and Fain – Estimate CTR using topic clusters (job categories)
• Richardson et. al.– Describe features for predicting CTR for ads.
• Our baseline: avg CTR for a test job (2.29%)
Refined Problem definition
• Ideal: Predict CTR(job j, position p, user cluster u, context c)
Data sparsity Huge feature vector• Predict CTR(job)
Use CTR versus position curve• Predict CTR(job, position)
Data set
• Used HotJobs data from Aug 11, 2008 to Aug 31, 2008 to predict CTR of jobs on Sep 1, 2008
• 40K jobs from 7k+ companies• 32K train set and 8K as test set• Jobs have location, company name, category,
creation date, posting date, optional position wise click history, job source, title, snippet & job description.
Different models
• Weka: Linear Regression and SMOReg• Treenet: Gradient Boosted Decision Trees
• Feature selection:– Weka: wrapper with evaluator=linear regression
and search=GreedyStepwise– Treenet: Variable importance metrics
Features
• Features from Similar Jobs (60)– CTR of jobs with same
title/company/state/city+state/category and their cardinalities posted in past one/two weeks or all jobs based on the click history of past one/two/three weeks
• Features from Related Jobs (288) – CTR_mn of related jobs with m= |A-B| and
n=|B-A| and cardinalities (0 ≤m,n≤ 5) posted in past one/two weeks or all jobs based on the click history of past one/two/three weeks
Features
• Job Title Features (11)– #words, #capitalized words, isAllCaps, hasHighPunct,
hasLongWords, hasNumbers, vocabulory features• Daily CTR Features for past 3 weeks (21)• Other Features (10)– Job Category, age, location specificity, job source, and
job description page features• Other potential features– high-marketing-pitch words, brand value of company,
spam feedback, seasonal variations
Experiments and results• Baseline: Predict avg CTR for a test job (2.29%)• Predicting avg - category-wise – CTR (A)• Linear Regression over 390 features (B) – uses only 142 regressors.• GBDT using Treenet over 390 features (C) – uses 300 regressors. (at
256_600_0.01_100)
Analysis of regressor distribution
Important features
• Similar Jobs features– Same company, title, city+state using 1 week click
history• Others features– Creation date, job description page size, date of
update, posting date, job category• Related Jobs features– Related_11, related_12 jobs posted in past 1/3
weeks over 1/3 week click history
Pruning the feature set
Pruning the feature set
• Wrapper based feature selection with linear regression and with Treenet’s variable importance (E) -11 features.
In absence of click history …
• Linear regression with 369 features (F) – uses 187 regressors.
• Treenet uses 282 regressors at 256_600_0.01_20 (G)
Analysis of regressor distribution
None of the sets alone helps!
Pruning the feature set
Variable importance curves
Conclusion and future work• More features• Dyadic models to predict user-personalized CTR with
(job feature vector, user feature vector) dyads.• Auto model updates to correct model drift
• We built a machine learning system to predict CTR for job listings and presented our results using various regression metrics.
Thanks for your time