Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Online Operation
Aggregation
Web UI
Relevant Topic 1Relevant Topic 2...
Relevant Topic 1Relevant Topic 2...
Relevant Topic 1Relevant Topic 2...
Scoring and
Ranking
Offline ProcessingSocial Media
Topic-Signal DB
Topic-Signal Creator
job, unemployed, hiring
2012 2014
User Query
Topic-Signal DB
Raccoon: A Query System for Social Media Signals
Architecture
Dolan Antenucci, Michael R. Anderson, Michael CafarellaUniversity of Michigan
Real world phenomena like unemployment and disease behavior have been estimated using social media – a process known as nowcasting [1].
These estimates are cheaper and faster than traditional data collection methods like phone surveys.
Faster and cheaper means saved money and better policy.
Motivation: Forecasting the Present via Social Media Example User Query (Target: Unemployment Behavior)
Problem: Nowcasting Requires Ground Truth Data
Scalability with the CloudCurrent Prototype System:• 40 billion tweets (collected 2011-2015)• 150 million topic-signal pairs (after threshold filtering)• Query processing runtime: ~20s (on 10 EC2 c3.8xlarge servers)• Topic-signal pairs support daily updating
[email protected] w http://web.eecs.umich.edu/~dol
[1] S. L. Scott and H. Varian. Bayesian variable selection for nowcasting economic time series. In Economics of Digitization. University of Chicago Press, 2014.[2] A. Greenspan. The Age of Turbulence. Penguin Press, 2007.[3] D. Antenucci, M. Cafarella, M. C. Levenstein, C. Re ́, and M. D. Shapiro. Using social media to measure labor market flows. Working Paper 20010, National Bureau of Economic Research, March 2014.
Related Work
10/17 i need a job 491
12/15 i love you 5092
1/28 jus8n bieber 940,291
2.
3.
“Ground Truth Data”
1.
Apr May Jun
Une
mploymen
t
Apr May Jun
Une
mploymen
t
“I need a job”, ) ( “I need a job”, ) ( “I need a job”, ) (
1
0Rela
tive
Chan
ge
job, unemployment
2012 20142013 2015
Users Given Trade-Off Between Query Components
2012 20142013Correlation with query signal: 0.897
2015
Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...
Resulting Target Estimate:1
0Rela
tive
Chan
ge
2012 20142013Correlation with query signal: 0.897
2015
Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...
Resulting Target Estimate:1
0Rela
tive
Chan
ge
2012 20142013Correlation with query signal: 0.897
2015
Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...
Resulting Target Estimate:1
0Rela
tive
Chan
ge
Topic-Signal Pairs Nowcasting Result
Like a search engine, users iteratively submit queries:
Raccoon User
Raccoon Nowcasting
System
User Query (q, r)
2012 20142013Correlation with query signal: 0.897
2015
Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...
Resulting Target Estimate:1
0Rela
tive
Chan
ge
Nowcasting Result (T, s, β)
1
0Rela
tive
Chan
ge
job, unemployment
2012 20142013 2015
User Interaction Loop
Unemployment-‐related keywords
Seasonal trends of unemployment (drawn by user)
Yet, economists [3] still want estimates for targets lacking ground truth data:
Solution: Substitute Domain Knowledge for Ground Truth
Our target users have some semantic and signal domain knowledge about their target phenomena:
Seman=c Knowledge (keywords, etc.)
Signal Knowledge (seasonal trends, etc.)
Apr May Jun
Une
mploymen
t
Nowcasting Result
“I need a job”, ) ( “I need a job”, ) ( “I need a job”, ) (
User’s Domain Knowledge
“Ground Truth Data”
• 10-day auto sales (once used, but no longer available [2])• Physical movement (e.g., out of parents house, for job)• etc.
Topic-Signal Pairs
Prefer Signal Query Component
Users can explore a set of query results in real-time, produced by varying the relative influence of the semantic and signal query components:
Prefer SemanticQuery Component
User Query
Signal Scorer
Seman=c Scorer
Trade-‐off Generator
Topic 1 Topic 2 … Topic 1 Topic 2 … Topic 1 Topic 2 …
Feature Reduc=on via PCA
Topic 1 Topic 2 … Topic 1 Topic 2 … Topic 1 Topic 2 …
Topic-‐Signal DB
Final Aggrega=on via Linear Regression
2012 20142013Correlation with query signal: 0.897
2015
Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...
Resulting Target Estimate:1
0Rela
tive
Chan
ge
2012 20142013Correlation with query signal: 0.897
2015
Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...
Resulting Target Estimate:1
0Rela
tive
Chan
ge
2012 20142013Correlation with query signal: 0.897
2015
Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...
Resulting Target Estimate:1
0Rela
tive
Chan
ge
Scoring and Ranking Aggregation