Upload
trinhkiet
View
233
Download
0
Embed Size (px)
Citation preview
OPEN SOURCE INDICATORS (OSI) Intelligence ARPA
Jason Matheny
INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA)
Program Goal • Develop and test methods for continuous,
automated analysis of publicly available data in order to anticipate and/or detect significant societal events: – Civil unrest, political elections, economic crises,
disease outbreaks • “Beat the news” by fusing early indicators of
events from diverse data.
2
INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA)
• Identify changes in population-level behavior that precede events of interest
• Analyze publicly available data indicative of those changes: – Blogs, microblogs, news, internet traffic, web search
queries, public webcams, financial markets, Wikipedia edits, online reservation systems, others
– With big data volume and variety, train models to detect patterns that have historically preceded societal events
• Evaluate teams on the accuracy and timeliness of forecasts they deliver about real-world events in Latin America.
3
Approach
INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA) 4
Event Typology
01 –
Civ
il U
nres
t
011 – Employment & Wages 0111 – Non-violent Civil Unrest 0112 – Violent Civil Unrest
012 – Housing 0121 – Non-violent Civil Unrest 0122 – Violent Civil Unrest
013 – Energy & Resources 0131 – Non-violent Civil Unrest 0132 – Violent Civil Unrest
014 – Other Economic Policies 0141 – Non-violent Civil Unrest 0142 – Violent Civil Unrest
015 – Other Government Policies 0151 – Non-violent Civil Unrest 0152 – Violent Civil Unrest
016 – Other 0161 – Non-violent Civil Unrest 0162 – Violent Civil Unrest
017 - Unspecified 0171 – Non-violent Civil Unrest 0172 – Violent Civil Unrest
02 –
Vot
e 021 – Election 0211 – President/Prime Minister 0212 – Governor 0213 – Mayor
022 – Referendum 0221 – “Yes” 0222 – “No”
03 –
Infe
ctio
us
Hum
an Il
lnes
s
031 – Rare Diseases
0311 – Bolivian Hemorrhagic Fever (Machupo) 0312 – Cholera 0313 – Hantavirus 0314 – Yellow Fever
032 – Pandemic 033 – Influenza Like Illness (ILI)
04 –
Ec
onom
y
041 – Stock Index 0411 – Stock Index Increases 0412 – Stock Index Decreases
042 – Currency Exchange 0421 – Currency Exchange Increases 0422 – Currency Exchange Decreases
INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA)
Audit Trail
INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA)
Audit Trail
6
INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA)
Audit Trail
7
INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA)
Audit Trail
8
INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA)
Results • Flu incidence
– 14-day lead-time, 60% accuracy
• Rare diseases
– 6-day lead-time, 75% accuracy, 80% recall, 40% precision
• Civil unrest
– 7-day lead-time, 75% accuracy, 85% recall, 70% precision
• Geolocation
– ~90% of daily twitter volume to the city-level
9
INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA)
Foresight and Understanding from Scientific Exposition (FUSE)
10
INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA)
Goal: Validated, early detection of technical emergence
Reduce “technical surprise” via reliable, early detection of emerging scientific and technical capabilities across disciplines and languages found within the full-text content of scientific, technical, and patent literature
Special focus from the outset on multiple languages, Phase 2 focus on English and Chinese Novelty Discover patterns of emergence and connections between
technical concepts at a speed, scale, and comprehensiveness that exceeds human capacity
Usage Alert analyst of emerging technical areas with sufficient explanatory evidence to support further exploration
11
INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA)
Scientific and Patent Literature
Growth estimated at ~35k unique docs/month for FUSE; worldwide ~800k docs/month 12
FUSE: Example Nominations
Three year forecast for most prominent filers in Chinese Patent Office
Three year forecast for most prominent patent term in Chinese Patent Office
Two year forecast for most prominent terms in English-language scientific papers (Web of Science)
13
INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA)
Storyboard Example
14
INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA)
0
500
1000
1500
2000
2500
3000
1981-1985 1986-1990 1991-1995 1996-2000 2001-2005 2006-2010
# Gr
ante
d US
Pat
ents
Total
Company
Academic/Govt/Non-Profit
Individual
Unclassified
FUSE Research Thrusts
Document Features Patents, S&T Lit
Evidence Representation
Indicator Development Leading indicators
System Engineering
Nomination Quality Forecast formulation
Prominence of surrogate entities of emergence
Theory & Hypothesis Development Supports indicator development and explanation; a robust theory is unlikely
15
Forecasting Science & Technology (ForeST)
INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA) 17
ForeST Program Goal • Develop and test methods for generating accurate and
timely forecasts for significant S&T milestones, by efficiently aggregating the judgments of many experts
• Technical approach: • Generate S&T forecasting questions from indicators
within the scientific and patent literatures • Elicit and aggregate forecasts from ~10,000 scientists
and engineers, globally, using a combinatorial prediction market (SciCast.org)
• The ForeST Program directly leverages the programmatic and technical achievements of IARPA’s ACE and FUSE Programs
INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA) 18
INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA) 19
INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA) 20
INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA) 21
Evaluation
• Develop and test methods in parallel. As with all IARPA programs, the program is evaluated every six months.
• Run the world’s largest S&T forecasting tournament.
• Evaluate using real events (1,000 per year). E.g., “Will $100 whole human genome sequencing costs be achieved in 2015?”
• Evaluate against best known approaches: econometric models, trend extrapolation, deliberating experts, flat prediction markets.