21
OPEN SOURCE INDICATORS (OSI) Intelligence ARPA Jason Matheny

OPEN SOURCE INDICATORS (OSI) Intelligence ARPA

Embed Size (px)

Citation preview

Page 1: OPEN SOURCE INDICATORS (OSI) Intelligence ARPA

OPEN SOURCE INDICATORS (OSI) Intelligence ARPA

Jason Matheny

Page 2: OPEN SOURCE INDICATORS (OSI) Intelligence ARPA

INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA)

Program Goal • Develop and test methods for continuous,

automated analysis of publicly available data in order to anticipate and/or detect significant societal events: – Civil unrest, political elections, economic crises,

disease outbreaks • “Beat the news” by fusing early indicators of

events from diverse data.

2

Page 3: OPEN SOURCE INDICATORS (OSI) Intelligence ARPA

INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA)

• Identify changes in population-level behavior that precede events of interest

• Analyze publicly available data indicative of those changes: – Blogs, microblogs, news, internet traffic, web search

queries, public webcams, financial markets, Wikipedia edits, online reservation systems, others

– With big data volume and variety, train models to detect patterns that have historically preceded societal events

• Evaluate teams on the accuracy and timeliness of forecasts they deliver about real-world events in Latin America.

3

Approach

Page 4: OPEN SOURCE INDICATORS (OSI) Intelligence ARPA

INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA) 4

Event Typology

01 –

Civ

il U

nres

t

011 – Employment & Wages 0111 – Non-violent Civil Unrest 0112 – Violent Civil Unrest

012 – Housing 0121 – Non-violent Civil Unrest 0122 – Violent Civil Unrest

013 – Energy & Resources 0131 – Non-violent Civil Unrest 0132 – Violent Civil Unrest

014 – Other Economic Policies 0141 – Non-violent Civil Unrest 0142 – Violent Civil Unrest

015 – Other Government Policies 0151 – Non-violent Civil Unrest 0152 – Violent Civil Unrest

016 – Other 0161 – Non-violent Civil Unrest 0162 – Violent Civil Unrest

017 - Unspecified 0171 – Non-violent Civil Unrest 0172 – Violent Civil Unrest

02 –

Vot

e 021 – Election 0211 – President/Prime Minister 0212 – Governor 0213 – Mayor

022 – Referendum 0221 – “Yes” 0222 – “No”

03 –

Infe

ctio

us

Hum

an Il

lnes

s

031 – Rare Diseases

0311 – Bolivian Hemorrhagic Fever (Machupo) 0312 – Cholera 0313 – Hantavirus 0314 – Yellow Fever

032 – Pandemic 033 – Influenza Like Illness (ILI)

04 –

Ec

onom

y

041 – Stock Index 0411 – Stock Index Increases 0412 – Stock Index Decreases

042 – Currency Exchange 0421 – Currency Exchange Increases 0422 – Currency Exchange Decreases

Page 5: OPEN SOURCE INDICATORS (OSI) Intelligence ARPA

INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA)

Audit Trail

Page 6: OPEN SOURCE INDICATORS (OSI) Intelligence ARPA

INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA)

Audit Trail

6

Page 7: OPEN SOURCE INDICATORS (OSI) Intelligence ARPA

INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA)

Audit Trail

7

Page 8: OPEN SOURCE INDICATORS (OSI) Intelligence ARPA

INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA)

Audit Trail

8

Page 9: OPEN SOURCE INDICATORS (OSI) Intelligence ARPA

INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA)

Results • Flu incidence

– 14-day lead-time, 60% accuracy

• Rare diseases

– 6-day lead-time, 75% accuracy, 80% recall, 40% precision

• Civil unrest

– 7-day lead-time, 75% accuracy, 85% recall, 70% precision

• Geolocation

– ~90% of daily twitter volume to the city-level

9

Page 10: OPEN SOURCE INDICATORS (OSI) Intelligence ARPA

INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA)

Foresight and Understanding from Scientific Exposition (FUSE)

10

Page 11: OPEN SOURCE INDICATORS (OSI) Intelligence ARPA

INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA)

Goal: Validated, early detection of technical emergence

Reduce “technical surprise” via reliable, early detection of emerging scientific and technical capabilities across disciplines and languages found within the full-text content of scientific, technical, and patent literature

Special focus from the outset on multiple languages, Phase 2 focus on English and Chinese Novelty Discover patterns of emergence and connections between

technical concepts at a speed, scale, and comprehensiveness that exceeds human capacity

Usage Alert analyst of emerging technical areas with sufficient explanatory evidence to support further exploration

11

Page 12: OPEN SOURCE INDICATORS (OSI) Intelligence ARPA

INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA)

Scientific and Patent Literature

Growth estimated at ~35k unique docs/month for FUSE; worldwide ~800k docs/month 12

Page 13: OPEN SOURCE INDICATORS (OSI) Intelligence ARPA

FUSE: Example Nominations

Three year forecast for most prominent filers in Chinese Patent Office

Three year forecast for most prominent patent term in Chinese Patent Office

Two year forecast for most prominent terms in English-language scientific papers (Web of Science)

13

Page 14: OPEN SOURCE INDICATORS (OSI) Intelligence ARPA

INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA)

Storyboard Example

14

Page 15: OPEN SOURCE INDICATORS (OSI) Intelligence ARPA

INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA)

0

500

1000

1500

2000

2500

3000

1981-1985 1986-1990 1991-1995 1996-2000 2001-2005 2006-2010

# Gr

ante

d US

Pat

ents

Total

Company

Academic/Govt/Non-Profit

Individual

Unclassified

FUSE Research Thrusts

Document Features Patents, S&T Lit

Evidence Representation

Indicator Development Leading indicators

System Engineering

Nomination Quality Forecast formulation

Prominence of surrogate entities of emergence

Theory & Hypothesis Development Supports indicator development and explanation; a robust theory is unlikely

15

Page 16: OPEN SOURCE INDICATORS (OSI) Intelligence ARPA

Forecasting Science & Technology (ForeST)

Page 17: OPEN SOURCE INDICATORS (OSI) Intelligence ARPA

INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA) 17

ForeST Program Goal • Develop and test methods for generating accurate and

timely forecasts for significant S&T milestones, by efficiently aggregating the judgments of many experts

• Technical approach: • Generate S&T forecasting questions from indicators

within the scientific and patent literatures • Elicit and aggregate forecasts from ~10,000 scientists

and engineers, globally, using a combinatorial prediction market (SciCast.org)

• The ForeST Program directly leverages the programmatic and technical achievements of IARPA’s ACE and FUSE Programs

Page 18: OPEN SOURCE INDICATORS (OSI) Intelligence ARPA

INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA) 18

Page 19: OPEN SOURCE INDICATORS (OSI) Intelligence ARPA

INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA) 19

Page 20: OPEN SOURCE INDICATORS (OSI) Intelligence ARPA

INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA) 20

Page 21: OPEN SOURCE INDICATORS (OSI) Intelligence ARPA

INTELLIGENCE ADVANCED RESEARCH PROJECTS ACTIVITY (IARPA) 21

Evaluation

• Develop and test methods in parallel. As with all IARPA programs, the program is evaluated every six months.

• Run the world’s largest S&T forecasting tournament.

• Evaluate using real events (1,000 per year). E.g., “Will $100 whole human genome sequencing costs be achieved in 2015?”

• Evaluate against best known approaches: econometric models, trend extrapolation, deliberating experts, flat prediction markets.