17
Forecasting Country Stability in North Africa V.S. Subrahmanian (UMD + Sentimetrix) [email protected] Joint work with Steven Banaszak (Sentimetrix) Liz Bowman (US Army) John Dickerson (CMU + Sentimetrix) Sentimetrix - JISIC, The Hague, Sep 2014 1

Stability north-africa-jisic-2014

Embed Size (px)

Citation preview

Forecasting Country Stability in North Africa

V.S. Subrahmanian (UMD + Sentimetrix)[email protected]

Joint work withSteven Banaszak (Sentimetrix)

Liz Bowman (US Army)John Dickerson (CMU + Sentimetrix)

Sentimetrix - JISIC, The Hague, Sep 2014 1

Research Goal

Can we accurately predict various stability related events in North Africa (Egypt, Morocco, Sudan) by mining sentiment in open source data on various key players, even with a relative paucity of longitudinal data (36 time points)?

Sentimetrix - JISIC, The Hague, Sep 2014 2

Research Goal

• Provide a single dashboard that will enable an analyst to– See international stability situation at a glance

– Focus on countries of interest to him

– Look at forecasts in countries of interest to him

– Understand the rationale for the forecasts

– Understand the “why” around those forecasts

– Understand relationship between sentiment on different entities and stability events in countries

• This study focuses on Egypt, Morocco, and Sudan.

Sentimetrix - JISIC, The Hague, Sep 2014 3

SentiBility Architecture

Sentimetrix - JISIC, The Hague, Sep 2014 4

Dependent Variables

• The current SentiBility (SB) system has 5 DVs:– Battles (did government engage in battles?)

– WonBattles (did gov win territory)

– LostBattles (did gov lose battles)

– Riots/Protests (were there riots and/or protests?)

– Violence Against Civilians

• These are the phenomena we are trying to predict.

• Historical data on DVs collected using ACCLED data set from University of Sussex for 36 months.

Sentimetrix - JISIC, The Hague, Sep 2014 5

Independent Variables: Data

• Political Entity Dataset (PED): set of key political leaders, parties, opposition groups, for each country.

• Hybrid Article Dataset (HAD): For each entity in PED, identified a set of articles (blog posts, news, tweets, forums) that reference that entity.

• Open-Source Sentiment DB: Assigns a score in [-1,+1] to each article-entity pair, specifying sentiment score of entity in the article.– -1 denotes maximally negative score– +1 shows maximally positive score– 0 is completely neutral

• Data about IVs was collected for 2-3 years varying by country during the 2008-2011 time frame.

Sentimetrix - JISIC, The Hague, Sep 2014 6

Sentiment Scoring

Used Sentimetrix’s Sentiment Scoring Engine which leverages many past papers by us:

• V.S. Subrahmanian et al. US Patent US 8296168 B2 System and method for analysis of an opinion expressed in documents with regard to a particular topic, Priority date Sep 13 2006

• Subrahmanian, V. S., and Diego Reforgiato. "AVA: Adjective-verb-adverb combinations for sentiment analysis." IEEE Intelligent Systems, 23.4 (2008): 43-50.

• Cesarano, Carmine, et al. "Opinion Analysis in Document Databases." AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs. 2006.

Sentimetrix - JISIC, The Hague, Sep 2014 7

Learning Classifiers

• Training/validation set was an approximately 70/30 split.• Applied an ensemble of 5 classifier families

– Gaussian NB – Support Vector Machines– Random forests– AdaBoost– GradientBoost

• Each classifier family was optimized using leave-one-out cross validation via a hyper-parameter grid search to find best parameters.

• Different classifier families were best for different predictions (country-event).

Sentimetrix - JISIC, The Hague, Sep 2014 8

Egypt Case Study

• PED contained 55 Egyptian entities.

Sentimetrix - JISIC, The Hague, Sep 2014 9

Egypt Case Study

• High sentiment on Adel Labib inversely correlated with violence against civilians.

• High sentiment on Ahmed Ghanem inversely correlated with riots/protests.

• When sentiment on Morso, El-Baradei, and Tantawi were all high, then there were few battles.

Sentimetrix - JISIC, The Hague, Sep 2014 10

Egypt Case Study: Multivariate Forecast Accuracy

Dependent Variable Forecast Accuracy

Battle 72%

Violence against Civilians 90%

Riots and Protests 90%

Sentimetrix - JISIC, The Hague, Sep 2014 11

Sudan Case Study

• PED contained 88 Sudanese entities.

Sentimetrix - JISIC, The Hague, Sep 2014 12

Sudan Case Study: Multivariate Forecast Accuracy

Dependent Variable Forecast Accuracy

Won-Battle 69%

Riots and Protests 88%

Sentimetrix - JISIC, The Hague, Sep 2014 13

System Screenshots

Sentimetrix - JISIC, The Hague, Sep 2014 14

System Screenshots

Sentimetrix - JISIC, The Hague, Sep 2014 15

System Screenshots

Sentimetrix - JISIC, The Hague, Sep 2014 16

Contact Information

V.S. Subrahmanian

Founder

Sentimetrix, Inc.

[email protected]

@vssubrah

www.sentimetrix.com

Sentimetrix - JISIC, The Hague, Sep 2014 17