Kalev Leetaru, Eric Shook, and Shaowen Wang

Preview:

DESCRIPTION

A CyberGIS Approach to Digital Humanities and Social Sciences: The World of Textual Geography and a Case Study of Wikipedia’s History of the World. Kalev Leetaru, Eric Shook, and Shaowen Wang. CyberInfrastructure and Geospatial Information Laboratory (CIGI) - PowerPoint PPT Presentation

Citation preview

1

Kalev Leetaru, Eric Shook, and Shaowen Wang

CyberInfrastructure and Geospatial Information Laboratory (CIGI)Department of Geography and Geographic Information Science

School of Earth, Society, and EnvironmentNational Center for Supercomputing Applications (NCSA)

University of Illinois at Urbana-Champaign

CyberGIS ‘ 12, Urbana IL, August 8, 2012

A CyberGIS Approach to Digital Humanities and Social Sciences: The World of Textual Geography and a Case

Study of Wikipedia’s History of the World

10

11

14

http://www.sgi.com/go/wikipedia

15

16

17

18

19

Workflow

CyberGIS

SentimentMining

Fulltext Geocoding

Inside the CyberGIS “black box”

Security DomainDecomposition

XSEDE

GISolve Middleware

CI

Data &Viz

Resource Selection

Task Scheduling

Clouds

Workflow Management ServicesOpen Service API

OSG

EmotionalHeatmap

Data Input for a Topic

A set of locations with 3 attributes Latitude, longitude point location1. Number of articles mentioning this location2. Number of articles mentioning both this location and topic3. Average tone of articles mentioning both this location and topic

Data Input for a Topic

A set of locations with 3 attributes Latitude, longitude point location1. Number of articles mentioning this location2. Number of articles mentioning both this location and topic3. Average tone of articles mentioning both this location and topic

?

Spatializing Emotion

3 important elements

1. Importance of location2. Prevalence of topic3. Emotion toward topic

Goal:Capture 3 elements on a single map

1) Importance of Location Every mention of a location

increases its importance

Generate a density map of the number of times a location is mentioned in text using Kernel Density Estimation (KDE) based on k nearest neighbor search

1) Importance of Location

2) Prevalence of Topic

We term topic intensity to capture the prevalence of a topic relative to other topics, and adopt a method commonly used in epidemiological studies to estimate it

Relative risk is a ratio of the KDE of disease infection locations and case control locations

Topic Intensity

Topic Intensity

KDE(articles that mention a topic)___ KDE(articles that do not mention the topic)

Relative Risk

KDE(points with disease)__ KDE(points without disease)

Topic Intensity

3) Emotion Toward a Topic Challenging question:

Is the emotional measure tone, discrete or continuous?– Is tone "countable" like trees or does

it exist as a continuum like air temperature?

Tone is a continuum:– Cannot have "number of tones"

3) Emotion Toward a Topic A different method is used,

because tone is continuous and not discrete

Inverse distance weighted (IDW) interpolation is used to estimate tone across space creating a tone map

Tone map captures positive and negative tone toward a particular topic across space

3) Emotion Toward a Topic

Overview – 3 layers

1) Article density - Proxy: Importance of location

2) Topic intensity - Proxy: Prevalence of topic relative to other topics

3) Tone - Proxy: Emotion toward a topic

Overview – 3 layers

1) Article density - Proxy: Importance of location

2) Topic intensity - Proxy: Prevalence of topic relative to other topics

3) Tone - Proxy: Emotion toward a topic

First two layers representscaling factors for tone

Value range: 0 - 1

Value range: 0 - 100

Value range: -100 - 100

Emotional Heatmap

Article Density Topic Intensity

Emotional HeatmapTone

*

=

*

Emotional Heatmap of Armed Conflict in 2003 (Wikipedia)

Summary

First steps, but started the dialogue

Balance– Managing the complexity of

cyberinfrastructure access– Simplifying the workflow of chaining

of spatial analytics– Making sense of what’s involved

Scientific rigor

Ongoing Work

Translate spatial knowledge to domain knowledge by answering a basic question: why is this here and not there?

Tackle spatial aggregation issues– Represent locations as areas not

points– Areal interpolation

39

Acknowledgments

Guofeng Cao, Anand Padmanabhan National Science Foundation

– BCS-0846655– OCI-1047916– Open Science Grid– XSEDE SES070004N

40

Thanks!

Recommended