221
The Next Big [Social Science] Thing Gary King 1 Institute for Quantitative Social Science Harvard University National Academy of Sciences, 3/4/2016 1 GaryKing.org 1/18

gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

The Next Big [Social Science] Thing

Gary King1

Institute for Quantitative Social ScienceHarvard University

National Academy of Sciences, 3/4/2016

1GaryKing.org1/18

Page 2: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

Big Data

• The most visible changes:

new forms of big data (telescopes,microscopes, scanners, etc.)

• The most important changes:

the methods that make thedata actionable.

• Improving scientific inference:

using facts we know to learnabout facts we don’t know

• The revolution is about the analytics,

the methods of inference

• The methods make old data and new instruments actionable• More data alone only makes research harder

• Look for:

new methods to unlock secrets in new types of data:1. The changing evidence base2. Data from outside organizations3. Evidence-based public policy4. Meta-science5. Merging science, technology, and social science

. . . with examples from my research

2/18

Page 3: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

Big Data is Not About the Data!

• The most visible changes:

new forms of big data (telescopes,microscopes, scanners, etc.)

• The most important changes:

the methods that make thedata actionable.

• Improving scientific inference:

using facts we know to learnabout facts we don’t know

• The revolution is about the analytics,

the methods of inference

• The methods make old data and new instruments actionable• More data alone only makes research harder

• Look for:

new methods to unlock secrets in new types of data:1. The changing evidence base2. Data from outside organizations3. Evidence-based public policy4. Meta-science5. Merging science, technology, and social science

. . . with examples from my research

2/18

Page 4: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

Big Data is Not About the Data!

• The most visible changes:

new forms of big data (telescopes,microscopes, scanners, etc.)

• The most important changes:

the methods that make thedata actionable.

• Improving scientific inference:

using facts we know to learnabout facts we don’t know

• The revolution is about the analytics,

the methods of inference

• The methods make old data and new instruments actionable• More data alone only makes research harder

• Look for:

new methods to unlock secrets in new types of data:1. The changing evidence base2. Data from outside organizations3. Evidence-based public policy4. Meta-science5. Merging science, technology, and social science

. . . with examples from my research

2/18

Page 5: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

Big Data is Not About the Data!

• The most visible changes: new forms of big data (telescopes,microscopes, scanners, etc.)

• The most important changes:

the methods that make thedata actionable.

• Improving scientific inference:

using facts we know to learnabout facts we don’t know

• The revolution is about the analytics,

the methods of inference

• The methods make old data and new instruments actionable• More data alone only makes research harder

• Look for:

new methods to unlock secrets in new types of data:1. The changing evidence base2. Data from outside organizations3. Evidence-based public policy4. Meta-science5. Merging science, technology, and social science

. . . with examples from my research

2/18

Page 6: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

Big Data is Not About the Data!

• The most visible changes: new forms of big data (telescopes,microscopes, scanners, etc.)

• The most important changes:

the methods that make thedata actionable.

• Improving scientific inference:

using facts we know to learnabout facts we don’t know

• The revolution is about the analytics,

the methods of inference

• The methods make old data and new instruments actionable• More data alone only makes research harder

• Look for:

new methods to unlock secrets in new types of data:1. The changing evidence base2. Data from outside organizations3. Evidence-based public policy4. Meta-science5. Merging science, technology, and social science

. . . with examples from my research

2/18

Page 7: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

Big Data is Not About the Data!

• The most visible changes: new forms of big data (telescopes,microscopes, scanners, etc.)

• The most important changes: the methods that make thedata actionable.

• Improving scientific inference:

using facts we know to learnabout facts we don’t know

• The revolution is about the analytics,

the methods of inference

• The methods make old data and new instruments actionable• More data alone only makes research harder

• Look for:

new methods to unlock secrets in new types of data:1. The changing evidence base2. Data from outside organizations3. Evidence-based public policy4. Meta-science5. Merging science, technology, and social science

. . . with examples from my research

2/18

Page 8: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

Big Data is Not About the Data!

• The most visible changes: new forms of big data (telescopes,microscopes, scanners, etc.)

• The most important changes: the methods that make thedata actionable.

• Improving scientific inference:

using facts we know to learnabout facts we don’t know

• The revolution is about the analytics,

the methods of inference

• The methods make old data and new instruments actionable• More data alone only makes research harder

• Look for:

new methods to unlock secrets in new types of data:1. The changing evidence base2. Data from outside organizations3. Evidence-based public policy4. Meta-science5. Merging science, technology, and social science

. . . with examples from my research

2/18

Page 9: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

Big Data is Not About the Data!

• The most visible changes: new forms of big data (telescopes,microscopes, scanners, etc.)

• The most important changes: the methods that make thedata actionable.

• Improving scientific inference: using facts we know to learnabout facts we don’t know

• The revolution is about the analytics,

the methods of inference

• The methods make old data and new instruments actionable• More data alone only makes research harder

• Look for:

new methods to unlock secrets in new types of data:1. The changing evidence base2. Data from outside organizations3. Evidence-based public policy4. Meta-science5. Merging science, technology, and social science

. . . with examples from my research

2/18

Page 10: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

Big Data is Not About the Data!

• The most visible changes: new forms of big data (telescopes,microscopes, scanners, etc.)

• The most important changes: the methods that make thedata actionable.

• Improving scientific inference: using facts we know to learnabout facts we don’t know

• The revolution is about the analytics,

the methods of inference• The methods make old data and new instruments actionable• More data alone only makes research harder

• Look for:

new methods to unlock secrets in new types of data:1. The changing evidence base2. Data from outside organizations3. Evidence-based public policy4. Meta-science5. Merging science, technology, and social science

. . . with examples from my research

2/18

Page 11: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

Big Data is Not About the Data!

• The most visible changes: new forms of big data (telescopes,microscopes, scanners, etc.)

• The most important changes: the methods that make thedata actionable.

• Improving scientific inference: using facts we know to learnabout facts we don’t know

• The revolution is about the analytics, the methods of inference

• The methods make old data and new instruments actionable• More data alone only makes research harder

• Look for:

new methods to unlock secrets in new types of data:1. The changing evidence base2. Data from outside organizations3. Evidence-based public policy4. Meta-science5. Merging science, technology, and social science

. . . with examples from my research

2/18

Page 12: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

Big Data is Not About the Data!

• The most visible changes: new forms of big data (telescopes,microscopes, scanners, etc.)

• The most important changes: the methods that make thedata actionable.

• Improving scientific inference: using facts we know to learnabout facts we don’t know

• The revolution is about the analytics, the methods of inference• The methods make old data and new instruments actionable

• More data alone only makes research harder

• Look for:

new methods to unlock secrets in new types of data:1. The changing evidence base2. Data from outside organizations3. Evidence-based public policy4. Meta-science5. Merging science, technology, and social science

. . . with examples from my research

2/18

Page 13: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

Big Data is Not About the Data!

• The most visible changes: new forms of big data (telescopes,microscopes, scanners, etc.)

• The most important changes: the methods that make thedata actionable.

• Improving scientific inference: using facts we know to learnabout facts we don’t know

• The revolution is about the analytics, the methods of inference• The methods make old data and new instruments actionable• More data alone only makes research harder

• Look for:

new methods to unlock secrets in new types of data:1. The changing evidence base2. Data from outside organizations3. Evidence-based public policy4. Meta-science5. Merging science, technology, and social science

. . . with examples from my research

2/18

Page 14: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

Big Data is Not About the Data!

• The most visible changes: new forms of big data (telescopes,microscopes, scanners, etc.)

• The most important changes: the methods that make thedata actionable.

• Improving scientific inference: using facts we know to learnabout facts we don’t know

• The revolution is about the analytics, the methods of inference• The methods make old data and new instruments actionable• More data alone only makes research harder

• Look for:

new methods to unlock secrets in new types of data:1. The changing evidence base2. Data from outside organizations3. Evidence-based public policy4. Meta-science5. Merging science, technology, and social science

. . . with examples from my research

2/18

Page 15: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

Big Data is Not About the Data!

• The most visible changes: new forms of big data (telescopes,microscopes, scanners, etc.)

• The most important changes: the methods that make thedata actionable.

• Improving scientific inference: using facts we know to learnabout facts we don’t know

• The revolution is about the analytics, the methods of inference• The methods make old data and new instruments actionable• More data alone only makes research harder

• Look for: new methods to unlock secrets in new types of data:

1. The changing evidence base2. Data from outside organizations3. Evidence-based public policy4. Meta-science5. Merging science, technology, and social science

. . . with examples from my research

2/18

Page 16: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

Big Data is Not About the Data!

• The most visible changes: new forms of big data (telescopes,microscopes, scanners, etc.)

• The most important changes: the methods that make thedata actionable.

• Improving scientific inference: using facts we know to learnabout facts we don’t know

• The revolution is about the analytics, the methods of inference• The methods make old data and new instruments actionable• More data alone only makes research harder

• Look for: new methods to unlock secrets in new types of data:1. The changing evidence base

2. Data from outside organizations3. Evidence-based public policy4. Meta-science5. Merging science, technology, and social science

. . . with examples from my research

2/18

Page 17: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

Big Data is Not About the Data!

• The most visible changes: new forms of big data (telescopes,microscopes, scanners, etc.)

• The most important changes: the methods that make thedata actionable.

• Improving scientific inference: using facts we know to learnabout facts we don’t know

• The revolution is about the analytics, the methods of inference• The methods make old data and new instruments actionable• More data alone only makes research harder

• Look for: new methods to unlock secrets in new types of data:1. The changing evidence base2. Data from outside organizations

3. Evidence-based public policy4. Meta-science5. Merging science, technology, and social science

. . . with examples from my research

2/18

Page 18: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

Big Data is Not About the Data!

• The most visible changes: new forms of big data (telescopes,microscopes, scanners, etc.)

• The most important changes: the methods that make thedata actionable.

• Improving scientific inference: using facts we know to learnabout facts we don’t know

• The revolution is about the analytics, the methods of inference• The methods make old data and new instruments actionable• More data alone only makes research harder

• Look for: new methods to unlock secrets in new types of data:1. The changing evidence base2. Data from outside organizations3. Evidence-based public policy

4. Meta-science5. Merging science, technology, and social science

. . . with examples from my research

2/18

Page 19: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

Big Data is Not About the Data!

• The most visible changes: new forms of big data (telescopes,microscopes, scanners, etc.)

• The most important changes: the methods that make thedata actionable.

• Improving scientific inference: using facts we know to learnabout facts we don’t know

• The revolution is about the analytics, the methods of inference• The methods make old data and new instruments actionable• More data alone only makes research harder

• Look for: new methods to unlock secrets in new types of data:1. The changing evidence base2. Data from outside organizations3. Evidence-based public policy4. Meta-science

5. Merging science, technology, and social science

. . . with examples from my research

2/18

Page 20: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

Big Data is Not About the Data!

• The most visible changes: new forms of big data (telescopes,microscopes, scanners, etc.)

• The most important changes: the methods that make thedata actionable.

• Improving scientific inference: using facts we know to learnabout facts we don’t know

• The revolution is about the analytics, the methods of inference• The methods make old data and new instruments actionable• More data alone only makes research harder

• Look for: new methods to unlock secrets in new types of data:1. The changing evidence base2. Data from outside organizations3. Evidence-based public policy4. Meta-science5. Merging science, technology, and social science

. . . with examples from my research

2/18

Page 21: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

Big Data is Not About the Data!

• The most visible changes: new forms of big data (telescopes,microscopes, scanners, etc.)

• The most important changes: the methods that make thedata actionable.

• Improving scientific inference: using facts we know to learnabout facts we don’t know

• The revolution is about the analytics, the methods of inference• The methods make old data and new instruments actionable• More data alone only makes research harder

• Look for: new methods to unlock secrets in new types of data:1. The changing evidence base2. Data from outside organizations3. Evidence-based public policy4. Meta-science5. Merging science, technology, and social science

. . . with examples from my research

2/18

Page 22: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science Research

The Last 50 Years:• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science:

transformed mostFortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for:

impact data methods insights data. . .

3/18

Page 23: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science:

transformed mostFortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for:

impact data methods insights data. . .

3/18

Page 24: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research

• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science:

transformed mostFortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for:

impact data methods insights data. . .

3/18

Page 25: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics

• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science:

transformed mostFortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for:

impact data methods insights data. . .

3/18

Page 26: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science:

transformed mostFortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for:

impact data methods insights data. . .

3/18

Page 27: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .

• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science:

transformed mostFortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for:

impact data methods insights data. . .

3/18

Page 28: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics

• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science:

transformed mostFortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for:

impact data methods insights data. . .

3/18

Page 29: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere

• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science:

transformed mostFortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for:

impact data methods insights data. . .

3/18

Page 30: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)

• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science:

transformed mostFortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for:

impact data methods insights data. . .

3/18

Page 31: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation

• The march of quantification: through academia, theprofessions, government, & commerce

• Real world impact of social science:

transformed mostFortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for:

impact data methods insights data. . .

3/18

Page 32: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce

• Real world impact of social science:

transformed mostFortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for:

impact data methods insights data. . .

3/18

Page 33: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science:

transformed mostFortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for:

impact data methods insights data. . .

3/18

Page 34: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science: transformed most

Fortune 500 firms;

established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for:

impact data methods insights data. . .

3/18

Page 35: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science: transformed most

Fortune 500 firms; established new industries;

alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for:

impact data methods insights data. . .

3/18

Page 36: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science: transformed most

Fortune 500 firms; established new industries; alteredfriendship networks,

political campaigns, public health, legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for:

impact data methods insights data. . .

3/18

Page 37: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science: transformed most

Fortune 500 firms; established new industries; alteredfriendship networks, political campaigns,

public health, legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for:

impact data methods insights data. . .

3/18

Page 38: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science: transformed most

Fortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health,

legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for:

impact data methods insights data. . .

3/18

Page 39: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science: transformed most

Fortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis

; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for:

impact data methods insights data. . .

3/18

Page 40: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science: transformed most

Fortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing,

economics, even sports;set standards for evaluating public policy; & many others

• Look for:

impact data methods insights data. . .

3/18

Page 41: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science: transformed most

Fortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing, economics,

even sports;set standards for evaluating public policy; & many others

• Look for:

impact data methods insights data. . .

3/18

Page 42: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science: transformed most

Fortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing, economics, even sports

;set standards for evaluating public policy; & many others

• Look for:

impact data methods insights data. . .

3/18

Page 43: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science: transformed most

Fortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy

; & many others• Look for:

impact data methods insights data. . .

3/18

Page 44: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science: transformed most

Fortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for:

impact data methods insights data. . .

3/18

Page 45: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science: transformed most

Fortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for:

impact data methods insights data. . .

3/18

Page 46: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science: transformed most

Fortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for: impact

data methods insights data. . .

3/18

Page 47: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science: transformed most

Fortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for: impact data

methods insights data. . .

3/18

Page 48: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science: transformed most

Fortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for: impact data methods

insights data. . .

3/18

Page 49: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science: transformed most

Fortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for: impact data methods insights

data. . .

3/18

Page 50: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

1. The Evidence Base of Social Science ResearchThe Last 50 Years:

• Survey research• Aggregate government statistics• One-off studies of individual people, places, or events

The Next 50 Years: Additional data sources due to. . .• Advances in social science statistics• Shrinking computers & the growing Internet: data everywhere• The replication movement: data sharing (e.g., Dataverse)• Governments encouraging data collection & experimentation• The march of quantification: through academia, the

professions, government, & commerce• Real world impact of social science: transformed most

Fortune 500 firms; established new industries; alteredfriendship networks, political campaigns, public health, legalanalysis; impacted crime and policing, economics, even sports;set standards for evaluating public policy; & many others

• Look for: impact data methods insights data. . .3/18

Page 51: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

2. Corporate, Nonprofit, Government Data

• At one time, almost all research data was inside the university

• Now, most data is outside, in corporations, governments, etc.,

controlled by our research subjects!

• We arrange access

data becomes more informative thepublic becomes more concerned about privacy firms worryabout research hurting business we obtain new accessunder new rules · · ·

• Disruption will not stop. To upend the software-based techindustry requires only a few people, some creativity, and luck:

Facebook, Unix, gmail, snapchat, Google, Twitter, etc.

• Look for:

unique opportunities from fast changing researchpartnerships with groups outside the university

4/18

Page 52: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

2. Corporate, Nonprofit, Government Data

• At one time, almost all research data was inside the university

• Now, most data is outside, in corporations, governments, etc.,

controlled by our research subjects!

• We arrange access

data becomes more informative thepublic becomes more concerned about privacy firms worryabout research hurting business we obtain new accessunder new rules · · ·

• Disruption will not stop. To upend the software-based techindustry requires only a few people, some creativity, and luck:

Facebook, Unix, gmail, snapchat, Google, Twitter, etc.

• Look for:

unique opportunities from fast changing researchpartnerships with groups outside the university

4/18

Page 53: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

2. Corporate, Nonprofit, Government Data

• At one time, almost all research data was inside the university

• Now, most data is outside, in corporations, governments, etc.,

controlled by our research subjects!

• We arrange access

data becomes more informative thepublic becomes more concerned about privacy firms worryabout research hurting business we obtain new accessunder new rules · · ·

• Disruption will not stop. To upend the software-based techindustry requires only a few people, some creativity, and luck:

Facebook, Unix, gmail, snapchat, Google, Twitter, etc.

• Look for:

unique opportunities from fast changing researchpartnerships with groups outside the university

4/18

Page 54: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

2. Corporate, Nonprofit, Government Data

• At one time, almost all research data was inside the university

• Now, most data is outside, in corporations, governments, etc.,controlled by our research subjects!

• We arrange access

data becomes more informative thepublic becomes more concerned about privacy firms worryabout research hurting business we obtain new accessunder new rules · · ·

• Disruption will not stop. To upend the software-based techindustry requires only a few people, some creativity, and luck:

Facebook, Unix, gmail, snapchat, Google, Twitter, etc.

• Look for:

unique opportunities from fast changing researchpartnerships with groups outside the university

4/18

Page 55: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

2. Corporate, Nonprofit, Government Data

• At one time, almost all research data was inside the university

• Now, most data is outside, in corporations, governments, etc.,controlled by our research subjects!

• We arrange access

data becomes more informative thepublic becomes more concerned about privacy firms worryabout research hurting business we obtain new accessunder new rules · · ·

• Disruption will not stop. To upend the software-based techindustry requires only a few people, some creativity, and luck:

Facebook, Unix, gmail, snapchat, Google, Twitter, etc.

• Look for:

unique opportunities from fast changing researchpartnerships with groups outside the university

4/18

Page 56: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

2. Corporate, Nonprofit, Government Data

• At one time, almost all research data was inside the university

• Now, most data is outside, in corporations, governments, etc.,controlled by our research subjects!

• We arrange access data becomes more informative

thepublic becomes more concerned about privacy firms worryabout research hurting business we obtain new accessunder new rules · · ·

• Disruption will not stop. To upend the software-based techindustry requires only a few people, some creativity, and luck:

Facebook, Unix, gmail, snapchat, Google, Twitter, etc.

• Look for:

unique opportunities from fast changing researchpartnerships with groups outside the university

4/18

Page 57: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

2. Corporate, Nonprofit, Government Data

• At one time, almost all research data was inside the university

• Now, most data is outside, in corporations, governments, etc.,controlled by our research subjects!

• We arrange access data becomes more informative thepublic becomes more concerned about privacy

firms worryabout research hurting business we obtain new accessunder new rules · · ·

• Disruption will not stop. To upend the software-based techindustry requires only a few people, some creativity, and luck:

Facebook, Unix, gmail, snapchat, Google, Twitter, etc.

• Look for:

unique opportunities from fast changing researchpartnerships with groups outside the university

4/18

Page 58: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

2. Corporate, Nonprofit, Government Data

• At one time, almost all research data was inside the university

• Now, most data is outside, in corporations, governments, etc.,controlled by our research subjects!

• We arrange access data becomes more informative thepublic becomes more concerned about privacy firms worryabout research hurting business

we obtain new accessunder new rules · · ·

• Disruption will not stop. To upend the software-based techindustry requires only a few people, some creativity, and luck:

Facebook, Unix, gmail, snapchat, Google, Twitter, etc.

• Look for:

unique opportunities from fast changing researchpartnerships with groups outside the university

4/18

Page 59: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

2. Corporate, Nonprofit, Government Data

• At one time, almost all research data was inside the university

• Now, most data is outside, in corporations, governments, etc.,controlled by our research subjects!

• We arrange access data becomes more informative thepublic becomes more concerned about privacy firms worryabout research hurting business we obtain new accessunder new rules

· · ·• Disruption will not stop. To upend the software-based tech

industry requires only a few people, some creativity, and luck:

Facebook, Unix, gmail, snapchat, Google, Twitter, etc.

• Look for:

unique opportunities from fast changing researchpartnerships with groups outside the university

4/18

Page 60: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

2. Corporate, Nonprofit, Government Data

• At one time, almost all research data was inside the university

• Now, most data is outside, in corporations, governments, etc.,controlled by our research subjects!

• We arrange access data becomes more informative thepublic becomes more concerned about privacy firms worryabout research hurting business we obtain new accessunder new rules · · ·

• Disruption will not stop. To upend the software-based techindustry requires only a few people, some creativity, and luck:

Facebook, Unix, gmail, snapchat, Google, Twitter, etc.

• Look for:

unique opportunities from fast changing researchpartnerships with groups outside the university

4/18

Page 61: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

2. Corporate, Nonprofit, Government Data

• At one time, almost all research data was inside the university

• Now, most data is outside, in corporations, governments, etc.,controlled by our research subjects!

• We arrange access data becomes more informative thepublic becomes more concerned about privacy firms worryabout research hurting business we obtain new accessunder new rules · · ·

• Disruption will not stop. To upend the software-based techindustry requires only a few people, some creativity, and luck:

Facebook, Unix, gmail, snapchat, Google, Twitter, etc.

• Look for:

unique opportunities from fast changing researchpartnerships with groups outside the university

4/18

Page 62: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

2. Corporate, Nonprofit, Government Data

• At one time, almost all research data was inside the university

• Now, most data is outside, in corporations, governments, etc.,controlled by our research subjects!

• We arrange access data becomes more informative thepublic becomes more concerned about privacy firms worryabout research hurting business we obtain new accessunder new rules · · ·

• Disruption will not stop. To upend the software-based techindustry requires only a few people, some creativity, and luck:Facebook,

Unix, gmail, snapchat, Google, Twitter, etc.

• Look for:

unique opportunities from fast changing researchpartnerships with groups outside the university

4/18

Page 63: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

2. Corporate, Nonprofit, Government Data

• At one time, almost all research data was inside the university

• Now, most data is outside, in corporations, governments, etc.,controlled by our research subjects!

• We arrange access data becomes more informative thepublic becomes more concerned about privacy firms worryabout research hurting business we obtain new accessunder new rules · · ·

• Disruption will not stop. To upend the software-based techindustry requires only a few people, some creativity, and luck:Facebook, Unix,

gmail, snapchat, Google, Twitter, etc.

• Look for:

unique opportunities from fast changing researchpartnerships with groups outside the university

4/18

Page 64: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

2. Corporate, Nonprofit, Government Data

• At one time, almost all research data was inside the university

• Now, most data is outside, in corporations, governments, etc.,controlled by our research subjects!

• We arrange access data becomes more informative thepublic becomes more concerned about privacy firms worryabout research hurting business we obtain new accessunder new rules · · ·

• Disruption will not stop. To upend the software-based techindustry requires only a few people, some creativity, and luck:Facebook, Unix, gmail,

snapchat, Google, Twitter, etc.

• Look for:

unique opportunities from fast changing researchpartnerships with groups outside the university

4/18

Page 65: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

2. Corporate, Nonprofit, Government Data

• At one time, almost all research data was inside the university

• Now, most data is outside, in corporations, governments, etc.,controlled by our research subjects!

• We arrange access data becomes more informative thepublic becomes more concerned about privacy firms worryabout research hurting business we obtain new accessunder new rules · · ·

• Disruption will not stop. To upend the software-based techindustry requires only a few people, some creativity, and luck:Facebook, Unix, gmail, snapchat,

Google, Twitter, etc.

• Look for:

unique opportunities from fast changing researchpartnerships with groups outside the university

4/18

Page 66: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

2. Corporate, Nonprofit, Government Data

• At one time, almost all research data was inside the university

• Now, most data is outside, in corporations, governments, etc.,controlled by our research subjects!

• We arrange access data becomes more informative thepublic becomes more concerned about privacy firms worryabout research hurting business we obtain new accessunder new rules · · ·

• Disruption will not stop. To upend the software-based techindustry requires only a few people, some creativity, and luck:Facebook, Unix, gmail, snapchat, Google,

Twitter, etc.

• Look for:

unique opportunities from fast changing researchpartnerships with groups outside the university

4/18

Page 67: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

2. Corporate, Nonprofit, Government Data

• At one time, almost all research data was inside the university

• Now, most data is outside, in corporations, governments, etc.,controlled by our research subjects!

• We arrange access data becomes more informative thepublic becomes more concerned about privacy firms worryabout research hurting business we obtain new accessunder new rules · · ·

• Disruption will not stop. To upend the software-based techindustry requires only a few people, some creativity, and luck:Facebook, Unix, gmail, snapchat, Google, Twitter, etc.

• Look for:

unique opportunities from fast changing researchpartnerships with groups outside the university

4/18

Page 68: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

2. Corporate, Nonprofit, Government Data

• At one time, almost all research data was inside the university

• Now, most data is outside, in corporations, governments, etc.,controlled by our research subjects!

• We arrange access data becomes more informative thepublic becomes more concerned about privacy firms worryabout research hurting business we obtain new accessunder new rules · · ·

• Disruption will not stop. To upend the software-based techindustry requires only a few people, some creativity, and luck:Facebook, Unix, gmail, snapchat, Google, Twitter, etc.

• Look for:

unique opportunities from fast changing researchpartnerships with groups outside the university

4/18

Page 69: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

2. Corporate, Nonprofit, Government Data

• At one time, almost all research data was inside the university

• Now, most data is outside, in corporations, governments, etc.,controlled by our research subjects!

• We arrange access data becomes more informative thepublic becomes more concerned about privacy firms worryabout research hurting business we obtain new accessunder new rules · · ·

• Disruption will not stop. To upend the software-based techindustry requires only a few people, some creativity, and luck:Facebook, Unix, gmail, snapchat, Google, Twitter, etc.

• Look for: unique opportunities from fast changing researchpartnerships with groups outside the university

4/18

Page 70: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: How to Read a Trillion Social Media Posts& Classify Deaths without Physicians

• Examples of Bad Analytics:

• Physicians’ “Verbal Autopsy” analysis (from WHO)• Sentiment analysis via word counts (social media companies)

• Unrelated substantive problems, same analytics solution:

• Key to both methods: classifying (deaths, social media posts)• Key to both goals: estimating %’s

• Modern Data Analytics: New method led to:

1.

2. Worldwide cause-of-death estimates for

5/18

Page 71: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: How to Read a Trillion Social Media Posts& Classify Deaths without Physicians

• Examples of Bad Analytics:

• Physicians’ “Verbal Autopsy” analysis (from WHO)• Sentiment analysis via word counts (social media companies)

• Unrelated substantive problems, same analytics solution:

• Key to both methods: classifying (deaths, social media posts)• Key to both goals: estimating %’s

• Modern Data Analytics: New method led to:

1.

2. Worldwide cause-of-death estimates for

5/18

Page 72: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: How to Read a Trillion Social Media Posts& Classify Deaths without Physicians

• Examples of Bad Analytics:• Physicians’ “Verbal Autopsy” analysis (from WHO)

• Sentiment analysis via word counts (social media companies)

• Unrelated substantive problems, same analytics solution:

• Key to both methods: classifying (deaths, social media posts)• Key to both goals: estimating %’s

• Modern Data Analytics: New method led to:

1.

2. Worldwide cause-of-death estimates for

5/18

Page 73: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: How to Read a Trillion Social Media Posts& Classify Deaths without Physicians

• Examples of Bad Analytics:• Physicians’ “Verbal Autopsy” analysis (from WHO)• Sentiment analysis via word counts (social media companies)

• Unrelated substantive problems, same analytics solution:

• Key to both methods: classifying (deaths, social media posts)• Key to both goals: estimating %’s

• Modern Data Analytics: New method led to:

1.

2. Worldwide cause-of-death estimates for

5/18

Page 74: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: How to Read a Trillion Social Media Posts& Classify Deaths without Physicians

• Examples of Bad Analytics:• Physicians’ “Verbal Autopsy” analysis (from WHO)• Sentiment analysis via word counts (social media companies)

• Unrelated substantive problems, same analytics solution:

• Key to both methods: classifying (deaths, social media posts)• Key to both goals: estimating %’s

• Modern Data Analytics: New method led to:

1.

2. Worldwide cause-of-death estimates for

5/18

Page 75: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: How to Read a Trillion Social Media Posts& Classify Deaths without Physicians

• Examples of Bad Analytics:• Physicians’ “Verbal Autopsy” analysis (from WHO)• Sentiment analysis via word counts (social media companies)

• Unrelated substantive problems, same analytics solution:• Key to both methods: classifying (deaths, social media posts)

• Key to both goals: estimating %’s

• Modern Data Analytics: New method led to:

1.

2. Worldwide cause-of-death estimates for

5/18

Page 76: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: How to Read a Trillion Social Media Posts& Classify Deaths without Physicians

• Examples of Bad Analytics:• Physicians’ “Verbal Autopsy” analysis (from WHO)• Sentiment analysis via word counts (social media companies)

• Unrelated substantive problems, same analytics solution:• Key to both methods: classifying (deaths, social media posts)• Key to both goals: estimating %’s

• Modern Data Analytics: New method led to:

1.

2. Worldwide cause-of-death estimates for

5/18

Page 77: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: How to Read a Trillion Social Media Posts& Classify Deaths without Physicians

• Examples of Bad Analytics:• Physicians’ “Verbal Autopsy” analysis (from WHO)• Sentiment analysis via word counts (social media companies)

• Unrelated substantive problems, same analytics solution:• Key to both methods: classifying (deaths, social media posts)• Key to both goals: estimating %’s

• Modern Data Analytics: New method led to:

1.

2. Worldwide cause-of-death estimates for

5/18

Page 78: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: How to Read a Trillion Social Media Posts& Classify Deaths without Physicians

• Examples of Bad Analytics:• Physicians’ “Verbal Autopsy” analysis (from WHO)• Sentiment analysis via word counts (social media companies)

• Unrelated substantive problems, same analytics solution:• Key to both methods: classifying (deaths, social media posts)• Key to both goals: estimating %’s

• Modern Data Analytics: New method led to:

1.

2. Worldwide cause-of-death estimates for

5/18

Page 79: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: How to Read a Trillion Social Media Posts& Classify Deaths without Physicians

• Examples of Bad Analytics:• Physicians’ “Verbal Autopsy” analysis (from WHO)• Sentiment analysis via word counts (social media companies)

• Unrelated substantive problems, same analytics solution:• Key to both methods: classifying (deaths, social media posts)• Key to both goals: estimating %’s

• Modern Data Analytics: New method led to:

1.

2. Worldwide cause-of-death estimates for

5/18

Page 80: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

3. Evidence-Based Public Policy

• Definition of “government policies”:

Giant experiments without control groups

• New movement to allow real experiments in government (e.g.,Seguro Popular)

• Most large scale public policy evaluations fail:

New “politically robust” methods

• White House executive orders requiring openness

• Local governments competing to share data

• Rise of systematic observational analyses (e.g., Social Securityevaluation)

• Look for:

new research (and new methods) from partnershipsbetween researchers and governments

6/18

Page 81: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

3. Evidence-Based Public Policy

• Definition of “government policies”:

Giant experiments without control groups

• New movement to allow real experiments in government (e.g.,Seguro Popular)

• Most large scale public policy evaluations fail:

New “politically robust” methods

• White House executive orders requiring openness

• Local governments competing to share data

• Rise of systematic observational analyses (e.g., Social Securityevaluation)

• Look for:

new research (and new methods) from partnershipsbetween researchers and governments

6/18

Page 82: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

3. Evidence-Based Public Policy

• Definition of “government policies”:Giant experiments without control groups

• New movement to allow real experiments in government (e.g.,Seguro Popular)

• Most large scale public policy evaluations fail:

New “politically robust” methods

• White House executive orders requiring openness

• Local governments competing to share data

• Rise of systematic observational analyses (e.g., Social Securityevaluation)

• Look for:

new research (and new methods) from partnershipsbetween researchers and governments

6/18

Page 83: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

3. Evidence-Based Public Policy

• Definition of “government policies”:Giant experiments without control groups

• New movement to allow real experiments in government (e.g.,Seguro Popular)

• Most large scale public policy evaluations fail:

New “politically robust” methods

• White House executive orders requiring openness

• Local governments competing to share data

• Rise of systematic observational analyses (e.g., Social Securityevaluation)

• Look for:

new research (and new methods) from partnershipsbetween researchers and governments

6/18

Page 84: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

3. Evidence-Based Public Policy

• Definition of “government policies”:Giant experiments without control groups

• New movement to allow real experiments in government (e.g.,Seguro Popular)

• Most large scale public policy evaluations fail:

New “politically robust” methods

• White House executive orders requiring openness

• Local governments competing to share data

• Rise of systematic observational analyses (e.g., Social Securityevaluation)

• Look for:

new research (and new methods) from partnershipsbetween researchers and governments

6/18

Page 85: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

3. Evidence-Based Public Policy

• Definition of “government policies”:Giant experiments without control groups

• New movement to allow real experiments in government (e.g.,Seguro Popular)

• Most large scale public policy evaluations fail: New “politically robust” methods

• White House executive orders requiring openness

• Local governments competing to share data

• Rise of systematic observational analyses (e.g., Social Securityevaluation)

• Look for:

new research (and new methods) from partnershipsbetween researchers and governments

6/18

Page 86: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

3. Evidence-Based Public Policy

• Definition of “government policies”:Giant experiments without control groups

• New movement to allow real experiments in government (e.g.,Seguro Popular)

• Most large scale public policy evaluations fail: New “politically robust” methods

• White House executive orders requiring openness

• Local governments competing to share data

• Rise of systematic observational analyses (e.g., Social Securityevaluation)

• Look for:

new research (and new methods) from partnershipsbetween researchers and governments

6/18

Page 87: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

3. Evidence-Based Public Policy

• Definition of “government policies”:Giant experiments without control groups

• New movement to allow real experiments in government (e.g.,Seguro Popular)

• Most large scale public policy evaluations fail: New “politically robust” methods

• White House executive orders requiring openness

• Local governments competing to share data

• Rise of systematic observational analyses (e.g., Social Securityevaluation)

• Look for:

new research (and new methods) from partnershipsbetween researchers and governments

6/18

Page 88: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

3. Evidence-Based Public Policy

• Definition of “government policies”:Giant experiments without control groups

• New movement to allow real experiments in government (e.g.,Seguro Popular)

• Most large scale public policy evaluations fail: New “politically robust” methods

• White House executive orders requiring openness

• Local governments competing to share data

• Rise of systematic observational analyses (e.g., Social Securityevaluation)

• Look for:

new research (and new methods) from partnershipsbetween researchers and governments

6/18

Page 89: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

3. Evidence-Based Public Policy

• Definition of “government policies”:Giant experiments without control groups

• New movement to allow real experiments in government (e.g.,Seguro Popular)

• Most large scale public policy evaluations fail: New “politically robust” methods

• White House executive orders requiring openness

• Local governments competing to share data

• Rise of systematic observational analyses (e.g., Social Securityevaluation)

• Look for:

new research (and new methods) from partnershipsbetween researchers and governments

6/18

Page 90: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

3. Evidence-Based Public Policy

• Definition of “government policies”:Giant experiments without control groups

• New movement to allow real experiments in government (e.g.,Seguro Popular)

• Most large scale public policy evaluations fail: New “politically robust” methods

• White House executive orders requiring openness

• Local governments competing to share data

• Rise of systematic observational analyses (e.g., Social Securityevaluation)

• Look for: new research (and new methods) from partnershipsbetween researchers and governments

6/18

Page 91: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Bias in Social Security Administration Forecasts

• Social Security: single largest government program; lifted awhole generation out of poverty; extremely popular

• Forecasts: used for programs comprising > 50% of the USexpenditures;

e.g., if retirees draw benefits longer thanexpected, the Trust Fund runs out

• First evaluation of SSA forecasts in 85 years:

• Methods:

little changed; mostly qualitative; a time when we’velearned more about forecasting than at any time in history

• Results:

unbiased until 2000; systematically biased after

• Actuaries hunkered down, insulated themselves, refused tobudge when Democrats & Republicans pushed hard for changes

• In the process, they also insulated themselves from the facts:

Especially since 2000, Americans started living unexpectedlylonger lives (due to statins, early cancer detection, etc.)

• New customized analytics we developed:

• Logical consistency (e.g., older people have higher mortality)• Far more accurate forecasts• Trust fund needs > $800 million more than SSA thought• Many other applications to different types of forecasts

7/18

Page 92: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Bias in Social Security Administration Forecasts• Social Security: single largest government program; lifted a

whole generation out of poverty; extremely popular

• Forecasts: used for programs comprising > 50% of the USexpenditures;

e.g., if retirees draw benefits longer thanexpected, the Trust Fund runs out

• First evaluation of SSA forecasts in 85 years:

• Methods:

little changed; mostly qualitative; a time when we’velearned more about forecasting than at any time in history

• Results:

unbiased until 2000; systematically biased after

• Actuaries hunkered down, insulated themselves, refused tobudge when Democrats & Republicans pushed hard for changes

• In the process, they also insulated themselves from the facts:

Especially since 2000, Americans started living unexpectedlylonger lives (due to statins, early cancer detection, etc.)

• New customized analytics we developed:

• Logical consistency (e.g., older people have higher mortality)• Far more accurate forecasts• Trust fund needs > $800 million more than SSA thought• Many other applications to different types of forecasts

7/18

Page 93: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Bias in Social Security Administration Forecasts• Social Security: single largest government program; lifted a

whole generation out of poverty; extremely popular• Forecasts: used for programs comprising > 50% of the US

expenditures;

e.g., if retirees draw benefits longer thanexpected, the Trust Fund runs out

• First evaluation of SSA forecasts in 85 years:

• Methods:

little changed; mostly qualitative; a time when we’velearned more about forecasting than at any time in history

• Results:

unbiased until 2000; systematically biased after

• Actuaries hunkered down, insulated themselves, refused tobudge when Democrats & Republicans pushed hard for changes

• In the process, they also insulated themselves from the facts:

Especially since 2000, Americans started living unexpectedlylonger lives (due to statins, early cancer detection, etc.)

• New customized analytics we developed:

• Logical consistency (e.g., older people have higher mortality)• Far more accurate forecasts• Trust fund needs > $800 million more than SSA thought• Many other applications to different types of forecasts

7/18

Page 94: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Bias in Social Security Administration Forecasts• Social Security: single largest government program; lifted a

whole generation out of poverty; extremely popular• Forecasts: used for programs comprising > 50% of the US

expenditures; e.g., if retirees draw benefits longer thanexpected, the Trust Fund runs out

• First evaluation of SSA forecasts in 85 years:

• Methods:

little changed; mostly qualitative; a time when we’velearned more about forecasting than at any time in history

• Results:

unbiased until 2000; systematically biased after

• Actuaries hunkered down, insulated themselves, refused tobudge when Democrats & Republicans pushed hard for changes

• In the process, they also insulated themselves from the facts:

Especially since 2000, Americans started living unexpectedlylonger lives (due to statins, early cancer detection, etc.)

• New customized analytics we developed:

• Logical consistency (e.g., older people have higher mortality)• Far more accurate forecasts• Trust fund needs > $800 million more than SSA thought• Many other applications to different types of forecasts

7/18

Page 95: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Bias in Social Security Administration Forecasts• Social Security: single largest government program; lifted a

whole generation out of poverty; extremely popular• Forecasts: used for programs comprising > 50% of the US

expenditures; e.g., if retirees draw benefits longer thanexpected, the Trust Fund runs out

• First evaluation of SSA forecasts in 85 years:

• Methods:

little changed; mostly qualitative; a time when we’velearned more about forecasting than at any time in history

• Results:

unbiased until 2000; systematically biased after

• Actuaries hunkered down, insulated themselves, refused tobudge when Democrats & Republicans pushed hard for changes

• In the process, they also insulated themselves from the facts:

Especially since 2000, Americans started living unexpectedlylonger lives (due to statins, early cancer detection, etc.)

• New customized analytics we developed:

• Logical consistency (e.g., older people have higher mortality)• Far more accurate forecasts• Trust fund needs > $800 million more than SSA thought• Many other applications to different types of forecasts

7/18

Page 96: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Bias in Social Security Administration Forecasts• Social Security: single largest government program; lifted a

whole generation out of poverty; extremely popular• Forecasts: used for programs comprising > 50% of the US

expenditures; e.g., if retirees draw benefits longer thanexpected, the Trust Fund runs out

• First evaluation of SSA forecasts in 85 years:• Methods:

little changed; mostly qualitative; a time when we’velearned more about forecasting than at any time in history

• Results:

unbiased until 2000; systematically biased after

• Actuaries hunkered down, insulated themselves, refused tobudge when Democrats & Republicans pushed hard for changes

• In the process, they also insulated themselves from the facts:

Especially since 2000, Americans started living unexpectedlylonger lives (due to statins, early cancer detection, etc.)

• New customized analytics we developed:

• Logical consistency (e.g., older people have higher mortality)• Far more accurate forecasts• Trust fund needs > $800 million more than SSA thought• Many other applications to different types of forecasts

7/18

Page 97: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Bias in Social Security Administration Forecasts• Social Security: single largest government program; lifted a

whole generation out of poverty; extremely popular• Forecasts: used for programs comprising > 50% of the US

expenditures; e.g., if retirees draw benefits longer thanexpected, the Trust Fund runs out

• First evaluation of SSA forecasts in 85 years:• Methods: little changed;

mostly qualitative; a time when we’velearned more about forecasting than at any time in history

• Results:

unbiased until 2000; systematically biased after

• Actuaries hunkered down, insulated themselves, refused tobudge when Democrats & Republicans pushed hard for changes

• In the process, they also insulated themselves from the facts:

Especially since 2000, Americans started living unexpectedlylonger lives (due to statins, early cancer detection, etc.)

• New customized analytics we developed:

• Logical consistency (e.g., older people have higher mortality)• Far more accurate forecasts• Trust fund needs > $800 million more than SSA thought• Many other applications to different types of forecasts

7/18

Page 98: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Bias in Social Security Administration Forecasts• Social Security: single largest government program; lifted a

whole generation out of poverty; extremely popular• Forecasts: used for programs comprising > 50% of the US

expenditures; e.g., if retirees draw benefits longer thanexpected, the Trust Fund runs out

• First evaluation of SSA forecasts in 85 years:• Methods: little changed; mostly qualitative;

a time when we’velearned more about forecasting than at any time in history

• Results:

unbiased until 2000; systematically biased after

• Actuaries hunkered down, insulated themselves, refused tobudge when Democrats & Republicans pushed hard for changes

• In the process, they also insulated themselves from the facts:

Especially since 2000, Americans started living unexpectedlylonger lives (due to statins, early cancer detection, etc.)

• New customized analytics we developed:

• Logical consistency (e.g., older people have higher mortality)• Far more accurate forecasts• Trust fund needs > $800 million more than SSA thought• Many other applications to different types of forecasts

7/18

Page 99: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Bias in Social Security Administration Forecasts• Social Security: single largest government program; lifted a

whole generation out of poverty; extremely popular• Forecasts: used for programs comprising > 50% of the US

expenditures; e.g., if retirees draw benefits longer thanexpected, the Trust Fund runs out

• First evaluation of SSA forecasts in 85 years:• Methods: little changed; mostly qualitative; a time when we’ve

learned more about forecasting than at any time in history

• Results:

unbiased until 2000; systematically biased after

• Actuaries hunkered down, insulated themselves, refused tobudge when Democrats & Republicans pushed hard for changes

• In the process, they also insulated themselves from the facts:

Especially since 2000, Americans started living unexpectedlylonger lives (due to statins, early cancer detection, etc.)

• New customized analytics we developed:

• Logical consistency (e.g., older people have higher mortality)• Far more accurate forecasts• Trust fund needs > $800 million more than SSA thought• Many other applications to different types of forecasts

7/18

Page 100: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Bias in Social Security Administration Forecasts• Social Security: single largest government program; lifted a

whole generation out of poverty; extremely popular• Forecasts: used for programs comprising > 50% of the US

expenditures; e.g., if retirees draw benefits longer thanexpected, the Trust Fund runs out

• First evaluation of SSA forecasts in 85 years:• Methods: little changed; mostly qualitative; a time when we’ve

learned more about forecasting than at any time in history• Results:

unbiased until 2000; systematically biased after• Actuaries hunkered down, insulated themselves, refused to

budge when Democrats & Republicans pushed hard for changes• In the process, they also insulated themselves from the facts:

Especially since 2000, Americans started living unexpectedlylonger lives (due to statins, early cancer detection, etc.)

• New customized analytics we developed:

• Logical consistency (e.g., older people have higher mortality)• Far more accurate forecasts• Trust fund needs > $800 million more than SSA thought• Many other applications to different types of forecasts

7/18

Page 101: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Bias in Social Security Administration Forecasts• Social Security: single largest government program; lifted a

whole generation out of poverty; extremely popular• Forecasts: used for programs comprising > 50% of the US

expenditures; e.g., if retirees draw benefits longer thanexpected, the Trust Fund runs out

• First evaluation of SSA forecasts in 85 years:• Methods: little changed; mostly qualitative; a time when we’ve

learned more about forecasting than at any time in history• Results: unbiased until 2000;

systematically biased after• Actuaries hunkered down, insulated themselves, refused to

budge when Democrats & Republicans pushed hard for changes• In the process, they also insulated themselves from the facts:

Especially since 2000, Americans started living unexpectedlylonger lives (due to statins, early cancer detection, etc.)

• New customized analytics we developed:

• Logical consistency (e.g., older people have higher mortality)• Far more accurate forecasts• Trust fund needs > $800 million more than SSA thought• Many other applications to different types of forecasts

7/18

Page 102: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Bias in Social Security Administration Forecasts• Social Security: single largest government program; lifted a

whole generation out of poverty; extremely popular• Forecasts: used for programs comprising > 50% of the US

expenditures; e.g., if retirees draw benefits longer thanexpected, the Trust Fund runs out

• First evaluation of SSA forecasts in 85 years:• Methods: little changed; mostly qualitative; a time when we’ve

learned more about forecasting than at any time in history• Results: unbiased until 2000; systematically biased after

• Actuaries hunkered down, insulated themselves, refused tobudge when Democrats & Republicans pushed hard for changes

• In the process, they also insulated themselves from the facts:

Especially since 2000, Americans started living unexpectedlylonger lives (due to statins, early cancer detection, etc.)

• New customized analytics we developed:

• Logical consistency (e.g., older people have higher mortality)• Far more accurate forecasts• Trust fund needs > $800 million more than SSA thought• Many other applications to different types of forecasts

7/18

Page 103: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Bias in Social Security Administration Forecasts• Social Security: single largest government program; lifted a

whole generation out of poverty; extremely popular• Forecasts: used for programs comprising > 50% of the US

expenditures; e.g., if retirees draw benefits longer thanexpected, the Trust Fund runs out

• First evaluation of SSA forecasts in 85 years:• Methods: little changed; mostly qualitative; a time when we’ve

learned more about forecasting than at any time in history• Results: unbiased until 2000; systematically biased after• Actuaries hunkered down, insulated themselves, refused to

budge when Democrats & Republicans pushed hard for changes

• In the process, they also insulated themselves from the facts:

Especially since 2000, Americans started living unexpectedlylonger lives (due to statins, early cancer detection, etc.)

• New customized analytics we developed:

• Logical consistency (e.g., older people have higher mortality)• Far more accurate forecasts• Trust fund needs > $800 million more than SSA thought• Many other applications to different types of forecasts

7/18

Page 104: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Bias in Social Security Administration Forecasts• Social Security: single largest government program; lifted a

whole generation out of poverty; extremely popular• Forecasts: used for programs comprising > 50% of the US

expenditures; e.g., if retirees draw benefits longer thanexpected, the Trust Fund runs out

• First evaluation of SSA forecasts in 85 years:• Methods: little changed; mostly qualitative; a time when we’ve

learned more about forecasting than at any time in history• Results: unbiased until 2000; systematically biased after• Actuaries hunkered down, insulated themselves, refused to

budge when Democrats & Republicans pushed hard for changes• In the process, they also insulated themselves from the facts:

Especially since 2000, Americans started living unexpectedlylonger lives (due to statins, early cancer detection, etc.)

• New customized analytics we developed:

• Logical consistency (e.g., older people have higher mortality)• Far more accurate forecasts• Trust fund needs > $800 million more than SSA thought• Many other applications to different types of forecasts

7/18

Page 105: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Bias in Social Security Administration Forecasts• Social Security: single largest government program; lifted a

whole generation out of poverty; extremely popular• Forecasts: used for programs comprising > 50% of the US

expenditures; e.g., if retirees draw benefits longer thanexpected, the Trust Fund runs out

• First evaluation of SSA forecasts in 85 years:• Methods: little changed; mostly qualitative; a time when we’ve

learned more about forecasting than at any time in history• Results: unbiased until 2000; systematically biased after• Actuaries hunkered down, insulated themselves, refused to

budge when Democrats & Republicans pushed hard for changes• In the process, they also insulated themselves from the facts:

Especially since 2000, Americans started living unexpectedlylonger lives

(due to statins, early cancer detection, etc.)• New customized analytics we developed:

• Logical consistency (e.g., older people have higher mortality)• Far more accurate forecasts• Trust fund needs > $800 million more than SSA thought• Many other applications to different types of forecasts

7/18

Page 106: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Bias in Social Security Administration Forecasts• Social Security: single largest government program; lifted a

whole generation out of poverty; extremely popular• Forecasts: used for programs comprising > 50% of the US

expenditures; e.g., if retirees draw benefits longer thanexpected, the Trust Fund runs out

• First evaluation of SSA forecasts in 85 years:• Methods: little changed; mostly qualitative; a time when we’ve

learned more about forecasting than at any time in history• Results: unbiased until 2000; systematically biased after• Actuaries hunkered down, insulated themselves, refused to

budge when Democrats & Republicans pushed hard for changes• In the process, they also insulated themselves from the facts:

Especially since 2000, Americans started living unexpectedlylonger lives (due to statins, early cancer detection, etc.)

• New customized analytics we developed:

• Logical consistency (e.g., older people have higher mortality)• Far more accurate forecasts• Trust fund needs > $800 million more than SSA thought• Many other applications to different types of forecasts

7/18

Page 107: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Bias in Social Security Administration Forecasts• Social Security: single largest government program; lifted a

whole generation out of poverty; extremely popular• Forecasts: used for programs comprising > 50% of the US

expenditures; e.g., if retirees draw benefits longer thanexpected, the Trust Fund runs out

• First evaluation of SSA forecasts in 85 years:• Methods: little changed; mostly qualitative; a time when we’ve

learned more about forecasting than at any time in history• Results: unbiased until 2000; systematically biased after• Actuaries hunkered down, insulated themselves, refused to

budge when Democrats & Republicans pushed hard for changes• In the process, they also insulated themselves from the facts:

Especially since 2000, Americans started living unexpectedlylonger lives (due to statins, early cancer detection, etc.)

• New customized analytics we developed:

• Logical consistency (e.g., older people have higher mortality)• Far more accurate forecasts• Trust fund needs > $800 million more than SSA thought• Many other applications to different types of forecasts

7/18

Page 108: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Bias in Social Security Administration Forecasts• Social Security: single largest government program; lifted a

whole generation out of poverty; extremely popular• Forecasts: used for programs comprising > 50% of the US

expenditures; e.g., if retirees draw benefits longer thanexpected, the Trust Fund runs out

• First evaluation of SSA forecasts in 85 years:• Methods: little changed; mostly qualitative; a time when we’ve

learned more about forecasting than at any time in history• Results: unbiased until 2000; systematically biased after• Actuaries hunkered down, insulated themselves, refused to

budge when Democrats & Republicans pushed hard for changes• In the process, they also insulated themselves from the facts:

Especially since 2000, Americans started living unexpectedlylonger lives (due to statins, early cancer detection, etc.)

• New customized analytics we developed:• Logical consistency (e.g., older people have higher mortality)

• Far more accurate forecasts• Trust fund needs > $800 million more than SSA thought• Many other applications to different types of forecasts

7/18

Page 109: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Bias in Social Security Administration Forecasts• Social Security: single largest government program; lifted a

whole generation out of poverty; extremely popular• Forecasts: used for programs comprising > 50% of the US

expenditures; e.g., if retirees draw benefits longer thanexpected, the Trust Fund runs out

• First evaluation of SSA forecasts in 85 years:• Methods: little changed; mostly qualitative; a time when we’ve

learned more about forecasting than at any time in history• Results: unbiased until 2000; systematically biased after• Actuaries hunkered down, insulated themselves, refused to

budge when Democrats & Republicans pushed hard for changes• In the process, they also insulated themselves from the facts:

Especially since 2000, Americans started living unexpectedlylonger lives (due to statins, early cancer detection, etc.)

• New customized analytics we developed:• Logical consistency (e.g., older people have higher mortality)• Far more accurate forecasts

• Trust fund needs > $800 million more than SSA thought• Many other applications to different types of forecasts

7/18

Page 110: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Bias in Social Security Administration Forecasts• Social Security: single largest government program; lifted a

whole generation out of poverty; extremely popular• Forecasts: used for programs comprising > 50% of the US

expenditures; e.g., if retirees draw benefits longer thanexpected, the Trust Fund runs out

• First evaluation of SSA forecasts in 85 years:• Methods: little changed; mostly qualitative; a time when we’ve

learned more about forecasting than at any time in history• Results: unbiased until 2000; systematically biased after• Actuaries hunkered down, insulated themselves, refused to

budge when Democrats & Republicans pushed hard for changes• In the process, they also insulated themselves from the facts:

Especially since 2000, Americans started living unexpectedlylonger lives (due to statins, early cancer detection, etc.)

• New customized analytics we developed:• Logical consistency (e.g., older people have higher mortality)• Far more accurate forecasts• Trust fund needs > $800 million more than SSA thought

• Many other applications to different types of forecasts

7/18

Page 111: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Bias in Social Security Administration Forecasts• Social Security: single largest government program; lifted a

whole generation out of poverty; extremely popular• Forecasts: used for programs comprising > 50% of the US

expenditures; e.g., if retirees draw benefits longer thanexpected, the Trust Fund runs out

• First evaluation of SSA forecasts in 85 years:• Methods: little changed; mostly qualitative; a time when we’ve

learned more about forecasting than at any time in history• Results: unbiased until 2000; systematically biased after• Actuaries hunkered down, insulated themselves, refused to

budge when Democrats & Republicans pushed hard for changes• In the process, they also insulated themselves from the facts:

Especially since 2000, Americans started living unexpectedlylonger lives (due to statins, early cancer detection, etc.)

• New customized analytics we developed:• Logical consistency (e.g., older people have higher mortality)• Far more accurate forecasts• Trust fund needs > $800 million more than SSA thought• Many other applications to different types of forecasts 7/18

Page 112: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

The End of The Quantitative-Qualitative Divide

• The Quant-Qual divide exists in every field.

• Qualitative researchers: overwhelmed by information

• Quantitative researchers: recognize the huge amounts ofinformation in qualitative analyses, new methods makeunstructured text, video, audio actionable

• Expert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics areapplied: analytics wins.

• But: There’s always qualitative information that hasn’t beenquantified

• Look for:

new methodological solutions to the quant-qualdivide.

8/18

Page 113: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

The End of The Quantitative-Qualitative Divide

• The Quant-Qual divide exists in every field.

• Qualitative researchers: overwhelmed by information

• Quantitative researchers: recognize the huge amounts ofinformation in qualitative analyses, new methods makeunstructured text, video, audio actionable

• Expert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics areapplied: analytics wins.

• But: There’s always qualitative information that hasn’t beenquantified

• Look for:

new methodological solutions to the quant-qualdivide.

8/18

Page 114: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

The End of The Quantitative-Qualitative Divide

• The Quant-Qual divide exists in every field.

• Qualitative researchers: overwhelmed by information

• Quantitative researchers: recognize the huge amounts ofinformation in qualitative analyses, new methods makeunstructured text, video, audio actionable

• Expert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics areapplied: analytics wins.

• But: There’s always qualitative information that hasn’t beenquantified

• Look for:

new methodological solutions to the quant-qualdivide.

8/18

Page 115: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

The End of The Quantitative-Qualitative Divide

• The Quant-Qual divide exists in every field.

• Qualitative researchers: overwhelmed by information

• Quantitative researchers: recognize the huge amounts ofinformation in qualitative analyses, new methods makeunstructured text, video, audio actionable

• Expert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics areapplied: analytics wins.

• But: There’s always qualitative information that hasn’t beenquantified

• Look for:

new methodological solutions to the quant-qualdivide.

8/18

Page 116: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

The End of The Quantitative-Qualitative Divide

• The Quant-Qual divide exists in every field.

• Qualitative researchers: overwhelmed by information

• Quantitative researchers: recognize the huge amounts ofinformation in qualitative analyses, new methods makeunstructured text, video, audio actionable

• Expert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics areapplied: analytics wins.

• But: There’s always qualitative information that hasn’t beenquantified

• Look for:

new methodological solutions to the quant-qualdivide.

8/18

Page 117: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

The End of The Quantitative-Qualitative Divide

• The Quant-Qual divide exists in every field.

• Qualitative researchers: overwhelmed by information

• Quantitative researchers: recognize the huge amounts ofinformation in qualitative analyses, new methods makeunstructured text, video, audio actionable

• Expert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics areapplied: analytics wins.

• But: There’s always qualitative information that hasn’t beenquantified

• Look for:

new methodological solutions to the quant-qualdivide.

8/18

Page 118: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

The End of The Quantitative-Qualitative Divide

• The Quant-Qual divide exists in every field.

• Qualitative researchers: overwhelmed by information

• Quantitative researchers: recognize the huge amounts ofinformation in qualitative analyses, new methods makeunstructured text, video, audio actionable

• Expert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics areapplied: analytics wins.

• But: There’s always qualitative information that hasn’t beenquantified

• Look for:

new methodological solutions to the quant-qualdivide.

8/18

Page 119: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

The End of The Quantitative-Qualitative Divide

• The Quant-Qual divide exists in every field.

• Qualitative researchers: overwhelmed by information

• Quantitative researchers: recognize the huge amounts ofinformation in qualitative analyses, new methods makeunstructured text, video, audio actionable

• Expert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics areapplied: analytics wins.

• But: There’s always qualitative information that hasn’t beenquantified

• Look for: new methodological solutions to the quant-qualdivide.

8/18

Page 120: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

4. Meta-Science

• Meta-science: must follow the rules of science

• For example: [under embargo at Science]

• Meta-science: a subfield of the social sciences. E.g.:

• Dataverse: largest collection of social science research data• solve political problems technologically: Breaks the choice

between individual credit and permanent archiving• dataverse automates the job of the archivist• a complete archive on your site, with no installations• Huge array of analytics and behavioral incentives feed back

and improve science

• Look for:

huge potential in applying social science insights toimprove science

9/18

Page 121: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

4. Meta-Science

• Meta-science: must follow the rules of science

• For example: [under embargo at Science]

• Meta-science: a subfield of the social sciences. E.g.:

• Dataverse: largest collection of social science research data• solve political problems technologically: Breaks the choice

between individual credit and permanent archiving• dataverse automates the job of the archivist• a complete archive on your site, with no installations• Huge array of analytics and behavioral incentives feed back

and improve science

• Look for:

huge potential in applying social science insights toimprove science

9/18

Page 122: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

4. Meta-Science

• Meta-science: must follow the rules of science• For example: [under embargo at Science]

• Meta-science: a subfield of the social sciences. E.g.:

• Dataverse: largest collection of social science research data• solve political problems technologically: Breaks the choice

between individual credit and permanent archiving• dataverse automates the job of the archivist• a complete archive on your site, with no installations• Huge array of analytics and behavioral incentives feed back

and improve science

• Look for:

huge potential in applying social science insights toimprove science

9/18

Page 123: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

4. Meta-Science

• Meta-science: must follow the rules of science• For example: [under embargo at Science]

• Meta-science: a subfield of the social sciences. E.g.:

• Dataverse: largest collection of social science research data• solve political problems technologically: Breaks the choice

between individual credit and permanent archiving• dataverse automates the job of the archivist• a complete archive on your site, with no installations• Huge array of analytics and behavioral incentives feed back

and improve science

• Look for:

huge potential in applying social science insights toimprove science

9/18

Page 124: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

4. Meta-Science

• Meta-science: must follow the rules of science• For example: [under embargo at Science]

• Meta-science: a subfield of the social sciences. E.g.:• Dataverse: largest collection of social science research data

• solve political problems technologically: Breaks the choicebetween individual credit and permanent archiving

• dataverse automates the job of the archivist• a complete archive on your site, with no installations• Huge array of analytics and behavioral incentives feed back

and improve science

• Look for:

huge potential in applying social science insights toimprove science

9/18

Page 125: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

4. Meta-Science

• Meta-science: must follow the rules of science• For example: [under embargo at Science]

• Meta-science: a subfield of the social sciences. E.g.:• Dataverse: largest collection of social science research data• solve political problems technologically: Breaks the choice

between individual credit and permanent archiving

• dataverse automates the job of the archivist• a complete archive on your site, with no installations• Huge array of analytics and behavioral incentives feed back

and improve science

• Look for:

huge potential in applying social science insights toimprove science

9/18

Page 126: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

4. Meta-Science

• Meta-science: must follow the rules of science• For example: [under embargo at Science]

• Meta-science: a subfield of the social sciences. E.g.:• Dataverse: largest collection of social science research data• solve political problems technologically: Breaks the choice

between individual credit and permanent archiving• dataverse automates the job of the archivist

• a complete archive on your site, with no installations• Huge array of analytics and behavioral incentives feed back

and improve science

• Look for:

huge potential in applying social science insights toimprove science

9/18

Page 127: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

4. Meta-Science

• Meta-science: must follow the rules of science• For example: [under embargo at Science]

• Meta-science: a subfield of the social sciences. E.g.:• Dataverse: largest collection of social science research data• solve political problems technologically: Breaks the choice

between individual credit and permanent archiving• dataverse automates the job of the archivist• a complete archive on your site, with no installations

• Huge array of analytics and behavioral incentives feed backand improve science

• Look for:

huge potential in applying social science insights toimprove science

9/18

Page 128: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

4. Meta-Science

• Meta-science: must follow the rules of science• For example: [under embargo at Science]

• Meta-science: a subfield of the social sciences. E.g.:• Dataverse: largest collection of social science research data• solve political problems technologically: Breaks the choice

between individual credit and permanent archiving• dataverse automates the job of the archivist• a complete archive on your site, with no installations• Huge array of analytics and behavioral incentives feed back

and improve science

• Look for:

huge potential in applying social science insights toimprove science

9/18

Page 129: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

4. Meta-Science

• Meta-science: must follow the rules of science• For example: [under embargo at Science]

• Meta-science: a subfield of the social sciences. E.g.:• Dataverse: largest collection of social science research data• solve political problems technologically: Breaks the choice

between individual credit and permanent archiving• dataverse automates the job of the archivist• a complete archive on your site, with no installations• Huge array of analytics and behavioral incentives feed back

and improve science

• Look for:

huge potential in applying social science insights toimprove science

9/18

Page 130: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

4. Meta-Science

• Meta-science: must follow the rules of science• For example: [under embargo at Science]

• Meta-science: a subfield of the social sciences. E.g.:• Dataverse: largest collection of social science research data• solve political problems technologically: Breaks the choice

between individual credit and permanent archiving• dataverse automates the job of the archivist• a complete archive on your site, with no installations• Huge array of analytics and behavioral incentives feed back

and improve science

• Look for: huge potential in applying social science insights toimprove science

9/18

Page 131: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

Author Dataverse

Your dataverse branded as your web site but served by the Dataverse Network, therefore re-quiring no local installation and providing an enormous array of services

Your web site

DataverseNetwork™

po wered by the

Pr oject

10/18

Page 132: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

Journal Dataverse

DataverseNetwork™

po wered by the

Pr oject

11/18

Page 133: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

Journal Dataverse

DataverseNetwork™

po wered by the

Pr oject

12/18

Page 134: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

Journal Dataverse

DataverseNetwork™

po wered by the

Pr oject

13/18

Page 135: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

5. Merging Science, Technology, and Social Science

• Social Science:

understanding or solving society’s challenges,through person or group-level analyses

• Discovering the underlying scientific cause sometimes doesn’tsolve the problem:

• Solved biological problems:

smoking, obesity, medical advicenoncompliance

• Solved physical science problems:

human generated globalwarming, biodiversity, teaching evolution

• Solved problems in the science of science:

replication and datasharing, science funding

• Genomics, proteomics, brain scanning, etc., all producing hugenumbers of person-level variables

• Look for:

new science–social science partnerships, andmerging of methods

14/18

Page 136: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

5. Merging Science, Technology, and Social Science

• Social Science:

understanding or solving society’s challenges,through person or group-level analyses

• Discovering the underlying scientific cause sometimes doesn’tsolve the problem:

• Solved biological problems:

smoking, obesity, medical advicenoncompliance

• Solved physical science problems:

human generated globalwarming, biodiversity, teaching evolution

• Solved problems in the science of science:

replication and datasharing, science funding

• Genomics, proteomics, brain scanning, etc., all producing hugenumbers of person-level variables

• Look for:

new science–social science partnerships, andmerging of methods

14/18

Page 137: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

5. Merging Science, Technology, and Social Science

• Social Science: understanding or solving society’s challenges,

through person or group-level analyses

• Discovering the underlying scientific cause sometimes doesn’tsolve the problem:

• Solved biological problems:

smoking, obesity, medical advicenoncompliance

• Solved physical science problems:

human generated globalwarming, biodiversity, teaching evolution

• Solved problems in the science of science:

replication and datasharing, science funding

• Genomics, proteomics, brain scanning, etc., all producing hugenumbers of person-level variables

• Look for:

new science–social science partnerships, andmerging of methods

14/18

Page 138: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

5. Merging Science, Technology, and Social Science

• Social Science: understanding or solving society’s challenges,through person or group-level analyses

• Discovering the underlying scientific cause sometimes doesn’tsolve the problem:

• Solved biological problems:

smoking, obesity, medical advicenoncompliance

• Solved physical science problems:

human generated globalwarming, biodiversity, teaching evolution

• Solved problems in the science of science:

replication and datasharing, science funding

• Genomics, proteomics, brain scanning, etc., all producing hugenumbers of person-level variables

• Look for:

new science–social science partnerships, andmerging of methods

14/18

Page 139: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

5. Merging Science, Technology, and Social Science

• Social Science: understanding or solving society’s challenges,through person or group-level analyses

• Discovering the underlying scientific cause sometimes doesn’tsolve the problem:

• Solved biological problems:

smoking, obesity, medical advicenoncompliance

• Solved physical science problems:

human generated globalwarming, biodiversity, teaching evolution

• Solved problems in the science of science:

replication and datasharing, science funding

• Genomics, proteomics, brain scanning, etc., all producing hugenumbers of person-level variables

• Look for:

new science–social science partnerships, andmerging of methods

14/18

Page 140: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

5. Merging Science, Technology, and Social Science

• Social Science: understanding or solving society’s challenges,through person or group-level analyses

• Discovering the underlying scientific cause sometimes doesn’tsolve the problem:

• Solved biological problems:

smoking, obesity, medical advicenoncompliance

• Solved physical science problems:

human generated globalwarming, biodiversity, teaching evolution

• Solved problems in the science of science:

replication and datasharing, science funding

• Genomics, proteomics, brain scanning, etc., all producing hugenumbers of person-level variables

• Look for:

new science–social science partnerships, andmerging of methods

14/18

Page 141: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

5. Merging Science, Technology, and Social Science

• Social Science: understanding or solving society’s challenges,through person or group-level analyses

• Discovering the underlying scientific cause sometimes doesn’tsolve the problem:

• Solved biological problems: smoking,

obesity, medical advicenoncompliance

• Solved physical science problems:

human generated globalwarming, biodiversity, teaching evolution

• Solved problems in the science of science:

replication and datasharing, science funding

• Genomics, proteomics, brain scanning, etc., all producing hugenumbers of person-level variables

• Look for:

new science–social science partnerships, andmerging of methods

14/18

Page 142: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

5. Merging Science, Technology, and Social Science

• Social Science: understanding or solving society’s challenges,through person or group-level analyses

• Discovering the underlying scientific cause sometimes doesn’tsolve the problem:

• Solved biological problems: smoking, obesity,

medical advicenoncompliance

• Solved physical science problems:

human generated globalwarming, biodiversity, teaching evolution

• Solved problems in the science of science:

replication and datasharing, science funding

• Genomics, proteomics, brain scanning, etc., all producing hugenumbers of person-level variables

• Look for:

new science–social science partnerships, andmerging of methods

14/18

Page 143: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

5. Merging Science, Technology, and Social Science

• Social Science: understanding or solving society’s challenges,through person or group-level analyses

• Discovering the underlying scientific cause sometimes doesn’tsolve the problem:

• Solved biological problems: smoking, obesity, medical advicenoncompliance

• Solved physical science problems:

human generated globalwarming, biodiversity, teaching evolution

• Solved problems in the science of science:

replication and datasharing, science funding

• Genomics, proteomics, brain scanning, etc., all producing hugenumbers of person-level variables

• Look for:

new science–social science partnerships, andmerging of methods

14/18

Page 144: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

5. Merging Science, Technology, and Social Science

• Social Science: understanding or solving society’s challenges,through person or group-level analyses

• Discovering the underlying scientific cause sometimes doesn’tsolve the problem:

• Solved biological problems: smoking, obesity, medical advicenoncompliance

• Solved physical science problems:

human generated globalwarming, biodiversity, teaching evolution

• Solved problems in the science of science:

replication and datasharing, science funding

• Genomics, proteomics, brain scanning, etc., all producing hugenumbers of person-level variables

• Look for:

new science–social science partnerships, andmerging of methods

14/18

Page 145: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

5. Merging Science, Technology, and Social Science

• Social Science: understanding or solving society’s challenges,through person or group-level analyses

• Discovering the underlying scientific cause sometimes doesn’tsolve the problem:

• Solved biological problems: smoking, obesity, medical advicenoncompliance

• Solved physical science problems: human generated globalwarming,

biodiversity, teaching evolution• Solved problems in the science of science:

replication and datasharing, science funding

• Genomics, proteomics, brain scanning, etc., all producing hugenumbers of person-level variables

• Look for:

new science–social science partnerships, andmerging of methods

14/18

Page 146: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

5. Merging Science, Technology, and Social Science

• Social Science: understanding or solving society’s challenges,through person or group-level analyses

• Discovering the underlying scientific cause sometimes doesn’tsolve the problem:

• Solved biological problems: smoking, obesity, medical advicenoncompliance

• Solved physical science problems: human generated globalwarming, biodiversity,

teaching evolution• Solved problems in the science of science:

replication and datasharing, science funding

• Genomics, proteomics, brain scanning, etc., all producing hugenumbers of person-level variables

• Look for:

new science–social science partnerships, andmerging of methods

14/18

Page 147: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

5. Merging Science, Technology, and Social Science

• Social Science: understanding or solving society’s challenges,through person or group-level analyses

• Discovering the underlying scientific cause sometimes doesn’tsolve the problem:

• Solved biological problems: smoking, obesity, medical advicenoncompliance

• Solved physical science problems: human generated globalwarming, biodiversity, teaching evolution

• Solved problems in the science of science:

replication and datasharing, science funding

• Genomics, proteomics, brain scanning, etc., all producing hugenumbers of person-level variables

• Look for:

new science–social science partnerships, andmerging of methods

14/18

Page 148: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

5. Merging Science, Technology, and Social Science

• Social Science: understanding or solving society’s challenges,through person or group-level analyses

• Discovering the underlying scientific cause sometimes doesn’tsolve the problem:

• Solved biological problems: smoking, obesity, medical advicenoncompliance

• Solved physical science problems: human generated globalwarming, biodiversity, teaching evolution

• Solved problems in the science of science:

replication and datasharing, science funding

• Genomics, proteomics, brain scanning, etc., all producing hugenumbers of person-level variables

• Look for:

new science–social science partnerships, andmerging of methods

14/18

Page 149: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

5. Merging Science, Technology, and Social Science

• Social Science: understanding or solving society’s challenges,through person or group-level analyses

• Discovering the underlying scientific cause sometimes doesn’tsolve the problem:

• Solved biological problems: smoking, obesity, medical advicenoncompliance

• Solved physical science problems: human generated globalwarming, biodiversity, teaching evolution

• Solved problems in the science of science: replication and datasharing,

science funding

• Genomics, proteomics, brain scanning, etc., all producing hugenumbers of person-level variables

• Look for:

new science–social science partnerships, andmerging of methods

14/18

Page 150: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

5. Merging Science, Technology, and Social Science

• Social Science: understanding or solving society’s challenges,through person or group-level analyses

• Discovering the underlying scientific cause sometimes doesn’tsolve the problem:

• Solved biological problems: smoking, obesity, medical advicenoncompliance

• Solved physical science problems: human generated globalwarming, biodiversity, teaching evolution

• Solved problems in the science of science: replication and datasharing, science funding

• Genomics, proteomics, brain scanning, etc., all producing hugenumbers of person-level variables

• Look for:

new science–social science partnerships, andmerging of methods

14/18

Page 151: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

5. Merging Science, Technology, and Social Science

• Social Science: understanding or solving society’s challenges,through person or group-level analyses

• Discovering the underlying scientific cause sometimes doesn’tsolve the problem:

• Solved biological problems: smoking, obesity, medical advicenoncompliance

• Solved physical science problems: human generated globalwarming, biodiversity, teaching evolution

• Solved problems in the science of science: replication and datasharing, science funding

• Genomics, proteomics, brain scanning, etc., all producing hugenumbers of person-level variables

• Look for:

new science–social science partnerships, andmerging of methods

14/18

Page 152: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

5. Merging Science, Technology, and Social Science

• Social Science: understanding or solving society’s challenges,through person or group-level analyses

• Discovering the underlying scientific cause sometimes doesn’tsolve the problem:

• Solved biological problems: smoking, obesity, medical advicenoncompliance

• Solved physical science problems: human generated globalwarming, biodiversity, teaching evolution

• Solved problems in the science of science: replication and datasharing, science funding

• Genomics, proteomics, brain scanning, etc., all producing hugenumbers of person-level variables

• Look for:

new science–social science partnerships, andmerging of methods

14/18

Page 153: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

5. Merging Science, Technology, and Social Science

• Social Science: understanding or solving society’s challenges,through person or group-level analyses

• Discovering the underlying scientific cause sometimes doesn’tsolve the problem:

• Solved biological problems: smoking, obesity, medical advicenoncompliance

• Solved physical science problems: human generated globalwarming, biodiversity, teaching evolution

• Solved problems in the science of science: replication and datasharing, science funding

• Genomics, proteomics, brain scanning, etc., all producing hugenumbers of person-level variables

• Look for: new science–social science partnerships, andmerging of methods

14/18

Page 154: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Humans are Horrible at Thinking of Keywords

• An experiment:

“We have 10,000 twitter posts, eachcontaining the word ‘healthcare’, from the time periodsurrounding the Supreme Court decision on Obamacare.Please list any keywords which come to mind that will selectposts in this set related to Obamacare and will not selectposts unrelated to Obama care.”

• Examples:

unconstitutional, coverage, obama, ACA. . .

• Median keywords recalled:

8

• Unique keywords recalled by 43 undergrads:

149

• Keywords 42 of 43 failed to recall:

98 (66%)

• Humans recognize keywords well, recall them poorly

• Thresher: New technology to discover the right keywords

15/18

Page 155: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Humans are Horrible at Thinking of Keywords

• An experiment:

“We have 10,000 twitter posts, eachcontaining the word ‘healthcare’, from the time periodsurrounding the Supreme Court decision on Obamacare.Please list any keywords which come to mind that will selectposts in this set related to Obamacare and will not selectposts unrelated to Obama care.”

• Examples:

unconstitutional, coverage, obama, ACA. . .

• Median keywords recalled:

8

• Unique keywords recalled by 43 undergrads:

149

• Keywords 42 of 43 failed to recall:

98 (66%)

• Humans recognize keywords well, recall them poorly

• Thresher: New technology to discover the right keywords

15/18

Page 156: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Humans are Horrible at Thinking of Keywords

• An experiment: “We have 10,000 twitter posts, eachcontaining the word ‘healthcare’, from the time periodsurrounding the Supreme Court decision on Obamacare.Please list any keywords which come to mind that will selectposts in this set related to Obamacare and will not selectposts unrelated to Obama care.”

• Examples:

unconstitutional, coverage, obama, ACA. . .

• Median keywords recalled:

8

• Unique keywords recalled by 43 undergrads:

149

• Keywords 42 of 43 failed to recall:

98 (66%)

• Humans recognize keywords well, recall them poorly

• Thresher: New technology to discover the right keywords

15/18

Page 157: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Humans are Horrible at Thinking of Keywords

• An experiment: “We have 10,000 twitter posts, eachcontaining the word ‘healthcare’, from the time periodsurrounding the Supreme Court decision on Obamacare.Please list any keywords which come to mind that will selectposts in this set related to Obamacare and will not selectposts unrelated to Obama care.”

• Examples:

unconstitutional, coverage, obama, ACA. . .

• Median keywords recalled:

8

• Unique keywords recalled by 43 undergrads:

149

• Keywords 42 of 43 failed to recall:

98 (66%)

• Humans recognize keywords well, recall them poorly

• Thresher: New technology to discover the right keywords

15/18

Page 158: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Humans are Horrible at Thinking of Keywords

• An experiment: “We have 10,000 twitter posts, eachcontaining the word ‘healthcare’, from the time periodsurrounding the Supreme Court decision on Obamacare.Please list any keywords which come to mind that will selectposts in this set related to Obamacare and will not selectposts unrelated to Obama care.”

• Examples: unconstitutional,

coverage, obama, ACA. . .

• Median keywords recalled:

8

• Unique keywords recalled by 43 undergrads:

149

• Keywords 42 of 43 failed to recall:

98 (66%)

• Humans recognize keywords well, recall them poorly

• Thresher: New technology to discover the right keywords

15/18

Page 159: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Humans are Horrible at Thinking of Keywords

• An experiment: “We have 10,000 twitter posts, eachcontaining the word ‘healthcare’, from the time periodsurrounding the Supreme Court decision on Obamacare.Please list any keywords which come to mind that will selectposts in this set related to Obamacare and will not selectposts unrelated to Obama care.”

• Examples: unconstitutional, coverage,

obama, ACA. . .

• Median keywords recalled:

8

• Unique keywords recalled by 43 undergrads:

149

• Keywords 42 of 43 failed to recall:

98 (66%)

• Humans recognize keywords well, recall them poorly

• Thresher: New technology to discover the right keywords

15/18

Page 160: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Humans are Horrible at Thinking of Keywords

• An experiment: “We have 10,000 twitter posts, eachcontaining the word ‘healthcare’, from the time periodsurrounding the Supreme Court decision on Obamacare.Please list any keywords which come to mind that will selectposts in this set related to Obamacare and will not selectposts unrelated to Obama care.”

• Examples: unconstitutional, coverage, obama,

ACA. . .

• Median keywords recalled:

8

• Unique keywords recalled by 43 undergrads:

149

• Keywords 42 of 43 failed to recall:

98 (66%)

• Humans recognize keywords well, recall them poorly

• Thresher: New technology to discover the right keywords

15/18

Page 161: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Humans are Horrible at Thinking of Keywords

• An experiment: “We have 10,000 twitter posts, eachcontaining the word ‘healthcare’, from the time periodsurrounding the Supreme Court decision on Obamacare.Please list any keywords which come to mind that will selectposts in this set related to Obamacare and will not selectposts unrelated to Obama care.”

• Examples: unconstitutional, coverage, obama, ACA. . .

• Median keywords recalled:

8

• Unique keywords recalled by 43 undergrads:

149

• Keywords 42 of 43 failed to recall:

98 (66%)

• Humans recognize keywords well, recall them poorly

• Thresher: New technology to discover the right keywords

15/18

Page 162: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Humans are Horrible at Thinking of Keywords

• An experiment: “We have 10,000 twitter posts, eachcontaining the word ‘healthcare’, from the time periodsurrounding the Supreme Court decision on Obamacare.Please list any keywords which come to mind that will selectposts in this set related to Obamacare and will not selectposts unrelated to Obama care.”

• Examples: unconstitutional, coverage, obama, ACA. . .

• Median keywords recalled:

8

• Unique keywords recalled by 43 undergrads:

149

• Keywords 42 of 43 failed to recall:

98 (66%)

• Humans recognize keywords well, recall them poorly

• Thresher: New technology to discover the right keywords

15/18

Page 163: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Humans are Horrible at Thinking of Keywords

• An experiment: “We have 10,000 twitter posts, eachcontaining the word ‘healthcare’, from the time periodsurrounding the Supreme Court decision on Obamacare.Please list any keywords which come to mind that will selectposts in this set related to Obamacare and will not selectposts unrelated to Obama care.”

• Examples: unconstitutional, coverage, obama, ACA. . .

• Median keywords recalled: 8

• Unique keywords recalled by 43 undergrads:

149

• Keywords 42 of 43 failed to recall:

98 (66%)

• Humans recognize keywords well, recall them poorly

• Thresher: New technology to discover the right keywords

15/18

Page 164: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Humans are Horrible at Thinking of Keywords

• An experiment: “We have 10,000 twitter posts, eachcontaining the word ‘healthcare’, from the time periodsurrounding the Supreme Court decision on Obamacare.Please list any keywords which come to mind that will selectposts in this set related to Obamacare and will not selectposts unrelated to Obama care.”

• Examples: unconstitutional, coverage, obama, ACA. . .

• Median keywords recalled: 8

• Unique keywords recalled by 43 undergrads:

149

• Keywords 42 of 43 failed to recall:

98 (66%)

• Humans recognize keywords well, recall them poorly

• Thresher: New technology to discover the right keywords

15/18

Page 165: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Humans are Horrible at Thinking of Keywords

• An experiment: “We have 10,000 twitter posts, eachcontaining the word ‘healthcare’, from the time periodsurrounding the Supreme Court decision on Obamacare.Please list any keywords which come to mind that will selectposts in this set related to Obamacare and will not selectposts unrelated to Obama care.”

• Examples: unconstitutional, coverage, obama, ACA. . .

• Median keywords recalled: 8

• Unique keywords recalled by 43 undergrads: 149

• Keywords 42 of 43 failed to recall:

98 (66%)

• Humans recognize keywords well, recall them poorly

• Thresher: New technology to discover the right keywords

15/18

Page 166: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Humans are Horrible at Thinking of Keywords

• An experiment: “We have 10,000 twitter posts, eachcontaining the word ‘healthcare’, from the time periodsurrounding the Supreme Court decision on Obamacare.Please list any keywords which come to mind that will selectposts in this set related to Obamacare and will not selectposts unrelated to Obama care.”

• Examples: unconstitutional, coverage, obama, ACA. . .

• Median keywords recalled: 8

• Unique keywords recalled by 43 undergrads: 149

• Keywords 42 of 43 failed to recall:

98 (66%)

• Humans recognize keywords well, recall them poorly

• Thresher: New technology to discover the right keywords

15/18

Page 167: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Humans are Horrible at Thinking of Keywords

• An experiment: “We have 10,000 twitter posts, eachcontaining the word ‘healthcare’, from the time periodsurrounding the Supreme Court decision on Obamacare.Please list any keywords which come to mind that will selectposts in this set related to Obamacare and will not selectposts unrelated to Obama care.”

• Examples: unconstitutional, coverage, obama, ACA. . .

• Median keywords recalled: 8

• Unique keywords recalled by 43 undergrads: 149

• Keywords 42 of 43 failed to recall: 98 (66%)

• Humans recognize keywords well, recall them poorly

• Thresher: New technology to discover the right keywords

15/18

Page 168: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Humans are Horrible at Thinking of Keywords

• An experiment: “We have 10,000 twitter posts, eachcontaining the word ‘healthcare’, from the time periodsurrounding the Supreme Court decision on Obamacare.Please list any keywords which come to mind that will selectposts in this set related to Obamacare and will not selectposts unrelated to Obama care.”

• Examples: unconstitutional, coverage, obama, ACA. . .

• Median keywords recalled: 8

• Unique keywords recalled by 43 undergrads: 149

• Keywords 42 of 43 failed to recall: 98 (66%)

• Humans recognize keywords well, recall them poorly

• Thresher: New technology to discover the right keywords

15/18

Page 169: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Humans are Horrible at Thinking of Keywords

• An experiment: “We have 10,000 twitter posts, eachcontaining the word ‘healthcare’, from the time periodsurrounding the Supreme Court decision on Obamacare.Please list any keywords which come to mind that will selectposts in this set related to Obamacare and will not selectposts unrelated to Obama care.”

• Examples: unconstitutional, coverage, obama, ACA. . .

• Median keywords recalled: 8

• Unique keywords recalled by 43 undergrads: 149

• Keywords 42 of 43 failed to recall: 98 (66%)

• Humans recognize keywords well, recall them poorly

• Thresher: New technology to discover the right keywords

15/18

Page 170: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1:

Homograph

自由 “Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2:

Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation;

Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 171: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1:

Homograph

自由 “Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2:

Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation;

Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 172: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1:

Homograph

自由

“Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2:

Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation;

Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 173: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1:

Homograph

自由 “Freedom”

目田 “Eye field” (nonsensical)

Example Substitution 2:

Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation;

Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 174: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1:

Homograph

自由 “Freedom”

目田 “Eye field” (nonsensical)

Example Substitution 2:

Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation;

Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 175: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1:

Homograph

自由 “Freedom”目田

“Eye field” (nonsensical)

Example Substitution 2:

Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation;

Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 176: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1:

Homograph

自由 “Freedom”目田 “Eye field”

(nonsensical)

Example Substitution 2:

Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation;

Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 177: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1:

Homograph

自由 “Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2:

Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation;

Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 178: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1: Homograph

自由 “Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2:

Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation;

Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 179: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1: Homograph

自由 “Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2:

Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation;

Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 180: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1: Homograph

自由 “Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2:

Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation;

Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 181: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1: Homograph

自由 “Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2:

Homophone (sound like “hexie”)

和谐

“Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation;

Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 182: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1: Homograph

自由 “Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2:

Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation;

Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 183: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1: Homograph

自由 “Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2:

Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation;

Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 184: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1: Homograph

自由 “Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2:

Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹

“River crab” (irrelevant)

They can’t follow the conversation;

Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 185: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1: Homograph

自由 “Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2:

Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab”

(irrelevant)

They can’t follow the conversation;

Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 186: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1: Homograph

自由 “Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2:

Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation;

Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 187: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1: Homograph

自由 “Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2: Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation;

Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 188: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1: Homograph

自由 “Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2: Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation;

Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 189: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1: Homograph

自由 “Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2: Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation; Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 190: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1: Homograph

自由 “Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2: Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation; Thresher can.

The same task:

(1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 191: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1: Homograph

自由 “Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2: Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation; Thresher can.

The same task: (1) Long tail search,

(2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 192: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1: Homograph

自由 “Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2: Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation; Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job,

(3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 193: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1: Homograph

自由 “Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2: Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation; Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong),

(4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 194: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1: Homograph

自由 “Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2: Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation; Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers,

(5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 195: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1: Homograph

自由 “Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2: Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation; Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling,

(6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 196: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1: Homograph

自由 “Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2: Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation; Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,

(7) Infinitely improvable classification, eDiscovery

16/18

Page 197: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1: Homograph

自由 “Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2: Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation; Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification,

eDiscovery

16/18

Page 198: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g.: Following Conversations that Hide in Plain Sight

Example Substitution 1: Homograph

自由 “Freedom”目田 “Eye field” (nonsensical)

Example Substitution 2: Homophone (sound like “hexie”)

和谐 “Harmonious [Society]” (official slogan)

河蟹 “River crab” (irrelevant)

They can’t follow the conversation; Thresher can.

The same task: (1) Long tail search, (2) Government and industryanalyst’s job, (3) language drift (#BostonBombings #BostonStrong), (4) Child pornographers, (5) Look-alikemodeling, (6) Starting point for other automated text methods,(7) Infinitely improvable classification, eDiscovery

16/18

Page 199: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g. 2: Reverse Engineering Censorship in China

• Previous approach: watch a few posts; see what’s removed

• Data: We get posts before the Chinese censor them

• ≈ 13% censored overall

• What Could be the Goal?

1. Stop collective action

Right

• Implications: Social Media is Actionable!

• Chinese leaders:

• measure criticism: to judge local officials• censor: to stop events with collective action potential

• Thus, we can use criticism & censorship to predict:

• Officials in trouble, likely to be replaced• Dissident arrests;

new peace treaties; emerging scandals

• Disagreements between central and local leaders

17/18

Page 200: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g. 2: Reverse Engineering Censorship in China

• Previous approach: watch a few posts; see what’s removed

• Data: We get posts before the Chinese censor them

• ≈ 13% censored overall

• What Could be the Goal?

1. Stop collective action

Right

• Implications: Social Media is Actionable!

• Chinese leaders:

• measure criticism: to judge local officials• censor: to stop events with collective action potential

• Thus, we can use criticism & censorship to predict:

• Officials in trouble, likely to be replaced• Dissident arrests;

new peace treaties; emerging scandals

• Disagreements between central and local leaders

17/18

Page 201: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g. 2: Reverse Engineering Censorship in China

• Previous approach: watch a few posts; see what’s removed

• Data: We get posts before the Chinese censor them

• ≈ 13% censored overall

• Everyone knows the Goal:

• What Could be the Goal?

1. Stop collective action

Right

• Implications: Social Media is Actionable!

• Chinese leaders:

• measure criticism: to judge local officials• censor: to stop events with collective action potential

• Thus, we can use criticism & censorship to predict:

• Officials in trouble, likely to be replaced• Dissident arrests;

new peace treaties; emerging scandals

• Disagreements between central and local leaders

17/18

Page 202: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g. 2: Reverse Engineering Censorship in China

• Previous approach: watch a few posts; see what’s removed

• Data: We get posts before the Chinese censor them

• ≈ 13% censored overall

• Everyone knows the Goal:

• What Could be the Goal?

1. Stop collective action

Right

• Implications: Social Media is Actionable!

• Chinese leaders:

• measure criticism: to judge local officials• censor: to stop events with collective action potential

• Thus, we can use criticism & censorship to predict:

• Officials in trouble, likely to be replaced• Dissident arrests;

new peace treaties; emerging scandals

• Disagreements between central and local leaders

17/18

Page 203: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g. 2: Reverse Engineering Censorship in China

• Previous approach: watch a few posts; see what’s removed

• Data: We get posts before the Chinese censor them

• ≈ 13% censored overall

• Everyone knows the Goal:

• What Could be the Goal?

1. Stop collective action

Right

• Implications: Social Media is Actionable!

• Chinese leaders:

• measure criticism: to judge local officials• censor: to stop events with collective action potential

• Thus, we can use criticism & censorship to predict:

• Officials in trouble, likely to be replaced• Dissident arrests;

new peace treaties; emerging scandals

• Disagreements between central and local leaders

17/18

Page 204: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g. 2: Reverse Engineering Censorship in China

• Previous approach: watch a few posts; see what’s removed

• Data: We get posts before the Chinese censor them

• ≈ 13% censored overall

• Everyone knows the Goal:Stop criticism and protest about the state,its leaders, and their policies

• What Could be the Goal?

1. Stop collective action

Right

• Implications: Social Media is Actionable!

• Chinese leaders:

• measure criticism: to judge local officials• censor: to stop events with collective action potential

• Thus, we can use criticism & censorship to predict:

• Officials in trouble, likely to be replaced• Dissident arrests;

new peace treaties; emerging scandals

• Disagreements between central and local leaders

17/18

Page 205: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g. 2: Reverse Engineering Censorship in China

• Previous approach: watch a few posts; see what’s removed

• Data: We get posts before the Chinese censor them

• ≈ 13% censored overall

• Everyone knows the Goal:Stop criticism and protest about the state,its leaders, and their policies Wrong

• What Could be the Goal?

1. Stop collective action

Right

• Implications: Social Media is Actionable!

• Chinese leaders:

• measure criticism: to judge local officials• censor: to stop events with collective action potential

• Thus, we can use criticism & censorship to predict:

• Officials in trouble, likely to be replaced• Dissident arrests;

new peace treaties; emerging scandals

• Disagreements between central and local leaders

17/18

Page 206: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g. 2: Reverse Engineering Censorship in China

• Previous approach: watch a few posts; see what’s removed

• Data: We get posts before the Chinese censor them

• ≈ 13% censored overall

• Everyone knows the Goal:Stop criticism and protest about the state,its leaders, and their policies Wrong

• What Could be the Goal?

1. Stop collective action

Right

• Implications: Social Media is Actionable!

• Chinese leaders:

• measure criticism: to judge local officials• censor: to stop events with collective action potential

• Thus, we can use criticism & censorship to predict:

• Officials in trouble, likely to be replaced• Dissident arrests;

new peace treaties; emerging scandals

• Disagreements between central and local leaders

17/18

Page 207: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g. 2: Reverse Engineering Censorship in China

• Previous approach: watch a few posts; see what’s removed

• Data: We get posts before the Chinese censor them

• ≈ 13% censored overall

• Everyone knows the Goal:Stop criticism and protest about the state,its leaders, and their policies Wrong

• What Could be the Goal?

1. Stop criticism of the state

2. Stop collective action

Right

• Implications: Social Media is Actionable!

• Chinese leaders:

• measure criticism: to judge local officials• censor: to stop events with collective action potential

• Thus, we can use criticism & censorship to predict:

• Officials in trouble, likely to be replaced• Dissident arrests;

new peace treaties; emerging scandals

• Disagreements between central and local leaders

17/18

Page 208: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g. 2: Reverse Engineering Censorship in China

• Previous approach: watch a few posts; see what’s removed

• Data: We get posts before the Chinese censor them

• ≈ 13% censored overall

• Everyone knows the Goal:Stop criticism and protest about the state,its leaders, and their policies Wrong

• What Could be the Goal?

1. Stop criticism of the state2. Stop collective action

Right

• Implications: Social Media is Actionable!

• Chinese leaders:

• measure criticism: to judge local officials• censor: to stop events with collective action potential

• Thus, we can use criticism & censorship to predict:

• Officials in trouble, likely to be replaced• Dissident arrests;

new peace treaties; emerging scandals

• Disagreements between central and local leaders

17/18

Page 209: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g. 2: Reverse Engineering Censorship in China

• Previous approach: watch a few posts; see what’s removed

• Data: We get posts before the Chinese censor them

• ≈ 13% censored overall

• Everyone knows the Goal:Stop criticism and protest about the state,its leaders, and their policies Wrong

• What Could be the Goal?

1. Stop criticism of the state Wrong2. Stop collective action

Right

• Implications: Social Media is Actionable!

• Chinese leaders:

• measure criticism: to judge local officials• censor: to stop events with collective action potential

• Thus, we can use criticism & censorship to predict:

• Officials in trouble, likely to be replaced• Dissident arrests;

new peace treaties; emerging scandals

• Disagreements between central and local leaders

17/18

Page 210: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g. 2: Reverse Engineering Censorship in China

• Previous approach: watch a few posts; see what’s removed

• Data: We get posts before the Chinese censor them

• ≈ 13% censored overall

• Everyone knows the Goal:Stop criticism and protest about the state,its leaders, and their policies Wrong

• What Could be the Goal?

1. Stop criticism of the state Wrong2. Stop collective action Right

• Implications: Social Media is Actionable!

• Chinese leaders:

• measure criticism: to judge local officials• censor: to stop events with collective action potential

• Thus, we can use criticism & censorship to predict:

• Officials in trouble, likely to be replaced• Dissident arrests;

new peace treaties; emerging scandals

• Disagreements between central and local leaders

17/18

Page 211: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g. 2: Reverse Engineering Censorship in China

• Previous approach: watch a few posts; see what’s removed

• Data: We get posts before the Chinese censor them

• ≈ 13% censored overall

• Everyone knows the Goal:Stop criticism and protest about the state,its leaders, and their policies Wrong

• What Could be the Goal?

1. Stop criticism of the state Wrong2. Stop collective action Right

• Implications: Social Media is Actionable!

• Chinese leaders:

• measure criticism: to judge local officials• censor: to stop events with collective action potential

• Thus, we can use criticism & censorship to predict:

• Officials in trouble, likely to be replaced• Dissident arrests;

new peace treaties; emerging scandals

• Disagreements between central and local leaders

17/18

Page 212: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g. 2: Reverse Engineering Censorship in China

• Previous approach: watch a few posts; see what’s removed

• Data: We get posts before the Chinese censor them

• ≈ 13% censored overall

• Everyone knows the Goal:Stop criticism and protest about the state,its leaders, and their policies Wrong

• What Could be the Goal?

1. Stop criticism of the state Wrong2. Stop collective action Right

• Implications: Social Media is Actionable!• Chinese leaders:

• measure criticism: to judge local officials• censor: to stop events with collective action potential

• Thus, we can use criticism & censorship to predict:

• Officials in trouble, likely to be replaced• Dissident arrests;

new peace treaties; emerging scandals

• Disagreements between central and local leaders

17/18

Page 213: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g. 2: Reverse Engineering Censorship in China

• Previous approach: watch a few posts; see what’s removed

• Data: We get posts before the Chinese censor them

• ≈ 13% censored overall

• Everyone knows the Goal:Stop criticism and protest about the state,its leaders, and their policies Wrong

• What Could be the Goal?

1. Stop criticism of the state Wrong2. Stop collective action Right

• Implications: Social Media is Actionable!• Chinese leaders:

• measure criticism: to judge local officials

• censor: to stop events with collective action potential

• Thus, we can use criticism & censorship to predict:

• Officials in trouble, likely to be replaced• Dissident arrests;

new peace treaties; emerging scandals

• Disagreements between central and local leaders

17/18

Page 214: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g. 2: Reverse Engineering Censorship in China

• Previous approach: watch a few posts; see what’s removed

• Data: We get posts before the Chinese censor them

• ≈ 13% censored overall

• Everyone knows the Goal:Stop criticism and protest about the state,its leaders, and their policies Wrong

• What Could be the Goal?

1. Stop criticism of the state Wrong2. Stop collective action Right

• Implications: Social Media is Actionable!• Chinese leaders:

• measure criticism: to judge local officials• censor: to stop events with collective action potential

• Thus, we can use criticism & censorship to predict:

• Officials in trouble, likely to be replaced• Dissident arrests;

new peace treaties; emerging scandals

• Disagreements between central and local leaders

17/18

Page 215: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g. 2: Reverse Engineering Censorship in China

• Previous approach: watch a few posts; see what’s removed

• Data: We get posts before the Chinese censor them

• ≈ 13% censored overall

• Everyone knows the Goal:Stop criticism and protest about the state,its leaders, and their policies Wrong

• What Could be the Goal?

1. Stop criticism of the state Wrong2. Stop collective action Right

• Implications: Social Media is Actionable!• Chinese leaders:

• measure criticism: to judge local officials• censor: to stop events with collective action potential

• Thus, we can use criticism & censorship to predict:

• Officials in trouble, likely to be replaced• Dissident arrests;

new peace treaties; emerging scandals

• Disagreements between central and local leaders

17/18

Page 216: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g. 2: Reverse Engineering Censorship in China

• Previous approach: watch a few posts; see what’s removed

• Data: We get posts before the Chinese censor them

• ≈ 13% censored overall

• Everyone knows the Goal:Stop criticism and protest about the state,its leaders, and their policies Wrong

• What Could be the Goal?

1. Stop criticism of the state Wrong2. Stop collective action Right

• Implications: Social Media is Actionable!• Chinese leaders:

• measure criticism: to judge local officials• censor: to stop events with collective action potential

• Thus, we can use criticism & censorship to predict:• Officials in trouble, likely to be replaced

• Dissident arrests;

new peace treaties; emerging scandals

• Disagreements between central and local leaders

17/18

Page 217: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g. 2: Reverse Engineering Censorship in China

• Previous approach: watch a few posts; see what’s removed

• Data: We get posts before the Chinese censor them

• ≈ 13% censored overall

• Everyone knows the Goal:Stop criticism and protest about the state,its leaders, and their policies Wrong

• What Could be the Goal?

1. Stop criticism of the state Wrong2. Stop collective action Right

• Implications: Social Media is Actionable!• Chinese leaders:

• measure criticism: to judge local officials• censor: to stop events with collective action potential

• Thus, we can use criticism & censorship to predict:• Officials in trouble, likely to be replaced• Dissident arrests;

new peace treaties; emerging scandals• Disagreements between central and local leaders

17/18

Page 218: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g. 2: Reverse Engineering Censorship in China

• Previous approach: watch a few posts; see what’s removed

• Data: We get posts before the Chinese censor them

• ≈ 13% censored overall

• Everyone knows the Goal:Stop criticism and protest about the state,its leaders, and their policies Wrong

• What Could be the Goal?

1. Stop criticism of the state Wrong2. Stop collective action Right

• Implications: Social Media is Actionable!• Chinese leaders:

• measure criticism: to judge local officials• censor: to stop events with collective action potential

• Thus, we can use criticism & censorship to predict:• Officials in trouble, likely to be replaced• Dissident arrests; new peace treaties;

emerging scandals• Disagreements between central and local leaders

17/18

Page 219: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g. 2: Reverse Engineering Censorship in China

• Previous approach: watch a few posts; see what’s removed

• Data: We get posts before the Chinese censor them

• ≈ 13% censored overall

• Everyone knows the Goal:Stop criticism and protest about the state,its leaders, and their policies Wrong

• What Could be the Goal?

1. Stop criticism of the state Wrong2. Stop collective action Right

• Implications: Social Media is Actionable!• Chinese leaders:

• measure criticism: to judge local officials• censor: to stop events with collective action potential

• Thus, we can use criticism & censorship to predict:• Officials in trouble, likely to be replaced• Dissident arrests; new peace treaties; emerging scandals

• Disagreements between central and local leaders

17/18

Page 220: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

E.g. 2: Reverse Engineering Censorship in China

• Previous approach: watch a few posts; see what’s removed

• Data: We get posts before the Chinese censor them

• ≈ 13% censored overall

• Everyone knows the Goal:Stop criticism and protest about the state,its leaders, and their policies Wrong

• What Could be the Goal?

1. Stop criticism of the state Wrong2. Stop collective action Right

• Implications: Social Media is Actionable!• Chinese leaders:

• measure criticism: to judge local officials• censor: to stop events with collective action potential

• Thus, we can use criticism & censorship to predict:• Officials in trouble, likely to be replaced• Dissident arrests; new peace treaties; emerging scandals• Disagreements between central and local leaders 17/18

Page 221: gking.harvard.edugking.harvard.edu/files/gking/files/evbase-nas2.pdfBig Data The mostvisible changes: new forms ofbig data(telescopes, microscopes, scanners, etc.) The mostimportant

For more information

GaryKing.org

Institute for Quantitative Social ScienceHarvard University

18/18