Detecting the Missing Information in Misinformation

Detecting the Missing Information

in MisinformationEmre Kıcıman

emrek@microsoft.com@emrek

Microsoft Research

Collaborators

Rohail SyedU. Michigan

Michael GolebiewskiMicrosoft

Sudha RaoMicrosoft

Bruno AbrahaoNYU

Bhaskar MitraMicrosoft

Misinformation, disinformation, and fake news

• Misinformation: information that is false or misleading

• Disinformation: misinformation created with intent to harm

• Related: fake news, satire, mal-information, propaganda, clickbait, …, credibility, reliability

Effects of misinformation

Individual decisions Polarization No Trust

Misinformation: What’s being done?

• Fact checking, automated detection• Human, diffusion, network based,

reputation, content based

• Mitigating creation and spread• Legal and platform efforts

• Education• General population, media org.

• Research into effects

Web Search Perspective

• Search engines: Trusted representation of the web

• Conflicting principles?• Enable information access• Don’t mislead people

• Key difference with other platforms: People have a query

• If a person knows something exists, but search engine doesn’t show it, this can cause distrust, feed conspiratorial mindset

• One approach:• Show misinformation if someone searches for it directly• Perhaps label it clearly, and/or interleave with responses and checks• Avoid showing it otherwise. (i.e., don’t aid discovery and spread)

Missing information

Missing information in misinformation

• Fake news is often missing details to substantiate story• Often low quality and/or short

• Focus on emotion and reaction

• Similar phenomenon found in fake reviews also lacking detail [Ott, Cardie, Choy, Hancock, 2011]

Fake news example

Judge Mahal al Alallaha-Smith of the 22nd District

Federal Court of Appeals ruled this morning that two

“critical issues for Muslims” in Sharia Law had to

be abided by in the United States court system

because of the systematic infusion clause and

because the 14th Amendment guarantees them the

rights guaranteed by other states.

“[…] With that as precedent, understanding that a

higher court may reverse it, my decision is that

items one and two on the docket are allowable

between family members as prescribed by Sharia Law.”

https://www.snopes.com/fact-check/muslim-federal-judge-sharia/

Fake news example

Where is the “22nd District Federal Court of Appeals”?

Fake news example

What is “the systematic infusion clause”?

Fake news example

What are the “other states”?

Fake news example

What is the “higher court”?

Fake news example

What is the docket number? Who is suing whom?

How can we detect missing information?

Adapt models built for Question Answering tasks

E.g. SQuAD models https://rajpurkar.github.io/SQuAD-explorer/

Given a question and a passage of text, these models find the answer in text, or say it is missing.

Approach

Usually-Answerable Questions (UAQ): For a class of articles, set of template questions usually answered in reliable news articles

Article Evaluation: How many UAQs are answerable by the article?

Ground-truth human evaluation

Scalable QA model evaluation

Qualitative Outcome: Explain score by showing UAQs themselves

Preliminary Experiments and Results

Simple NER-driven template questions• X:person →Where does <X> work?

• Y:profession →What is the <Y>’s name?

• …

Gather 5000+ fake and real news articles, randomly subsample

Ground-truth human evaluation of question-answering

Scalable QA model evaluation

Preliminary Evaluation: Crowd-workers

Is the Q answerable by article?

Crowdsourced eval• 3 judgments

Result: Real news answers more UAQs than fake.

Varies based on Q

Question Avg. Fake Avg. Real p-val

Overall 24% 39% 3E-7

Where is the X? 12% 52% 0.001

Where is X? 12% 42% 0.008

What happened in X? 29% 63% 0.009

When did X happen? 12% 40% 0.011

Who was in X? 28% 57% 0.024

Where was X? 11% 33% 0.027

Preliminary Evaluation: BERT QA Model

QA Model eval

• Basic BERT model

Result: Unreliable news less answerable than reliable news

Summary of Preliminary Experiments

• There is missing information in fake news

• Some questions are more missing than others

• We can automate Q&A models

Open/on-going work

1. Are we choosing the right Usually-Answerable-Questions?

2. Why are questions unanswerable in an article?• Common knowledge question

• Nonsensical question

• Missing information

3. Improving the QA model

4. UAQ as input to learned fake news classification

What if this works?

Bigger implications: Fake news as arms-race

• If missing information is an identifiable sign of misinformation, ...… then authors will add fake news

• Hypothesis: The more facts are included, the easier fact-checking is

Missing informationdetection

Fact checking

Bigger implications: Information literacy

• Method not only generates a quantitative score, but also can explain its reasoning by example

• Will this promote more critical thinking and reading by teach people what to look for? 😀

• Or will people use it as a crutch to avoid thinking themselves? 😟

Broader challenges

Broader challenges: Fact-checking Synthetic media/deepfakes• Misinformation, missing information,

and fact-checking not limited to text

• How to fact check the rhetoric of a video? When is manipulation ok, when is it a lie?

• Fact checking mixed media (Eg misleading captions)

• How to identify deepfakes?

Broader challenge: Beyond “broadcast” misinformation• What happens when misinformation is

targeted? • I.e., in a phishing attack.

• With growth of AI, targeted attacks will become more scalable and automatable• phone calls, emails, text msg

• Today’s “broadcast” fakes can be caught as they get wide distribution.

• How to deploy and scale detection and checking for individualized attacks

Helping individuals

Noelle Martin

• Victim of deepfake attack

• Now, a law reform activist

• Combating image-based abuse

Summary

Missing information in misinformation

• Can we identify details that are missing from articles

• Will this force authors to add more details?

• Will it help readers be more critical thinkers?

Fact-checking more broadly:

• Rapidly changing landscape, many threat models and many points of leverage

→Many research and impact opportunities

Thanks! Questions?

Emre Kıcıman

emrek@microsoft.com

• @emrek

Detecting the Missing Information in Misinformation

Documents

Beyond Misinformation PDF

Categorization: Information and Misinformation

Misinformation debate

misinformation and disinformation

Weapons of Mass Misinformation - ipdefenseforum.com

Detecting Missing Hyphens in Learner Text

Processing Political Misinformation final R&R. · 2020-04-21 · 1 PROCESSING POLITICAL MISINFORMATION Running head: PROCESSING POLITICAL MISINFORMATION Processing Political Misinformation—Comprehending

Managing misinformation & disinformation*

engineering misinformation

Misinformation Dec10 Rpt

DuckMotherfucker!Misinformation,C onspiracyinNetInspiredArt · Jones, Rhett. Animalnewyork. 23 Oct. 2014. DuckMotherfucker!Misinformation,C onspiracyinNetInspiredArt “Rabbit Season,

˜e Debunking Handbook 2020...fie Debunking Handbook 2020 4 Quick guide to responding to misinformation Misinformation can do damage Misinformation is false information that is spread

Monitoring health misinformation in Nigeria

AMERICANS’ VIEWS OF MISINFORMATION IN THE NEWS …...of misinformation, frequently referred to as “fake news.”1 Concern about misinformation — which can be defined as stories

Anti-Wild and Scenic Effort Spreads Misinformation · Anti-Wild and Scenic Effort Spreads Misinformation Serious misinformation regarding Wild and Scenic designation for the South

Miscreants and Misinformation on Twitter

Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College

Misinformation on Computer

Detecting Missing Information in Bug Descriptions · Detecting Missing Information in Bug Descriptions Oscar Chaparro1, Jing Lu1, Fiorella Zampetti2, Laura Moreno3 Massimiliano Di

Positive side-effects of misinformation