Detecting the Missing Information in Misinformation

Preview:

Citation preview

Detecting the Missing Information

in MisinformationEmre Kıcıman

emrek@microsoft.com@emrek

Microsoft Research

Collaborators

Rohail SyedU. Michigan

Michael GolebiewskiMicrosoft

Sudha RaoMicrosoft

Bruno AbrahaoNYU

Bhaskar MitraMicrosoft

Misinformation, disinformation, and fake news

• Misinformation: information that is false or misleading

• Disinformation: misinformation created with intent to harm

• Related: fake news, satire, mal-information, propaganda, clickbait, …, credibility, reliability

Effects of misinformation

Individual decisions Polarization No Trust

Misinformation: What’s being done?

• Fact checking, automated detection• Human, diffusion, network based,

reputation, content based

• Mitigating creation and spread• Legal and platform efforts

• Education• General population, media org.

• Research into effects

Web Search Perspective

• Search engines: Trusted representation of the web

• Conflicting principles?• Enable information access• Don’t mislead people

• Key difference with other platforms: People have a query

• If a person knows something exists, but search engine doesn’t show it, this can cause distrust, feed conspiratorial mindset

• One approach:• Show misinformation if someone searches for it directly• Perhaps label it clearly, and/or interleave with responses and checks• Avoid showing it otherwise. (i.e., don’t aid discovery and spread)

Missing information

Missing information in misinformation

• Fake news is often missing details to substantiate story• Often low quality and/or short

• Focus on emotion and reaction

• Similar phenomenon found in fake reviews also lacking detail [Ott, Cardie, Choy, Hancock, 2011]

Fake news example

Judge Mahal al Alallaha-Smith of the 22nd District

Federal Court of Appeals ruled this morning that two

“critical issues for Muslims” in Sharia Law had to

be abided by in the United States court system

because of the systematic infusion clause and

because the 14th Amendment guarantees them the

rights guaranteed by other states.

“[…] With that as precedent, understanding that a

higher court may reverse it, my decision is that

items one and two on the docket are allowable

between family members as prescribed by Sharia Law.”

https://www.snopes.com/fact-check/muslim-federal-judge-sharia/

Fake news example

Judge Mahal al Alallaha-Smith of the 22nd District

Federal Court of Appeals ruled this morning that two

“critical issues for Muslims” in Sharia Law had to

be abided by in the United States court system

because of the systematic infusion clause and

because the 14th Amendment guarantees them the

rights guaranteed by other states.

“[…] With that as precedent, understanding that a

higher court may reverse it, my decision is that

items one and two on the docket are allowable

between family members as prescribed by Sharia Law.”

https://www.snopes.com/fact-check/muslim-federal-judge-sharia/

Where is the “22nd District Federal Court of Appeals”?

Fake news example

Judge Mahal al Alallaha-Smith of the 22nd District

Federal Court of Appeals ruled this morning that two

“critical issues for Muslims” in Sharia Law had to

be abided by in the United States court system

because of the systematic infusion clause and

because the 14th Amendment guarantees them the

rights guaranteed by other states.

“[…] With that as precedent, understanding that a

higher court may reverse it, my decision is that

items one and two on the docket are allowable

between family members as prescribed by Sharia Law.”

https://www.snopes.com/fact-check/muslim-federal-judge-sharia/

What is “the systematic infusion clause”?

Fake news example

Judge Mahal al Alallaha-Smith of the 22nd District

Federal Court of Appeals ruled this morning that two

“critical issues for Muslims” in Sharia Law had to

be abided by in the United States court system

because of the systematic infusion clause and

because the 14th Amendment guarantees them the

rights guaranteed by other states.

“[…] With that as precedent, understanding that a

higher court may reverse it, my decision is that

items one and two on the docket are allowable

between family members as prescribed by Sharia Law.”

https://www.snopes.com/fact-check/muslim-federal-judge-sharia/

What are the “other states”?

Fake news example

Judge Mahal al Alallaha-Smith of the 22nd District

Federal Court of Appeals ruled this morning that two

“critical issues for Muslims” in Sharia Law had to

be abided by in the United States court system

because of the systematic infusion clause and

because the 14th Amendment guarantees them the

rights guaranteed by other states.

“[…] With that as precedent, understanding that a

higher court may reverse it, my decision is that

items one and two on the docket are allowable

between family members as prescribed by Sharia Law.”

https://www.snopes.com/fact-check/muslim-federal-judge-sharia/

What is the “higher court”?

Fake news example

Judge Mahal al Alallaha-Smith of the 22nd District

Federal Court of Appeals ruled this morning that two

“critical issues for Muslims” in Sharia Law had to

be abided by in the United States court system

because of the systematic infusion clause and

because the 14th Amendment guarantees them the

rights guaranteed by other states.

“[…] With that as precedent, understanding that a

higher court may reverse it, my decision is that

items one and two on the docket are allowable

between family members as prescribed by Sharia Law.”

https://www.snopes.com/fact-check/muslim-federal-judge-sharia/

What is the docket number? Who is suing whom?

How can we detect missing information?

Adapt models built for Question Answering tasks

E.g. SQuAD models https://rajpurkar.github.io/SQuAD-explorer/

Given a question and a passage of text, these models find the answer in text, or say it is missing.

Approach

Usually-Answerable Questions (UAQ): For a class of articles, set of template questions usually answered in reliable news articles

Article Evaluation: How many UAQs are answerable by the article?

Ground-truth human evaluation

Scalable QA model evaluation

Qualitative Outcome: Explain score by showing UAQs themselves

Preliminary Experiments and Results

Simple NER-driven template questions• X:person →Where does <X> work?

• Y:profession →What is the <Y>’s name?

• …

Gather 5000+ fake and real news articles, randomly subsample

Ground-truth human evaluation of question-answering

Scalable QA model evaluation

Preliminary Evaluation: Crowd-workers

Is the Q answerable by article?

Crowdsourced eval• 3 judgments

Result: Real news answers more UAQs than fake.

Varies based on Q

Question Avg. Fake Avg. Real p-val

Overall 24% 39% 3E-7

Where is the X? 12% 52% 0.001

Where is X? 12% 42% 0.008

What happened in X? 29% 63% 0.009

When did X happen? 12% 40% 0.011

Who was in X? 28% 57% 0.024

Where was X? 11% 33% 0.027

Preliminary Evaluation: BERT QA Model

QA Model eval

• Basic BERT model

Result: Unreliable news less answerable than reliable news

Summary of Preliminary Experiments

• There is missing information in fake news

• Some questions are more missing than others

• We can automate Q&A models

Open/on-going work

1. Are we choosing the right Usually-Answerable-Questions?

2. Why are questions unanswerable in an article?• Common knowledge question

• Nonsensical question

• Missing information

3. Improving the QA model

4. UAQ as input to learned fake news classification

What if this works?

Bigger implications: Fake news as arms-race

• If missing information is an identifiable sign of misinformation, ...… then authors will add fake news

• Hypothesis: The more facts are included, the easier fact-checking is

Missing informationdetection

Fact checking

Bigger implications: Information literacy

• Method not only generates a quantitative score, but also can explain its reasoning by example

• Will this promote more critical thinking and reading by teach people what to look for? 😀

• Or will people use it as a crutch to avoid thinking themselves? 😟

Broader challenges

Broader challenges: Fact-checking Synthetic media/deepfakes• Misinformation, missing information,

and fact-checking not limited to text

• How to fact check the rhetoric of a video? When is manipulation ok, when is it a lie?

• Fact checking mixed media (Eg misleading captions)

• How to identify deepfakes?

Broader challenge: Beyond “broadcast” misinformation• What happens when misinformation is

targeted? • I.e., in a phishing attack.

• With growth of AI, targeted attacks will become more scalable and automatable• phone calls, emails, text msg

• Today’s “broadcast” fakes can be caught as they get wide distribution.

• How to deploy and scale detection and checking for individualized attacks

Helping individuals

Noelle Martin

• Victim of deepfake attack

• Now, a law reform activist

• Combating image-based abuse

Summary

Missing information in misinformation

• Can we identify details that are missing from articles

• Will this force authors to add more details?

• Will it help readers be more critical thinkers?

Fact-checking more broadly:

• Rapidly changing landscape, many threat models and many points of leverage

→Many research and impact opportunities

Thanks! Questions?

Emre Kıcıman

emrek@microsoft.com

• @emrek

Recommended