A Test Collection for Research on Depression and Language Use

David E. Losada

Fabio Crestani

A Test Collection for Research onDepression and Language Use

CLEF 2016, Évora (Portugal)

350 million people sufer from

depression

early interventionis fundamental

human expert + technology

current technology

doesn´t supportearly alerts

reactive

works with very

explicit signals

current technology

doesn´t supportearly alerts

reactive

works with very

explicit signals

too often, too late!

instigate research on the onset of depression

proactive technologies

track temporal evolution

early alerts

Text analytics

natural language can be indicative of personality, social status, emotions, mental health, disorders, ...

linguistic markers

use of personal pronouns

statistical properties of text

topic modelspsychometrics

content vs style

social words

verb tense positive/negative emotions

psychological processes

cognitive processes

Lack of data on depression & language

few collections available

focus on 2-class categorisation

no temporal dimension, no early risk analysis

little context about the tweet writer

difficult to assess whether a mention of

depression is genuine

no way to extract a long history of tweets (e.g. several years)

little context about the tweet writer

difficult to assess whether a mention of

depression is genuine

no way to extract a long history of tweets (e.g. several years)

A Thin Line

A Thin Line

no way to extract any history

short messages, little context

A Thin Line

no way to extract any history

short messages, little context

large history for each redditor (several years)

many subreddits (communities) about different

medical conditions (e.g. depression or anorexia)

long messages

terms & conditions allow use

for research purposes

large history for each redditor (several years)

many subreddits (communities) about different

medical conditions (e.g. depression or anorexia)

long messages

terms & conditions allow use

for research purposes

depression group vs control group


“I am depressed” “I think I have depression”

Adopted extraction method from Coppersmith et al. 2014:

pattern matching search

search for explicit mentions of diagnosis (e.g. “I was diagnosed with depression”)

manual inspection of the results


(e.g. “My wife has depression”, “I am a student interested in depression”)

large set of random redditors

from a wide range of subreddits (news, media, ...)

also included some false positives from the depression subreddit

retrieved all history from any subreddit his/her posts + his/her comments to other posts

often several years of text

removed the post/comment with

the explicit mention of the

diagnosis (depression group)

redditor profile

pre- & post-diagnosis text

organised the writings in

chronological order

XML archives

redditor profile

collection: main statistics

early prediction task

detect early traces of depression

for each subject, sequentially process pieces of evidence...




------------2/13/13

John Doe's writings(post or comments)




------------2/13/13





------------2/13/13

------------2/15/13





------------2/13/13

------------

------------

2/15/13 3/1/13





------------2/13/13

------------

------------

------------

2/15/13 3/1/13 12/9/16

...John Doe's writings(post or comments)




------------2/13/13

------------

------------

------------

2/15/13 3/1/13 12/9/16


tradeoff early decision

vs more informed decision




------------2/13/13

------------

------------

------------

2/15/13 3/1/13 14/9/16


tradeoff early decision

vs more informed decision

when should I fire an alarm?

early prediction task: performance metric

After seeing k texts a system makes a binary decision dd about John Doe:

d=1 => possible risk of depressiond=0 => non-risk case

early prediction task: performance metric

After seeing k texts a system makes a binary decision dd about John Doe:

------------2/13/13

(1)

------------

------------

2/15/13(2)

3/10/14(k)

John Doe's writings(post or comments) ...

decision (d)

d=1 => possible risk of depressiond=0 => non-risk case

early prediction task: performance metric------------2/13/13

(1)

------------

------------

2/15/13(2)

3/10/14(k)

John Doe's writings(post or comments) ...

decision (d)

ERDEO(d,k)=

Early Risk Detection Error:

cfp

(false positive)

cfn

(false negative)

ctp

* lco(k) (true positive)

0 (true negative)

Early Risk Detection Error:

ERDEO(d,k)=

cfp

(false positive)

cfn

(false negative)

ctp

* lco(k) (true positive)

0 (true negative)

Usually, cfn >> c

fp

cfn ← 1, c

fp ← expected proportion of positive cases (e.g. 0.01)

True Positive cost: ctp

* lco(k)

ctp← c

fn (late detection ≈ no detection)

Latency cost function

experiments

Training Test

403 83 352 54

Training

403 83

------------

------------...

------------

------------

2/13/13 2/15/13 3/1/13 12/9/16

single docrepresentations

depression language classifier

------------

------------...

------------

------------

3/23/13 3/25/13 1/3/14 2/19/15

------------------------

John Doe

Jane Doe

Jane Doe

John Doe

------------------------

.

.

...

1:0.4 2:0.5 …..........+11:0.3 3:0.7 …..........-1

.

.

.

feature-based representations (tfidf weights)

logistic regression(L1 regularisation)

Test

352 54

random (after 1st message)

------------

------------...

------------

------------

2/13/13 2/15/13 3/1/13 14/9/16

rand ({0,1})

.

.

.

Test

352 54

minority class (after 1st message)

------------

------------...

------------

------------

2/13/13 2/15/13 3/1/13 14/9/16

1 (risk case)

Test

352 54

first n

1 2 n

------------...

------------

------------ ...

2/13/13 2/15/13 3/1/13


decision

Test

352 54

dynamic

1 2 n

------------...

------------

------------ ...

2/13/13 2/15/13 3/1/13


confident about risk?

we finish and predict 1 (risk case)

yes

Test

352 54

dynamic

1 2 n

------------...

------------

------------ ...

2/13/13 2/15/13 3/1/13



we wait and see more evidence...no

Test

352 54

dynamic

1 2 n

------------...

------------

------------ ...

2/13/13 2/15/13 3/1/13



we finish and predict 1 (risk case)

yes

Test

352 54

dynamic

1 2 n

------------...

------------

------------ ...

2/13/13 2/15/13 3/1/13



we wait and see more evidence...no

random/minority: poor F1 & ERDEfirst n: good F1 but slow at detecting risk casesdynamic: best balance between correctness & time

results

new collection on

depression & language

early risk detectionalgorithms

(preliminary baselines)

methodology for benchmark construction

temporal dimension

conclusions

David E. Losada

Fabio Crestani

A Test Collection for Research on Depression and Language Use

We also thank the “Ministerio de Economía y Competitividad”

of the Goverment of Spain &FEDER Funds (ref. TIN2015-64282-R)

This research was funded by the Swiss National Science Foundation

(project “Early risk prediction on the Internet: an evaluation corpus”, 2015)

Acknowledgements:

Ehnero. picture pg 1.CC BY NC 2.0.Gerald Gabernig. picture pg 2.CC BY 2.0.ankxt. picture pg 3.CC BY 2.0.NEC Corporation of America. picture pg 4.CC BY 2.0.Jordi Borràs i Vivó. picture pgs 5-6 .CC BY NC ND 2.0. Helen Harrop. picture pg 7.CC BY SA 2.0.Nilufer Gadgieva. picture pg 8.CC BY NC 2.0.Alix May. picture pg 9.CC BY NC 2.0.Justin Lincoln. picture pg 10.CC BY SA 2.0.Grace McDunnough. picture pgs 11-18 (top).CC BY NC ND 2.0. Andy Kennelly. picture pgs 19-21.CC BY NC 2.0.Joel Olives. picture pgs 22-23 (left).CC BY 2.0.Tim Morgan. picture pg 23 (right).CC BY 2.0.Conor Lawless. picture pg 24.CC BY 2.0.Oscar Rethwill. picture pgs 25-32.CC BY 2.0.Emily. picture pgs 33-37.CC BY NC 2.0.Tiberiu Ana. picture pg 38.CC BY 2.0.woodleywonderworks. picture pg 39 (left), 40 (left).CC BY 2.0.Niko Kaiser. picture pg 39 (right), 41-47.CC BY 2.0.John Sheets. picture pg 48.CC BY NC 2.0.Anders Sandberg. picture pg 49.CC BY NC 2.0.See-ming Lee. picture pg 51.CC BY NC 2.0.

https://www.flickr.com/photos/122924483@N02/

https://creativecommons.org/licenses/by-nc/2.0/

https://www.flickr.com/photos/ggabernig/

https://creativecommons.org/licenses/by/2.0/

https://www.flickr.com/photos/ankyt/


https://www.flickr.com/photos/neccorp/


https://www.flickr.com/photos/jordiborras

https://creativecommons.org/licenses/by-nc-nd/2.0/

https://www.flickr.com/photos/creatinginthedark/

https://creativecommons.org/licenses/by-sa/2.0/



https://www.flickr.com/photos/asrai/


https://www.flickr.com/photos/justinlincoln/

https://creativecommons.org/licenses/by-sa/2.0/

https://www.flickr.com/photos/gracemcdunnough/

https://creativecommons.org/licenses/by-nc-nd/2.0/

https://www.flickr.com/photos/ajax8055/


https://www.flickr.com/photos/jolives/


https://www.flickr.com/photos/timothymorgan/


https://www.flickr.com/photos/conchur/


https://www.flickr.com/photos/rethwill/


https://www.flickr.com/photos/ebarney/


https://www.flickr.com/photos/txberiu/


https://www.flickr.com/photos/wwworks/


https://www.flickr.com/photos/nicokaiser/




https://www.flickr.com/photos/arenamontanus/


https://www.flickr.com/photos/seeminglee/


Data & Analytics

A Test Collection for Research on Depression and Language Use