Click here to load reader

Dan Jurafsky Lecture 8: Medical Applications: Intoxication, Depression, Trauma, Alzheimers, General Medical Health CS 424P/ LINGUIST 287 Extracting Social

Embed Size (px)

Citation preview

  • Slide 1
  • Slide 2
  • Dan Jurafsky Lecture 8: Medical Applications: Intoxication, Depression, Trauma, Alzheimers, General Medical Health CS 424P/ LINGUIST 287 Extracting Social Meaning and Sentiment
  • Slide 3
  • Topic 1: Intoxication
  • Slide 4
  • Hollien et al 2001 Methods: 35 young adults, 19 males, 16 females given series of doses of alcohol speech collected at 4 BAC stages Rainbow passage difficult words (buttercup, shapupie) extemp speech (Tell us about your favorite TV program) head-mounted mikes Investigated: F0 mean and variance duration/rate of speech intensity disfluencies
  • Slide 5
  • Hollien et al 2001 Results: F0
  • Slide 6
  • Hollien et al 2001 Results: Duration
  • Slide 7
  • Hollien et al 2001 Results: Disfluencies
  • Slide 8
  • Hollien et al 2001 Results: Magnitudes
  • Slide 9
  • Hollien et al 2001 Results: Speaker Specific Effects What did they find?
  • Slide 10
  • A famous case study Johnson, K., Pisoni, D. & Bernacki, R. (1990) Do voice recordings reveal whether a person is intoxicated?: A case study. Phonetica. 47: 215-237.
  • Slide 11
  • Exxon Valdez
  • Slide 12
  • Was Captain Hazelwood drunk? Not clear if this is relevant, since he was asleep below deck The third mate was in charge of the wheelhouse the ships radar was broken But is a well-studied case
  • Slide 13
  • Johnson et al examined 3 kinds of cues Segmental Effects Disfluencies Suprasegmental Effects
  • Slide 14
  • Keith Johnsons /s/ and//
  • Slide 15
  • / /: Captain Hazelwood
  • Slide 16
  • Slide 17
  • Slide 18
  • Duration
  • Slide 19
  • F0
  • Slide 20
  • Summary
  • Slide 21
  • Questions Johnson et al. examined various possible causes. What other kinds of speaker state could cause drop in F0, slower speech, and disfluencies?
  • Slide 22
  • New Corpus! Alcohol Language Corpus Florian Schiel et al 2009, 2010 http://www.bas.uni- muenchen.de/forschung/Bas/BasALCeng.html http://www.bas.uni- muenchen.de/forschung/Bas/BasALCeng.html 124 speakers, 11,160 recordings recorded in a car (sometimes with engine running) tonguetwisters command and control speech (turn off the radio) spontaneous dialogue and monologue sample, drunk: sample, sober:
  • Slide 23
  • Automatic Classification Use of prosodic speech characteristics for automated detection of alcohol intoxication Michael Levit, Richard Huber, Anton Batliner, Elmar Noeth Break utterance into phrases automatically, based on fundamental frequency (where possible); zero-crossing rate energy
  • Slide 24
  • Then use 4 classes of features Prosodic F0 max, F0 min, energy max, energy min, pause length Duration of voiced regions, unvoiced regions, etc. Jitter and shimmer Average cepstrum and cepstral slope
  • Slide 25
  • Methods Alcoholized speech samples collected at the Police Academy of Hessen, Germany 120 readings (87 minutes) of a fable 33 male speakers BAC between 0 and.24/mille Binary task: above or below 0.8/mille leave-one-out cross-validation neural net classifier
  • Slide 26
  • Results of Levit et al. Used dev set to find best classifier This used two feature classes: Prosodic features Jitter/shimmer Results with this classifier 62% phrase-accuracy 69% for the whole speech sample voting of the phrases
  • Slide 27
  • Automatic detection features in the Bavarian corpus Humans: 62%-75% Machine: features used to date: F0 duration rhythm (correlated with duration but doesnt require word transcripts) formants (f1 mean and F4 variance) Future work!!! disfluencies other segmental features: s versus sh but Schiel finding: more hyperarticulation in vowels in women in their corpus
  • Slide 28
  • Topic 2: Depression
  • Slide 29
  • Stirman and Pennebaker Suicidal poets 300 poems from early, middle, late periods of 9 suicidal poets 9 non-suicidal poets
  • Slide 30
  • Stirman and Pennebaker: 2 models Durkheim disengagement model: suicidal individual has failed to integrate into society sufficiently, is detached from social life detach from the source of their pain, withdraw from social relationships, become more self-oriented prediction: more self-reference, less group references Hopelessness model: Suicide takes place during extended periods of sadness and desperation, pervasive feelings of helplessness, thoughts of death prediction: more negative emotion, fewer positive, more refs to death
  • Slide 31
  • Methods 156 poems from 9 poets who committed suicide published, well-known in English have written within 1 year of commmiting suicide Control poets matched for nationality, education, sex, era.
  • Slide 32
  • The poets
  • Slide 33
  • Stirman and Pennebaker: Results
  • Slide 34
  • Significant factors Disengagement theory I, me, mine we, our, ours Hopelessness theory death, grave Other sexual words (lust, breast)
  • Slide 35
  • Rude et al: Language use of depressed and depression-vulnerable college students Beck (1967) cognitive theory of depression depression-prone individuals see the world and tehmselves in pervasively engative terms Pyszynski and Greenberg (1987) think about themselves after the loss of a central source of self-worth, unable to exit a self-regulatory cycle concerned with efforts to regain what was lost. results in self-focus, self-blame Durkheim social integration/disengagement perception of self as not integrated into society is key to suicidality and possibly depression
  • Slide 36
  • Methods College freshmen 31 currently-depressed (standard inventories) 26 formerly-depressed 67 never-depressed Session 1: take depression inventory Session 2: write essay please describe your deepest thoughts and feelings about being in college write continuously off the top of your head. Dont worry about grammar or spelling. Just write continuously.
  • Slide 37
  • Results depressed used more I,me than never-depressed turned out to be only I and used more negative emotional words not enough we to check Durkheim model formerly depressed participants used more I in the last third of the essay
  • Slide 38
  • Ramirez-Esparza et al: Depression in English and Spanish Study 1: Use LIWC counts on posts from 320 English and Spanish forums 80 posts each from depression forums in English and Spanish 80 control posts each from breast cancer forums Run the following LIWC categories I we negative emotion positive emotion
  • Slide 39
  • Results of Study 1
  • Slide 40
  • Conclusions?
  • Slide 41
  • Study 2 From depression forums: 404 English posts 404 Spanish posts Create a term by document matrix of content words 200 most frequent content words Do a factor analysis dimensionality reduction in term-document matrix Used 5 factors
  • Slide 42
  • English Factors a
  • Slide 43
  • Spanish Factors a
  • Slide 44
  • Implications? Problems? New applications?
  • Slide 45
  • Topic 3: Trauma
  • Slide 46
  • Cohn, Mehl, Pennebaker: Linguistic Markers of Psychology Change Surrounding September 11, 2001 1084 LiveJournal users all blog entries for 2 months before and after 9/11 Lumped prior two months into one baseline corpus. Investigated changes after 9/11 compared to that baseline Using LIWC categories
  • Slide 47
  • Variables examined Emotional positivity difference between LIWC scores for positive emotion words (happy, good, nice) and negative emotion words (kill, ugly, guilty). cognitive processing think, question, because: concerned with organizing and intellectually understanding issues social orientation talk, share, friends and personal pronouns besides I/me. (essentially counts # of references to other people)
  • Slide 48
  • Last factor: Psychological Distancing psychological distancing factor-analytic: + articles, + words > 6 letters long - I/me/mine - would/should/could - present tense verbs low score = personal, experiential lg, focus on here and now high score: abstract, impersonal, rational tone
  • Slide 49
  • Results
  • Slide 50
  • Implications? Methodological problems? Ideas for exciting new studies?
  • Slide 51
  • Topic 4: Alzheimers
  • Slide 52
  • The Nun Study Linguistic Ability in Early Life and the Neuropathology of Alzheimers Disease and Cerebrovascular Disease: Findings from the Nun Study D.A. SNOWDON, L.H. GREINER, AND W.R. MARKESBERY The Nun Study: a longitudinal study of aging and Alzheimers disease Cognitive and physical function assessed annually All participants agreed to brain donation at death At the first exam given between 1991 and 1993, the 678 participants were 75 to 102 years old. This study: subset of 74 participants for whom we had handwritten autobiographies from early life, all of whom had died.
  • Slide 53
  • The data In September 1930 leader of the School Sisters of Notre Dame religious congregation requested each sister write a short sketch of her own life. This account should not contain more than two to three hundred words and should be written on a single sheet of paper... include the place of birth, parentage, interesting and edifying events of one's childhood, schools attended, influences that led to the convent, religious life, and its outstanding events. Handwritten diaries found in two participating convents, Baltimore and Milwaukee
  • Slide 54
  • The linguistic analysis Grammatical complexity Developmental Level metric (Cheung/Kemper) sentences classified from 0 (simple one-clause sentences) to 7 (complex sentences with multiple embedding and subordination) Idea density: average number of ideas expressed per 10 words. elementary propositions, typically verb, adjective, adverb, or prepositional phrase. Complex propositions that stated or inferred causal, temporal, or other relationships between ideas also were counted. Prior studies suggest: idea density is associated with educational level, vocabulary, and general knowledge grammatical complexity is associated with working memory, performance on speeded tasks, and writing skill.
  • Slide 55
  • Idea density I was born in Eau Claire, Wis., on May 24, 1913 and was baptized in St. James Church. (1) I was born, (2) born in Eau Claire, Wis., (3) born on May 24, 1913, (4) I was baptized, (5) was baptized in church (6) was baptized in St. James Church, (7) I was born...and was baptized. There are 18 words or utterances in that sentence. The idea density for that sentence was 3.9 (7/18 * 10 = 3.9 ideas per 10 words).
  • Slide 56
  • Results correlation between neuropatholocially defined Alzheimers desiease had lower idea desnity socres than thnon-Alzheimers Correlations between idea density scores and mean neurofibrillary tangle counts 0.59 for the frontal lobe, 0.48 for the temporal lobe, 0.49 for the parietal lobe
  • Slide 57
  • Explanations? Early studies found same results with a college- education subset of the population who were teachers, suggesting education was not the key factor They suggest: Low linguistic ability in early life may reflect suboptimal neurological and cognitive development which might increase susceptibility to the development of Alzheimers disease pathology in late life
  • Slide 58
  • Garrod et al. 2005 British writer Iris Murdoch last novel published 1995, Diagnosed with Alzheimers 1997 Compared three novels Under the Net (first) The Sea (in her prime) Jackson's Dilemma (final novel) All her books written in longhand with little editing
  • Slide 59
  • Type to token ratio in the 3 novels
  • Slide 60
  • Syntactic Complexity
  • Slide 61
  • Mean proportions of usages of the 10 most frequently occurring words in each book that appear twice within a series of short intervals, ranging from consecutive positions in the text to a separation of three intervening words. Garrard P et al. Brain 2005;128:250-260 Brain Vol. 128 No. 2 Guarantors of Brain 2004; all rights reserved
  • Slide 62
  • Parts of speech
  • Slide 63
  • Comparative distributions of values of: (A) frequency and (B) word length in the three books. Garrard P et al. Brain 2005;128:250-260 Brain Vol. 128 No. 2 Guarantors of Brain 2004; all rights reserved
  • Slide 64
  • From Under the Net, 1954 "So you may imagine how unhappy it makes me to have to cool my heels at Newhaven, waiting for the trains to run again, and with the smell of France still fresh in my nostrils. On this occasion, too, the bottles of cognac, which I always smuggle, had been taken from me by the Customs, so that when closing time came I was utterly abandoned to the torments of a morbid self-scrutiny. From Jackson's Dilemma, 1995 "His beautiful mother had died of cancer when he was 10. He had seen her die. When he heard his father's sobs he knew. When he was 18, his younger brother was drowned. He had no other siblings. He loved his mother and his brother passionately. He had not got on with his father. His father, who was rich and played at being an architect, wanted Edward to be an architect too. Edward did not want to be an architect."
  • Slide 65
  • Lancashire and Hirst Vocabulary Changes in Agatha Christies Mysteries as an Indication of Dementia: A Case Study Ian Lancashire and Graeme Hirst 2009
  • Slide 66
  • Slide 67
  • Vocabulary Changes in Agatha Christies Mysteries as an Indication of Dementia: A Case Study Ian Lancashire and Graeme Hirst 2009 Examined all of Agatha Christies novels Features: Nicholas, M., Obler, L. K., Albert, M. L., Helm-Estabrooks, N. (1985). Empty speech in Alzheimers disease and fluent aphasia. Journal of Speech and Hearing Research, 28: 40510. Number of unique word types Number of different repeated n-grams up to 5 Number of occurences of thing, anything, and something
  • Slide 68
  • Slide 69
  • Results
  • Slide 70
  • Topic 5: Writing and physical health People asked to write about traumatic experiences subsequently exhibit better physical health than people asked to write about superficial topics Intuition: people who write about emotional topics report that the experiment makes them think differently about their experience. Hypothesis: Do changes in writing style correlate with improved health? Could we find these changes automatically?
  • Slide 71
  • Singular Value Decomposition Singular Value Decomposition (SVD) is a form of factor analysis Any m n matrix A can be written using an SVD of the form A = UDV T where: U is an m n matrix (a hanger matrix) D is an n n diagonal matrix (a stretcher matrix) V T is an n n matrix (an aligner matrix) (see http://www.uwlax.edu/faculty/will/svd/index.html)http://www.uwlax.edu/faculty/will/svd/index.html
  • Slide 72
  • Application of SVD to LSA Assemble a large corpus of natural language Parse corpus into meaningful passages Form matrix with passages as rows and words as columns SVD applied to re-represent the words and passages as vectors in a high-dimensional semantic space
  • Slide 73
  • SVD: an example (1) Titles of Technical Memos c1: Human machine interface for ABC computer applications c2: A survey of user opinion of computer system response time c3: The EPS user interface management system c4: System and human system engineering testing of EPS c5: Relation of user perceived response time to error measurement m1: The generation of random, binary, ordered trees m2: The intersection graph of paths in trees m3: Graph minors IV: Widths of trees and well-quasi-ordering m4: Graph minors: A survey
  • Slide 74
  • LSA This example is taken from: Deerwester, S.,Dumais, S.T., Landauer, T.K.,Furnas, G.W. and Harshman, R.A. (1990). "Indexing by latent semantic analysis." Journal of the Society for Information Science, 41(6), 391-407. Slides are from a presentation by Tom Landauer and Peter Foltz, adapted by Melanie Martin
  • Slide 75
  • A Small Example Technical Memo Titles c1: Human machine interface for ABC computer applications c2: A survey of user opinion of computer system response time c3: The EPS user interface management system c4: System and human system engineering testing of EPS c5: Relation of user perceived response time to error measurement m1: The generation of random, binary, ordered trees m2: The intersection graph of paths in trees m3: Graph minors IV: Widths of trees and well-quasi-ordering m4: Graph minors: A survey Slides are from a presentation by Tom Landauer and Peter Foltz, adapted by Melanie Martin
  • Slide 76
  • A Small Example 2 r (human.user) = -.38r (human.minors) = -.29 Slides are from a presentation by Tom Landauer and Peter Foltz, adapted by Melanie Martin
  • Slide 77
  • A Small Example 3 Singular Value Decomposition {A}={U}{ S }{V} T Dimension Reduction {~A}~={~U}{~ S }{~V} T
  • Slide 78
  • A Small Example 4 {U} = Slides are from a presentation by Tom Landauer and Peter Foltz, adapted by Melanie Martin
  • Slide 79
  • A Small Example 5 { S } = Slides are from a presentation by Tom Landauer and Peter Foltz, adapted by Melanie Martin
  • Slide 80
  • A Small Example 6 {V} = Slides are from a presentation by Tom Landauer and Peter Foltz, adapted by Melanie Martin
  • Slide 81
  • A Small Example 7 r (human.user) =.94r (human.minors) = -.83
  • Slide 82
  • A Small Example 2 reprise r (human.user) = -.38r (human.minors) = -.29 Slides are from a presentation by Tom Landauer and Peter Foltz, adapted by Melanie Martin
  • Slide 83
  • Pennebaker results Pronouns: I, my, it, you, me, she, he, her, we, they, your, him, his, them, our, myself, their, us, its
  • Slide 84
  • Implications?