35
Unstructured Data and the Role of Natural Language Processing Philip Resnik Department of Linguistics and Institute for Advanced Computer Studies University of Maryland Wolfram Data Summit September 7, 2012

Unstructured Data and the Role of Natural Language Processing

  • Upload
    reia

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

Unstructured Data and the Role of Natural Language Processing. Philip Resnik Department of Linguistics and Institute for Advanced Computer Studies University of Maryland Wolfram Data Summit September 7, 2012. ♬ Daisy, Daisy…. - PowerPoint PPT Presentation

Citation preview

Page 1: Unstructured Data and the Role of Natural Language Processing

Unstructured Data and the Role of Natural Language Processing

Philip ResnikDepartment of Linguistics and

Institute for Advanced Computer StudiesUniversity of Maryland

Wolfram Data SummitSeptember 7, 2012

Page 2: Unstructured Data and the Role of Natural Language Processing

Hiya, guys. What did you think of Obama’s speech last night? I think I liked Michelle’s better.

2

Beep.

I didn’t watch it. I was playing a nice game of chess.

♬ Daisy, Daisy…♬ Daisy, Daisy…

Page 3: Unstructured Data and the Role of Natural Language Processing

3

Page 4: Unstructured Data and the Role of Natural Language Processing

Sources: graph adapted from Church, K. (2003) “Speech and Language Processing: Where have we been and where are we going,” Eurospeech, Geneva, Switzerland. Green circle data have been added from figures in Cardie and Mooney (1999).

0%20%40%60%80%

100%

1985

1990

1995

2000

2005

Annual Meeting of the Association for Computational Linguistics

% “Statistical” Papers

The statistical revolution in NLP

4

AI Winter

Page 5: Unstructured Data and the Role of Natural Language Processing

5

NLP is no longer about getting machines to understand language like people do.

It’s about building machines that do things with language that people find useful.

Page 6: Unstructured Data and the Role of Natural Language Processing

Surface methods

6

Page 7: Unstructured Data and the Role of Natural Language Processing

Surface methods plus categories

7

Pennebaker, Linguistic Inquiry and Word Counthttp://www.liwc.net/tryonlineresults.php

Page 8: Unstructured Data and the Role of Natural Language Processing

Surface methods plus categories

8

Brendan O’Connor, Ramnath Balasubramanyan, Bryan R. Routledge, Noah A. Smith, From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series, Proceedings of the International AAAI Conference on Weblogs and Social Media, Washington, DC, May 2010.

Page 9: Unstructured Data and the Role of Natural Language Processing

Surface methods plus categories

*Note that Noah Smith did point out this ambiguity!

Page 10: Unstructured Data and the Role of Natural Language Processing

Surface methods plus hidden structure

10http://www.statmt.org/moses/?n=Moses.Background

natuerlich hat john spass am spiel

of course john has fun with the game

Page 11: Unstructured Data and the Role of Natural Language Processing

11

One morning I shot an elephant in my pajamas.

How he got in my pajamas, I don’t know.

One morning I shot an elephant in my pajamas.One morning I shot an elephant in my pajamas.

Page 12: Unstructured Data and the Role of Natural Language Processing

HPI: Atrial fibrillation. This patient is a 56-year-old white gentleman who has had a history of atrial fib on and off since he had his bypass surgery. Patient was originally diagnosed with coronary artery disease as well as mitral valve problems approximately 3 years ago. Dr. Tirona used to take care of him at that time. He had a bypass surgery as well as mitral valve repair done at that time. Postop he had an episode of A-fib which then resolved spontaneously. He remembers somebody talking to him about cardioversion, but then the A-fib resolved spontaneously. So he was started on Coumadin. He would get some occasional episodes, but usually they are very brief, so he never bothered about them. Of late, over the last few months, he has been getting more frequent episodes and duration of these episodes is also prolonged for a few hours. So he saw Dr. Hagan who has referred him here for further evaluation and treatment. The patient states when he does get the A-fib, he feels very weak, tired, and short of breath. He denies any chest pain. Otherwise he is usually very active physically, he works fulltime as an electrician, and has not had any problems as far as doing his day-to-day work.

MEDICAL HISTORY: 1. Coronary artery disease as mentioned above. 2. Hypertension. 3. Hypercholesterolemia.

.IMPRESSION: Paroxysmal atrial fibrillation in a patient with prior mitral valve disease, currently having more frequent breakthroughs symptoms.

Extracting structured information

Page 13: Unstructured Data and the Role of Natural Language Processing

HPI: Atrial fibrillation. This patient is a 56-year-old white gentleman who has had a history of atrial fib on and off since he had his bypass surgery. Patient was originally diagnosed with coronary artery disease as well as mitral valve problems approximately 3 years ago. Dr. Tirona used to take care of him at that time. He had a bypass surgery as well as mitral valve repair done at that time. Postop he had an episode of A-fib which then resolved spontaneously. He remembers somebody talking to him about cardioversion, but then the A-fib resolved spontaneously. So he was started on Coumadin. He would get some occasional episodes, but usually they are very brief, so he never bothered about them. Of late, over the last few months, he has been getting more frequent episodes and duration of these episodes is also prolonged for a few hours. So he saw Dr. Hagan who has referred him here for further evaluation and treatment. The patient states when he does get the A-fib, he feels very weak, tired, and short of breath. He denies any chest pain. Otherwise he is usually very active physically, he works fulltime as an electrician, and has not had any problems as far as doing his day-to-day work.

MEDICAL HISTORY: 1. Coronary artery disease as mentioned above. 2. Hypertension. 3. Hypercholesterolemia.

.IMPRESSION: Paroxysmal atrial fibrillation in a patient with prior mitral valve disease, currently having more frequent breakthroughs symptoms.

fibrillationatrial --

fibrillationatrial paroxysmal-

diseasemitral_valve -history

fibrillationatrial -history

problemmitral_valve --

weak- --

tired- --

short_of_breath- --

diseasecoronary_artery -history

hypertension- -history

hypercholesterolemia- --

Diagnosis/ProblemAnatomy ModifierType

Extracting structured information

Page 14: Unstructured Data and the Role of Natural Language Processing

HPI: Atrial fibrillation. This patient is a 56-year-old white gentleman who has had a history of atrial fib on and off since he had his bypass surgery. Patient was originally diagnosed with coronary artery disease as well as mitral valve problems approximately 3 years ago. Dr. Tirona used to take care of him at that time. He had a bypass surgery as well as mitral valve repair done at that time. Postop he had an episode of A-fib which then resolved spontaneously. He remembers somebody talking to him about cardioversion, but then the A-fib resolved spontaneously. So he was started on Coumadin. He would get some occasional episodes, but usually they are very brief, so he never bothered about them. Of late, over the last few months, he has been getting more frequent episodes and duration of these episodes is also prolonged for a few hours. So he saw Dr. Hagan who has referred him here for further evaluation and treatment. The patient states when he does get the A-fib, he feels very weak, tired, and short of breath. He denies any chest pain. Otherwise he is usually very active physically, he works fulltime as an electrician, and has not had any problems as far as doing his day-to-day work.

MEDICAL HISTORY: 1. Coronary artery disease as mentioned above. 2. Hypertension. 3. Hypercholesterolemia.

.IMPRESSION: Paroxysmal atrial fibrillation in a patient with prior mitral valve disease, currently having more frequent breakthroughs symptoms.

fibrillationatrial --

fibrillationatrial paroxysmal-

diseasemitral_valve -history

fibrillationatrial -history

problemmitral_valve --

weak- --

tired- --

short_of_breath- --

diseasecoronary_artery -history

hypertension- -history

hypercholesterolemia- --

Diagnosis/ProblemAnatomy ModifierType

427.31

427.31

427.31

780.79

780.79

786.05

272.0

401.9

394.9

414.01

427.31Atrial fibrillation

394.9Other and unspecified mitral valve diseases

ICD

Extracting structured information

Page 15: Unstructured Data and the Role of Natural Language Processing

To react to this talk on your phone or laptop while you listen: visit

go.reactlabs.org and select #EHR.15

Page 16: Unstructured Data and the Role of Natural Language Processing

Medical coders without NLP Medical coders with NLP

Page 17: Unstructured Data and the Role of Natural Language Processing

17Dr. Martin S. Kohn | Clinical Decision Support: DeepQA, http://www.im.org/Meetings/Past/2012/2012APMWinterMeeting/Documents/Winter%20Meeting%20Presentations/PS%20II_Kohn.pdf

Page 18: Unstructured Data and the Role of Natural Language Processing

Discovering structure05_03_02.txt.0002 BEGALA Good evening. Welcome to CROSSFIRE,coming to you live from the George Washington University in beautifuldowntown Washington, D.C. Tonight in the CROSSFIRE, the case ofthe Reverend Paul Shanley, the Roman Catholic priest facing childrape charges in Massachusetts. Should his superiors be heldresponsible? Also, Matt Drudge, founder of the Internet "DrudgeReport." Is he a right-wing muckraker, an Internet gossip or alegitimate journalist? We'll ask Drudge himself when we get himin the CROSSFIRE. First, flying the not-so-friendly skies, wouldyou feel safer if pilots were armed? One outspoken congressionalcritic is against having guns in the cockpit. We're going tointroduce her now. Please welcome, Eleanor Holmes Norton, theDemocratic delegate from the District of Columbia. Ms. Norton, thankyou. Welcome back.

05_03_02.txt.0003 CARLSON Now, Ms. Norton, the majority,the vast majority of commercial airline pilots are strongly in favorof carrying guns in the cockpit on commercial airliners. You'reagainst it. What do you as a delegate know about operating acommercial airliner that the majority of commercial airline pilotsdon't know

05_03_02.txt.0004 DELEGATE Well, I know whatTransportation Secretary Norm Mineta tells me, and I know whatHomeland Security Adviser Tom Ridge tells me, and they are againstit. And I think the reason they are against it is you don't wantthe guy who's flying one of these big busters up there also with agun in his hand trying to protect his plane. You want air marshalsto do that. You want flight attendants to understand how to protectthe cockpit. And you want the redundancies that we have built in,redundancy after redundancy, working for you. We are panicking theAmerican people. They say, oh my God, I thought they had thehearings, I thought they did that. Here come the pilots saying,oh no, they haven't. We've got to have guns.

05_03_02.txt.0002 BEGALA Good evening. Welcome to CROSSFIRE,coming to you live from the George Washington University in beautifuldowntown Washington, D.C. Tonight in the CROSSFIRE, the case ofthe Reverend Paul Shanley, the Roman Catholic priest facing childrape charges in Massachusetts. Should his superiors be heldresponsible? Also, Matt Drudge, founder of the Internet "DrudgeReport." Is he a right-wing muckraker, an Internet gossip or alegitimate journalist? We'll ask Drudge himself when we get himin the CROSSFIRE. First, flying the not-so-friendly skies, wouldyou feel safer if pilots were armed? One outspoken congressionalcritic is against having guns in the cockpit. We're going tointroduce her now. Please welcome, Eleanor Holmes Norton, theDemocratic delegate from the District of Columbia. Ms. Norton, thankyou. Welcome back.

05_03_02.txt.0003 CARLSON Now, Ms. Norton, the majority,the vast majority of commercial airline pilots are strongly in favorof carrying guns in the cockpit on commercial airliners. You'reagainst it. What do you as a delegate know about operating acommercial airliner that the majority of commercial airline pilotsdon't know

05_03_02.txt.0004 DELEGATE Well, I know whatTransportation Secretary Norm Mineta tells me, and I know whatHomeland Security Adviser Tom Ridge tells me, and they are againstit. And I think the reason they are against it is you don't wantthe guy who's flying one of these big busters up there also with agun in his hand trying to protect his plane. You want air marshalsto do that. You want flight attendants to understand how to protectthe cockpit. And you want the redundancies that we have built in,redundancy after redundancy, working for you. We are panicking theAmerican people. They say, oh my God, I thought they had thehearings, I thought they did that. Here come the pilots saying,oh no, they haven't. We've got to have guns.

05_03_02.txt.0002 BEGALA Good evening. Welcome to CROSSFIRE,coming to you live from the George Washington University in beautifuldowntown Washington, D.C. Tonight in the CROSSFIRE, the case ofthe Reverend Paul Shanley, the Roman Catholic priest facing childrape charges in Massachusetts. Should his superiors be heldresponsible? Also, Matt Drudge, founder of the Internet "DrudgeReport." Is he a right-wing muckraker, an Internet gossip or alegitimate journalist? We'll ask Drudge himself when we get himin the CROSSFIRE. First, flying the not-so-friendly skies, wouldyou feel safer if pilots were armed? One outspoken congressionalcritic is against having guns in the cockpit. We're going tointroduce her now. Please welcome, Eleanor Holmes Norton, theDemocratic delegate from the District of Columbia. Ms. Norton, thankyou. Welcome back.

05_03_02.txt.0003 CARLSON Now, Ms. Norton, the majority,the vast majority of commercial airline pilots are strongly in favorof carrying guns in the cockpit on commercial airliners. You'reagainst it. What do you as a delegate know about operating acommercial airliner that the majority of commercial airline pilotsdon't know

05_03_02.txt.0004 DELEGATE Well, I know whatTransportation Secretary Norm Mineta tells me, and I know whatHomeland Security Adviser Tom Ridge tells me, and they are againstit. And I think the reason they are against it is you don't wantthe guy who's flying one of these big busters up there also with agun in his hand trying to protect his plane. You want air marshalsto do that. You want flight attendants to understand how to protectthe cockpit. And you want the redundancies that we have built in,redundancy after redundancy, working for you. We are panicking theAmerican people. They say, oh my God, I thought they had thehearings, I thought they did that. Here come the pilots saying,oh no, they haven't. We've got to have guns.

Looking at just word counts often gives you

a mish-mash.18

Page 19: Unstructured Data and the Role of Natural Language Processing

Discovering structure05_03_02.txt.0002 BEGALA Good evening. Welcome to CROSSFIRE,coming to you live from the George Washington University in beautifuldowntown Washington, D.C. Tonight in the CROSSFIRE, the case ofthe Reverend Paul Shanley, the Roman Catholic priest facing childrape charges in Massachusetts. Should his superiors be heldresponsible? Also, Matt Drudge, founder of the Internet "DrudgeReport." Is he a right-wing muckraker, an Internet gossip or alegitimate journalist? We'll ask Drudge himself when we get himin the CROSSFIRE. First, flying the not-so-friendly skies, wouldyou feel safer if pilots were armed? One outspoken congressionalcritic is against having guns in the cockpit. We're going tointroduce her now. Please welcome, Eleanor Holmes Norton, theDemocratic delegate from the District of Columbia. Ms. Norton, thankyou. Welcome back.

05_03_02.txt.0003 CARLSON Now, Ms. Norton, the majority,the vast majority of commercial airline pilots are strongly in favorof carrying guns in the cockpit on commercial airliners. You'reagainst it. What do you as a delegate know about operating acommercial airliner that the majority of commercial airline pilotsdon't know

05_03_02.txt.0004 DELEGATE Well, I know whatTransportation Secretary Norm Mineta tells me, and I know whatHomeland Security Adviser Tom Ridge tells me, and they are againstit. And I think the reason they are against it is you don't wantthe guy who's flying one of these big busters up there also with agun in his hand trying to protect his plane. You want air marshalsto do that. You want flight attendants to understand how to protectthe cockpit. And you want the redundancies that we have built in,redundancy after redundancy, working for you. We are panicking theAmerican people. They say, oh my God, I thought they had thehearings, I thought they did that. Here come the pilots saying,oh no, they haven't. We've got to have guns.

05_03_02.txt.0002 BEGALA Good evening. Welcome to CROSSFIRE,coming to you live from the George Washington University in beautifuldowntown Washington, D.C. Tonight in the CROSSFIRE, the case ofthe Reverend Paul Shanley, the Roman Catholic priest facing childrape charges in Massachusetts. Should his superiors be heldresponsible? Also, Matt Drudge, founder of the Internet "DrudgeReport." Is he a right-wing muckraker, an Internet gossip or alegitimate journalist? We'll ask Drudge himself when we get himin the CROSSFIRE. First, flying the not-so-friendly skies, wouldyou feel safer if pilots were armed? One outspoken congressionalcritic is against having guns in the cockpit. We're going tointroduce her now. Please welcome, Eleanor Holmes Norton, theDemocratic delegate from the District of Columbia. Ms. Norton, thankyou. Welcome back.

05_03_02.txt.0003 CARLSON Now, Ms. Norton, the majority,the vast majority of commercial airline pilots are strongly in favorof carrying guns in the cockpit on commercial airliners. You'reagainst it. What do you as a delegate know about operating acommercial airliner that the majority of commercial airline pilotsdon't know

05_03_02.txt.0004 DELEGATE Well, I know whatTransportation Secretary Norm Mineta tells me, and I know whatHomeland Security Adviser Tom Ridge tells me, and they are againstit. And I think the reason they are against it is you don't wantthe guy who's flying one of these big busters up there also with agun in his hand trying to protect his plane. You want air marshalsto do that. You want flight attendants to understand how to protectthe cockpit. And you want the redundancies that we have built in,redundancy after redundancy, working for you. We are panicking theAmerican people. They say, oh my God, I thought they had thehearings, I thought they did that. Here come the pilots saying,oh no, they haven't. We've got to have guns.

05_03_02.txt.0002 BEGALA Good evening. Welcome to CROSSFIRE,coming to you live from the George Washington University in beautifuldowntown Washington, D.C. Tonight in the CROSSFIRE, the case ofthe Reverend Paul Shanley, the Roman Catholic priest facing childrape charges in Massachusetts. Should his superiors be heldresponsible? Also, Matt Drudge, founder of the Internet "DrudgeReport." Is he a right-wing muckraker, an Internet gossip or alegitimate journalist? We'll ask Drudge himself when we get himin the CROSSFIRE. First, flying the not-so-friendly skies, wouldyou feel safer if pilots were armed? One outspoken congressionalcritic is against having guns in the cockpit. We're going tointroduce her now. Please welcome, Eleanor Holmes Norton, theDemocratic delegate from the District of Columbia. Ms. Norton, thankyou. Welcome back.

05_03_02.txt.0003 CARLSON Now, Ms. Norton, the majority,the vast majority of commercial airline pilots are strongly in favorof carrying guns in the cockpit on commercial airliners. You'reagainst it. What do you as a delegate know about operating acommercial airliner that the majority of commercial airline pilotsdon't know

05_03_02.txt.0004 DELEGATE Well, I know whatTransportation Secretary Norm Mineta tells me, and I know whatHomeland Security Adviser Tom Ridge tells me, and they are againstit. And I think the reason they are against it is you don't wantthe guy who's flying one of these big busters up there also with agun in his hand trying to protect his plane. You want air marshalsto do that. You want flight attendants to understand how to protectthe cockpit. And you want the redundancies that we have built in,redundancy after redundancy, working for you. We are panicking theAmerican people. They say, oh my God, I thought they had thehearings, I thought they did that. Here come the pilots saying,oh no, they haven't. We've got to have guns.

Bayesian topic models* discover the distinct topics interwoven in documents.

*Wikipedia: Topic Model; Blei et al. 200319

Page 20: Unstructured Data and the Role of Natural Language Processing

Model: detecting topic shifts

Page 21: Unstructured Data and the Role of Natural Language Processing

21

Model: detecting topic shifts

Page 22: Unstructured Data and the Role of Natural Language Processing

Model: topic shift tendency

22

Page 23: Unstructured Data and the Role of Natural Language Processing

23

Ifill, moderator: Terrible. Yes, she was constrained by the agreed debate rules. But she gave not the slightest sign of chafing against them or looking for ways to follow up the many unanswered questions or self-contradictory answers. This was the big news of the evening. Katie Couric, and for that matter Jim Lehrer, have never looked so good.

Page 24: Unstructured Data and the Role of Natural Language Processing

24

Model: topic shift tendency

Page 25: Unstructured Data and the Role of Natural Language Processing

Take-aways

• The role of NLP is not “understanding”. It’s helping people do useful things with language.

• Shallow methods work extremely well… except when they don’t. Language is replete with underlying structure.

• The deep value to look for in NLP is in bringing that structure to the surface and making it accessible to human insight.

25

Page 26: Unstructured Data and the Role of Natural Language Processing

Thanks!

26

Page 27: Unstructured Data and the Role of Natural Language Processing

reactlabs.org

Page 28: Unstructured Data and the Role of Natural Language Processing
Page 29: Unstructured Data and the Role of Natural Language Processing
Page 30: Unstructured Data and the Role of Natural Language Processing

Four years ago, I know that many Americans felt a fresh excitement about the possibilities of a new president. That choice was not the choice of our party, but Americans always come together after elections. We're a good and generous people, and we are united by so much more than what divides us. When that election was over, when the yard signs came down and the television commercials finally came off the air, Americans were eager to go back to work, to live our lives the way Americans always have, optimistic and positive and confident in the future. That very optimism is uniquely American. It's what brought us to America. We're a nation of immigrants, we're the children and grandchildren and great-grandchildren of the ones who wanted a better life. The driven ones. The ones who woke up at night, hearing that voice telling them that life in a place called America could be better. They came, not just in pursuit of the riches of this world, but for the richness of this life.

Page 31: Unstructured Data and the Role of Natural Language Processing

Take-aways

• The role of NLP is not “understanding”. It’s helping people do useful things with language.

• Shallow methods work extremely well… except when they don’t. Language is replete with underlying structure.

• The deep value to look for in NLP is in bringing that structure to the surface and making it accessible to human insight.

31

Page 32: Unstructured Data and the Role of Natural Language Processing

Thanks!

32

Page 33: Unstructured Data and the Role of Natural Language Processing

33

Page 34: Unstructured Data and the Role of Natural Language Processing

34

Page 35: Unstructured Data and the Role of Natural Language Processing

35

"In this 10-year time frame, . . . we will have perfected speech recognition and speech output well enough that those will become a standard part of the interface."

Bill Gates, 1997