68
CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang

CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

CS6120/CS4120:NaturalLanguageProcessing

Instructor:Prof.LuWangCollegeofComputerandInformationScience

NortheasternUniversityWebpage:www.ccs.neu.edu/home/luwang

Page 2: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

TimeandLocation

• Time:TuesdaysandFridays,3:25pm- 5:05pm

• Location:WestVillageH108

Page 3: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

CourseWebpage

• http://www.ccs.neu.edu/home/luwang/courses/cs6120_sp2018/cs6120_sp2018.html

Page 4: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

Prerequisites• Programming• Beingabletowritecodeinsomeprogramminglanguages(e.g.Python,Java,C/C++,Matlab)proficiently

• Courses• Algorithms• Somecalculus• Probabilityandstatistics• Linearalgebra(optionalbuthighlyrecommended)

Page 5: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

Prerequisites• Aquiz:• ThisFriday,inclass• 22simplequestions,20ofthemasTrueorFalsequestions(relevanttoprobability,statistics,andlinearalgebra)• Thepurposeofthisquizistoindicatetheexpectedbackgroundofstudents.• 80%ofthequestionsshouldbeeasytoanswer.• Notcountedinyourfinalscore!

Page 6: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

TextbookandReferences• Maintextbook(andsomeslides)• DanJurafsky andJamesH.Martin,"SpeechandLanguageProcessing,2nd Edition",PrenticeHall,2009.

• Wewillusesomematerialfrom3rd editionwhenitisavailable.• http://web.stanford.edu/~jurafsky/slp3/

• Youtube video:https://www.youtube.com/watch?v=s3kKlUBa3b0

• Otherreference• ChrisManningandHinrich Schutze,"FoundationsofStatisticalNaturalLanguageProcessing",MITPress,1999

• Machinelearningtextbooks:• ChristopherM.Bishop,"PatternRecognitionandMachineLearning",Springer,2006.

• TomMitchell,"MachineLearning",McGrawHill,1997.

Page 7: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

TopicsoftheCourse(tentatively)• LanguageModeling

• Part-of-SpeechTagging

• TextCategorization:WordSenseDisambiguation,NamedEntityRecognition

• Syntax:FormalGrammarsofEnglish,SyntacticParsing,StatisticalParsing,DependencyParsing

• Semantics:Vector-Space,LexicalSemantics,SemanticswithDenseVectors

• InformationExtraction

• QuestionAnswering

• MachineTranslation

• Summarization

• SentimentAnalysis,OpinionMining

• NLPandSocialMedia• DialogSystemsandChatbots

Page 8: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

TheGoal

• StudyfundamentaltasksinNLP

• Learnsomeclassicandstate-of-the-arttechniques

• Acquirehands-onskillsforsolvingNLPproblems• Evensomeresearchexperience!

Page 9: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

Grading• Assignment(30%)• 2 assignments,15%foreach

• Quiz(5%)• 8 in-classtests,1%foreach(threelowestscoresaredropped)• Tuesdays,andstartingnextweek

• FinalExam(35%)• Project(25%)• Participation(5%)• Classes:askandanswerquestions,participateindiscussions…• Piazza:helpyourpeers,addressquestions…

Page 10: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

Exam

• Openbook• Timeandplace,TBD(stillschedulingwiththecollege)• Pleasedon’tmaketravelarrangementsforexamweeks.

Page 11: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

CourseProject

• AnNLP-relatedresearchproject

• 2-3studentsasateam

Page 12: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

CourseProjectGrading

• Wewanttoseenovelprojects!• Theproblemneedstobewell-defined,novel,useful,andpractical.

• Reasonableresultsandobservations.

• Weencourageyoutotacklearesearch-drivenproblem.

• Option1:aresearchprojectdiscussedwiththeinstructor• Abettersolutionforanexistingproblem• Oranovelproblem

• Option2:Thefakenewschallenge

Page 13: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

SampleProjectsfromPreviousOffering

• Projectreportsarelistedhere:http://www.ccs.neu.edu/home/luwang/courses/cs6120_fa2017.html• NeuralSemanticParsingNaturalLanguageintoSQL• ShortPassagesReadingComprehensionandQuestionAnswering• PoliticalPromiseEvaluation(PPE)• PredictingPersonalityTraitsusingTweets• STORYNEXT2.0:ATEXTINSIGHTS/VISUALIZATIONTOOL• AndroidApplicationforVisualQA• NovelSummarizerandKeywordIdentifierUsingTextRankwithSentenceFarmDetection• ParaphraseGeneration• HashtagSimilaritybasedonTweetText• StanceDetectionfortheFakeNewsChallenge• MachineComprehensionUsingmatch-LSTMandAnswer-Pointer• OnlineAbuseDetection• PlagiarismDetectionUsingFP-GrowthAlgorithm• AnExaminationofInfluentialFramingofControversialTopicsonTwitter

Page 14: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

NeuralSemanticParsingNaturalLanguageintoSQL

Page 15: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

ShortPassagesReadingComprehensionandQuestionAnswering

Page 16: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some
Page 17: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

OnlineAbuseDetection

Automating the process of identifying abuse comments wouldnot only save time for the Social Media platforms but alsowould increase user safety and improve discussions online.

Build a classifier that classifies the test data as either an“Abuse” (aggression, personal attack or a toxic statement) or“Not-Abuse” statement using multiple techniques.

Page 18: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

StoryNext 2:SentimentAnalysisforDocuments

Page 19: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

Option2:TheFakeNewsChallenge

Page 20: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

TheFakeNewsChallenge

• Website:http://www.fakenewschallenge.org/• Goal:“Thegoalofthe FakeNewsChallenge istoexplorehowartificialintelligencetechnologies,particularlymachinelearningandnaturallanguageprocessing,mightbeleveragedtocombatthefakenewsproblem.WebelievethattheseAItechnologiesholdpromiseforsignificantlyautomatingpartsoftheprocedurehumanfactcheckersusetodaytodetermineifastoryisrealorahoax.”

Page 21: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

TheFakeNewsChallenge

• Stage1:StanceDetection

Page 22: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

TheFakeNewsChallenge

• Data:https://github.com/FakeNewsChallenge/fnc-1

Page 23: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

Headline:“RobertPlantRippedup$800MLedZeppelinReunionContract”

Page 24: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

CourseProjectGrading

• Wewanttoseenovelprojects!• Theproblemneedstobewell-defined,novel,useful,andpractical.

• Reasonableresultsandobservations.

• Weencourageyoutotacklearesearch-drivenproblem.• Abettersolutionforanexistingproblem• Oranovelproblem

• FeelfreetotalktotheinstructororTAsonprojecttopicsduringofficehours.

Page 25: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

CourseProjectGrading

• Threereports• Proposal(3%),duebytheendofJanuary• Progress,withcode(7%)• Final,withcode(10%)

• Onepresentation• Inclass(5%)

Page 26: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

AudienceAward

• Bonuspoints!• Allteamsvotefortheirfavoriteproject(s).• Bestprojectgets1%asbonus(onebestprojectineach

batch,ifweneedtohavemorethanonebatch/lectureforpresentation)

Page 27: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

SubmissionandLatePolicy• Eachassignmentorreportisdueatthebeginningofclassonthecorrespondingduedate.

• Programminglanguage• Python(encouraged),Java,C/C++

• Electronicversion• Onblackboard

Page 28: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

SubmissionandLatePolicy

• Assignmentorreportturnedinlatewillbecharged20points(outof100points)offforeachlateday(i.e.24hours).

• Eachstudenthasabudgetof5days throughoutthesemesterbeforealatepenaltyisapplied.

• Latedaysarenotapplicabletofinalpresentation.

• Eachgroupmemberischargedwiththesamenumberoflatedays,ifany,fortheirsubmission.

Page 29: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

Howtofindus?• Coursewebpage:• http://www.ccs.neu.edu/home/luwang/courses/cs6120_sp2018/cs6120_sp2018.html

• Officehours• Prof.LuWang:Tuesdays,from5:15pmto6:15pm,orbyappointment,258WVH• TALiwen Hou• TATirthraj Maheshkumar Parmar• TAManthan Thakar

• Piazza• http://piazza.com/northeastern/sp2018/cs6120/home• Allcourserelevantquestionsshouldgohere– alsoisthebestwaytoreachtheinstructorandTAs!

Page 30: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

WhatisNaturalLanguageProcessing?

• Allowingmachinestocommunicatewithhuman

• Naturallanguageunderstanding+naturallanguagegeneration

Page 31: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

Whatdoesitmeantounderstandalanguage?

Page 32: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

Whatdoesitmeantounderstandalanguage?Phonology

Morphology

Lexemes

Syntax

Semantics

Pragmatics

Discourse

Soundwaves

Words

Parsetrees

Meanings

Page 33: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

Whatdoesitmeantounderstandalanguage?Phonology

Morphology

Lexemes

Syntax

Semantics

Pragmatics

Discourse

ShallowerAnalysis

DeeperAnalysis

Page 34: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

Syntax,Semantic,Pragmatics• Syntaxconcernstheproperorderingofwordsanditsaffectonmeaning.

• Thedogbittheboy.• Theboybitthedog.• Bitboydogthethe.

• Semanticsconcernsthe(literal)meaningofwords,phrases,andsentences.• “plant”asaphotosyntheticorganism• “plant”asamanufacturingfacility• “plant”astheactofsowing

• Pragmaticsconcernstheoverallcommunicativeandsocialcontextanditseffectoninterpretation.• Thehamsandwichwantsanotherbeer.• Johnthinksvanilla.

[ModifiedfromRayMooney’sSlides]

Page 35: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

AmbiguityisUbiquitous• SpeechRecognition

• “recognizespeech”vs.“wreckanicebeach”• “youthinAsia”vs.“euthanasia”

Page 36: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

AmbiguityisUbiquitous• SpeechRecognition

• “recognizespeech”vs.“wreckanicebeach”• “youthinAsia”vs.“euthanasia”

• SyntacticAnalysis• “Iatespaghettiwith chopsticks”vs.“Iatespaghettiwith meatballs.”

Page 37: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

AmbiguityisUbiquitous• SpeechRecognition

• “recognizespeech”vs.“wreckanicebeach”• “youthinAsia”vs.“euthanasia”

• SyntacticAnalysis• “Iatespaghettiwith chopsticks”vs.“Iatespaghettiwith meatballs.”

• SemanticAnalysis• “Thedogisinthepen.”vs.“Theinkisinthepen.”• “Iputtheplant inthewindow”vs.“Fordputtheplant inMexico”

Page 38: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

AmbiguityisUbiquitous• SpeechRecognition

• “recognizespeech”vs.“wreckanicebeach”• “youthinAsia”vs.“euthanasia”

• SyntacticAnalysis• “Iatespaghettiwith chopsticks”vs.“Iatespaghettiwith meatballs.”

• SemanticAnalysis• “Thedogisinthepen.”vs.“Theinkisinthepen.”• “Iputtheplant inthewindow”vs.“Fordputtheplant inMexico”

• PragmaticAnalysis• From“ThePinkPantherStrikesAgain”:• Clouseau:Doesyourdogbite?HotelClerk:No.Clouseau:[bowingdowntopetthedog]Nicedoggie.[DogbarksandbitesClouseau inthehand]Clouseau:Ithoughtyousaidyourdogdidnotbite!HotelClerk:Thatisnotmydog.

Page 39: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

AmbiguityisExplosive• Ambiguitiescompoundtogenerateenormousnumbersofpossibleinterpretations.• InEnglish,asentenceendinginn prepositionalphraseshasover 2nsyntacticinterpretations(cf.Catalannumbers).• “Isawthemanwiththetelescope”:2parses• “Isawthemanonthehillwiththetelescope.”:5parses• “IsawthemanonthehillinTexaswiththetelescope”:14parses• “IsawthemanonthehillinTexaswiththetelescopeatnoon.”:42parses• “IsawthemanonthehillinTexaswiththetelescopeatnoononMonday”:132parses

Page 40: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

HumorandAmbiguity

• Manyjokesrelyontheambiguityoflanguage:• Policemantolittleboy:“Wearelookingforathiefwithabicycle.”Littleboy:“Wouldn’tyoubebetterusingyoureyes.”• Whyistheteacherwearingsun-glasses.Becausetheclassissobright.• GrouchoMarx:OnemorningIshotanelephantinmypajamas.Howhegotintomypajamas,I’llneverknow.• Shecriticizedmyapartment,soIknockedherflat.• Noahtookalloftheanimalsonthearkinpairs.Excepttheworms,theycameinapples.

Page 41: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

WhyisLanguageAmbiguous?

Page 42: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

WhyisLanguageAmbiguous?

• Havingauniquelinguisticexpressionforeverypossibleconceptualizationthatcouldbeconveyedwouldmakelanguageoverlycomplexandlinguisticexpressionsunnecessarilylong.• Allowingresolvableambiguitypermitsshorterlinguisticexpressions,i.e.datacompression.• Languagereliesonpeople’sabilitytousetheirknowledgeandinferenceabilitiestoproperlyresolveambiguities.• Infrequently,disambiguationfails,i.e.thecompressionislossy.

Page 43: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

SomeNLPTasks

Page 44: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

SyntacticTasks

Page 45: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

WordSegmentation

• Breakingastringofcharactersintoasequenceofwords.• Insomewrittenlanguages(e.g.Chinese)wordsarenotseparatedbyspaces.• EveninEnglish,charactersotherthanwhite-spacecanbeusedtoseparatewords[e.g.,;.- :() ]• ExamplesfromEnglishURLs:• jumptheshark.comÞ jumptheshark.com• myspace.com/pluckerswingbarÞmyspace .compluckers wingbarÞmyspace .complucker swingbar

Page 46: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

MorphologicalAnalysis

• Morphology isthefieldoflinguisticsthatstudiestheinternalstructureofwords.(Wikipedia)• Amorpheme isthesmallestlinguisticunitthathassemanticmeaning(Wikipedia)

• e.g.“carry”,“pre”,“ed”,“ly”,“s”

• Morphologicalanalysisisthetaskofsegmentingawordintoitsmorphemes:• carriedÞ carry+ed (pasttense)• independentlyÞ in+(depend+ent)+ly• GooglersÞ (Google+er)+s(plural)• unlockableÞ un+(lock+able)?

Þ (un+lock)+able?

Page 47: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

PartOfSpeech(POS)Tagging

• Annotateeachwordinasentencewithapart-of-speech.

• Usefulforsubsequentsyntacticparsingandwordsensedisambiguation.

I ate the spaghetti with meatballs. Pro V Det N Prep N

John saw the saw and decided to take it to the table.PN V Det N Con V Part V Pro Prep Det N

Page 48: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

PhraseChunking

• Findallnon-recursivenounphrases(NPs)andverbphrases(VPs)inasentence.• [NPI][VPate][NPthespaghetti][PPwith][NPmeatballs].• [NP He][VP reckons ][NP thecurrentaccountdeficit][VP willnarrow ][PPto][NP only#1.8billion][PP in][NP September]

Page 49: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

SyntacticParsing

• Producethecorrectsyntacticparsetreeforasentence.

Page 50: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

SemanticTasks

Page 51: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

WordSenseDisambiguation(WSD)

• Wordsinnaturallanguageusuallyhaveafairnumberofdifferentpossiblemeanings.• Ellenhasastronginterest incomputationallinguistics.• Ellenpaysalargeamountofinterest onhercreditcard.

• Formanytasks(questionanswering,translation),thepropersenseofeachambiguouswordinasentencemustbedetermined.

Page 52: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

SemanticRoleLabeling(SRL)

• Foreachclause,determinethesemanticroleplayedbyeachnounphrasethatisanargumenttotheverb.agent patient source destination instrument• John droveMary fromAustin toDallas inhisToyotaPrius.• Thehammer brokethewindow.

• Alsoreferredtoa“caseroleanalysis,”“thematicanalysis,”and“shallowsemanticparsing”

Page 53: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

SemanticParsing

• Asemanticparsermapsanatural-languagesentencetoacomplete,detailedsemanticrepresentation(logicalform).• Formanyapplications,thedesiredoutputisimmediatelyexecutablebyanotherprogram.• Example:MappinganEnglishdatabasequerytoProlog:

HowmanycitiesarethereintheUS?answer(A,count(B,(city(B),loc(B,C),

const(C,countryid(USA))),A))

Page 54: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

TextualEntailment

• Determinewhetheronenaturallanguagesentenceentails(implies)anotherunderanordinaryinterpretation.

• E.g.,“Asoccergamewithmultiplemalesplaying.->Somemenareplayingasport.”

Page 55: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

Pragmatics/DiscourseTasks

Page 56: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

AnaphoraResolution/Co-Reference

• Determinewhichphrasesinadocumentrefertothesameunderlyingentity.• Johnputthecarrotontheplateandateit.

• BushstartedthewarinIraq.ButthepresidentneededtheconsentofCongress.

• Somecasesrequiredifficultreasoning.• TodaywasJack'sbirthday.PennyandJanetwenttothestore.Theyweregoingtogetpresents.Janetdecidedtogetakite."Don'tdothat,"saidPenny."Jackhasakite.Hewillmakeyoutakeit back."

Page 57: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

MoreApplication-drivenTasks

Page 58: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

InformationExtraction(IE)

• Identifyphrasesinlanguagethatrefertospecifictypesofentitiesandrelationsintext.• Namedentityrecognitionistaskofidentifyingnamesofpeople,places,organizations,etc.intext.people organizations places• MichaelDell istheCEOofDellComputerCorporation andlivesinAustinTexas.

• Relationextractionidentifiesspecificrelationsbetweenentities.• MichaelDell istheCEOof DellComputerCorporation andlivesinAustinTexas.• MichaelDell istheCEOofDellComputerCorporationandlivesin AustinTexas.

Page 59: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

QuestionAnswering• Directlyanswernaturallanguagequestionsbasedoninformationpresentedinacorporaoftextualdocuments(e.g.theweb).• WhoisthepresidentofUnitedStates?

• DonaldTrump

• WhatisthepopularofMassachusetts?• 6.8million

Page 60: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

TextSummarization

• Produceashortsummaryofoneormanylongerdocument(s).• Article: Aninternationalteamofscientistsstudieddietandmortalityin135,335peoplebetween35and70yearsoldin18countries,followingthemforanaverageofmorethansevenyears.Dietinformationdependedonself-reports,andthescientistscontrolledforfactorsincludingage,sex,smoking,physicalactivityandbodymassindex. ThestudyisinTheLancet.Comparedwithpeoplewhoatethelowest20percentofcarbohydrates,thosewhoatethehighest20percenthada28percentincreasedriskofdeath.Buthighcarbohydrateintakewasnotassociatedwithcardiovasculardeath.…

• Summary: Researchersfoundthatpeoplewhoatehigheramountsofcarbohydrateshadahigherriskofdyingthanthosewhoatemorefats.

Page 61: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

SpokenDialogueSystems-- Chatbots

• Q:Isitgoingtoraintoday?• A:Itwillbemostlysunny.Norainisexpected.

Page 62: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

MachineTranslation

• Translateasentencefromonenaturallanguagetoanother.• 我喜欢汉堡à Ilikeburgers.

Page 63: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

AmbiguityResolutionisRequiredforTranslation• Syntacticandsemanticambiguitiesmustbeproperlyresolvedforcorrecttranslation:• “Johnplays theguitar.”→“John弹吉他”• “Johnplays soccer.”→“John踢足球”

Page 64: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

AmbiguityResolutionisRequiredforTranslation• Syntacticandsemanticambiguitiesmustbeproperlyresolvedforcorrecttranslation:• “Johnplays theguitar.”→“John弹吉他”• “Johnplays soccer.”→“John踢足球”

• AnapocryphalstoryisthatanearlyMTsystemgavethefollowingresultswhentranslatingfromEnglishtoRussianandthenbacktoEnglish:• “Thespiritiswillingbutthefleshisweak.”à “Theliquorisgoodbutthemeatisspoiled.”• “Outofsight,outofmind.”à “Invisibleidiot.”

Page 65: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

ResolvingAmbiguity• Choosingthecorrectinterpretationoflinguisticutterancesrequires(commonsense)knowledgeof:• Syntax

• Anagentistypicallythesubjectoftheverb• Semantics

• MichaelandEllenarenamesofpeople• Austinisthenameofacity(andofaperson)• ToyotaisacarcompanyandPriusisabrandofcar

• Pragmatics• Somesocialnorm,communicativegoals• Askingaquestion,expectingananswer

• Worldknowledge• Creditcardsrequireuserstopayfinancialinterest• Agentsmustbeanimateandahammerisnotanimate

Page 66: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

State-of-the-Arts

• Learningfromlargeamountsoftextdata(cf.rule-basedmethods)• Supervisedlearningorunsupervisedlearning

• Statisticalmachinelearning-basedmethods• Theprobabilisticknowledgeacquiredallowsrobustprocessingthathandleslinguisticregularitiesaswellasexceptions.

• Nowwithneuralnetwork-basedmethodsmostly

Page 67: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

RelatedFields

• ArtificialIntelligence• MachineLearning• Linguistics• Cognitivescience• Logic• Datascience• Politicalscience• Education• …manymore

Page 68: CS 6120/CS4120: Natural Language Processingwangluxy/archive/neu_courses/... · 2020. 7. 12. · •Not counted in your final score! Textbook and References •Main textbook (and some

RelevantScientificConferencesandJournals

• AssociationforComputationalLinguistics(ACL)• NorthAmericanAssociationforComputationalLinguistics(NAACL)• EmpiricalMethodsinNaturalLanguageProcessing(EMNLP)• InternationalConferenceonComputationalLinguistics(COLING)• ConferenceonComputationalNaturalLanguageLearning(CoNLL)• TransactionsoftheAssociationforComputationalLinguistics(TACL)• JournalofComputationalLinguistics(CL)