View
107
Download
0
Category
Preview:
Citation preview
1©2017 Talend Inc
CWIN17– NaturalLanguageProcessing
ArminWallrab|DirectorPreSalesCentral&NorthernEuropeawallrab@talend.com
2
• Whatisnaturallanguageprocessing?• Texttokenization• Sentencesplitting• Part-of-Speechtagging
http://www.clips.ua.ac.be/pages/mbsp-tags
• Syntacticparsing• Shallowparsing(akachunking)• NamedEntityRecognition
• Co-referenceresolution• Dependencyparsing
• Sentimentanalysis
Playathttp://nlp.stanford.edu:8080/corenlp/process
NaturalLanguageProcessing
3
• Extractusefulinformationfromthetextualresources(suchasforums,notesinsalesforce,etc.)• Namesofpersons• Namesofcompanies(competitors...)• Namesoftools(concurrenttools...)
• Classifydiscussionsbytopics• Groupdiscussionstogether• Finddiscussionswherepeoplearementionedbutdon'tparticipatetothediscussion.
• Entitylinking• Linksbetweenprofilesandmentionsinthetext• Linksbetweenpersonsandorganizations• Linksbetweenpersonsandanyotherinformationthatmaybeusedforre-identification
Wherecanthisbeuseful?
4
Wherecanthisbeuseful?
5
• Usetextualdatatogetmoreinformationaboutyourstructureddata
• AnalyzeCRMnotes• Extractcontactnames• Getinformationabouttheirstatus(leftthecompany,newphonenumber,gotmarriedandchangedname…)
• Comparethemwiththecurrentvaluesinyourstructureddata• Contactinformationup-to-date?• Namechanged?• Phonechanged?• Addresschanged?• …
http://ualr.edu/informationquality/iciq-proceedings/iciq-2015/
Self-healingcustomerdataqualityissuesthroughinterpretationofunstructured
data(Chandrasekaran.K,Clement.D)
Relationshipwithdataquality?
6
• Prepare text sample• Removeclutter (e.g.HTMLtags)• Tokenize &normalize
• TrainaModel• Designthe features• Labelentities• Validate the model (e.g.K-Fold CrossValidation)
• Usethe Model• Apply onfull text
UseSparkBatch
Great!HowdoesitworkinTalend?
7
Componentworkflow
8
Texttransformations
ConvertinConll-2003formataddoptionalfeaturesandlabeltokens
Extractnamedentitieswith<PER>labels
9©2017 Talend Inc
TheStanfordCoreNLPLibrary
10
Semantic Analysis
http://nlp.stanford.edu:8080/corenlp/
11
Meaning of the tags
https://www.clips.uantwerpen.be/pages/mbsp-tags
12
SentimentAnalysis&SentimentTree
http://corenlp.run/
http://nlp.stanford.edu:8080/sentiment/rntnDemo.html
13©2017 Talend Inc
Let’sdosomeNLPwithTalend!
14
Capturing TwitterMessages
15
Analysisof text messages with Talend
16
• NaturalLanguageProcessing(NLP)componentsareavailableinSparkBatchandStreaming
• Whatcanitbeusedfor?• Extractusefulinformationfromtextualresources(peoplenames,
companies,tools…)• Classifydiscussionsbytopics(groupdiscussionstogether,find
discussionswherepeoplearementioned)• Entitylinking(e.g.personsandorganizationslinking,links
betweenpersonsandanyotherinformationthatmaybeusedforre-identification)
• Whatarethetypicalindustryusecases?• IntelligentSearch• SentimentAnalysis• MarketingPersonalization• GDPR• …
• TalendcomeswithSupportforNLP• ModelPreparation• ModelTraining• ModelEvaluation
Summary
I added
a tool in the software
Recommended