CWIN17 Frankfurt / talend_nlp

Preview:

Citation preview

1©2017 Talend Inc

CWIN17– NaturalLanguageProcessing

ArminWallrab|DirectorPreSalesCentral&NorthernEuropeawallrab@talend.com

2

• Whatisnaturallanguageprocessing?• Texttokenization• Sentencesplitting• Part-of-Speechtagging

http://www.clips.ua.ac.be/pages/mbsp-tags

• Syntacticparsing• Shallowparsing(akachunking)• NamedEntityRecognition

• Co-referenceresolution• Dependencyparsing

• Sentimentanalysis

Playathttp://nlp.stanford.edu:8080/corenlp/process

NaturalLanguageProcessing

3

• Extractusefulinformationfromthetextualresources(suchasforums,notesinsalesforce,etc.)• Namesofpersons• Namesofcompanies(competitors...)• Namesoftools(concurrenttools...)

• Classifydiscussionsbytopics• Groupdiscussionstogether• Finddiscussionswherepeoplearementionedbutdon'tparticipatetothediscussion.

• Entitylinking• Linksbetweenprofilesandmentionsinthetext• Linksbetweenpersonsandorganizations• Linksbetweenpersonsandanyotherinformationthatmaybeusedforre-identification

Wherecanthisbeuseful?

4

Wherecanthisbeuseful?

5

• Usetextualdatatogetmoreinformationaboutyourstructureddata

• AnalyzeCRMnotes• Extractcontactnames• Getinformationabouttheirstatus(leftthecompany,newphonenumber,gotmarriedandchangedname…)

• Comparethemwiththecurrentvaluesinyourstructureddata• Contactinformationup-to-date?• Namechanged?• Phonechanged?• Addresschanged?• …

http://ualr.edu/informationquality/iciq-proceedings/iciq-2015/

Self-healingcustomerdataqualityissuesthroughinterpretationofunstructured

data(Chandrasekaran.K,Clement.D)

Relationshipwithdataquality?

6

• Prepare text sample• Removeclutter (e.g.HTMLtags)• Tokenize &normalize

• TrainaModel• Designthe features• Labelentities• Validate the model (e.g.K-Fold CrossValidation)

• Usethe Model• Apply onfull text

UseSparkBatch

Great!HowdoesitworkinTalend?

7

Componentworkflow

8

Texttransformations

ConvertinConll-2003formataddoptionalfeaturesandlabeltokens

Extractnamedentitieswith<PER>labels

9©2017 Talend Inc

TheStanfordCoreNLPLibrary

10

Semantic Analysis

http://nlp.stanford.edu:8080/corenlp/

11

Meaning of the tags

https://www.clips.uantwerpen.be/pages/mbsp-tags

12

SentimentAnalysis&SentimentTree

http://corenlp.run/

http://nlp.stanford.edu:8080/sentiment/rntnDemo.html

13©2017 Talend Inc

Let’sdosomeNLPwithTalend!

14

Capturing TwitterMessages

15

Analysisof text messages with Talend

16

• NaturalLanguageProcessing(NLP)componentsareavailableinSparkBatchandStreaming

• Whatcanitbeusedfor?• Extractusefulinformationfromtextualresources(peoplenames,

companies,tools…)• Classifydiscussionsbytopics(groupdiscussionstogether,find

discussionswherepeoplearementioned)• Entitylinking(e.g.personsandorganizationslinking,links

betweenpersonsandanyotherinformationthatmaybeusedforre-identification)

• Whatarethetypicalindustryusecases?• IntelligentSearch• SentimentAnalysis• MarketingPersonalization• GDPR• …

• TalendcomeswithSupportforNLP• ModelPreparation• ModelTraining• ModelEvaluation

Summary

I added

a tool in the software

Recommended