IBMWatsonApplicationDeveloperWorkshop
January2017Duration:60minutes
PreparedbyVíctorL.Fandiño|IBMGlobalBusinessPartners
WatsonKnowledgeStudio:BuildingaMachine-learningAnnotatorwithWatsonKnowledgeStudio
Lab02
WatsonApplicationDeveloperWorkshop 1
Overview
YoucanuseWatsonKnowledgeStudio(WKS)tocreateamachine-learningmodelthatunderstandsthelinguisticnuances,meaning,andrelationshipsspecifictoyourindustryortocreatearule-basedmodelthatfindsentitiesindocumentsbasedonrulesthatyoudefine.
Tobecomeasubjectmatterexpertinagivenindustryordomain,Watsonmustbetrained. You can facilitate the task of trainingWatson withWatson KnowledgeStudio.WithWatsonKnowledgeStudioyoucandelivermeaningfulinsightstousersbydeployingatrainedmodelinotherWatsoncloud-basedofferingsandcognitivesolutions, including AlchemyLanguage, Watson Discovery service and WatsonExplorer.
WatsonKnowledgeStudioprovideseasy-to-usetoolsforannotatingunstructureddomainliterature,andusesthoseannotationstocreateacustommachine-learningmodel thatunderstandsthe languageof thedomain.Theaccuracyof themodelimproves through iterative testing, ultimately resulting in an algorithm that canlearnfromthepatternsthatitseesandrecognizethosepatternsinlargecollectionsofnewdocuments.
Thefollowingdiagramillustrateshowitworks
• Basedonasetofdomain-specificsourcedocuments,theteamcreatesatype system that defines entity types and relation types for theinformationofinteresttotheapplicationthatwillusethemodel.
• Agroupoftwoormorehumanannotatorsannotateasmallsetofsourcedocuments to label words that represent entity types, words thatrepresent relation types between entity mentions, and to identifycoreferences of entity types. Any inconsistencies in annotation areresolved,andonesetofoptimallyannotateddocumentsisbuilt,whichformsthegroundtruth.
2 Lab02–BuildingaMachine-learningAnnotatorwithWatsonKnowledgeStudio
• Thegroundtruthisusedtotrainamodel.
• Thetrainedmodelisusedtofindentities,relations,andcoreferencesinnew,never-seen-beforedocuments.
Additionally, you can build a rule-basedmodel withWatson Knowledge Studio.Watson Knowledge Studio provides a rules editor that simplifies the process offindingandcapturingcommonpatternsinyourdocumentsasrules.Youcanthencreate amodel that recognizes the rule patterns, anddeploy it for use in otherservices.
Objectives
Theobjectivesof this lab is toprovideanoverviewonhow tobuildamachine-learningannotatorinWKS,coveringthefollowingtasks:
• Createprojects
• Createtypesystems
• Createdocumentsets
• Adddictionaries
• Createtasksforhumanannotators
• Createdictionary-basedannotatorsandmachine-learningannotators
• Deploythemachine-learningannotatortoAlchemyLanguageacomparetheresults
Prerequisites
IntheLabsPreparationGuide:GettingStartedwithIBMWatsonAPIs&SDKsyouhave instructionstogetan IBMBluemixandWatsonKnowledgeStudioaccount.Also,youwillneedPostmanfortestingthedeployedannotator.Forthis lab,usethelatestversionofChromeorFirefoxwebbrowsers.Forthebestperformance,useascreenresolutionofatleast1024x1280.
Note:inthislabyouwillbeworkinginyourownWatsonKnowledgeStudioinstancewiththeadministratorrole(ADMIN).Thatmeansthatyouaretheonlymemberofthe annotator component team. A real project always requiresmultiple human
WatsonApplicationDeveloperWorkshop 3
annotatorsinadditiontoanadministratororprojectmanager.Inthiscaseyouwillbeperformingallthetasks.Becauseyouaretheonlyhumanannotator,youwillnotbeabletoanalyzeinter-annotatoragreementandadjudicateconflictsinannotateddocuments, which is always an important part of the annotator developmentworkflow.ForinformationaboutteammembersinWatsonKnowledgeStudioandhowtocreateusers,seethesectionAssemblingtheteamintheWKSKnowledgeCenter.
Creatingaproject
Aprojectdefinesalloftheresourcesthatarerequiredtocreateamachine-learningannotator, including training documents, the type system, dictionaries, andannotations that are added by human annotators. Formore information aboutprojectcreation,seeCreatingaproject.
1. LogintoWatsonKnowledgeStudiowithyouradministratorID
2. IntheWKSmainpage,clickCreateProject.
3. Givetheprojectaname.Youcannotchangetheprojectnamelater,sochooseashortnamethatreflectsyourdomaincontentorthepurposeoftheannotator component. You can specify a longerdescription,whichcanbechangedlater.Inthislab,wewillnametheproject“wadwWKS”
4. Identifythelanguageofthedocumentsinyourproject.Thedocumentsthatyouaddtotheproject,andthedictionariesthatyoucreateorimport,mustbeinthelanguagethatyouspecify.Inthisexample,documentswillbeinEnglish.Theselectedlanguagecannotbechangedlatter
5. In the Component Configuration, leave the Default Tokenizer. Thedefaulttokenizerismoreadvancedthanthedictionary-basedtokenizer;itusesmachine-learningtoidentifythetokensinthesourcedocumentsbasedonthestatisticallearningithasdoneinthelanguageofthesourcedocuments. It identifies tokens with more precision because itunderstands themore natural and nuanced patterns of language. Thedictionary-basedtokenizeridentifiestokensbasedonlanguagerules.Theselected tokenizer cannot be changed latter. See Tokenizers formoredetails.
6. In the ProjectManager Selection, you have the option to add projectmanagers to theproject (theadministrator canaddor removeprojectmanagerslaterbyeditingtheproject).Onlythenamesofpeoplethatyou
4 Lab02–BuildingaMachine-learningAnnotatorwithWatsonKnowledgeStudio
assigned to the project manager role from the User AccountManagementpagefortheinstancearedisplayed.SinceyouhaveaccessonlytoasingleuserID,nonameswillbeavailable,sothisentrywillbeempty
7. Whenyouareready,clickCreate.Theprojectwillbecreatedandyouwillbe directed to the project Type System configuration. To change theproject description or add or remove project managers later, anadministratorcanedittheproject.
8. SampleScreenshot
WatsonApplicationDeveloperWorkshop 5
Creatingatypesystem
You will now learn how to import and modify a type system within WatsonKnowledgeStudio.Youmustcreateorimportatypesystembeforeyoubeginanyannotationtasks.SeeTypesystemsformoreinformationaboutthistopic.
9. Download the en-klue2.zip file to your computer. This file contains anexampleKLUEtypesystemforEnglishdocuments
10. Withinyourproject,clickTypeSystem in thebanneror thenavigationmenu.OntheTypeSystempage,clickImport
11. Select the en-klue2.zip file from your computer and click Import. Theimportedtypesystemisdisplayedinthetable
6 Lab02–BuildingaMachine-learningAnnotatorwithWatsonKnowledgeStudio
12. 52 entity types and 2,177 relation types should be imported. You canbrowsethetypesystem.Youcanalsoeditanentitytype.Forinstance,locatetheMONEYentitytype.IntheActionsectionclickEditandintheRolescolumndeletetheroleAWARD.ClickSave
Addingdocumentsforannotations
After you finish making changes to the type system, you can begin addingdocumentstoyourproject.YouwillnowlearnhowtoadddocumentstoaprojectinWatson Knowledge Studio that can be annotated by human annotators. SeeAddingdocumentstoaprojectformoreinformationaboutaddingdocuments.
13. Download the documents-new.csv file to your computer. This filecontainsexampledocumentssuitableforimporting.
14. Within your project, clickDocuments in the banner or the navigationmenu.OntheDocumentspage,clickImportDocumentSet
WatsonApplicationDeveloperWorkshop 7
15. Selectthedocuments-new.csvfilefromyourcomputerandclickImport.The importedfile isdisplayedinthetable.Theimporteddocumentsetshouldcontain14documents.Youcanclickthedocumentsetinthetabletoaccessabrowsethecontentofeachdocumentintheset.Theycontainnewsaboutcomputingtechnologiesandcompanies.
Atthispoint,asaProjectManager,youarenowreadytodividethecorpus intomultiple document sets and assign the document sets to different humanannotators. Since you are the only user in the instance, youwill create a singleannotationset.
Creatingannotationsets
Anannotationsetisasubsetofdocumentsfromanimporteddocumentsetthatyouassigntoahumanannotator.Thehumanannotatorannotatesthedocuments
in the annotation set. To later use inter-annotator scores to compare theannotationsthatareaddedbyeachhumanannotator,youmustassignatleasttwohumanannotators todifferentannotationsets.Youmustalsospecify thatsomepercentageofdocumentsoverlapbetweenthesets.
Note:Inarealproject,youwouldcreateasmanyannotationsetsasneeded,basedon thenumberofhumanannotatorsworking in theproject. In this lab,youwillcreatejustoneannotationsetthatyouwillalsoannotate.
Formoreinformationaboutannotationsets,seeCreatingandassigningannotationsets.
8 Lab02–BuildingaMachine-learningAnnotatorwithWatsonKnowledgeStudio
16. Within your project, clickDocuments in the banner or the navigationmenu
17. ClickCreateAnnotationSets.TheCreateAnnotationSetswindowopens.Bydefault,thiswindowshowsthebaseset(containingalldocuments),aswellasfieldswhereyoucanspecifytheinformationforanewannotationset
18. SelectyournameintheAnnotator listandprovideanamefortheset.Noticethatyoucouldaddmoresets (andahumanannotator foreachone),whichisamorerealisticsituationinabusinessenvironment.Inthecaseofmorethanoneset,theOverlapfieldspecifiesthepercentageofdocumentsinthebasesettobeincludedinallofthenewsets,sotheycan be annotated by all annotators and you can compare the results.Sinceyouonlyhaveoneset,theoverlaphasnoeffect
WatsonApplicationDeveloperWorkshop 9
19. ClickGenerate
ThenewannotationsetiscreatedandnowappearintheAnnotationSetstaboftheDocumentspage.
Addingadictionary
Dictionaries are used inWKS for pre-annotating text when creating amachine-learning annotator. You will now learn how to add a dictionary to a project inWatsonKnowledge Studio. Formore information aboutdictionaries, seeAdding
10 Lab02–BuildingaMachine-learningAnnotatorwithWatsonKnowledgeStudio
dictionariestoaproject.
20. Download the dictionary-items-organization.csv file to your computer.ThisfilecontainsdictionarytermsinCSVformat,suitablefor importingintoaWatsonKnowledgeStudiodictionary
21. Within your project, clickDictionaries in the banner or the navigationmenu
22. ClicktheAddicontoaddadictionary
Note:DonotclicktheImport icon,whichisusedtoimportadictionarythatyouwanttouseas-is.Forthelab,youwillcreateaneweditabledictionaryandthenimporttermsintoit.
23. IntheNamefield,type“Testdictionary”.ClickSavetocreatethe(empty)dictionary.Thenewdictionary iscreatedandautomaticallyopenedforediting
24. In the dictionary pane, click Import. In the Import Dictionary Entrieswindow, select the dictionary-items-organization.csv file from your
WatsonApplicationDeveloperWorkshop 11
computerandthenclickImport.24termsinthefileareimportedintothedictionary.Eachtermrepresentsanorganization
25. ClickAddEntrytocreateanewterm.Aneditablerowisaddedatthetop
26. In the Surface Forms column, type “IBM” and “International BusinessMachinesCorporation”onseparatelines(whenyoubegintotypeanewsurface form, a space is added below for an additional surface form).LeavetheradiobuttonnexttoIBMselected,indicatingthatthissurfaceformisthelemma.InthePartofSpeechcolumn,selectNoun.ClickSave
Afteryoucreateadictionary,youcanuseittospeeduphumanannotationtasksbypre-annotatingthedocuments.
Pre-annotatingwithadictionary-basedannotator
You will now learn how to use a dictionary-based annotator to pre-annotatedocuments in Watson Knowledge Studio. For more information about pre-annotationwithdictionary-basedannotators,seePre-annotatingdocumentswiththeDictionarypre-annotator.
27. Within your project, clickAnnotator Component in the banner or thenavigationmenu.Youcanseedifferentwaystopre-annotatedocuments
12 Lab02–BuildingaMachine-learningAnnotatorwithWatsonKnowledgeStudio
28. UnderthedescriptionoftheDictionaryPre-annotatortype,clickCreatethistypeofpre-annotator.TheDictionaryMappingwindowopens.
29. Thelistofentitytypesyoupreviouslyimportedwhencreatingthetypesystemappears.Younowhavetoassociateeachdictionarythatyouwantthedictionarypre-annotatortouse,withtheentitytypethatmatchesthetypeofthedictionaryterms.Youmustmapatleastonedictionarybeforeyoucanrunthepre-annotator.MaptheORGANIZATIONentitytypetothe“Testdictionary”dictionaryyoucreatedpreviously:ClickEditfortheORGANIZATIONentitytypename.Choosethedictionaryfromthelist
30. Clicktheplussignbesidethedictionarynametoaddthemapping,andthenclickSave
31. ClickCreateandthenselectCreate&Runfromthedrop-downmenu
WatsonApplicationDeveloperWorkshop 13
32. OntheRunAnnotatorpage,clickthecheckboxestoselectthedocumentsetthatyoucreatedearlierinthelab(notincludingthebaseset)
33. ClickRun
Thedocumentsintheselectedsetarepre-annotatedusingthedictionaryannotatoryoucreated.TheannotatorcomponentisaddedtotheAnnotatorComponentpage;youcouldlaterusethesameannotatortopre-annotateadditionaldocumentsetsbyclickingRun.
Creatinganannotationtask
In this section, youwill learn how to use annotation tasks to track thework ofhuman annotators in Watson Knowledge Studio. For more information about
14 Lab02–BuildingaMachine-learningAnnotatorwithWatsonKnowledgeStudio
annotationtasks,seeCreatinganannotationtask.
34. Within your project, click Human Annotation in the banner or thenavigationmenu.OntheHumanAnnotationpage,clickAddTask
35. Specify the details for the task: In the Title field, type “Test”. In theDeadlinefield,selectadateinthefuture
36. ClickCreate
37. IntheAddAnnotationSetstoTaskwindow,clickthecheckboxestoselectthe document set you created previously. This specifies that thedocumentsetmustbeannotatedbytheassignedhumanannotatorsaspartofthistask.Rememberthatforthis labyouonlyhaveonehumanannotatorandthecorrespondingannotationset.Inarealscenario,youwill have multiple annotation sets assigned to different humanannotatorsinyourproject
38. ClickCreateTask
WatsonApplicationDeveloperWorkshop 15
39. ClicktheTesttasktoopenit.Youcanusethisviewtoviewtheprogressof human annotation work, calculate the inter-annotator agreementscores, and view overlapping documents to adjudicate annotationconflicts
Annotatingdocuments
In this section, you will learn how to use the Ground Truth Editor to annotatedocuments in Watson Knowledge Studio. For more information about humanannotation,seeAnnotationwiththeGroundTruthEditor.
40. IntheTesttaskyoucreatedintheprevioussection,clickAnnotatenexttotheAnnotationSet1annotationset.TheGroundTruthEditoropens,showing you a preview of each document in the document set. TheGroundTruthEditoropensinanewbrowsertab,showingyouapreviewofeachdocumentinthedocumentset
16 Lab02–BuildingaMachine-learningAnnotatorwithWatsonKnowledgeStudio
41. Scrolltothe“Technology-gmanews.tv”documentandclicktoopenitforannotation.Notethattheterm“IBM”hasalreadybeenannotatedwiththe ORGANIZATION entity type; this annotation was added by thepreviousdictionarypre-annotatorprocess.Thispre-annotationiscorrect,soitdoesnotneedtobemodified
WatsonApplicationDeveloperWorkshop 17
42. You will now annotate a mention. Click theMentions icon to beginannotatingmentions. In the document body, select the text “ThomasWatson”
43. Inthelistofentitytypes,clickPERSON.TheentitytypePERSONisappliedtotheselectedmention
44. ClicktheRelationsicontobeginannotatingrelations
18 Lab02–BuildingaMachine-learningAnnotatorwithWatsonKnowledgeStudio
45. Select the “Thomas Watson” and “IBM” mentions (in that order). Toselectamention,clicktheentity-typelabelabovethetext
46. In the list of relation types, click founderOf. The two mentions areconnectedwithafounderOfrelationship
47. ClicktheCompletedoptionfromthemenu,andthenclicktheSaveicontoconfirm,andthenclickClose
48. In the list of documents clickSubmitAll to submit the documents forapproval.Onceconfirmed,youcanseethatthestatusofalldocumentsisCompleted
Note:Inarealproject,youwouldcreatemanymoreannotationsandcompleteallofthedocumentsinthesetbeforesubmitting.
49. ClosetheGroundTruthEditor
50. BackintheHumanAnnotationTaskswindow,clicktheRefreshbutton.YoucanseenowthattheAnnotationSet1isinSubmittedstatus
51. MarkthecheckboxneartoAnnotationSet1;youwillseethatanAcceptandCancelbuttonsappear.ClickAccept. Youhavenowpromoted the
WatsonApplicationDeveloperWorkshop 19
documentsettogroundtruth
Note:Inarealsituationyouwillhaveseveralannotationsetsreviewedbydifferenthuman annotators. Youwill have to compare their work to determinewhetherdifferenthumanannotatorsareannotatingoverlappingdocumentsconsistently.Inthat situation, Watson Knowledge Studio calculates inter-annotator agreement(IAA) scoresbyexaminingalloverlappingdocuments inalldocumentsets in thetask, regardless of the status of the document sets. The IAA scores show howdifferent human annotators annotated mentions, relations, and coreferencechains. It is a good idea to check IAA scores periodically and verify that humanannotatorsareconsistentwitheachother.Forexample,ahumanannotatorcouldhave defined the relation between IBM and ThomasWatson as founderOf andanotheroneasemployedBy.TheIAAscoreswillreflectthissituationthatyouwillhave to analyse and discuss with the annotators to adjudicate conflicts. This issomethingyoucandointheannotationtask.Forthissimpleexampleinthelab,youarethesingleannotatorandaminimumsetofhumanannotationshavebeendone,sonoconflictsarepresentandtheannotationsetstatusiscompleted(thestatus should be In Conflict if any overlapping is detectedwhen you select andacceptthedocumentsets).
Creatinganddeployingamachine-learningannotator
Afteryouhavepromotedthedocumentstogroundtruth,youcanusethemtotrainthemachine-learningannotator.Whenyoucreateamachine-learningannotator,youselectthedocumentsetsthatyouwanttousetotrainit.Youalsospecifythepercentageofdocumentsthataretobeusedastrainingdata,testdata,andblinddata.Onlydocumentsthatbecamegroundtruththroughapprovaloradjudicationcanbeusedtotrainthemachine-learningannotator.
20 Lab02–BuildingaMachine-learningAnnotatorwithWatsonKnowledgeStudio
52. On the Annotator Component page, click Create Annotator. On theMachineLearningAnnotatorpane,clickCreatethistypeofannotator
53. Selectthedocumentsetsthatyouwanttouseforcreatingamachine-learningannotator.ClickthecheckmarknexttoAnnotationSet1.Usethedefault values for creatingyour testing, training, andblinddata. Then,clickNext.Accept to reuse the currentdictionarymapping in thenextwindowandclickTrain&Evaluate
WatsonApplicationDeveloperWorkshop 21
Note:Trainingmighttakemorethantenminutes,orevenhours,dependingonthenumberofhumanannotationsandthetotalnumberofwordsacrossdocuments.
54. YouarenowbacktotheAnnotatorComponentwindowwhereyoucanseethetrainingprogressofyourannotator
55. Afterthemachine-learningannotatoristrained,youcanexportitoryoucanviewdetailedinformationonitsperformancebyclickingDetails.OntheModelSettingstab,youhaveaccesstotheTrain/Test/Blindsetswhereyoucanviewthedocumentsthathumanannotatorsworkedon.You can clickView Decoding Results to see the annotations that thetrained machine-learning annotator created on that same set ofdocuments
56. OntheStatisticstab,youcanviewdetailsabouttheprecision,recallandF1scoresforthemachine-learningannotator.Youcanviewthesescoresfor mentions, relations, and coreference chains by using the radiobuttons.Youcananalyseperformancebyviewingasummaryofstatisticsfor entity types, relation types, and coreference chains. You can alsoanalysestatisticsthatarepresentedinaconfusionmatrix.Theconfusionmatrix helps you compare the annotations that were added by themachine-learningannotatortotheannotationsinthegroundtruth.
22 Lab02–BuildingaMachine-learningAnnotatorwithWatsonKnowledgeStudio
Note: In this tutorial,youannotateddocumentswithonlyasingledictionary fororganizations.Therefore,thescoresyouseeare0orN/Aformostentitytypes.Thenumbers are low, but that is expected, because you did not do any humanannotationorcorrection.
57. OntheVersionstab,youcantakeasnapshotoftheannotatorandtheresources that were used to create it (except for dictionaries andannotationtasks).Forexample,youmightwanttotakeasnapshotbeforeyouchangetheannotator.Ifthestatisticsarepoorerthenexttimeyourun it, you can promote the older version and delete the version thatreturned poorer results. Also, if you want to make your annotatoravailable to other Watson applications, you must create at least oneversionoftheannotator.Thisallowsyoutodeployoneversion,whileyoucontinuetoimprovethecurrentversion.Theoptiontodeploydoesnotappearuntilyoucreateatleastoneversion
58. On theVersions tab, clickTakeSnapshot. Provideadescriptionof thesnapshot
WatsonApplicationDeveloperWorkshop 23
59. Oncethesnapshothasbeencreated,clickDeployintheActionsectioninthesamelinethatthesnapshot.SelectAlchemyLanguageastheservicetodeploythemodeltoandclickNext
60. IntheDeployModelwindow,introduceyourAlchemyLanguageAPIkey.ThisisthesameoneyouusedinthepreviousAlchemyLanguagelab(youcangetitfromyourIBMBluemixdashboard,whereyoushouldhaveyourAlchemyLanguageservice).ClickDeploy
61. A confirmation window indicating the deployment to theAlchemyLanguage servicehas startedwill appear. In thatwindow, youwill have your model Id, the one you should use when invoking the
24 Lab02–BuildingaMachine-learningAnnotatorwithWatsonKnowledgeStudio
AlchemyLanguagemethodswithyourannotator.CopythismodelIdforuseitlater.ClickOk
62. YoucancheckthestatusofthedeploymentintheActionsectionclosetothesnapshot.Dependingonthemodel,thedeploymentcantakesometimetocomplete.Onceyouseethatthestatusisavailable,youaremodelisreadytobeusedbytheAlchemyLanguagemethods
UsingtheannotatorinAlchemyLanguage(optional)
Afteryoutrainamachine-learningannotator,youcanuseittopre-annotatenewdocuments that you add to the corpus and you canmake it available to otherWatsonapplications,likeAlchemyLanguage,WatsonDiscoveryserviceorWatsonExplorer.
SeeUsingthemachine-learningmodeltolearnhowtodeployyourannotatorstotheseIBMWatsonapplications.
Nowthatyouhavedeployedyourmodel,youcantest itwithAlchemyLanguage.Verifyfirstthatthedeploymenthasfinished.WearenowgoingtoextractnamedentitieswiththeAlchemyLanguageGetRankedNamedEntitiesmethod.
63. Open a Postman session. Create a new POST HTTP request with thefollowingparameters:
WatsonApplicationDeveloperWorkshop 25
64. VerifythatintheBodysectionoftherequest,theoptionx-www-form-urlencodedisselected
65. ClickSendtomaketherequest.YoushouldgetanXMLoutputsimilartothefollowingone
Method POST
Endpoint https://gateway-a.watsonplatform.net/calls/text/TextGetRankedNamedEntities
apikey <yourapiKey>
outputMode xml
emotion 1
sentiment 1
knowledgeGraph 1
model <yourmodelId>
text NCR,whichcountslaterIBMThomasWatsonasoneofitsearlyemployees,saiditsproductsandservicesaccountformorethan$400billioninannualcommerceand23billionconsumerself-servicetransactions
26 Lab02–BuildingaMachine-learningAnnotatorwithWatsonKnowledgeStudio
66. IBMisrecognizedasorganizationandWatsonasaperson,theentitiesyoutrainedyourannotatorfor.
67. Ifyoutrythesamerequest,butnowremovingthemodelparameter,youshouldrealizethattheentitiesextractedaremuchmoregeneric
68. You can try the same exercise now with the AlchemyLanguageGetTypeRelationsmethod.YouwillseethatThomasWatsonisidentifiedasfounderOfIBM.Tryagainremovingthemodelparameterandcomparetheresults
EndofLab