Customizing KIM3

Embed Size (px)

Citation preview

  • 7/31/2019 Customizing KIM3

    1/15

    http://ontotext.com/kim 1

    CustomizingKIMAstepbystepguidefor

    integratinganewontologyinKIM,

    incorporatingitinthedefaulttext

    analysispipeline,andextractingnewtypesofentitiesandfacts.

  • 7/31/2019 Customizing KIM3

    2/15

    TableofContents

    1. Somebackgroundanddescriptionofthetask...............................................................................3

    2. Prerequisites.............................................................................................................................................4

    3. Procedure...................................................................................................................................................5

    3.1 ImportingDBpediaintheKIMsemanticrepository(OWLIM)andmakingitsotherdata

    sourcesawareofthisnewontology......................................................................................................................5

    3.1.1 ImportingDBpediainOWLIM.......................................................... ............................................................. ............5

    3.1.2 MappingDBpediatoPROTON.......................................................... ............................................................. ............5

    3.1.3 MarkingtheoriginoftheDBpediainstances........................................................... ...........................................6

    3.1.4 ManagingLabels....................................................... ............................................................ ...........................................7

    3.2 IncorporatingDBpediainthedefaultKIMIEpipeline.................. ................. .................. .................. ..8

    3.2.1 Loadingthegazetteerlists............................................................................................... ...........................................8

    3.2.2 Settingupthegazetteerprocessingresource........................................................................................ ............9

    3.2.3 AddingthegazetteertotheIEpipelineandseeingitinaction...............................................................10 3.3 Creatingannotations(GrammarRules).............. .................. ................. .................. .................. .........11

    3.4 Thewebinterface.................. ................. .................. .................. .................. ................. .................. ...............13

    3.5 Changingthevisibilityofresources......................... .................. ................. .................. .................. .........14

  • 7/31/2019 Customizing KIM3

    3/15

    3

    KIMcanbecustomizedinmultipleways tosuitdifferent semanticannotationandsearchneeds.Onewaytodo

    this isto change the text analysis pipeline to findnew typesof entities and facts, and use the conceptual

    modelsandinstancebasesrelevanttoacertaindomain.

    Thisguidedescribesthemethodsofadoptingathird-partyontology(DBpedia)inKIM,incorporatingitinthe

    defaultIEpipeline,andmakingthepipelineawareoftheknowledgebaseforthisnewmappedontology.

    1. SomebackgroundanddescriptionofthetaskTheresourcesintheKIMdefaultIEpipelinedependontheKIMdefaultontology-the PROTONontology.Itis

    theformalstructureoftheKIMknowledgebase.PROTONisagenericupper-levelontology,whichconsistsof

    about100classesand300propertiesof general worldly notions.If youwanttousethe full functionalityof

    KIM,whenusinganewontology,thebestistomapittoPROTON.

    DuetothecomplexityoftheIEprocess,addinganewontologytoKIMandmakingKIMawareofit,isnotaone

    stepprocess.IfthePROTONontologyandthenewoneareverysimilarasadomain,youonlyhavetoalign

    them,whichmeansmappingthetermsthatareusedforthesamenotions.Butiftheyarecompletelydifferent,

    thenyouhavetogothroughallthestagesofintegratinganontologyinKIM.Inmostcases,thetaskisamixture

    ofboth.PartofthenewontologymaybeusablebyjustaligningittoPROTON,andtheotherpart-bymaking

    theprocessingresourcesawareofthisnewpart(addingittolists,creatingnewgrammarrulesetc.).

    ForadditionalinformationabouthowtoextendtheKIMInformationExtractioncapabilities,please

    seetherespectivesectionintheKIMadministratorsguide:

    http://www.ontotext.com/kim/getting-started/documentation

  • 7/31/2019 Customizing KIM3

    4/15

    4

    2. PrerequisitesInordertointegratetheDBpediaontologyinKIMbyincorporatingitintheKIMdefaultIEpipeline,youwill

    need:

    KIMinstallation-canbedownloadedfrom http://ontotext.com/kim/KIM-download.html . Forguidesimplicitypurposes,wewillnametheKIMinstallationfolderKIM

    a DBpedia extract - a smallsubset of the original DBpedia (http://dbpedia.org) ontology - dbpedia-ontology.zip

    Figure1.AnextractofDbpediaontologyclasses

  • 7/31/2019 Customizing KIM3

    5/15

    5

    3. Procedure.3.1 ImportingDBpediaintheKIMsemanticrepository(OWLIM)andmakingitsotherdatasourcesawareofthisnewontology

    3.1.1 ImportingDBpediainOWLIM Createasub-folderdbpediaintheKIMcontextfolder.ItwillbeusedasstorageforalltheRDFdatainthis

    task.(KIM/context/default/kb/dbpedia/)

    Putthedbpedia_3.5.1.owlfile,containingtheDBpediaontology,inthedbpedia folder. Putdbpedia_instances.nt,containingtheactualobjectsdescription,in

    KIM/context/default/kb/dbpedia

    ellOWLIMtoloadtheseadditionalRDFdataatstart-up.Addthetwofilesdbpedia_3.5.1.owlanddbpedia_instances.nttothelistof importsand

    defaultNSdefinitionsinKIM/config/owlim.ttl.:

    owlim:imports

    .....

    kb/dbpedia/dbpedia_3.5.1.owl;

    kb/dbpedia/dbpedia_instances.nt;";

    owlim:defaultNS

    .....

    http://dbpedia.org/resource/;

    http://dbpedia.org/resource/;";

    NowyouhavearunningKIMwithDBpediaontologyloaded,butitisprettyautonomousandcannotchangethe

    IEprocessalot.

    3.1.2 MappingDBpediatoPROTONWhen mapping ontologies you look for classes that are identical and can be directly mapped. Here with

    DBpedia, the classes Place, Organization and Person can be directly mapped to Location,

    OrganizationandPersoninPROTON.

    @prefixrdfs:.

    @prefixprotons:.

    @prefixprotont:.

  • 7/31/2019 Customizing KIM3

    6/15

    CustomizingKIMAstepbystepguide

    Copyright2011OntotextAD Page:#of15 6

    @prefixdbpedia:.

    dbpedia:Placerdfs:subClassOfprotont:Location.

    dbpedia:Organisationrdfs:subClassOfprotont:Organization.

    dbpedia:Personrdfs:subClassOfprotont:Person.

    TherestoftheclassesdonothaveadirectequivalentinPROTON.Thereforeyoucansub-class themtothe

    classofentitiesthatismoregeneral,inthiscasetheprotons:Entity:

    dbpedia:Activityrdfs:subClassOfprotons:Entity.

    dbpedia:AnatomicalStructurerdfs:subClassOfprotons:Entity.

    ........

    Create a new file (e.g. dbpedia_proton.nt) into the dbpedia folder. There you put the mappingstatementstosub-classtheclassesofthenewontologytotheirPROTONequivalents:

    Finally add a record of this mapping file in the imports and defaultNS sections in

    KIM/config/owlim.ttl.

    Nowyouhaveentitiesfrombothontolgies(PROTONandDBpedia)inthesemanticrepository(OWLIM).Butin

    ordertoproperlyusethemforrecognizingtheirmentionsintexts,youneedsomehowtodifferentiatethem.

    3.1.3 MarkingtheoriginoftheDBpediainstancesYoucanspecifytheiroriginwiththeproperty protons:generatedBy.ThiswayyoumarkeveryinstancefromDBpediaasbeinggeneratedby http://dbpedia.org/page/DBpedia .

    @prefixprotons:.

    @prefixdbpage:.

    protons:generatedBydbpage:DBpedia.

    protons:generatedBydbpage:DBpedia.

    ........

    There are various approaches to retrieve the complete list of entities. In this case you can use regular

    expressionsandbashtoextractthemfromdbpedia_instances.ntfile.Create the RDF and store it ina file calleddbpedia_generated_by.n3.Putthefileinthe dbpedia

    folder, and then add (kb/dbpedia/dbpedia_generated_by.n3) to the list of imports and

    defaultNSdefinitionsinKIM/config/owlim.ttl.

  • 7/31/2019 Customizing KIM3

    7/15

    CustomizingKIMAstepbystepguide

    Copyright2011OntotextAD Page:#of15 7

    3.1.4 ManagingLabelsLabelsareveryimportantforentityrecognitionintext.TheKIMdatamodelreliesontheuseoftheproperties

    rdfs:labelandprotons:mainLabel.Therefore,weadviseyoutosetmeaningfullvaluesforrdfs:label

    andprotons:mainLabelforeachinstance.

    Butfirst,configurethepropertyENTITY_DESCRin KIM/config/install.properties.

    TheENTITY_DESCRpropertydeterminesthewaythesemanticrepositorystorestheentitylabels.Itcanbe

    set to -aliases or labels. Usinglabels is generally the preferred approach. It is simpler and more

    efficient. Aliases isusedin morecomplex cases,whereyou need to alsokeep metadata for the specific

    labels.Therefore,makesurethispropertyissettoLabels.

    After that, look at dbpedia_instances.nt. You can see that all instance labels are defined with the

    predicatefoaf:name.Createanrdfs:labelstatementforeachfoaf:namestatement.Youcandothisin

    severalways.

    Useinferencerulestocreatethenewstatements,ifyouseeapattern.TherulewillexistintheOWLIMinferencerulesdefinitionsinKIM/context/default/kb/KIMRules.pieandthisishowitwill

    looklike:

    e

    ename

    ----------------------

    ename

    tellOWLIMthatfoaf:nameandrdfs:labelarethesame:

    foaf:nameowl:sameAsrdfs:label.

    ThiswillcauseOWLIMtocreaterdfs:labelstatementforeveryfoaf:namestatementandviceversa.

    Ingeneralitisrecommendedtouseexplicitstatements.Socreateafilewithexplicitlabelsdefinitionscalled dbpedia_labels.n3 and put it in the dbpedia folder. Update the definitions in

    KIM/config/owlim.ttl.

    rdfs:label"Arsenal".

    rdfs:label"ArsenalFootballClub".

    rdfs:label"TheGunners".

    .......

    Finallywhatyouhavetoconsideristheprotons:mainLabel.Itservesasaprimaryrepresentationaspectof

    an entity in the graphical interface. The protons:mainLabel is actually a subproperty of rdfs:label.

    Therefore,apropertyrdfs:labelwillbeaddedtoeveryprotons:mainLabel.

  • 7/31/2019 Customizing KIM3

    8/15

    CustomizingKIMAstepbystepguide

    Copyright2011OntotextAD Page:#of15 8

    Definetheprotons:mainLabelforeachinstanceintheontology:

    @prefixprotons:.

    protons:mainLabel"ABBA".

    protons:mainLabel"Asia".

    ........

    Putthedefinitionsinthe dbpediafolderinafilecalleddbpedia_main_labels.n3andaddthefileto

    thelistofdefinitionsin KIM/config/owlim.ttl.

    Now you come to the stage whereyou incorporate DBpedia in the defaultKIM IEpipeline, and make the

    pipelineawareoftheknowledgebaseforthisnewmappedontology.

    3.2 IncorporatingDBpediainthedefaultKIMIEpipelineThedefaultKIMIEpipelineisacustomizedGATEpipeline.Theprocessingresourcethatactuallyfindstheentity

    mentionsintextsisthesocalledLargeKnowledgeBaseGazetteer(LKBGazetteer).

    3.2.1 LoadingthegazetteerlistsInthedefaultconfiguration,KIMcomeswithaworkinggazetteer,whichloadsitsdictionariesusingaSPARQL

    queryoverourRDFdata.Inthegeneralcase,makingthegazetteerusethenewentitiesisjustamatterof

    changingthequery,containedinKIM/config/query.txt.TheDBpediacase,however,isalittlebitmore

    complex,duetothecomplexityoftheontology(richverticalstructure).Therefore,wesuggestthatyousetupa

    newgazetteerforeachmajorclassofobjectsyouwanttorecognize.

    First,constructthequerytoloadthegazetteerlists:

    prefixrdfs:

    prefixprotont:

    PREFIXprotons:

    SELECT?la?entity?cl

    WHERE{

    ?entitya?cl;rdfs:label?la;

    protons:generatedBy.

    ?clrdfs:subClassOfprotont:Person.

    OPTIONAL

    {

    ?scrdfs:subClassOf?cl.

    ?entitya?sc.

    filter(?cl!=?sc)

  • 7/31/2019 Customizing KIM3

    9/15

    CustomizingKIMAstepbystepguide

    Copyright2011OntotextAD Page:#of15 9

    }

    filter(!bound(?sc)&&isURI(?cl))

    }

    Itwillreturnalistofallinstancesofclass protont:Personanditssubclasses.Thedbpedia:Personisits

    subclass,soitwillbeincludedintheresults.Theonlyrequirementforthequeryistoreturnthelabel,the

    instanceURIandtheclassURI,inthisorder.

    When you create the query, you can use some tools to see the actual results that will be loaded in the

    gazetteerdictionary.SuchtoolsareJVisualVMwiththeJConsoleextension.Orasimplewebservicecalllikethe

    onewedescribeinthedocumentation:

    http://www.ontotext.com/kim/getting-started/documentation (WSAPIsection)

    Create the folder KIM/contex/default/resources/gazetteer/dbpedia-person and put the

    queryinafilenamedquery.txtthere.

    Clearthecaches.

    Wheneveryoumakesomechangesthatconcerntheongology,youshouldremovetheOWLIMimage

    bydeletingKIM/context/default/populated.WhenyoustartKIMagain,itwillgenerateafresh

    image.

    3.2.2 SettingupthegazetteerprocessingresourceOpentheKIMGATEinterfacebyrunningKIM/bin/kim gate

    CreateaLargeKBGazetteerresourcewiththefollowingsetup:

    Figure2.LKBGazetteersetup

    Abriefdescriptionoftheproperties:

    annotationLimit - when thegazetteercreates theamountof Lookups indicated in this propertyvalue,itstops

    caseSensitive-whetherthematchingiscasesensitiveornot

  • 7/31/2019 Customizing KIM3

    10/15

    CustomizingKIMAstepbystepguide

    Copyright2011OntotextAD Page:#of15 10

    dictFeederClass-settocom.ontotext.kim.model.KimDictionaryFeederImpl dictFeederParamssetthedirectoryyoucreatedforPerson FeedSetupPath=$relpath$resources/gazetteer/dbpedia-person dynamicDictEnabled-settofalse feedTransformerStages-additionaltransformationsovertheterms outputASName-theLookupannotationsarecreatedinthissetPerson relpath-setthistoKIM/context/default/resources staticDictEnabled-setthistotrue staticDictSerializationPath-thecacheisstoredhere

    Whenthegazetteerinitializesforthefirsttime,itwilllookforafilenamed query.txtinthefoldersetinthe

    dictFeederParams.Thegazetteerwillreadthequeryfromthereandinitializeitsdictionary.BothSPARQL

    andSeRQLcanbeused.Whenyoudesignyourquery,itisimportanttousetheexactorderandmeaningofthe

    queryparameters-label,instanceURI,directclass.Thenamesarenotimportant.

    TheoutputASNameissettoPerson.ThismeansthatthegazetteerwillcreateitsLookupannotationsinthis

    annotationset.Thisishowyouwilldifferentiatebetweentherecognizedconceptsbythisgazetteerandothergazetteers.

    Afterwards,savetheapplicationstatetoKIM/context/default/resources/IE.gapp.

    Ifyouwantthegazetteertocreateitsdictionariesanew,youmustremovethecachefromthefolder

    youhavesetinstaticDictSerializationPath.Inthiscase

    KIM/context/default/populated/gazetteer-person.

    3.2.3 AddingthegazetteertotheIEpipelineandseeingitinactionNowtheresourceisloadedintomemory,andyouhavetoaddittothepipeline.

    Figure3.AddingtheprocessingresourcetotheIEpipeline

  • 7/31/2019 Customizing KIM3

    11/15

    CustomizingKIMAstepbystepguide

    Copyright2011OntotextAD Page:#of15 11

    CreateadocumentfromthewikipediaarticleaboutAristotle(http://en.wikipedia.org/wiki/Aristotle ).Additto

    thecorpusandrunthepipelineoverit.Youwillbeabletoseeyournewgazetteerinaction:

    Figure4.LookupannotationsintheGATEGUI

    3.3 Creatingannotations(GrammarRules)The IE pipeline creates annotations. The gazetteer creates Lookup annotations, but there are also other

    temporaryannotationswhichonlyroleis tohelpotherprocessing resourcesandrulestocreatemeaningful

    annotations at the end. KIM has a whitelist with all these meaningful annotation types in the property

    IE_ANN_TYPES inKIM/config/nerc.properties. At the end of the processing of a document, all

    annotationsnotinthislistareremoved.

    NowthetaskistotransformallLookupsfromthePersonannotationsetintoPersonannotationsinthe

    defaultannotationset.Theeasiestwayiswithajaperule.TheJaperulelookslikethis:

    Phase:dbpedia_person

    Input:Person

    Options:control=all

    Rule:dbpedia_person

    ({Lookup}):match

    -->

    :match.Person = { class = :match.Lookup.class , inst = :match.Lookup.inst,

    rule=dbpedia_person}

  • 7/31/2019 Customizing KIM3

    12/15

    CustomizingKIMAstepbystepguide

    Copyright2011OntotextAD Page:#of15 12

    Therule does nottransform theexisting Lookupannotations. ItcreatesnewPerson annotations

    overthesamephrasesasthelookups.

    Putthisgrammarinafilecalleddbpedia_person.japeandstoreitin

    KIM/context/default/resources/grammar/dbpedia. Then create a Jape transducer processing

    resourceforthisgrammarintheGATEUIandplaceitafterthenewgazetteerinthepipeline.Theimportant

    thing here is to set the inputASName parameter to the new annotation set (Person) and leave the

    outputASNameparameterempty.ThiswillmaketherulematchannotationsfromthePersonannotationset

    andcreateannotationsinthedefaultannotationset.

    Figure5TheJapetransducerprocessingresourceintheGATEUI

    Runthepipelineoveryourtestdocumentandseehowitwillcreatenewannotationsoftype Personinthe

    defaultannotationset.

    StorethechangesyoumadeinGATE,sothatKIMwillbeabletousethemlater.Youcandothisbyrightclicking

    onthepipelineandchosingSave Application State.Thensave itovertheapplicationdescriptor

    fileKIMuses,whichinthiscaseisKIM/context/default/resources/IE.gapp.Theexactpathisset

    intheIE_APP propertyin KIM/config/nerc.properties.

  • 7/31/2019 Customizing KIM3

    13/15

    CustomizingKIMAstepbystepguide

    Copyright2011OntotextAD Page:#of15 13

    3.4 ThewebinterfaceWhenyourunKIMwiththechangedpipelineandannotatesomedocuments,thenewentitieswillappearin

    thewebinterface:

    Figure6.ThenewentitiesintheKIMwebUI

  • 7/31/2019 Customizing KIM3

    14/15

    CustomizingKIMAstepbystepguide

    Copyright2011OntotextAD Page:#of15 14

    3.5 ChangingthevisibilityofresourcesInordertobeabletoseeandusethenewclassesandpropertiesinthewebinterface,KIMusesavisibility

    mechanism.Thenewresourceshavetobedefinedasvisiblein

    KIM/context/default/kb/visibility.nt.Hereisanexcerpt:

    "".

    "".

    ........

    ClearthecacheofKIM(removeKIM/context/default/populated)andstartitagain,youshouldbe

    abletoseetheadditionalclassesintheontologyview:

  • 7/31/2019 Customizing KIM3

    15/15

    CustomizingKIMAstepbystepguide

    15

    Figure7.ThenewontologyintheKIMWebUI

    WehaveshownhowtoutilizeonlythePersonclassfromthenewontology.Theprocessisidenticalfortheotherclasses.

    DescribedhereisonlyasingleapproachforadoptinganewontologyintoKIM.Everyontologyisuniqueand

    mayrequireadifferentsetup.Forexample,youcanuseonlyasinglegazetteertocreateLookupannotations

    foralltheinstances,andthenjaperulestodeterminetheirrealannotationtype.Thiswillrequireonlychanging

    thedefaultgazetteerqueryinKIM/config/query.txt andwritingsomejaperules.