21
KafNafParserPy A python library for parsing KAF/NAF Ruben Izquierdo Bevia Vrije University of Amsterdam CLTL meeting 19 th Nov 2014

KafNafParserPy: a python library for parsing/creating KAF and NAF files

Embed Size (px)

DESCRIPTION

This a presentation about how to use and install the KafNafParserPy library, which could be used to read and modify KAF or NAF files

Citation preview

Page 1: KafNafParserPy: a python library for parsing/creating KAF and NAF files

KafNafParserPyA python library for parsing KAF/NAF

Ruben Izquierdo Bevia

Vrije University of Amsterdam

CLTL meeting 19th Nov 2014

Page 2: KafNafParserPy: a python library for parsing/creating KAF and NAF files

What is KAF / NAF ?• Annotations formats to represent linguistic information

o XML based

o Different information in different layers interconnected

o Easy to be used in NLP pipelines

• KAF o https://github.com/opener-project/kaf/wiki/KAF-structure-overview

• NAF o http://www.newsreader-project.eu/files/2013/01/techreport.pdf

Page 3: KafNafParserPy: a python library for parsing/creating KAF and NAF files

What is the KafNafParserPy

• It is a Python module/library

• It allows to parse a KAF or NAF fileo Read all the layers

o Provides access to the information by means of python classes (methods and

attributes)

• It allows to generate new KAF/NAF fileso Create new layers

o Modify existing ones

• It allows to convert NAF KAF

Page 4: KafNafParserPy: a python library for parsing/creating KAF and NAF files

KafNafParserPyphilosophy

• No validation against DTD (just valid as XML)

• Python object for each XML element (header, text,

token,terms…)

• The attributes are not “parsed/read”o The KAF/NAF attributes are not defined as attributes for a class

o Just the pointer to the XML element is stored

• It provides access to all the attributes on “real time”

• Modifications are made “on the fly”

o If you change the object in memory you will need to dump it to a new

file to keep the results

Page 5: KafNafParserPy: a python library for parsing/creating KAF and NAF files

KafNafParserPyphilosophy

• Class Cterm (encapsulate a KAF/NAF term)o Attributes:

• string lemma

• string pos

• string morphofeat

• Cspan span ….

o Methods

• get_lemma(…) returns the lemma attribute

• get_pos(…) returns the pos attribute

• …..

Page 6: KafNafParserPy: a python library for parsing/creating KAF and NAF files

KafNafParserPyphilosophy

• Class Cterm (encapsulate a KAF/NAF term)o Attributes:

• string lemma

• string pos

• string morphofeat

• Cspan span ….

o Methods

• get_lemma(…) returns the lemma attribute

• get_pos(…) returns the pos attribute

• …..

Page 7: KafNafParserPy: a python library for parsing/creating KAF and NAF files

KafNafParserPyphilosophy

• Class Cterm (encapsulate a KAF/NAF term)o Attributes:

• string type (is NAF or KAF?)

• Pointer to the xml element

o Methods

• get_lemma(…) returns xml_element.get(‘lemma’)

• get_pos(…) returns xml_element.get(‘pos’)

• get_id(…)

o xml_element.get(‘id’) for NAF

o xml_element.get(‘tid’) for KAF

Page 8: KafNafParserPy: a python library for parsing/creating KAF and NAF files

Getting Started I• https://github.com/cltl/KafNafParserPy

• Basic steps:o Install lxml library for Python

• pip install lxml

o Clone the repository

• git clone https://github.com/cltl/KafNafParserPy

o Make it available for Python

• Put it on the same folder of the scripts that will import

• Add it to PYTHON_PATH

• Create a symbolic link in your virtualenv

• …

Page 9: KafNafParserPy: a python library for parsing/creating KAF and NAF files

Getting Started II• Documentation:

o HTML: http://kyoto.let.vu.nl/~izquierdo/api/KafNafParserPy/

o PDF: http://kyoto.let.vu.nl/~izquierdo/api/KafNafParserPy/api.pdf

• Entry point alwayso Module KafNafParserPy

o Class KafNafParser

Page 10: KafNafParserPy: a python library for parsing/creating KAF and NAF files

Getting tokens• How could I?

o We just have a “KafNafParser” object

• Go to the API and check the methods for the

KafNafParser class

Page 11: KafNafParserPy: a python library for parsing/creating KAF and NAF files

Getting tokens• How could I?

o We just have a “KafNafParser” object

• Go to the API and check the methods for the

KafNafParser class

Page 12: KafNafParserPy: a python library for parsing/creating KAF and NAF files

Getting tokens• How could I?

o We just have a “KafNafParser” object

• Go to the API and check the methods for the

KafNafParser class

Page 13: KafNafParserPy: a python library for parsing/creating KAF and NAF files

Getting tokens

Page 14: KafNafParserPy: a python library for parsing/creating KAF and NAF files

Getting terms• Use KafNafParser::get_terms(…)

• Use methods of Cterm

Page 15: KafNafParserPy: a python library for parsing/creating KAF and NAF files

Modifying one token• Change w7->War to Battle

Page 16: KafNafParserPy: a python library for parsing/creating KAF and NAF files

Modifying one token• Object “my_parser” after set_text(…)

o is updated with “Battle” in memory

o Original file “entities_example.naf” is not changed

• If we want to keep the changeso Close the program clean memory changes lost

o We will need to dump the object to a new file

• Could be a (string) filename or an open file

Page 17: KafNafParserPy: a python library for parsing/creating KAF and NAF files

Read entities• KafNafParser::get_entities() is an iterator for

entities• Centity::get_external_references() is an iterator for

external references

Page 18: KafNafParserPy: a python library for parsing/creating KAF and NAF files

Adding a new external reference

1. Create the new object external referenceo “from KafNafParserPy import KafNafParser”

o “from KafNafParserPy import *”

2. Set the attributes with the set_XYZ() methods

1. Add the new object to the layer/treeo By adding it to the specific element (the entity if we have it)

o By adding it to the general parser object providing the identifier (sometimes not

implemented)

Page 19: KafNafParserPy: a python library for parsing/creating KAF and NAF files

Adding a new external reference

• Create the new external reference• Find the element where we want to add it• Use the “adding” method of the element

Page 20: KafNafParserPy: a python library for parsing/creating KAF and NAF files

Adding a new external reference

• Create the new external reference• Use the “adding” method of the parser and

providing the id• Not always implemented (quite easy to do)

Page 21: KafNafParserPy: a python library for parsing/creating KAF and NAF files

KafNafParserPyRuben Izquierdo Bevia

[email protected]

http://rubenizquierdobevia.com

GitHubhttps://github.com/cltl/KafNafParserPy

API htmlhttp://kyoto.let.vu.nl/~izquierdo/api/KafNafParserPy/

API pdfhttp://kyoto.let.vu.nl/~izquierdo/api/KafNafParserPy/api.pdf