Upload
ciro-neto
View
287
Download
2
Tags:
Embed Size (px)
Citation preview
2
Practical session outcomes • Participants will learn to use NIF API to
annotate strings and documents using the following wrappers:–OpenNLP–Stanford Core NLP–Snowball Stemmer–DBpedia Spotlight
• Query your corpus using SPARQL
4
Snowball Stemmer Wrapper
• Stemming algorithm is a process for removing suffixes from words.–CONNECT• CONNECTED• CONNECTION• CONNECTING• CONNECTIONS
5
Snowball Stemmer Wrapper
java -jar snowball.jar -f text -i 'I am
connected.'
• -f is used to define the format• -i is used to define the input
10
Annotating Strings: Step-by-step
• 1. Open the USB stick folder• 2. Decompress the “session-nif.zip” folder • 3. Open the “NIF_DATATHON” folder and
decompress “NIF_tutorial_hands_on_jars.zip” • Open the prompt command, and use the
commands from the next slide in the “jar” folder.
11
Available Wrappers• To annotate documents, use the local wrappers (USB Stick)
java -jar opennlp.jar -f text -i 'This is a test.' -modelFolder ../model/
java -jar stanford.jar -f text -i 'This is a test.'
java -jar snowball.jar -f text -i 'This is my favorite test.'
java -jar spotlight.jar -f text -i 'Welcome to Germany.' -confidence 0.2
• To annotate small strings, you can try the on-line services: http://spotlight.nlp2rdf.aksw.org/spotlight?
f=text&i=Welcome+to+Germany.&t=direct&confidence=0.3&prefix=http://yourDomain.org/
• http://snowball.nlp2rdf.aksw.org/snowball?f=text&i=This+is+my+favorite+test.&t=direct&prefix=http://yourDomain.org/
• http://stanford.nlp2rdf.aksw.org/stanfordcorenlpn?f=text&i=This+is+a+test.&t=direct&prefix=http://yourDomain.org/
• http://opennlp.nlp2rdf.aksw.org/opennlp?f=text&i=This+is+a+test.&t=direct&modelFolder=model&prefix=http://yourDomain.org
12
Reading and Writing Files
• Write results in a file:“--outfile myAnnotatedFile.ttl“
• Read a document as input“--intype file -i /path/myDoc”
13
POS tagger for multiple languages
• The -modelFolder parameter set the folder that contains the POS tagging OpenNLP trained models and tokenization.
• Different languages can be found at OpenNLP website
http://opennlp.sourceforge.net/models-1.5/http://opennlp.sourceforge.net/models-1.5/
30
Querying your own NIF annotated corpus
1. Annotate your string using one of the wrappers2. Save your annotated sentence to a file (using “--outfile”)3. Open Twinkle4. Query your corpus using Twinkle
31
• Query your annotated corpus:– nif:Context– nif:Sentence– nif:anchorOf – nif:oliaCategory– nif:oliaLink
… or practice with Brown Corpus!