Upload
beverly-simpson
View
226
Download
0
Embed Size (px)
Citation preview
MinorThird
A collection of Java classes for Storing text Annotating text Learning to extract entities Categorize text
What's Different About MinorThird
Differs from existing NLP and learning toolkits
Combines tools for annotating and visualizing text with state-of-the art learning methods
Contains methods to visualize Both training data and the performance of classifiers Facilitates debugging
Integrated with text manipulation tools Possible to track and visualize the transformation of text data into machine
learning data
Architected to support active learning and on-line learning Should facilitate integration of learning methods into agents
Components
TextBase A collection of documents
TextLabels Logical assertions about documents in a TextBase A type of stand off annotation
The annotation are completely independent of the text Assert a category or property for a word, a document, or a subsequence
of words(span) by human labelers or by a learned program encode
syntactic properties like shallow parser or POS tags
semantic properties like the functional role that entities play in a sentence
Components
Repository Annotated TextBases are accessed in a single uniform way. However,
they are stored in one of several schemes. Repository can be configured to hold a bunch of TextLabels and their
associated TextBases.
Mixup (Minorthird Information eXtraction and Understanding Program) A special-purpose annotation language
Moderately complex hand-coded annotation programs can be implemented with Mixup
Based on the widely used notion of cascaded finite state transducers Includes some powerful features
A GUI debugging environment Escape to Java A kind of subroutine call mechanism