HW7 Extracting Arguments for %

Preview:

DESCRIPTION

HW7 Extracting Arguments for %. Ang Sun asun@cs.nyu.edu March 25, 2012. Outline. File Format Training Generating Training Examples Extracting Features Training of MaxEnt Models Decoding Scoring. File Format. - PowerPoint PPT Presentation

Citation preview

HW7 Extracting Arguments for %

Ang Sunasun@cs.nyu.eduMarch 25, 2012

Outline• File Format

• Training– Generating Training Examples– Extracting Features– Training of MaxEnt Models

• Decoding

• Scoring

File Format

• Statistics Canada said service-industry <ARG1> output </ARG1> in August <SUPPORT> rose </SUPPORT> 0.4 <PRED class="PARTITIVE-QUANT"> % </PRED> from July .

• Generating Training Examples– Positive Example• Only one positive example for a sentence• The one with the annotation ARG1

• Generating Training Examples– Negative Examples

• Two methods!• Method 1: consider any token that has one of the following POSs

– NN 1150– NNS 905– NNP 205– JJ 25– PRP 24– CD 21– DT 16– NNPS 13– VBG 2– FW 1– IN 1– RB 1– VBZ 1– WDT 1– WP 1

Too many negative examples!

• Generating Training Examples– Negative Examples• Two methods!• Method 2: only consider head tokens

• Extracting Featuresf: candToken=output

• Extracting Featuresf: tokenBeforeCand=service-industry

• Extracting Featuresf: tokenAfterCand=in

• Extracting Featuresf:

tokensBetweenCandPRED=in_August_rose_0.4

• Extracting Featuresf: numberOfTokensBetween=4

• Extracting Featuresf: exisitVerbBetweenCandPred=true

• Extracting Featuresf: exisitSUPPORTBetweenCandPred=true

• Extracting Featuresf: candTokenPOS=NN

• Extracting Featuresf: posBeforeCand=NN

• Extracting Featuresf: posAfterCand=IN

• Extracting Featuresf: possBetweenCandPRED=IN_NNP_VBD_CD

• Extracting Featuresf: BIOChunkChain=

I-NP_B-PP_B-NP_B-VP_B-NP_I-NP

• Extracting Featuresf: chunkChain=

NP_PP_NP_VP_NP

• Extracting Featuresf: candPredInSameNP=False

• Extracting Featuresf: candPredInSameVP=False

• Extracting Featuresf: candPredInSamePP=False

• Extracting Featuresf: shortestPathBetweenCandPred=

NP_NP-SBJ_S_VP_NP-EXT

• Training of MaxEnt Model– Each training example is one line• candToken=output . . . . . class=Y• candToken=Canada . . . . . Class=N

– Put all examples in one file, the training file

– Use the MaxEnt wrapper or the program you wrote in HW5 to train your relation extraction model

Decoding

• For each sentence– Generate testing examples as you did for training

• One example per feature line (without class=(Y/N))

– Apply your trained model to each of the testing examples

– Choose the example with the highest probability returned by your model as the ARG1

– So there should be and must be one ARG1 for each sentence

Scoring

• As you are required to tag only one ARG1 for each sentence

• Your system will be evaluated based on accuracy– Accuracy = #correct_ARG1s / #sentences

Good Luck!

Recommended