47
PASCAL PASCAL CHALLENGE ON INFORMATION EXTRACTION & MACHINE LEARNING Designing K now ledge M anagem ent using A daptive Inform ation Extraction from Text PASCAL N etw ork ofExcellence on Pattern A nalysis, StatisticalM odelling and Com putationalLearning C allfor participation: Evaluating M achine Learning forInform ation Extraction July 2004 -N ovem ber 2004 The D ot.K om European projectand the PascalN etw ork ofExcellence invite you in participating in the Challenge on Evaluation ofM achine Learning forInform ation Extraction from D ocum ents.G oalofthe challenge isto assessthe currentsituation concerning M achine Learning (M L)algorithm sforInform ation Extraction (IE),identifying future challengesand to foster additionalresearch in the field. G iven a corpusofannotated docum ents, the participantsw illbe expected to perform a num beroftasks;each exam ining differentaspectsofthe learning process. Corpus A standardised corpusof1100 W orkshop CallforPapers(CFP)w illbe provided.600 ofthese docum entsw illbe annotated w ith 12 tagsthatrelate to pertinentinform ation (nam es, locations, dates, etc.).Ofthe annotated docum ents400 w illbe provided to the participantsasa training set,the rem aining 200 w illform the unseen testsetused in the finalevaluation.A llthe docum ents w illbe pre-processed to include tokenisation,part-of-speech and nam ed-entity inform ation. Tasks Fullscenario:The only m andatory task forparticipantsislearning to annotate im plicitinform ation:given the 400 training docum ents,learn the textualpatternsnecessary to extractthe annotated inform ation.Each participantprovidesresultsofa four-fold cross-validation experim entusing the sam e docum entpartitionsforpre-com petitive tests.A finaltestw illbe perform ed on the 200 unseen docum ents. A ctive learning:Learning to selectdocum ents:the 400 training docum entsw illbe divided into fixed subsetsofincreasing size (e.g. 10, 20,30,50,75, 100, 150,and 200). The use ofthe subsetsfortraining w illshow effectoflim ited resourceson the learning process.Secondly,given each subsetthe participantscan selectthe docum entsto add to increm entto the nextsize (i.e. 10 to 20,20 to 30,etc.),thusshow ing the ability to selectthe m ostsuitable setofdocum entsto annotate. Enriched Scenario:the sam e procedure astask 1,exceptthe participantsw illbe able to use the unannotated partofthe corpus(500 docum ents).Thisw illshow how the use ofunsupervised orsem i-supervised m ethodscan im prove the resultsofsupervised approaches.A n interesting variantofthistask could concern the use ofunlim ited resources,e.g.the W eb. Participation Participantsfrom differentfieldssuch asm achine learning, textm ining,naturallanguage processing,etc. are welcom e.Participation in the challenge isfree.A fter registration,participantw ill receive the corpusofdocum entsto train on and the precise instructionson the tasksto be perform ed.A tan established date,participantsw illbe required to subm ittheirsystem s’answ ersvia a W eb portal.An autom atic scorerw illcom pute the accuracy ofextraction.A paperw illhave to be produced in orderto describe the system and the resultsobtained.Resultsofthe challenge w illbe discussed in a dedicated w orkshop. Timetable 5 th July 2004:Form aldefinition ofthe tasks,annotated corpusand evaluation server 15 th O ctober2004:Form alevaluation N ovem ber2004:Presentation ofevaluation atPascalw orkshop O rganizers Fabio Ciravegna:U niversity ofSheffield,U K ;(coordinator) M ary Elaine Califf,IllinoisState U niversity, U SA , Neil Ireson Local Challenge Coordinator Web Intelligent Group Department of Computer Science University of Sheffield

P ASCAL C HALLENGE ON I NFORMATION E XTRACTION & M ACHINE L EARNING

Embed Size (px)

DESCRIPTION

P ASCAL C HALLENGE ON I NFORMATION E XTRACTION & M ACHINE L EARNING. Neil Ireson Local Challenge Coordinator Web Intelligent Group Department of Computer Science University of Sheffield. Organisers. Sheffield – Fabio Ciravegna UCD Dublin – Nicholas Kushmerick - PowerPoint PPT Presentation

Citation preview

Page 1: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

PASCAL CHALLENGE ON INFORMATION EXTRACTION

& MACHINE LEARNING

Designing Knowledge Management using Adaptive Information Extraction from Text

PASCAL Network of Excellence on Pattern Analysis, Statistical Modelling and Computational Learning

Call for participation:

Evaluating Machine Learning for Information Extraction

July 2004 - November 2004

The Dot.Kom European project and the Pascal Network of Excellence invite you in participating in the Challenge on Evaluation of Machine Learning for Information Extraction from Documents. Goal of the challenge is to assess the current situation concerning Machine Learning (ML) algorithms for Information Extraction (IE), identifying future challenges and to foster additional research in the field. Given a corpus of annotated documents, the participants will be expected to perform a number of tasks; each examining different aspects of the learning process.

Corpus A standardised corpus of 1100 Workshop Call for Papers (CFP) will be provided. 600 of these documents will be annotated with 12 tags that re late to pertinent information (names, locations, dates, etc.). Of the annotated documents 400 will be provided to the participants as a training set, the remaining 200 will form the unseen test set used in the final evaluation. All the documents will be pre-processed to include tokenisation, part-of-speech and named-entity information.

Tasks Full scenario: The only mandatory task for participants is learning to annotate implicit information: given the 400 training documents, learn the textual patterns nece ssary to extract the annotated information. Each participant provides results of a four-fold cross-validation experiment using the same document partitions for pre-competitive tests. A final test will be performed on the 200 unseen documents. Active learning: Learning to select documents: the 400 training documents will be divided into fixed subsets of increasing size (e.g. 10, 20, 30, 50, 75, 100, 150, and 200). The use of the subsets for training will show effect of limited resources on the learning process. Secondly, given each subset the participants can select the documents to add to increment to the next size (i.e. 10 to 20, 20 to 30, etc.), thus showing the ability to select the most suitable set of documents to annotate. Enriched Scenario: the same procedure as task 1, except the participants will be able to use the unannotated part of the corpus (500 documents). This will show how the use of unsupervised or semi-supervised methods can improve the results of supervised approaches. An interesting variant of this task could concern the use of unlimited resources, e.g. the Web.

Participation Participants from different fields such as machine learning, text mining, natural language processing, etc. are welcome. Participation in the challenge is free. After registration, participant will receive the corpus of documents to train on and the precise instructions on the tasks to be performed. At an established date, participants will be required to submit their systems’ answers via a Web portal. An automatic scorer will compute the accuracy of extraction. A paper will have to be produced in order to describe the system and the results obtained. Results of the challenge will be discussed in a dedicated workshop.

Timetable 5th July 2004: Formal definition of the tasks, annotated corpus and evaluation server 15th October 2004: Formal evaluation November 2004: Presentation of evaluation at Pascal workshop

Organizers Fabio Ciravegna: University of Sheffield, UK; (coordinator) Mary Elaine Califf, Illinois State University, USA,

Neil Ireson

Local Challenge Coordinator

Web Intelligent GroupDepartment of Computer ScienceUniversity of Sheffield

Page 2: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Organisers• Sheffield – Fabio Ciravegna

• UCD Dublin – Nicholas Kushmerick

• ITC-IRST – Alberto Lavelli

• University of Illinois – Mary-Elaine Califf

• FairIsaac – Dayne Freitag

Website• http://tyne.shef.ac.uk/Pascal

Page 3: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Outline

• Challenge Goals

• Data

• Tasks

• Participants

• Results on Each Task

• Conclusion

Page 4: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Goal : Provide a testbed for comparative evaluation of ML-based IE

• Standardised data• Partitioning• Same set of features

– Corpus preprocessed using Gate– No features allowed other than the ones provided

• Explicit Tasks• Standard Evaluation

• Provided independently by a server

• For future use• Available for further test with same or new systems• Possible to publish and new corpora or tasks

Page 5: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Data (Workshop CFP)2005

1993

2000

Training Data

400 Workshop CFP

Testing Data

200 Workshop CFP

Page 6: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Data (Workshop CFP)2005

1993

2000

Training Data

400 Workshop CFP

Testing Data

200 Workshop CFP

Set0

Set1

Set2

Set3

Page 7: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Data (Workshop CFP)2005

1993

2000

Training Data

400 Workshop CFP

Testing Data

200 Workshop CFP

Set0

Set1

Set2

Set3

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9

Page 8: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Data (Workshop CFP)2005

1993

2000

Training Data

400 Workshop CFP

Testing Data

200 Workshop CFP

Enrich Data 1

250 Workshop CFP

Enrich Data 2

250 Conference CFP

WWW

Page 9: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Preprocessing

• GATE– Tokenisation– Part-Of-Speech– Named-Entities

• Date, Location, Person, Number, Money

Page 10: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Annotation Exercise

• 4+ months• Initial consultation• 40 documents – 2 annotators• Second consultation• 100 documents – 4 annotators• Determine annotation

disagreement• Full annotation – 10 annotators

AnnotatorsChristopher BrewsterSam ChapmanFabio CiravegnaClaudio GiulianoJose IriaAshred KhanVita LanfranchiAlberto LavelliBarry Norton

Page 11: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Page 12: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Annotation SlotsTraining Corpus Test corpus

workshop name 543 11.8% 245 10.8%

acronym 566 12.3% 243 10.7%

homepage 367 8.0% 215 9.5%

location 457 10.0% 224 9.9%

date 586 12.8% 326 14.3%

paper submission date 590 12.9% 316 13.9%

notification of acceptance date 391 8.5% 190 8.4%

camera-ready copy date 355 7.7% 163 7.2%

conference name 204 4.5% 90 4.0%

acronym 420 9.2% 187 8.2%

homepage 104 2.3% 75 3.3%

Total 4583 100.0% 2274 100.0%

Page 13: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Evaluation Tasks

• Task1 - ML for IE: Annotating implicit information – 4-fold cross-validation on 400 training documents

– Final Test on 200 unseen test documents

• Task2a - Learning Curve: – Effect of increasing amounts of training data on learning

• Task2b - Active learning: Learning to select documents – Given seed documents select the documents to add to training set

• Task3a - Enriched Data:– Same as Task1 but can use the 500 unannotated documents

• Task3b - Enriched & WWW Data:– Same as Task1 but can use all available unannotated documents

Page 14: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Evaluation

• Precision/Recall/F1Measure

• MUC Scorer

• Automatic Evaluation Server

• Exact matching

• Extract every slot occurrence

Page 15: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

ParticipantsParticipant ML 4-fold X-validation Test Corpus

1 2a 2b 3a 3b 1 2a 2b 3a 3b

Amilcare (Sheffield, UK) LP2 2 2 1 1 1 1 1

Bechet (Avignon, France) HMM 2 1 2 2

Canisius (Netherlands) SVM, IBL 1 1

Finn (Dublin, Ireland) SVM 1 1

Hachey (Edinburgh, UK) MaxEnt, HMM 1 1

ITC-IRST (Italy) SVM 3 3 1

Kerloch (France) HMM 2 2 3 2

Sigletos (Greece) LP2, BWI, ? 1 3

Stanford (USA) CRF 1 1

TRex (Sheffield, UK) SVM 2

Yaoyong (Sheffield, UK) SVM 3 3 3 3 3 3

Total 15 8 4 0 0 20 10 5 1 1

Page 16: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Task1

Information Extraction with all the available data

Page 17: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Task1: Test Corpus

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Precision

Re

call

AmilcareStanfordYaoyongITC-IRSTSigletosCanisiusTrexBechetFinnKerloch

Page 18: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Task1: Test Corpus

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Precision

Re

call

AmilcareStanfordYaoyongITC-IRSTSigletosCanisiusTrexBechetFinnKerloch

Page 19: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Task1: 4-Fold Cross-validation

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Precision

Rec

all

Amilcare

Yaoyong

ITC-IRST

Sigletos

Canisius

Bechet

Finn

Kerloch

Page 20: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Task1: 4-Fold & Test Corpus

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Precision

Rec

all

Amilcare

Yaoyong

ITC-IRST

Sigletos

Canisius

Bechet

Finn

Kerloch

Page 21: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Task1: Slot FMeasure

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Mean

Max

Page 22: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Best Slot FMeasures Task1: Test Corpus

Amilcare1 Yaoyong1 Stanford1 Yaoyong2 ITC-IRST2name 0.352 0.58 0.596 0.542 0.66acro 0.865 0.612 0.496 0.6 0.383date 0.694 0.731 0.752 0.69 0.589home 0.721 0.748 0.671 0.705 0.516loca 0.488 0.641 0.647 0.66 0.542pape 0.864 0.74 0.712 0.696 0.712noti 0.889 0.843 0.819 0.856 0.853came 0.87 0.75 0.784 0.747 0.783name 0.551 0.503 0.493 0.477 0.481acro 0.905 0.445 0.491 0.387 0.348home 0.393 0.149 0.151 0.116 0.119

workshop

conference

Page 23: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Slot Recall: All Participants

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7 8 9 10

Workshop name

Workshop acro

Workshop date

Workshop home

Workshop loca

Workshop pape

Workshop noti

Workshop came

Conference name

Conference acro

Conference home

Page 24: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Task 2a

Learning Curve

Page 25: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Task2a: Learning Curve FMeasure

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Amilcare

Yaoyong1

Yaoyong2

Yaoyong3

ITC-IRST1

Bechet2

Kerloch3

Bechet1

Kerloch2

Hachey

MEAN

Page 26: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Task2a: Learning Curve Precision

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Amilcare

Yaoyong1

Yaoyong2

Yaoyong3

ITC-IRST1

Bechet2

Kerloch3

Bechet1

Kerloch2

Hachey

MEAN

Page 27: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Task2a: Learning Curve Recall

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Amilcare

Yaoyong1

Yaoyong2

Yaoyong3

ITC-IRST1

Bechet2

Kerloch3

Bechet1

Kerloch2

Hachey

MEAN

Page 28: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Task 2b

Active Learning

Page 29: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Active Learning (1)

400 Potential Training Documents

200 Test Documents

Page 30: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Active Learning (1)

360 Potential Training Documents

40 SelectedTraining

Document

200 Test Documents

Select

Test

Page 31: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Active Learning (2)

360 Potential Training Documents

200 Test Documents

Subset040 Training Documents

Extract

Page 32: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Active Learning (2)

320 Potential Training Documents

40 SelectedTraining

Documents

200 Test Documents

Select

TestSubset040 Training Documents

Page 33: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Active Learning (3)

320 Potential Training Documents

200 Test Documents

Subset0,180 Training Documents

Extract

Page 34: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Active Learning (3)

280 Potential Training Documents

40 SelectedTraining

Documents

200 Test Documents

Select

TestSubset0,180 Training Documents

Page 35: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Task2b: Active Learning

• Amilcare– Maximum divergence from expected number of

tags.

• Hachey– Maximum divergence between two classifiers

built on different feature sets.

• Yaoyong (Gram-Schmidt)– Maximum divergence between example subset.

Page 36: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Task2b: Active LearningIncreased FMeasure over random selection

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Amilcare

Yaoyong1

Yaoyong2

Yaoyong3

Hachey

Page 37: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Task 3

Semi-supervised learning

(not significant participation)

Page 38: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Conclusions (Task1)

• Top three (4) systems use different algorithms– Amilcare : Rule Induction– Yaoyong : SVM– Stanford : CRF– Hachey : HMM

Page 39: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Conclusions (Task1: Test Corpus)• Same algorithms (SVM) produced different results

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

P recision

Yaoyong

ITC-IRST

Canisius

Trex

Finn

Page 40: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Conclusions (Task1: 4-fold Corpus)• Same algorithms (SVM) produced different results

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Precision

Yaoyong

ITC-IRST

Canisius

Finn

Page 41: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Conclusions (Task1)

• Task 1– Large variation on slot performance

• Good performance on:– “Important” dates and Workshop homepage

– Acronyms (for Amilcare)

• Poor performance on:– Workshop name and location

– Conference name and homepage

Page 42: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Conclusion (Task2 & Task3)

• Task 2a: Learning Curve– Systems’ performance is largely as expected

• Task 2b: Active Learning– Two approaches, Amilcare and Hachey,

showed benefits

• Task 3: Enrich Data– Not sufficient participation to evaluate use of

enrich data

Page 43: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Future Work

• Performance differences:– Systems: what determines good/bad performance– Slots: different systems were better/worse at identifying different

slots

• Combine approaches• Active Learning• Enrich data

– Overcoming the need for annotated data

• Extensions– Data: Use different data sets and other features, using (HTML)

structured data– Tasks: Relation extraction

Page 44: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Why is Amilcare Good?

Page 45: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Contextual Rules

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 2 3 4 5 6 7 8 9

no context-PRE

no context-REC

no context-FME

Page 46: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Contextual Rules

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 2 3 4 5 6 7 8 9

context-PRE

context-REC

context-FME

no context-PRE

no context-REC

no context-FME

Page 47: P ASCAL  C HALLENGE ON  I NFORMATION  E XTRACTION  &  M ACHINE  L EARNING

PASCAL

Rule Redundancy

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

0.3 0.5 0.7 0.9

FMeasure

Nu

mb

er

of

Ru

les

Slots

Linear (Slots)