38
Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Clifford W. Gay Lister Hill National Center Lister Hill National Center for Biomedical Communications for Biomedical Communications Bethesda, Maryland - USA Bethesda, Maryland - USA

Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

Embed Size (px)

Citation preview

Page 1: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

Semi-Automatic Indexing of Full Text Biomedical Articles

Washington D.C. October 25, 2005

Clifford W. GayClifford W. Gay

Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland - USABethesda, Maryland - USA

Page 2: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

2 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

AcknowledgmentsAcknowledgments

Alan R. Aronson, PhD.Alan R. Aronson, PhD.

Mehmet Kayaalp, M.D., PhD.Mehmet Kayaalp, M.D., PhD.

Page 3: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

3 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

OutlineOutline

IntroductionIntroduction The System: Medical Text Indexer (MTI)The System: Medical Text Indexer (MTI) The Data: Online biomedical journalsThe Data: Online biomedical journals The Task: Emulate Medline indexing using full textThe Task: Emulate Medline indexing using full text

ResultsResults Observations on PubMed Central articlesObservations on PubMed Central articles Model selection resultsModel selection results Recent workRecent work

Page 4: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

IntroductionThe System: Medical Text Indexer (MTI)

The Data: Online medical journalsThe Data: Online medical journals

The Task: Emulate Medline indexing using full textThe Task: Emulate Medline indexing using full text

ResultsResultsObservations on PubMed Central articlesObservations on PubMed Central articles

Model selection resultsModel selection results

Recent workRecent work

Page 5: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

5 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Why Semi-Automatic Indexing?Why Semi-Automatic Indexing?

U.S. National Library of Medicine indexes 5000 U.S. National Library of Medicine indexes 5000 journal titlesjournal titles Supports over 60 million PubMed searches each monthSupports over 60 million PubMed searches each month Has 130 indexersHas 130 indexers Indexed 570,000 articles in 2004Indexed 570,000 articles in 2004

Will need to index 1,000,000 very soonWill need to index 1,000,000 very soon Automated support is helping to meet this demandAutomated support is helping to meet this demand

– MTI was used on 26% of articles in 2004MTI was used on 26% of articles in 2004

More about MTIMore about MTI Aronson AR, Mork JG, Gay CW, Humphrey SM, Rogers WJ.

The NLM Indexing Initiative's Medical Text Indexer. Medinfo. 2004; 11(Pt 1): 268-72. PMID: 15360816

Page 6: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

Title + Abstract et al.

Ordered list of MeSH Terms

MeSH Headings

UMLS Concepts

Postprocessing

Restrict to MeSH

TrigramPhrase

Matching

Rel. Cits.

PubMedRelated

Citations

ExtractMeSH

Phrasex

MetaMap

Phrases

Medical Text Indexer (MTI)Medical Text Indexer (MTI)

Page 7: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

7 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

DCMS with MTI SuggestionsDCMS with MTI Suggestions

Page 8: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

IntroductionThe System: Medical Text Indexer (MTI)The System: Medical Text Indexer (MTI)

The Data: Online biomedical journals

The Task: Emulate Medline indexing using full textThe Task: Emulate Medline indexing using full text

ResultsResultsObservations on PubMed Central articlesObservations on PubMed Central articles

Model selection resultsModel selection results

Recent workRecent work

Page 9: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

9 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Why Full Text?Why Full Text?

Medical Text Indexer uses article title and abstractMedical Text Indexer uses article title and abstract HoweverHowever

Human indexers taught not to use abstractHuman indexers taught not to use abstract Author’s complete intent may not be in abstractAuthor’s complete intent may not be in abstract Check tags may only appear in a table or methods Check tags may only appear in a table or methods

section.section. If MTI indexes from full text articles it mayIf MTI indexes from full text articles it may

Find central concepts missing from abstractFind central concepts missing from abstract Identify terms when article has no abstract Identify terms when article has no abstract More accurately select check tagsMore accurately select check tags Be in better compliance with indexing policyBe in better compliance with indexing policy

Page 10: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

10 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Test Collection SelectionTest Collection Selection

Available online from PubMed CentralAvailable online from PubMed Central Consistent XML formatConsistent XML format

Identifies title, abstract, sections, tables, figures, Identifies title, abstract, sections, tables, figures, references, etc.references, etc.

500 articles from 17 diverse biomedical journals500 articles from 17 diverse biomedical journals Did not use: Did not use:

ReferencesReferences GraphicsGraphics MathMath

Page 11: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

11 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Test CollectionTest Collection

5 Clinical journals (165):5 Clinical journals (165): Breast Cancer Research (11)Breast Cancer Research (11) Journal of Clinical Microbiology (80)Journal of Clinical Microbiology (80)

3 Organization based journals (28):3 Organization based journals (28): Journal of American Medical Informatics Assoc. (10)Journal of American Medical Informatics Assoc. (10) Proceeding of the National Academy of Sciences (11)Proceeding of the National Academy of Sciences (11)

9 Journals in other categories:9 Journals in other categories: Pharmacology (65); Biochemistry (65); Plants (46); Pharmacology (65); Biochemistry (65); Plants (46);

Molecular Biology (45); Learning (30); Hospitals (22)Molecular Biology (45); Learning (30); Hospitals (22)

Page 12: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

IntroductionThe System: Medical Text Indexer (MTI)The System: Medical Text Indexer (MTI)

The Data: Online medical journalsThe Data: Online medical journals

The Task: Emulate Medline indexing using full text

ResultsResultsObservations on PubMed Central articlesObservations on PubMed Central articles

Model selection resultsModel selection results

Recent workRecent work

Page 13: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

13 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Indexing TaskIndexing Task

Title + Abstract et al.

Ordered list of MeSH Terms

MeSH Headings

UMLS Concepts

Postprocessing

Restrict to MeSH

TrigramPhrase

Matching

Rel. Cits.

PubMedRelated

Citations

ExtractMeSH

Phrasex

MetaMap

Phrases

Title + Abstract et al.

Ordered list of MeSH Terms

MeSH Headings

UMLS Concepts

Postprocessing

Restrict to MeSH

TrigramPhrase

Matching

Rel. Cits.

PubMedRelated

Citations

ExtractMeSH

Phrasex

MetaMap

Phrases

Page 14: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

Medline IndexingMedline Indexingbeta-Lactamasesbeta-Lactamases

/*genetics /*metabolism /*genetics /*metabolism EnterobacteriaceaeEnterobacteriaceae/drug effects /drug effects

/*enzymology/genetics /*enzymology/genetics

PlasmidsPlasmids/*genetics /*genetics

Genes,Genes, BacterialBacterial/genetics /genetics

Genotype Genotype

Kinetics Kinetics

Microbial Sensitivity TestsMicrobial Sensitivity Tests

Molecular Sequence DataMolecular Sequence Data

Research Support, Non-U.S. Research Support, Non-U.S. Gov't Gov't

Example ArticleExample Article

• DNA Transposable DNA Transposable Elements Elements

• Escherichia coliEscherichia coli• Genes, BacterialGenes, Bacterial• Cloning, MolecularCloning, Molecular• Klebsiella pneumoniaeKlebsiella pneumoniae• Amino Acid SequenceAmino Acid Sequence• Microbial Sensitivity Microbial Sensitivity

TestsTests• CephalothinCephalothin• Proteus mirabilisProteus mirabilis• ErwiniaErwinia• Salmonella typhimuriumSalmonella typhimurium• Enterobacteriaceae Enterobacteriaceae

InfectionsInfections• LactamsLactams

• beta-Lactamasesbeta-Lactamases• PlasmidsPlasmids• EnterobacteriaceaeEnterobacteriaceae• beta-Lactam Resistancebeta-Lactam Resistance• Conjugation, GeneticConjugation, Genetic• Cephalosporin ResistanceCephalosporin Resistance• CefotaximeCefotaxime• Nucleotide SequencesNucleotide Sequences• Molecular Sequence DataMolecular Sequence Data• CephalosporinsCephalosporins• Chromosomes, BacterialChromosomes, Bacterial• DNA, BacterialDNA, Bacterial

MTI Indexing

•MMIMMI •RELREL •MMI & RELMMI & REL

Recall = 0.67 Precison = 0.24 F2 measure = 0.492

Page 15: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

15 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

EvaluationEvaluation

F2 Measure Measure Weighted harmonic mean of Recall and PrecisionWeighted harmonic mean of Recall and Precision Weights Recall twice as important as PrecisionWeights Recall twice as important as Precision Values: 0.0 to 1.0Values: 0.0 to 1.0

Computed for each article and averagedComputed for each article and averaged

Page 16: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

IntroductionIntroductionThe System: Medical Text Indexer (MTI)The System: Medical Text Indexer (MTI)

The Data: Online medical journalsThe Data: Online medical journals

The Task: Emulate Medline indexing using full textThe Task: Emulate Medline indexing using full text

ResultsObservations on PubMed Central articles

Model selection resultsModel selection results

Recent workRecent work

Page 17: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

17 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Section Header ClassesSection Header Classes

Semantically equivalent section headersSemantically equivalent section headers MATERIALS AND METHODS class:

Materials and Method(s) Method(s) Scoring Methods Experimental Procedures Other Methods Tested

CAPTIONS class:CAPTIONS class: the titles and captions from tables and figuresthe titles and captions from tables and figures

Page 18: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

18 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Section ClassSection Class Average FAverage F22

CAPTIONSCAPTIONS 0.3175 0.3175

ABSTRACTABSTRACT 0.29600.2960

INTRODUCTIONINTRODUCTION 0.28690.2869

RESULTSRESULTS 0.27900.2790

DISCUSSIONDISCUSSION 0.27340.2734

NO HEADERNO HEADER 0.25740.2574

…… ……

CONCLUSIONS 0.1961

ABBREVIATIONSABBREVIATIONS 0.13040.1304

Section Class PerformanceSection Class Performance

Page 19: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

IntroductionIntroductionThe System: Medical Text Indexer (MTI)The System: Medical Text Indexer (MTI)

The Data: Online medical journalsThe Data: Online medical journals

The Task: Emulate Medline indexing using full textThe Task: Emulate Medline indexing using full text

ResultsObservations on PubMed Central articlesObservations on PubMed Central articles

Model selection results

Recent workRecent work

Page 20: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

20 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

ExperimentsExperiments

Varied MTI components usedVaried MTI components used MetaMap Indexing (MMI)MetaMap Indexing (MMI) Related Citations (REL)Related Citations (REL)

Varied section classes processedVaried section classes processed Used model selectionUsed model selection Used binary weighting for sectionsUsed binary weighting for sections

A model is A model is A selection of section classes and A selection of section classes and The text in those sections The text in those sections That represents the articleThat represents the article

Page 21: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

21 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Production BaselineProduction Baseline

Title+Abstract

MMI

REL

F2 = 0.457

Page 22: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

22 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Naive ModeNaive Mode

Title+Abstract

MMI

REL

Materials and Methods

Results andDiscussion

No Header F2 = 0.453( - 0.9%)All Section Classes

Page 23: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

23 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

MetaMap Indexing ModeMetaMap Indexing Mode

Title+Abstract

MMI

REL

Introduction

Results

Discussion

Other

No Header F2 = 0.373(-18.4%)

Captions

Page 24: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

24 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Augmented ModeAugmented Mode

Title+Abstract

MMI

REL

Introduction

Results

Discussion

Other

No Header

F2 = 0.475(+3.9%)

Captions

Page 25: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

25 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Refined Augmented ModeRefined Augmented Mode

Title+Abstract

MMI

REL

Captions

Results

Background

F2 = 0.485(+ 6.1%)

Page 26: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

26 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Full MTI ModeFull MTI Mode

Title+Abstract

MMI

REL

Introduction

Results

Discussion

Other

No HeaderF2 = 0.488(+ 6.8%)MMI model

Captions

Page 27: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

27 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Refined Full MTI Refined Full MTI

Title+Abstract

MMI

REL

Results

Results andDiscussion

No Header F2 = 0.491(+ 7.4%)

Captions

Conclusions

Page 28: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

28 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

MTI Performance SummaryMTI Performance Summary

Indexing ModelIndexing ModelRecallRecall PrecisionPrecision

Avg. FAvg. F22

Production Baseline (Ti, Ab)Production Baseline (Ti, Ab) 0.530.53 0.320.32 0.4570.457

Naive Mode (full text)Naive Mode (full text) 0.570.57 0.270.27 0.4530.453

Augmented Mode Augmented Mode (MMI + REL (Ti, Ab))(MMI + REL (Ti, Ab))

0.590.59 0.290.29 0.4750.475

Augmented Mode (refined)Augmented Mode (refined) 0.600.60 0.300.30 0.4850.485

Full MTI (MMI + REL Full MTI (MMI + REL common sections)common sections)

0.600.60 0.300.30 0.4880.488

Full MTI (refined)Full MTI (refined) 0.600.60 0.310.31 0.4910.491

Page 29: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

IntroductionIntroductionThe System: Medical Text Indexer (MTI)The System: Medical Text Indexer (MTI)

The Data: Online medical journalsThe Data: Online medical journals

The Task: Emulate Medline indexing using full textThe Task: Emulate Medline indexing using full text

ResultsObservations on PubMed Central articlesObservations on PubMed Central articles

Model selection resultsModel selection results

Recent work

Page 30: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

30 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Improvement PotentialImprovement Potential

With current modelWith current model No cut off at 25 terms yields No cut off at 25 terms yields

maximum recall of 0.79maximum recall of 0.79

If all good terms prioritized correctlyIf all good terms prioritized correctly F2 = 0.64 Improvement over baseline

7% 40%

Page 31: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

31 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Increase REL CitationsIncrease REL Citations

MTI currently uses 10 Related CitationsMTI currently uses 10 Related Citations

Optimal number for full text articles is 15Optimal number for full text articles is 15

Best model confirmed for this settingBest model confirmed for this setting

Additional Improvement in FAdditional Improvement in F22 = 0.01 = 0.01

Page 32: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

32 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

SummarizationSummarization

Selecting important text before MTI processingSelecting important text before MTI processing Using Yeh, Ke, Yang, Meng approachUsing Yeh, Ke, Yang, Meng approach Combines Combines

Latent Semantic Analysis and Latent Semantic Analysis and Salton’s Text Relationship MapSalton’s Text Relationship Map

Start with current modelStart with current model Document representation includesDocument representation includes

Bag of wordsBag of words MetaMap identified conceptsMetaMap identified concepts

Page 33: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

NLM Indexing Initiative

Clifford W. GayClifford W. Gay

Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland - USABethesda, Maryland - USA

Contact:Contact:Web:Web:

[email protected]@nlm.nih.govii.nlm.nih.gov/fulltext.shtmlii.nlm.nih.gov/fulltext.shtml

Page 34: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

34 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

NONE SectionsNONE Sections

Most appear in articles that have no abstract Most appear in articles that have no abstract 20/2320/23

Some are errorsSome are errors 4 have “Introduction” header in publisher version4 have “Introduction” header in publisher version 2 appear within other sections with headers.2 appear within other sections with headers.

Many contain the primary text of the articleMany contain the primary text of the article Comments, Editorials, Letters (11/23)Comments, Editorials, Letters (11/23)

Page 35: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

35 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Other SectionsOther Sections

Other section class has 525 sections (16%)Other section class has 525 sections (16%) Non-standard article organizationNon-standard article organization

Common in Review articlesCommon in Review articles

ExampleExample ß-Lactamases of ß-Lactamases of Kluyvera ascorbataKluyvera ascorbata, Probable Progenitors of , Probable Progenitors of

Some Plasmid-Encoded CTX-M Types Some Plasmid-Encoded CTX-M Types Bacterial strains.Bacterial strains. Antimicrobial agents and susceptibility testing.Antimicrobial agents and susceptibility testing. Kinetic and IEF analyses.Kinetic and IEF analyses. Genetic characterization of Genetic characterization of blablaKLUA.KLUA. Genetic environment of Genetic environment of blablaKLUA-1.KLUA-1. Arguments for mobilization of chromosomal Arguments for mobilization of chromosomal blablaKLUA gene.KLUA gene.

Page 36: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

36 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Ranking FunctionRanking Function

Made ranking function for Related Citations more Made ranking function for Related Citations more like MetaMap Indexing.like MetaMap Indexing.

Resulted in a more inclusive modelResulted in a more inclusive model Materials and MethodsMaterials and Methods IntroductionIntroduction

F2 measure = 0.4865F2 measure = 0.4865

Page 37: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

37 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Tuning Path WeightTuning Path Weight

Ratio of weights between the two indexing pathsRatio of weights between the two indexing paths MetaMap Indexing – 7MetaMap Indexing – 7 Related Citations – 2Related Citations – 2

No improvement possibleNo improvement possible

Page 38: Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications

38 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Partial Weight for Singleton HeadersPartial Weight for Singleton Headers

OTHER section classOTHER section class Header is uniqueHeader is unique Contain content termsContain content terms

Gave section class weight between 0 and 1Gave section class weight between 0 and 1 Some recall improvementSome recall improvement No collection wide improvement in FNo collection wide improvement in F22