15
Table mining and data curation from biomedical literature Nikola Milosevic Supervisors: Dr Goran Nenadic, Robert Hernandez

Table mining and data curation from biomedical literature

Embed Size (px)

DESCRIPTION

Presentation about my research given at the Manchester Institute of Biotechnology PhD symposium

Citation preview

Page 1: Table mining and data curation from biomedical literature

Table mining and data curation from biomedical literatureNikola MilosevicSupervisors: Dr Goran Nenadic, Robert Hernandez

Page 2: Table mining and data curation from biomedical literature

Why are we doing this?Growth of published research

Page 3: Table mining and data curation from biomedical literature

Information growth

Page 4: Table mining and data curation from biomedical literature

Text miningText mining developed tools and

methods to help scientistsFocused mainly on the body of

the articleTables and figures are typically

ignored

Page 5: Table mining and data curation from biomedical literature

What about tables?

Page 6: Table mining and data curation from biomedical literature

What about tables?

Page 7: Table mining and data curation from biomedical literature

ChallengeVisually structured textMay be ungrammatical and

ambiguousVarious layoutsValue representation types

◦Numeric◦Text◦Ranges◦Formulas◦Complex

Page 8: Table mining and data curation from biomedical literature

Method overview

Page 9: Table mining and data curation from biomedical literature

Method overview

Page 10: Table mining and data curation from biomedical literature

Table decompositionAim: Decompose table into the

structures suitable for further processing

Cell structures that keep information about navigational path (headers, stubs, etc.)

Heuristic based approachCell structure, alignment, content,

neigbourhood

Page 11: Table mining and data curation from biomedical literature

Table decomposition

Page 12: Table mining and data curation from biomedical literature

Information extractionPerformed a number of

experimentsExtraction of number of patients,

weight, BMIApproaches:

◦Rules◦Metamap◦White and black lists

Page 13: Table mining and data curation from biomedical literature

ResultsAchieved promising results

Some of the information classes are easier to extract than other

Page 14: Table mining and data curation from biomedical literature

Conclusion & Future workInformation extraction from tables is

feasibleFuture work:

◦Value and table type categorisation◦Development of normalization and

extraction engine◦Extraction rules◦Data storing format (triple store, linked

data)◦Data curation interface◦Data querying interface

Page 15: Table mining and data curation from biomedical literature

Thank you! Q&A

Email: [email protected]