Upload
nguyendat
View
232
Download
1
Embed Size (px)
Citation preview
EB
ER
HA
RD
KA
RL
SU
NIV
ER
SIT
ÄT
TÜ
BIN
GE
NS
emin
arfu
rS
prac
hwis
sens
chaf
t
Dictionary Entry ParsingLothar Lemnitzer, Claudia Kunze
[email protected], [email protected]
Computational Lexicography at ESSLLI 2005
Dictionary Entry Parsing – p.1
EB
ER
HA
RD
KA
RL
SU
NIV
ER
SIT
ÄT
TÜ
BIN
GE
NS
emin
arfu
rS
prac
hwis
sens
chaf
tTopics
What is Dictionary Entry Parsing ?
The Structure of Dictionary Entries
Problems of Parsing Entries
Architecture of the LexParse Parser
Resources
Dictionary Entry Parsing – p.2
EB
ER
HA
RD
KA
RL
SU
NIV
ER
SIT
ÄT
TÜ
BIN
GE
NS
emin
arfu
rS
prac
hwis
sens
chaf
tWhat is Dictionary Entry Parsing
Dictionary Entry Parsing
takes entries of print dictionaries as input
segments the entries
classifies the segments according to theirfunction
converts the entry into a tree-like or an sgmlpresentation
Dictionary Entry Parsing – p.3
EB
ER
HA
RD
KA
RL
SU
NIV
ER
SIT
ÄT
TÜ
BIN
GE
NS
emin
arfu
rS
prac
hwis
sens
chaf
tWhy parsing dictionary entries?
Printed dictionaries are considered to be a storeof valuable linguistic information
They have been re-used as a source for lexicaldatabases in NLP
Some print dictionaries have been convertedinto electronic dictionaries (electronic publishing)
Dictionary Entry Parsing – p.4
EB
ER
HA
RD
KA
RL
SU
NIV
ER
SIT
ÄT
TÜ
BIN
GE
NS
emin
arfu
rS
prac
hwis
sens
chaf
tThe structure of dictionary entries
Abbildung 1: Bilingual dictionary entry for the head-
word black
Dictionary Entry Parsing – p.5
EB
ER
HA
RD
KA
RL
SU
NIV
ER
SIT
ÄT
TÜ
BIN
GE
NS
emin
arfu
rS
prac
hwis
sens
chaf
tThe structure of dictionary entries
Most dictionary entries bear the following structuralcharacteristics:
They consist of information items (e.g. part ofspeech, equivalent(s))
Information items serve the function to provideinformation about the headword
Some of the information items are optional
Some of the information items are grouped
Dictionary Entry Parsing – p.6
EB
ER
HA
RD
KA
RL
SU
NIV
ER
SIT
ÄT
TÜ
BIN
GE
NS
emin
arfu
rS
prac
hwis
sens
chaf
tThe structure of dictionary entries
Structural relations between information items indictionary entries
Linear precedence: some information itemsprecede / follow others
(Immediate) dominance: some higher nodesdominate groups of information items (grammar
part of speech, inflection)
Dictionary Entry Parsing – p.7
EB
ER
HA
RD
KA
RL
SU
NIV
ER
SIT
ÄT
TÜ
BIN
GE
NS
emin
arfu
rS
prac
hwis
sens
chaf
tAdditional difficulties
Implicit information must be made explicit (e.g.gender part of speech)
Abbreviations must be resolved (e.g. the tildesymbol representing the headword)
Dictionary Entry Parsing – p.8
EB
ER
HA
RD
KA
RL
SU
NIV
ER
SIT
ÄT
TÜ
BIN
GE
NS
emin
arfu
rS
prac
hwis
sens
chaf
tStructure indicators
Structure indicators are essential to the formatof dictionary entries
They mark the beginning and end of informationitems (’fields’)
Punctuation and other symbols are used asnontypographic structure indicators
Fonts and typefaces are used as typographicstructure indicators
Dictionary Entry Parsing – p.9
EB
ER
HA
RD
KA
RL
SU
NIV
ER
SIT
ÄT
TÜ
BIN
GE
NS
emin
arfu
rS
prac
hwis
sens
chaf
tElements of a dictionary entry grammar
The dictionary entry grammar guides theanalysis of the entries
It defines the set of well-formed entries
A dictionary entry grammar is a quadruple (CEI,CNI, R, WA)
CEI=terminal alphabet; CNI=non-terminalsymbols, R=set of rules; WA=initial symbol
Dictionary Entry Parsing – p.10
EB
ER
HA
RD
KA
RL
SU
NIV
ER
SIT
ÄT
TÜ
BIN
GE
NS
emin
arfu
rS
prac
hwis
sens
chaf
tTasks of the LexParse parser
Split any (standard) entry of any dictionary intosegments
Reconstruct the hierarchical structure of theentry
Resolve abbreviations and make all informationexplicit
Report on malformed entries
Represent the data in a well-defined format (e.g.sgml, database records)
Dictionary Entry Parsing – p.11
EB
ER
HA
RD
KA
RL
SU
NIV
ER
SIT
ÄT
TÜ
BIN
GE
NS
emin
arfu
rS
prac
hwis
sens
chaf
tParser configuration and input
General: executable programme, configurationfile
Specific: dictionary entry grammar, dictionarydata
Dictionary Entry Parsing – p.12
EB
ER
HA
RD
KA
RL
SU
NIV
ER
SIT
ÄT
TÜ
BIN
GE
NS
emin
arfu
rS
prac
hwis
sens
chaf
tArchitecture of LexParse
Abbildung 2: The Lexparse Parser, developed by
Storrer and Hauser
Dictionary Entry Parsing – p.13
EB
ER
HA
RD
KA
RL
SU
NIV
ER
SIT
ÄT
TÜ
BIN
GE
NS
emin
arfu
rS
prac
hwis
sens
chaf
tDirectives for preprocessing
Directives prepare the input file or typesettingtape for the parser
Delete superfluous lines and patterns
Convert patterns into XCodes
Dictionary Entry Parsing – p.14
EB
ER
HA
RD
KA
RL
SU
NIV
ER
SIT
ÄT
TÜ
BIN
GE
NS
emin
arfu
rS
prac
hwis
sens
chaf
tXcodes
Xcodes reflect structure indicators
Typeface information, brackets and some specialcharacters should be converted into XCodes
e.g.: Cat XBRPO, *, XBRPC (a category isexpanded to a string enclosed by brackets)
ambiguous cases are resolved by treating themajority of cases correctly or by definingsub-patterns (e.g. for the semicolon)
Dictionary Entry Parsing – p.15
EB
ER
HA
RD
KA
RL
SU
NIV
ER
SIT
ÄT
TÜ
BIN
GE
NS
emin
arfu
rS
prac
hwis
sens
chaf
tGrammar
The grammar is a set of rewrite rules
non-terminal symbols are expanded to sets ofterminal and non-terminal symbols
e.g.: WA FK, SK
LexParse style: WA XFLBE, FK, SK, XFLEN
Dictionary Entry Parsing – p.16
EB
ER
HA
RD
KA
RL
SU
NIV
ER
SIT
ÄT
TÜ
BIN
GE
NS
emin
arfu
rS
prac
hwis
sens
chaf
tDisplay options
Specifies the format of the output
Options are: SGML, Tree, Map
Dictionary Entry Parsing – p.17
EB
ER
HA
RD
KA
RL
SU
NIV
ER
SIT
ÄT
TÜ
BIN
GE
NS
emin
arfu
rS
prac
hwis
sens
chaf
tDirectives for postprocessing
Directives clean the output
Delete superfluous lines and patterns
Convert patterns (e.g. German Umlaute)
Dictionary Entry Parsing – p.18
EB
ER
HA
RD
KA
RL
SU
NIV
ER
SIT
ÄT
TÜ
BIN
GE
NS
emin
arfu
rS
prac
hwis
sens
chaf
tConclusion
LexParse is a general parser for (standard)dictionary entries
LexParse deviates in some respects from ageneral language parser, since the language ofdictionary entries is special
LexParse prepares the data for subsequentformal processing (e.g. in a lexical database)
LexParse provides error reports and is thereforeuseful for consistency checking
Dictionary Entry Parsing – p.19