View
446
Download
0
Tags:
Embed Size (px)
Citation preview
A Pattern-Based Approach to Hyponymy Relation Acquisition for the Agricultural Thesaurus *Makoto Nakamura Ryusei Kobayashi Yasuhiro Ogawa Katsuhiko Toyama (Nagoya University, Japan)
First of All... • Japan Legal Information Institute, Graduate School of Law, Nagoya University, Japan • established in order to provide Japan’s legal information to the world
• Tasks • Base for issuing English translations of Japanese legal information
• Provision of Japanese legal information to overseas
• Development of software for legislative support • etc.
• Natural language processing for legal texts
Outline 1. Introduction 1. Example of a hyponymy relation 2. AGROVOC 3. Purpose
2. Previous Works on Legal Text Processing 3. Acquisition of Legal Terms from Legal Texts 4. Experiments 5. Conclusion
Introduction The goal of this study: • to construct a legal ontology based on legal terms Legal terms are: • special, idiomatic expressions that often describe legal matters in legal documents
• defined by law prior to use
Example of a Hyponymy Relation in a Law (1/2)
Input text: 漁業法 第六条3 「定置漁業」とは、漁具を定置して営む漁業であつて次に掲げるものをいう。 (以下略) Fishery Act Article 6-3 The “fixed gear fishery” refers to a fishery operated with fixed gear, which falls under any of the following items. *snip*
Example of a Hyponymy Relation in a Law (2/2)
Input text: 漁業法 第六条3 「定置漁業」とは、漁具を定置して営む漁業であつて次に掲げるものをいう。 (以下略) Fishery Act Article 6-3 The “fixed gear fishery” refers to a fishery operated with fixed gear, which falls under any of the following items. *snip*
Hypernym: 漁業 / fishery Hyponym: 定置漁業 / fixed gear fishery
Hyponymy Relation
AGROVOC (Niu et al., 2012) • the world’s most comprehensive multilingual agricultural vocabulary
• contains more than 40,000 concepts in 21 languages • covers topics on food, nutrition, agriculture, fisheries, forestry, environment, and other related domains
• expressed in a Simple Knowledge Organization System (SKOS) and published as Linked Data
• All the terms or concepts have been added to the thesaurus by the domain experts in different languages.
• This laborious human work is very time consuming and expensive.
AGROVOC
Purpose • to acquire hyponymy relations from the legal corpus
• to increment the vocabulary of AGROVOC
Assumption Legal terms are qualified for AGROVOC as long as they are related to the agricultural domain.
Example of Word Tree in AGROVOC Labels Status Scope Created Last modified fisheries (EN) Descriptor (20) n/a 1981-01-09 1981-01-09 00:00:00
fisheries
capture fisheries
Commercial fisheries economic activities
fishery economics
Fishery oceanography
fishing methods
related term used for
narrower term
broader term activities
Fishing industry
Inland fisheries Marine fisheries
AGROVOC
How to Add Legal Terms to AGROVOC Labels Status Scope Created Last modified fisheries (EN) Descriptor (20) n/a 1981-01-09 1981-01-09 00:00:00
fisheries
capture fisheries
Commercial fisheries economic activities
fishery economics
Fishery oceanography
fishing methods
related term used for
narrower term
broader term activities
Fishing industry
Inland fisheries Marine fisheries
fixed gear fisheries
fisheries
AGROVOC
Outline 1. Introduction 2. Previous Works on Legal Text Processing 3. Acquisition of Legal Terms from Legal Texts 4. Experiments 5. Conclusion
Previous Works on Legal Text Processing • Legal text processing using surface pattern recognition • Knowledge acquisition from itemized expressions (Kimura et al., 2008)
• Detection of legal definitions (Höfler et al., 2012) Surface pattern recognition is sufficient for boilerplate (fixed) expressions
• Hyponymy relation acquisition • The expressions “y is a (kind of) x,” “such x as y” (Miller et al., 1990, Hearst., 1992)
• This approach is applicable to Japanese (Ando et al., 2004) Legal ontologies could automatically be constructed from legal texts containing boilerplate expressions.
Outline 1. Introduction 2. Previous Works on Legal Text Processing 3. Acquisition of Legal Terms from Legal Texts 1. Extracting terms and their explanations in Japanese legal texts
2. Text processing for hyponymy relations 4. Experiments 5. Conclusion
Legal Corpus • A set of statutory sentences from laws and regulations
• A set of 109,380 Japanese legal sentences in 241 laws and regulations.
• A wide variety of laws and regulations Bankruptcy Act / Measurement Act / Act on Promotion of Global Warming Countermeasures, etc.
Step-1: Example of Surface Pattern Rules
Input text: ガス事業法 第二条 この法律において「一般ガス事業」とは、一般の需要に応じ導管によりガスを供給する事業をいう。 第二条10 この法律において「ガス事業」とは、一般ガス事業、簡易ガス事業、ガス導管事業及び大口ガス事業をいう。 Gas Business Act Article 2-1 The term “General Gas Utility Business” as used in this Act shall mean the business of supplying gas via pipelines to meet general demand. Article 2-10 The term “Gas Business” as used in this Act shall mean a General Gas Utility Business, Community Gas Utility Business, Gas Pipeline Service Business or Large-Volume Gas Business.
Pattern of Definitions /「(.+)」とは、(.+)(を、|をいい、|といい、|という。|とする。)/ /”(.+)” (as used in this Act)? (shall mean|means) (.+)./
Step-1: Example of Surface Pattern Rules
Input text: ガス事業法 第二条 この法律において「一般ガス事業」とは、一般の需要に応じ導管によりガスを供給する事業をいう。 第二条10 この法律において「ガス事業」とは、一般ガス事業、簡易ガス事業、ガス導管事業及び大口ガス事業をいう。 Gas Business Act Article 2-1 The term “General Gas Utility Business” as used in this Act shall mean the business of supplying gas via pipelines to meet general demand. Article 2-10 The term “Gas Business” as used in this Act shall mean a General Gas Utility Business, Community Gas Utility Business, Gas Pipeline Service Business or Large-Volume Gas Business.
Pattern of Definitions /「(.+)」とは、(.+)(を、|をいい、|といい、|という。|とする。)/ /”(.+)” (as used in this Act)? (shall mean|means) (.+)./
Step-1: Acquisition of Definitions and Explanations
We made 6 patterns for extracting definitions from legal corpus.
Output definition: 1. Term: 一般ガス事業
Explanation: 一般の需要に応じ導管によりガスを供給する事業 2. Term: ガス事業
Explanation: 一般ガス事業、簡易ガス事業、ガス導管事業及び大口ガス事業
1. Term: General Gas Utility Business Explanation: the business of supplying gas via pipelines to meet general demand
2. Term: Gas Business Explanation: a General Gas Utility Business, Community Gas Utility Business, Gas Pipeline Service Business or Large-Volume Gas Business
Step-2: Extraction of the Hypernym from a Dependency Tree (intensive)
事業 business
供給する supply
ガスを gas
より via
導管に pipelines
応じ meet
需要に demand
一般の general
Explanation of ‘General Gas Utility Business’ ( the business of supplying gas via pipelines to meet general demand )
• Intensive (hypernym), extensive (hyponym), and mixed patterns • Head word(s) becomes a hypernym of the term.
• CaboCha ‒ a Japanese dependency parser (Kudo et al., 2002) • Complement to the parser with special terms and syntactic rules peculiar to the legal domain (Ogawa et al., 2011)
Head word Hypernym
Step-2: Extraction of the Hyponyms by a cue phrase (extensive)
大口ガス事業 Large-Volume Gas Business
ガス導管事業 Gas Pipeline
Service Business
簡易ガス事業 Community Gas
Utility Business
、 ,
一般ガス事業 General Gas Utility Business
Explanation of ‘Gas Business’ ( General Gas Utility Business, Community Gas Utility Business, Gas Pipeline Service Business or Large-Volume Gas Business )
• classification by cue phrases (comma (,) and ‘or’) • Hyponymy relation - a tuple of two noun phrases and a conceptual relation
Hyponym 、 ,
及び or
Hyponym Hyponym Hyponym
Hypernym: ガス事業 / Gas Business Hyponym: 一般ガス事業 / General Gas Utility Business
Hyponymy Relation
Outline 1. Introduction 2. Previous Works on Legal Text Processing 3. Acquisition of Legal Terms from Legal Texts 4. Experiments 5. Conclusion
Experiments
• Purpose • Acquisition of hyponymy relations qualified for AGROVOC
• The legal corpus • 109,380 Japanese legal sentences
• Classification of hyponymy relations • Category (i): Neither the hypernym or the hyponym is registered in AGROVOC.
• Category (ii): Only the hyponym is not registered.
• Category (iii): Only the hypernym is not registered.
• Category (iv): Both are registered. AGROVOC
Hypernym Hyponym
new existing
(i) (ii)
(iii) (iv)
Experimental Result
Category of a hyponymy pair # of types Precision Category (i)-(iv) 1,027 †64.0% Category (ii) & (iii) 222 67.1%
Category (ii) 137 89.1% Category (iii) 75 21.3% Unknown 10 -
Category (iv) 25 88.0% Existing relations 9 88.9% New relations 16 87.5%
† is calculated from 100 samples chosen at random.
Experimental result in finding terms related to AGROVOC
AGROVOC
Hypernym Hyponym
new existing
(i) (ii)
(iii) (iv)
Example of Hyponymy Relations
Category Example 1 Example 2
(i) district court *maximum limit
bankruptcy court total allowable effort
(ii) business *injurious plant
General Gas Utility Business fungus
(iii) oocyte *measuring instrument
Unfertilized Egg equipment
(iv) greenhouse gases real property
Carbon dioxide land
- *common fishery -- fishery
Hypernym Hyponym
new existing
Experimental Result
Category of a hyponymy pair # of types Precision Category (i)-(iv) 1,027 †64.0% Category (ii) & (iii) 222 67.1%
Category (ii) 137 89.1% Category (iii) 75 21.3% Unknown 10 -
Category (iv) 25 88.0% Existing relations 9 88.9% New relations 16 87.5%
† is calculated from 100 samples chosen at random.
Experimental result in finding terms related to AGROVOC
AGROVOC
Hypernym Hyponym
new existing
(i) (ii)
(iii) (iv)
Example of Hyponymy Relations
Category Example 1 Example 2
(i) district court *maximum limit
bankruptcy court total allowable effort
(ii) business *injurious plant
General Gas Utility Business fungus
(iii) oocyte *measuring instrument
Unfertilized Egg equipment
(iv) greenhouse gases real property
Carbon dioxide land
- *common fishery -- fishery
Hypernym Hyponym
new existing
Experimental Result
Category of a hyponymy pair # of types Precision Category (i)-(iv) 1,027 †64.0% Category (ii) & (iii) 222 67.1%
Category (ii) 137 89.1% Category (iii) 75 21.3% Unknown 10 -
Category (iv) 25 88.0% Existing relations 9 88.9% New relations 16 87.5%
† is calculated from 100 samples chosen at random.
Experimental result in finding terms related to AGROVOC
AGROVOC
Hypernym Hyponym
new existing
(i) (ii)
(iii) (iv)
Example of Hyponymy Relations
Category Example 1 Example 2
(i) district court *maximum limit
bankruptcy court total allowable effort
(ii) business *injurious plant
General Gas Utility Business fungus
(iii) oocyte *measuring instrument
Unfertilized Egg equipment
(iv) greenhouse gases real property
Carbon dioxide land
- *common fishery -- fishery
Hypernym Hyponym
new existing
Experimental Result
Category of a hyponymy pair # of types Precision Category (i)-(iv) 1,027 †64.0% Category (ii) & (iii) 222 67.1%
Category (ii) 137 89.1% Category (iii) 75 21.3% Unknown 10 -
Category (iv) 25 88.0% Existing relations 9 88.9% New relations 16 87.5%
† is calculated from 100 samples chosen at random.
Experimental result in finding terms related to AGROVOC
AGROVOC
Hypernym Hyponym
new existing
(i) (ii)
(iii) (iv)
Example of Hyponymy Relations
Category Example 1 Example 2
(i) district court *maximum limit
bankruptcy court total allowable effort
(ii) business *injurious plant
General Gas Utility Business fungus
(iii) oocyte *measuring instrument
Unfertilized Egg equipment
(iv) greenhouse gases real property
Carbon dioxide land
- *common fishery -- fishery
Hypernym Hyponym
new existing
Conclusion • Since legal documents are likely to use fixed expressions, surface pattern rules work well for term extraction.
• We succeeded in finding 222 terms that seem qualified for AGROVOC with high precision.
• Some error-prone rules and a procedural mistake are detected.
• We plan to expand our method to multilingualism. • As long as boilerplate expressions are used often, our simple method is applicable to any language.
• The other method is to use bilingual lexicons as a dictionary (Jin et al., 2012)
Thank you
A Pattern-Based Approach to Hyponymy Relation Acquisition for the Agricultural Thesaurus
Makoto Nakamura ( [email protected] )