Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
EU Vocabularies - Facilitating
the linking of legal data
John Dann Le Gouvernement du Grand-Duché de Luxembourg
Ministère d'État, Service central de législation
Anikó Gerencsér
Publications Office of the European Union Standardisation Unit, Metadata Sector
Law via the Internet Conference, Florence
11-12 October 2018
Publications Office of the European Union
Publishing EU law and publications in the 24 official EU languages
Multilingual legal data
Controlled vocabularies:
• EuroVoc multilingual thesaurus
• 75 Authority tables
– Language codes
– Corporate bodies (names of EU institutions)
– Countries
– Legal proceeding
– Treaty
• EU Vocabularies: merge of EuroVoc and Metadata Registry websites
https://publications.europa.eu/en/web/eu-vocabularies
Multilingual, multidisciplinary
thesaurus
Legal domain
All official EU languages
Releases: 2/year
Hierarchical structure:
21 domains
127 microthesauri
7180 concepts
Controlled vocabularies
Official Journal of Luxembourg
Opportunities
Modernisation of publishing the OJ
• Promote Linked Open Data
Facilitate the accessibility and participation • Access to public information - PSI Directive
Further digitalisation of public services • ... • Provide controlled vocabularies
‒ Re-use of existing Vocabularies
7
8
Legal Data
Legiwrite
Web-based XML editor
Electronic OJ with legal
value
legilux.lu
User Tools
ELI Open Data
From Draft to
Publication
Project “Casemates”
• Reduce costs • Faster publications • Efficiency
Treaties
Metadata Ontology RDF XML HTML etc
Directives
Vocabulary
EUR-Lex
Visualisation of the vocabulary
9
http://data.legilux.public.lu/vocabulaires/fr/
Re-use from OPUE
Visualisation of the vocabulary
10
Wish to align with eurovoc …
Objective of alignment ?
Toward a thematic cross-national search on legislation • industry can make a search on “pesticide” across
national legislations • Enrich N-Lex with thematic cross-national search
‒ http://eur-lex.europa.eu/n-lex
Allow to search national legislation using Eurovoc – or EU legislation using national vocabulary • Better research and access to European and national
Legislation
Potentially enrich national concepts with Eurovoc translations, notes, synonyms, etc.
11
Objective of alignment ?
Align EuroVoc thesaurus with thematic vocabularies used by Member States to annotate their national legislation
Build semantic interoperability based on thematic classification: • between national legislations and
EU legislation • but also from Member State
legislation to Member State legislation
Concept
EuroVoc “résidu de
pesticide”@fr “pesticide
residue”@en
Concept
Luxembourg
Theme
“pesticide”
@fr
Concept
Member State
Theme “pesticide
residue”@en
alignment 1
alignment 2
alignment 3 inferred
from alignment 1 and
alignment 2
Methodology
2 methods proposed for building alignments
Method 1 : Lexical alignment between concepts
• tools used: SILK for generating links between datasets and VocBench for
uploading and managing the alignments
Method 2 : Analyse transposition of EU directives in
Luxembourg legislation
• Compare Eurovoc concepts used to index EU directive with
Legilux concepts used to index the corresponding transposition
• Combine this transposition analysis with a lexical analysis
METHOD 1
Alignment between EuroVoc and LegiLux
Two methods to align
14
Method 1: lexical alignment
SILK – linking exercise: Legal subject theme / EuroVoc
Purpose: establish mappings between the labels of two datasets
Find pairs of exact or close matches between the lexical forms
of the concepts in two vocabularies
http://silkframework.org/
Statistics - EuroVoc
LegiLux EuroVoc Exact matches
Treaty subject theme EuroVoc 31%
International actor EuroVoc 23%
Legal subject theme EuroVoc
22%
Treaty type EuroVoc 16%
Management of vocabularies: VocBench
Open source, collaborative tool for managing multilingual
controlled vocabularies
Semantic technologies
Funded by ISA2 programme:
https://ec.europa.eu/isa2/solutions/vocbench3_en
Support of import/export in various formats (OWL, SKOS, RDF,
Excel)
Allows to upload and manage alignments
Model: INRIA's Alignment API (same as OnaGUI)
Validate results
Re-export results
Transform results into OWL/SKOS mapping triples and load into
the project`s dataset
http://vocbench.uniroma2.it/
https://joinup.ec.europa.eu/solution/vocbench3
Method 2 Alignment based on EU Directive Transposition Analysis
Reuse existing classification work
EU texts are classified with EuroVoc
Luxembourg texts transposing EU text are classified with lux. vocabulary
« most » of the time »close » concepts are used on both sides
reuse this proximity of concepts to build an alignment based on usage
19
Approach: use transpositions
Directive 2000/48/CE de la Commission
du 25 juillet 2000 subject:
produit pharmaceutique protection du consommateur produit phytosanitaire résidu de pesticide autorisation de vente
Règlement grand-ducal du 8 avril 2000 subject:
denrée alimentaire et produit usuel pesticide
EuroVoc
résidu de
pesticide
Lux theme
pesticide
transposes
Reuse existing classification work
Strenghs of the method • Only concepts really used to classify the law are aligned
• The alignment follows usages of legal professionals
• The alignement takes in account the difference of granularity between EuroVoc and national vocabulary
Limits of the method
• The alignment is focus on the vocabulary used for legal domains covered by EU. The alignement cannot be used for civil domain or any legal domain not covered by EU legislation
21
Focus alignment on concepts used to classify EU legislation and national transposition
EuroVoc LegiLux thematic
347 concepts are used to index legislations that transpose a directive 876 concepts are used to index legislations that do not transpose a directive Total : 1223 of the total 1593 concepts are used to index legislation in Luxembourg (370 concepts are not used)
1956 concepts out of a total of 7159 Eurovoc concepts are used to index Directives 3882 concepts out of a total of 7159 Eurovoc concepts are used to index directives or regulations.
Transposition analysis
27% 22%
7559 concepts 1593 concepts
Transposition analysis example
We look at the number of times a national concept used to index a transposition « co-occur » with a EuroVoc concept used to index the transposed directive
On this example, we see more meaningful alignments with a good score, and less meaningful with a low score.
Lots of potential «relatedMatch»
National
concept Eurovoc concept Number of co-
occurrences Total nb of
transpositions Score (=Number of co-
occurrences / Total number
of transpositions indexed)
bruit protection contre le bruit 7 7 1
bruit pollution acoustique 5 7 0.714285714285714285714286
bruit bruit 4 7 0.571428571428571428571429
bruit programme d'action 3 7 0.428571428571428571428571
bruit méthode d'évaluation 3 7 0.428571428571428571428571
bruit diffusion de l'information 3 7 0.428571428571428571428571
bruit accès à l'information 2 7 0.285714285714285714285714
bruit niveau sonore 2 7 0.285714285714285714285714
bruit rapprochement des
législations 2 7 0.285714285714285714285714
bruit appareil
électrodomestique 1 7 0.142857142857142857142857
bruit machine électrique 1 7 0.142857142857142857142857
bruit matériel de levage 1 7 0.142857142857142857142857
bruit norme européenne 1 7 0.142857142857142857142857
bruit harmonisation des
normes 1 7 0.142857142857142857142857
bruit matériel de construction 1 7 0.142857142857142857142857
bruit aéroport 1 7 0.142857142857142857142857
bruit norme environnementale 1 7 0.142857142857142857142857
bruit soins de santé 1 7 0.142857142857142857142857
Transposition analysis : tool
Use of OnaGUI
1) to enrich the statistical alignment with a linguistic alignments 2) do a human refinement and validation of calculated alignments Validate alignements expressed in INRIA EDOAL format (same as VocBench)
Ontology Alignment Graphical User Interface https://github.com/lmazuel/onagui
Validation in OnaGUI
Exact match • Pesticide / pesticide
Specific match • Polluant / polluant
atmosphérique
Generic match • Polluant / substance
dangeureuse
Related match • Polluant / contrôle de la
pollution
« no match » • Personnel / …
Conclusion
An alignement based on transpositions, consolidated with lexical
proximity
189 exact matches, 586 other links (related, narrow, broad or close
relationshops)
76% of concepts considered are aligned, 54% with exact match
Allows to produce a rich alignment, not only exact matchs, but a lot of
related / broad / narrow matches
Alignment made possible with :
Vocabularies available as structured data (SKOS)
SPARQL access to legislation indexation
Collaboration
Potential future steps:
align also with other controlled vocabularies
Test alignements on document retrieval use-case
Thank you!
Contacts :
John Dann [email protected]
Publications Office:
• Denis Dechandon, Head of Sector
• Anikó Gerencsér [email protected]