Upload
lionel-briand
View
16
Download
1
Embed Size (px)
Citation preview
.lusoftware verification & validationVVS
Legal Markup Generation in the Large: An Experience Report
Nicolas Sannier1, Morayo Adedjouma1, Mehrdad Sabetzadeh1, Lionel Briand1, John Dann2, Marc Hisette2, Pascal Thill2
1 – SnT / University of Luxembourg2 – Service Central de Législation (SCL)
Research Context SCL
• SCL: Service central de législation
• Publication of all the Luxembourgish legislation through the Legilux portal (http://legilux.public.lu)
• ~91000 legislative documents
• Since Jan. 1st, 2017: >32M of accesses to Legilux and >34M of searches
• Legal professionals, companies, citizens
2
Research Context SCL’s Challenge
• Until 2016, the legal texts in Legilux were PDF documents
• Transitioning from PDF documents to digital resources
• Semantic web technologies and legal metadata about the legal texts
• Challenge:
• How to generate all the metadata for the complete legislative framework?
3
Legal Metadata
• Legal metadata is machine-readable information about a legal text
• High-level information about the text
• Coordination information (authors, publication dates, amendments, jurisdiction, period of effect)
• Structural information: books, articles, clauses, citations…[Adedjouma, Maxwell, Sannier, Zeni]
• Semantic information: rights, permissions, obligations…[Breaux, Ghanavati, Ingolfo, Maxwell, Zeni]
• Essential for (automated) legal (compliance) activities
4
Legal Metadata
5
Legal MetadataCurrent Practices
• Shallow structure
• Cross reference not handled systematically
• Extremely tedious
6
Legal Metadata GenerationWhy is it so Difficult?
• Variations in the structure of the texts• Short texts and long texts do not have the same structure• Complex nesting in subdivisions
• Variations in drafting practices over time and jurisdictions• Variations in elements labeling• amendments
• Human errors or (intended) inconsistencies• Formatting decisions, misspelling of labels, under-specified structure or
cross references
7
Legal Metadata Generation
• SCL has already in place:• Facilities for manually editing XML documents• In-house developments for short text conversion with shallow information• No scalable solutions for large or/and complex legal texts• No scalable solutions to treat the whole corpus
• Our challenge:• Develop a solution for markup (metadata) generation in the large• Focus on structural markup
8
Automated Legal Metadata Generation
• Builds on our previous work on cross references detection and resolution [Adedjouma, Sannier] and extended to structural markup generation
9
3. Resolve cross
references
Legal text(Non-Markup) Text with
organization annotations
Text with full structural annotations
Text organization metamodel
Legal text in markup
format XML
2. Detect text
organization
4. GenerateXML
1. Define conceptual model for
text structure
Structural Metadata ExtractionAnnotated Text
• Deep structure resolution
• Systematic cross reference detection and resolution
10
Section IV.- On civil servants and officers in charge of certain judicial police functionsSubsection 1. - The mayorsArt. 13-1. (L. 16 June 1989) The mayors are in charge of enforcing […].Subsection 2. - Rural guards and rangersArt. 14. (L. 16 June 1989) The rural guards are in charge of investigating […].Art. 14-1. (1) They are in charge of looking after removed goods from the place of removal to the place where the goods are put under administrative custody.(2) They cannot enter houses […], unless […]. …
Structural Metadata ExtractionGenerated XML
11
<section><num>Section IV</num><heading>.- On civil servants and officers in charge of certain judicial police functions</heading><subsection><num>Subsection 1</num><heading>. - The mayors</heading><article scl:uri="http://eli.legilux.public.lu/eli/etat/leg/code/instruction/art_13-1"> <num>13-1</num><alinea><content><p>(<ref href="http://eli.legilux.public.lu/eli/etat/leg/loi/1989/06/16/n1">L. 16 June 1989</ref>) The mayors are in charge of enforcing […].</p></content></alinea></article></subsection><subsection><num>Subsection 2</num><heading>. - Rural guards and rangers</heading><article scl:uri="http://eli.legilux.public.lu/eli/etat/leg/code/instruction/art_14"><num>14</num><alinea><content><p>(<ref href="http://eli.legilux.public.lu/eli/etat/leg/loi/1989/06/16/n1">L. 16 June 1989</ref>) The rural guards are in charge of investigating […].</p></content></alinea></article><article scl:uri="http://eli.legilux.public.lu/eli/etat/leg/code/instruction/art_14-1"> <num>14-1</num><paragraph><num>(1)</num><alinea><content><p>They are in charge of looking after removed goods from the place of removal to the place where the goods are put under administrative custody.</p></content></alinea></paragraph><paragraph><num>(2)</num><alinea><content><p>They cannot enter houses […], unless [...].</p></content></alinea> [...] </paragraph> [...] </article> [...] </subsection> [...]</section>
Evaluation
• Case study: enhancing 5 major codes selected by SCL with structural metadata
• How well does the tool perform in generating structural metadata?
• Lessons from the case study
• Tailoring the tool for the processing of the 5 legislative codes
• (Manual) verification of each generated document
• How to make the metadata generation process more effective?
12
Cost-effective Metadata Generation
• Keep in mind that these documents will have to be manually validated
• Do not over-specify (depth allowed by the XML schema, URI strategy)
• The more metadata available, the more metadata SCL has to review for validation
• One shot activity
• No one size fits all solutions
• Do not try to automatically resolve all the tricky situations and exceptions in the texts
13
Easing Metadata Generation
• Manual preprocessing (for high-level divisions) to ease the automated processing
• Deviations in labeling that require additional detection rules
• Provide artificial labels for unlabeled high-level divisions
• Try to detect (and address) inconsistencies during the pre-analysis of the document
• Favor manual post processing for fixing hard to resolve issues
• Cross references using rare patterns
• (Complex) (implicit) structure nesting in subdivisions14
Application Generated Markup for 5 Legislative Codes
15
High-leveldivisionsArticle
Sub-articledivisionsCRs Effort(FTE
workdays)Part Book Title Chapter Section Subsection Paragraph Alinea L/N/D
CivilCode 3 3 36 131 154 43 2316 33 3474 361 997 24
CommercialCode 0 4 21 14 12 0 261 14 489 85 232 12
PenalCode 0 2 11 86 46 0 671 64 1094 471 912 7
CodeofCriminalProcedure 0 3 15 33 36 7 529 542 1315 325 1065 10
NewCodeforCivilProcedure 2 11 90 19 42 30 1322 206 2316 342 821 7
Total 5 23 173 283 290 80 5099 859 8688 1584 4027 60
Application Quality of the Generated Markup
16
CivilCode NewCodeofCivilProcedure
FC PC M Q(%) FC PC M Q(%)Part 0 0 3 0 2 0 0 100Book 2 1 0 66,6 11 0 0 100Title 36 0 0 100 90 0 0 100Chapter 129 0 2 98,5 19 0 0 100Section 154 0 0 100 42 0 0 100Subsection 39 4 0 90,7 30 0 0 100Article 2305 10 1 99,5 1321 1 0 99,9Paragraph 33 0 0 100 206 0 0 100Alinea 3358 53 63 96,6 2297 12 7 99,2L/N/D 318 12 31 88,1 307 3 32 89,8InternalCR 436 10 0 97,8 367 21 1 94,3ExternalCR 245 297 9 44,5 74 355 3 17,1
FC: Fully correct markupPC: Partially correct markupM: Missing markupQ: Quality of the generated markup
Q = FC / (FC+PC+M)
Overall Results
• >21,000 structural markup elements
• Including >5000 articles and >4000 cross references
• ~91% of generated markup is fully correct
• ~8% of markup needs tweaks (often minor)
• ~1% of markup needs to be manually inserted
• Manual work mainly related to verification and external cross references
(<ref href=”…/eli/etat/leg/loi/1989/06/16/n1">L. 16 June 1989</ref>) 17
Conclusion
• Structural Metadata (Markup) generation in the large
• Apparently simple but hard to achieve in practice
• Case study conducted over 5 legislative codes
• ~91% of the generated markup is fully correct
• Perfect automation is impossible to achieve
• Cost-effective balance between automation and manual work
18
19
On-going Work
• Reengineering the tool into a more robust and configurable tool: ARMLET
• Going for a pilot deployment at SCL (and commercialization?)
• Research partnership with SCL on semantic metadata and compliance rules extraction
• Deontic modalities (rights, obligations, permissions), actors, beneficiaries, events, conditions and consequences
• Going for smart legal search and dynamic Q/A systems
• (Automated) legal compliance activities
20
References• Morayo Adedjouma, Mehrdad Sabetzadeh, Lionel C. Briand: Automated detection and resolution of legal cross
references: Approach and a study of Luxembourg's legislation. RE 2014: 63-72
• Travis D. Breaux, Annie I. Antón, Jon Doyle: Semantic parameterization: A process for modeling domain descriptions. ACM Trans. Softw. Eng. Methodol. 18(2): 5:1-5:27 (2008)
• Sepideh Ghanavati, Daniel Amyot, André Rifaut: Legal goal-oriented requirement language (legal GRL) for modeling regulations. MiSE 2014: 1-6
• Silvia Ingolfo, Ivan Jureta, Alberto Siena, Anna Perini, Angelo Susi: Nòmos 3: Legal Compliance of Roles and Requirements. ER 2014: 275-288
• Jeremy C. Maxwell, Annie I. Antón: The production rule framework: developing a canonical set of software requirements for compliance with law. IHI 2010: 629-636
• Jeremy C. Maxwell, Annie I. Antón, Peter P. Swire, Maria Riaz, Christopher M. McCraw: A legal cross-references taxonomy for reasoning about compliance requirements. Requir. Eng. 17(2): 99-115 (2012)
• Nicolas Sannier, Morayo Adedjouma, Mehrdad Sabetzadeh, Lionel C. Briand: An automated framework for detection and resolution of cross references in legal texts. Requir. Eng. 22(2): 215-237 (2017)
• Nicola Zeni, Nadzeya Kiyavitskaya, Luisa Mich, James R. Cordy, John Mylopoulos: GaiusT: supporting the extraction of rights and obligations for regulatory compliance. Requir. Eng. 20(1): 1-22 (2015)
• Nicola Zeni, E. A. Seid, Priscila Engiel, Silvia Ingolfo, John Mylopoulos: Building Large Models of Law with NómosT. ER 2016: 233-247
.lusoftware verification & validationVVS
Legal Markup Generation in the Large: An Experience Report
Nicolas Sannier1, Morayo Adedjouma1, Mehrdad Sabetzadeh1, Lionel Briand1, John Dann2, Marc Hisette2, Pascal Thill2
1 – SnT / University of Luxembourg2 – Service Central de Législation (SCL)