22
.lu software verification & validation V V S Legal Markup Generation in the Large: An Experience Report Nicolas Sannier 1 , Morayo Adedjouma 1 , Mehrdad Sabetzadeh 1 , Lionel Briand 1 , John Dann 2 , Marc Hisette 2 , Pascal Thill 2 1 – SnT / University of Luxembourg 2 – Service Central de Législation (SCL)

Legal Markup Generation in the Large: An Experience Report

Embed Size (px)

Citation preview

Page 1: Legal Markup Generation in the Large: An Experience Report

.lusoftware verification & validationVVS

Legal Markup Generation in the Large: An Experience Report

Nicolas Sannier1, Morayo Adedjouma1, Mehrdad Sabetzadeh1, Lionel Briand1, John Dann2, Marc Hisette2, Pascal Thill2

1 – SnT / University of Luxembourg2 – Service Central de Législation (SCL)

Page 2: Legal Markup Generation in the Large: An Experience Report

Research Context SCL

• SCL: Service central de législation

• Publication of all the Luxembourgish legislation through the Legilux portal (http://legilux.public.lu)

• ~91000 legislative documents

• Since Jan. 1st, 2017: >32M of accesses to Legilux and >34M of searches

• Legal professionals, companies, citizens

2

Page 3: Legal Markup Generation in the Large: An Experience Report

Research Context SCL’s Challenge

• Until 2016, the legal texts in Legilux were PDF documents

• Transitioning from PDF documents to digital resources

• Semantic web technologies and legal metadata about the legal texts

• Challenge:

• How to generate all the metadata for the complete legislative framework?

3

Page 4: Legal Markup Generation in the Large: An Experience Report

Legal Metadata

• Legal metadata is machine-readable information about a legal text

• High-level information about the text

• Coordination information (authors, publication dates, amendments, jurisdiction, period of effect)

• Structural information: books, articles, clauses, citations…[Adedjouma, Maxwell, Sannier, Zeni]

• Semantic information: rights, permissions, obligations…[Breaux, Ghanavati, Ingolfo, Maxwell, Zeni]

• Essential for (automated) legal (compliance) activities

4

Page 5: Legal Markup Generation in the Large: An Experience Report

Legal Metadata

5

Page 6: Legal Markup Generation in the Large: An Experience Report

Legal MetadataCurrent Practices

• Shallow structure

• Cross reference not handled systematically

• Extremely tedious

6

Page 7: Legal Markup Generation in the Large: An Experience Report

Legal Metadata GenerationWhy is it so Difficult?

• Variations in the structure of the texts• Short texts and long texts do not have the same structure• Complex nesting in subdivisions

• Variations in drafting practices over time and jurisdictions• Variations in elements labeling• amendments

• Human errors or (intended) inconsistencies• Formatting decisions, misspelling of labels, under-specified structure or

cross references

7

Page 8: Legal Markup Generation in the Large: An Experience Report

Legal Metadata Generation

• SCL has already in place:• Facilities for manually editing XML documents• In-house developments for short text conversion with shallow information• No scalable solutions for large or/and complex legal texts• No scalable solutions to treat the whole corpus

• Our challenge:• Develop a solution for markup (metadata) generation in the large• Focus on structural markup

8

Page 9: Legal Markup Generation in the Large: An Experience Report

Automated Legal Metadata Generation

• Builds on our previous work on cross references detection and resolution [Adedjouma, Sannier] and extended to structural markup generation

9

3. Resolve cross

references

Legal text(Non-Markup) Text with

organization annotations

Text with full structural annotations

Text organization metamodel

Legal text in markup

format XML

2. Detect text

organization

4. GenerateXML

1. Define conceptual model for

text structure

Page 10: Legal Markup Generation in the Large: An Experience Report

Structural Metadata ExtractionAnnotated Text

• Deep structure resolution

• Systematic cross reference detection and resolution

10

Section IV.- On civil servants and officers in charge of certain judicial police functionsSubsection 1. - The mayorsArt. 13-1. (L. 16 June 1989) The mayors are in charge of enforcing […].Subsection 2. - Rural guards and rangersArt. 14. (L. 16 June 1989) The rural guards are in charge of investigating […].Art. 14-1. (1) They are in charge of looking after removed goods from the place of removal to the place where the goods are put under administrative custody.(2) They cannot enter houses […], unless […]. …

Page 11: Legal Markup Generation in the Large: An Experience Report

Structural Metadata ExtractionGenerated XML

11

<section><num>Section IV</num><heading>.- On civil servants and officers in charge of certain judicial police functions</heading><subsection><num>Subsection 1</num><heading>. - The mayors</heading><article scl:uri="http://eli.legilux.public.lu/eli/etat/leg/code/instruction/art_13-1"> <num>13-1</num><alinea><content><p>(<ref href="http://eli.legilux.public.lu/eli/etat/leg/loi/1989/06/16/n1">L. 16 June 1989</ref>) The mayors are in charge of enforcing […].</p></content></alinea></article></subsection><subsection><num>Subsection 2</num><heading>. - Rural guards and rangers</heading><article scl:uri="http://eli.legilux.public.lu/eli/etat/leg/code/instruction/art_14"><num>14</num><alinea><content><p>(<ref href="http://eli.legilux.public.lu/eli/etat/leg/loi/1989/06/16/n1">L. 16 June 1989</ref>) The rural guards are in charge of investigating […].</p></content></alinea></article><article scl:uri="http://eli.legilux.public.lu/eli/etat/leg/code/instruction/art_14-1"> <num>14-1</num><paragraph><num>(1)</num><alinea><content><p>They are in charge of looking after removed goods from the place of removal to the place where the goods are put under administrative custody.</p></content></alinea></paragraph><paragraph><num>(2)</num><alinea><content><p>They cannot enter houses […], unless [...].</p></content></alinea> [...] </paragraph> [...] </article> [...] </subsection> [...]</section>

Page 12: Legal Markup Generation in the Large: An Experience Report

Evaluation

• Case study: enhancing 5 major codes selected by SCL with structural metadata

• How well does the tool perform in generating structural metadata?

• Lessons from the case study

• Tailoring the tool for the processing of the 5 legislative codes

• (Manual) verification of each generated document

• How to make the metadata generation process more effective?

12

Page 13: Legal Markup Generation in the Large: An Experience Report

Cost-effective Metadata Generation

• Keep in mind that these documents will have to be manually validated

• Do not over-specify (depth allowed by the XML schema, URI strategy)

• The more metadata available, the more metadata SCL has to review for validation

• One shot activity

• No one size fits all solutions

• Do not try to automatically resolve all the tricky situations and exceptions in the texts

13

Page 14: Legal Markup Generation in the Large: An Experience Report

Easing Metadata Generation

• Manual preprocessing (for high-level divisions) to ease the automated processing

• Deviations in labeling that require additional detection rules

• Provide artificial labels for unlabeled high-level divisions

• Try to detect (and address) inconsistencies during the pre-analysis of the document

• Favor manual post processing for fixing hard to resolve issues

• Cross references using rare patterns

• (Complex) (implicit) structure nesting in subdivisions14

Page 15: Legal Markup Generation in the Large: An Experience Report

Application Generated Markup for 5 Legislative Codes

15

High-leveldivisionsArticle

Sub-articledivisionsCRs Effort(FTE

workdays)Part Book Title Chapter Section Subsection Paragraph Alinea L/N/D

CivilCode 3 3 36 131 154 43 2316 33 3474 361 997 24

CommercialCode 0 4 21 14 12 0 261 14 489 85 232 12

PenalCode 0 2 11 86 46 0 671 64 1094 471 912 7

CodeofCriminalProcedure 0 3 15 33 36 7 529 542 1315 325 1065 10

NewCodeforCivilProcedure 2 11 90 19 42 30 1322 206 2316 342 821 7

Total 5 23 173 283 290 80 5099 859 8688 1584 4027 60

Page 16: Legal Markup Generation in the Large: An Experience Report

Application Quality of the Generated Markup

16

CivilCode NewCodeofCivilProcedure

FC PC M Q(%) FC PC M Q(%)Part 0 0 3 0 2 0 0 100Book 2 1 0 66,6 11 0 0 100Title 36 0 0 100 90 0 0 100Chapter 129 0 2 98,5 19 0 0 100Section 154 0 0 100 42 0 0 100Subsection 39 4 0 90,7 30 0 0 100Article 2305 10 1 99,5 1321 1 0 99,9Paragraph 33 0 0 100 206 0 0 100Alinea 3358 53 63 96,6 2297 12 7 99,2L/N/D 318 12 31 88,1 307 3 32 89,8InternalCR 436 10 0 97,8 367 21 1 94,3ExternalCR 245 297 9 44,5 74 355 3 17,1

FC: Fully correct markupPC: Partially correct markupM: Missing markupQ: Quality of the generated markup

Q = FC / (FC+PC+M)

Page 17: Legal Markup Generation in the Large: An Experience Report

Overall Results

• >21,000 structural markup elements

• Including >5000 articles and >4000 cross references

• ~91% of generated markup is fully correct

• ~8% of markup needs tweaks (often minor)

• ~1% of markup needs to be manually inserted

• Manual work mainly related to verification and external cross references

(<ref href=”…/eli/etat/leg/loi/1989/06/16/n1">L. 16 June 1989</ref>) 17

Page 18: Legal Markup Generation in the Large: An Experience Report

Conclusion

• Structural Metadata (Markup) generation in the large

• Apparently simple but hard to achieve in practice

• Case study conducted over 5 legislative codes

• ~91% of the generated markup is fully correct

• Perfect automation is impossible to achieve

• Cost-effective balance between automation and manual work

18

Page 19: Legal Markup Generation in the Large: An Experience Report

19

Page 20: Legal Markup Generation in the Large: An Experience Report

On-going Work

• Reengineering the tool into a more robust and configurable tool: ARMLET

• Going for a pilot deployment at SCL (and commercialization?)

• Research partnership with SCL on semantic metadata and compliance rules extraction

• Deontic modalities (rights, obligations, permissions), actors, beneficiaries, events, conditions and consequences

• Going for smart legal search and dynamic Q/A systems

• (Automated) legal compliance activities

20

Page 21: Legal Markup Generation in the Large: An Experience Report

References• Morayo Adedjouma, Mehrdad Sabetzadeh, Lionel C. Briand: Automated detection and resolution of legal cross

references: Approach and a study of Luxembourg's legislation. RE 2014: 63-72

• Travis D. Breaux, Annie I. Antón, Jon Doyle: Semantic parameterization: A process for modeling domain descriptions. ACM Trans. Softw. Eng. Methodol. 18(2): 5:1-5:27 (2008)

• Sepideh Ghanavati, Daniel Amyot, André Rifaut: Legal goal-oriented requirement language (legal GRL) for modeling regulations. MiSE 2014: 1-6

• Silvia Ingolfo, Ivan Jureta, Alberto Siena, Anna Perini, Angelo Susi: Nòmos 3: Legal Compliance of Roles and Requirements. ER 2014: 275-288

• Jeremy C. Maxwell, Annie I. Antón: The production rule framework: developing a canonical set of software requirements for compliance with law. IHI 2010: 629-636

• Jeremy C. Maxwell, Annie I. Antón, Peter P. Swire, Maria Riaz, Christopher M. McCraw: A legal cross-references taxonomy for reasoning about compliance requirements. Requir. Eng. 17(2): 99-115 (2012)

• Nicolas Sannier, Morayo Adedjouma, Mehrdad Sabetzadeh, Lionel C. Briand: An automated framework for detection and resolution of cross references in legal texts. Requir. Eng. 22(2): 215-237 (2017)

• Nicola Zeni, Nadzeya Kiyavitskaya, Luisa Mich, James R. Cordy, John Mylopoulos: GaiusT: supporting the extraction of rights and obligations for regulatory compliance. Requir. Eng. 20(1): 1-22 (2015)

• Nicola Zeni, E. A. Seid, Priscila Engiel, Silvia Ingolfo, John Mylopoulos: Building Large Models of Law with NómosT. ER 2016: 233-247

Page 22: Legal Markup Generation in the Large: An Experience Report

.lusoftware verification & validationVVS

Legal Markup Generation in the Large: An Experience Report

Nicolas Sannier1, Morayo Adedjouma1, Mehrdad Sabetzadeh1, Lionel Briand1, John Dann2, Marc Hisette2, Pascal Thill2

1 – SnT / University of Luxembourg2 – Service Central de Législation (SCL)