35
Metadata in NIR Fabio Vitali University of Bologna Maria Guercio University of Urbino

Metadata in NIR

  • Upload
    yon

  • View
    41

  • Download
    0

Embed Size (px)

DESCRIPTION

Metadata in NIR. Fabio Vitali University of Bologna Maria Guercio University of Urbino. Introduction. Metadata support has always been present in NIR Recently (June/July 2004) deep (and hot) discussions have happened within the WG about identifying a full set of metadata information - PowerPoint PPT Presentation

Citation preview

Page 1: Metadata in NIR

Metadata in NIRFabio Vitali

University of BolognaMaria Guercio

University of Urbino

Page 2: Metadata in NIR

Introduction

Metadata support has always been present in NIR

Recently (June/July 2004) deep (and hot) discussions have happened within the WG about identifying a full set of metadata information

This is the result so far of the status of discussion.

Page 3: Metadata in NIR

Some terminology

Automatic: any task that can be completely left to the machine to be performed

– All kinds of data format conversion – E.g. XML->HTML or NIR XML -> NIR RDF.

Semi-automatic: any task that can, with a certain degree of precision, be performed by the machine, but that still requires a human for final verification and approval.

– Identification of structures – E.g. partitioning of documents, identification and interpretation

of citations Manual: any task that needs to be decided upon and

performed by a thinking human, even though the machine can provide the support to help him/her and ease the task itself.

Page 4: Metadata in NIR

Some terminology (2)

Objective– an objective datum is something for which no reasonable

discussion can exist as to its value.– E.g. the title of article 15, the publication date

Subjective– A subjective datum is something that requires an active

interpretation from a human that may be wrong, or for which different opinions exist

– E.g., resolution of implicit citations, classification of provisions

Explicit– A datum that is actually written somewhere in the text

Implicit– A datum that needs to be deduced from the external, or

through the application of specific reasoning

Page 5: Metadata in NIR

Some terminology (3) Low competence

– the kind of competence one may expect from a non-specialized employee, such as a secretary, armed with just common sense and some topical experience

– E.g.: where does article 1 end and article 2 start High competence

– The kind of competence one may expect from overspecialized jurists that come to some results after careful and painful reasoning

– e.g.: dates and times in norms. Editorial intervention

– by the publisher of a document Authorial intervention

– by the author of a document

Page 6: Metadata in NIR

Design issues for NIR (1)

Data structure rather than application– Norme In Rete knows about applications,

but is not dependent on any use of the data and is not specifically targeted towards any specific application (except presentation)

– The same text should be marked in the same way by different editors (at least in the most fundamental structures)

Page 7: Metadata in NIR

Design issues for NIR (2)

Rigorous distinction of roles– The author of a norm is the legislator, the provider of

the actual XML document is the editor.– The legislator is GOD (his decisions cannot be

discussed), but He only speaks through the text of the norms.

– The editor can add a large quantity of information, but it has no official status

– The very act of adding tag is an editorial operation, subjective and open to discussions.

– In fact, any addition coming from editors (structure identification, notes, comments, interpretation) happens outside of the document content (in markup structures or in special metadata sections)

Page 8: Metadata in NIR

Design issues for NIR (3)

Complexity of the access to texts– Many editors, many publishing systems,

many copies in different stages of evolution

– There is no authoritative source of XML documents (only of printed documents).

– One web site could forget about updating a law to the latest version

– Use of URN allows to refer to the text of a law without identifying a single existing authoritative source.

Page 9: Metadata in NIR

Design issues for NIR (4)

Support for description and prescription– Tagging of existing texts can only be

descriptive (supporting any possible mess that the legislator may have put in)

– Support for legal drafting can be provided, suggesting or enforcing legal drafting rules in the writing.

Page 10: Metadata in NIR

Design issues for NIR (5)

Everything has a reliable name– Every legal structure needs to be

referenced and accessible.– References need to be unambiguous,

universal, definitive.– URN for whole documents, – id attributes for substructures and spans– XPointers for even smaller entities.

Page 11: Metadata in NIR

Design issues for NIR (6)

Clean separation between objective properties and interpretation– Objective properties can be marked by low-level

editors, while interpretation requires experts and high-level editors.

– Objective (manifest) properties include identification of boundaries (articles, slauses, etc.) and official facts about texts (publication dates, etc.)

– Interpretation includes identification of troublesome dates (dies coactu, dies valens), identification of normative content of the texts provisions, application of modifications.

Page 12: Metadata in NIR

Design issues for NIR (7)

Specific support for multiple interpretations– “Disposizioni” (law provisions) can be

identified and specified on the text. – Multiple different interpretations of the

same text must be allowed– So they cab be placed outside of the main

document.

Page 13: Metadata in NIR

Basic structures (1)

Containers– Documents, parts, subparts, articles, etc. – All numbered and titled

Text containers– Clauses (comma), list elements, etc.

Inline elements– Presentation oriented (bold, italics, etc.): discouraged,

we rely on HTML elements and CSS styles– Legal oriented (references, modifications, specification

of dates, organizations, roles, places, etc.): we rely on specific NIR elements.

Page 14: Metadata in NIR

Basic structures (2)

Metadata– Publication information and other data supplied

by editors (publication notes, document evolution, etc.)

– Law provisions for the interpretation of the semantics of the content

Support for irregular texts (those that do not comply with standard legal drafting rules) is available through relaxed syntax in some cases (documentoNIR)

Page 15: Metadata in NIR

The Schemas for NIR documents3 different DTDs

– Strict rules (prescriptive)– Loose rules (descriptive)– Light rules (support for most common

cases)– They are intercompatible

The vocabulary is exactly the sameAll light documents are also looseAll strict document are also loose

Page 16: Metadata in NIR

The needs for metadata

Metadata represent the only chance for putting information that was not explicitly written by the legislator.

All possible types of additional information beyond those provided in the text need to find a place here.

Uses: archival, analysis, annotations, automatic processing (consolidation), etc.

Page 17: Metadata in NIR

Official classification of metadata A starting point is provided by NISO (US

National Information Standards Organization) in the guide “Understanding metadata” (2004):– descriptive metadata to describe a resource “for

purposes such as discovery and identification”– structural metadata to indicate “how compounds

objects are put together”– administrative metadata to provide information “to

help manage a resource”, articulated (only) as rights management metadata and preservation metadata (“information needed to archive and preserve a resource”)

Page 18: Metadata in NIR

But… The distinction between descriptive, structural and

administrative metadata cannot find any concrete basis on the real practice: – All the communities involved in the preservation of

documents have developed and used relevant information related to the structure identification as a sub-set of information of their descriptive systems. They never consider the structural data as independent component.

– The ambiguity of the administrative metadata is even more evident, specifically in the digital systems where the technological components are less and less relevant for the long-term preservation and play a function for physical retrieval of a resource in a digital repository, but are considered part of the descriptive system in the case of web resources.

Page 19: Metadata in NIR

<xml>Changes</xml>

Metadata in the NIR DTD

Any kind of information that is provided by the editor rather than by the author.

In a way even tagging text is metadata

Deriving new versions out of an original and a few modification documents is also adding metadata.

But adding proper metadata means providing additional information to a version of a document that can be used to better search, contextualize and understand a document.

text

<xml>Changes</xml>

<xml>Text

</xml>

<xml>Changes</xml>

<xml>Changes</xml>

meta

Page 20: Metadata in NIR

Proper metadata in the NIR DTD

Can be specified – In an external document (in RDF - still

underspecified)– In an internal section at the beginning of the

document (meta) in a NIR vocabulary– In many internal sections near the parts of the text

they refer to, in a NIR vocabulary Conversion back and forth is always possible

and automatic. Deals with description, structure,

administration, as well as: – Interpretation of content– Relationships with other documents– Comments and notes

Page 21: Metadata in NIR

Seven types of proper metadata Reflective information

– Things the document knows about itself Positioning information

– Things the document knows about the norms it expresses and the legal system it belongs to

Lifecycle information– Special moments in the history of the document and of its norms, and

the list of other documents that justify them Editorial notes

– Things the editor wants to attach to specific parts of the document but cannot, since the DTD does not allow editorial intervention on content

Iter-connected texts– The history of the document before its approval

Proprietary extensions Provisions (disposizioni)

Page 22: Metadata in NIR

Reflection info (descrittori)Refers to the document, not its content

– Publication date. Re-publications. Errata. Official clarifications.

– URN(s), aliases– Objective data, easy to find even with low competences

Storing freshness information?– A document does not usually know whether it is up-to-date.

We may deal with stale documents, dead web sites, CD-ROMs

– The best we can do is to provide them with a last-updated date

– The normative system will confirm whether this is the last interesting date, or there exist more recent versions of the same document

Page 23: Metadata in NIR

Positioning info (inquadramento)

Refers to the norms contained in the doc– Missing parts– Rank, function, nature and proposers of

the law– Keywords and taxonomies they belong to

Objective data (mostly), but requiring high competence to write down.

Page 24: Metadata in NIR

Lifecycle (altriatti) - 1

Over time, documents undergo changes (in content, efficacy, power and so on)

These change happen at specific points in time and depend on specific documents (modification documents).

Usually modification documents specify several changes on the same modified document, and may specify multiple modification dates.

Therefore it makes sense to create a secondary structure where all relevant moments and documents can be matched

Page 25: Metadata in NIR

Lifecycle (altriatti) - 2t01

1/1/1996

t02

1/3/1997

t03

12/6/1998

t04

24/9/1999

t05

1/1/2001

original

v01

modified

v02

suspended resumed

v02repealed

ID URN of law relation

r01 urn:nir:xxxxxxx12/1995 original

r02 urn:nir:xxxxxxx1/1997 passive

r03 urn:nir:xxxxxxx5/1998 passive

r04 urn:nir:xxxxxxx12/2000 passive

ID date idref

t01 1/1/1996 r01

t02 1/3/1997 r02

t03 12/6/1998 r03

t04 24/9/1999 r03

t05 1/1/2001 r04

Page 26: Metadata in NIR

Lifecycle (altriatti) - 3

The lifecycle section only provides information about the relation to the document that causes the modifications

This information is objective and can be provided with low competence

Information about each actual modification is optional and placed in the provision section.

That information is sometimes subjective and can be provided only with significant competence

Page 27: Metadata in NIR

Other sections

Editorial notes (redazionale)– Footnotes, comments, and any text the editor

feels like adding. It can point to specific places in the text through <ndr> elements

Iter-connected data (lavoripreparatori)– An official blurb detailing the iter for the approval

of the act, with presentation dates, discussion dates, etc. Plain text.

Proprietary– An open-ended section where editors can add

their own metadata with freedom.

Page 28: Metadata in NIR

Provisions

Provisions describe the meaning of each meaningful fragment of the text according to a predefined (and hopefully complete) taxonomy (ontology???)

Divided in three main sections plus a residual category:– Justifications– Analytical provisions– Modifications – Other

Page 29: Metadata in NIR

Justifications

Some norms (e.g., decrees) introduce before the actual text a foreword providing a number of justifications:– Considered…– Consulted…– Based on a proposal by– Considering…– Etc.

Page 30: Metadata in NIR

Analytical provisions

Describe properties and meaning of fragments of the actual text.

A full taxonomy exists, including concepts like definition, obligation, right, etc.

Carlo will be speaking about them

Page 31: Metadata in NIR

Modifications In a modifying law, each modification can be

described in detail with a provision. The provision describes in details what kind of

modification, the document it is applied to, where inside it, and when.

Possible modifications are: abrogation, substitution,insertion, renumbering, change of terms, prorogation, repetition, suspension, retro-activity, ultra-activity, etc (a total of 24 different types).

Currently no way to express normal case (dies coactu = dies valens = 15 days after publication for the whole act), but a way will be found soon.

Page 32: Metadata in NIR

Arguments for provisions

All provisions have some specific arguments, plus some shared arguments

E.g.: <motivazioni>

<regole><obbligo>

<pos href=“#art12com5”/><destinatario>sindaco</destinatario><controparte>ufficio tributi</controparte><termine da=“r01” a=“r02”/>

</obbligo>…

</regole>

Important shared arguments are positions and terms

Page 33: Metadata in NIR

Positions All provisions point to a position inside the

document where the text of the provision is placed. <articolo id="art1">

<num>1.</num> <comma id="art1-com1">

<num>1</num><corpo>blah blah</corpo>

…<obbligo>

<pos href=“#art1com”/><destinatario>xxx</destinatario><controparte>y1</controparte>

</obbligo>

The pos element points to the id, or XPointer, or the text content, of the part of the document that contains the provision.

Page 34: Metadata in NIR

Terms

Specify conditions, and specific efficacy (dies coactu) and validity (dies valens) intervals.

No formal language exists yet for specifying conditions– E.g.: “after the approval of the

corresponding regulation”Dates are specified by referring to the

id of the relevant date as placed in the lifecycle section.

Page 35: Metadata in NIR

Conclusions Metadata are still under heavy evolution within

the NIR WG. In the last 4 month a major work has been

started, in order to perform a systematic analysis of the desired metadata information for NIR documents.

I haven’t even mentioned namespaces Some details are still shaky (required elements,

repeatable elements, conditions, default values), but the structure should be reasonable stable.

These are not in the published version: it is still way too early.