19
S T E X+ – a System for Flexible Formalization of Linked Data Andrea Kohlhase and Michael Kohlhase and Christoph Lange [email protected] Computer Science Jacobs University Bremen, Germany September 1, 2010 Christoph Lange: S T E X+ 1 September 1, 2010

sTeX+ – a System for Flexible Formalization of Linked Data

Embed Size (px)

DESCRIPTION

I-Semantics 2010

Citation preview

Page 1: sTeX+ – a System for Flexible Formalization of Linked Data

STEX+ – a System for Flexible Formalization of LinkedData

Andrea Kohlhase and Michael Kohlhase and Christoph Lange

[email protected] ScienceJacobs UniversityBremen, Germany

September 1, 2010

Christoph Lange: STEX+ 1 September 1, 2010

Page 2: sTeX+ – a System for Flexible Formalization of Linked Data

SAMS: A Case Study for Semantic Authoring

• SAMS: Safety Component for Autonomous Mobile Systems [FHL+08]

• develop a safety component for autonomous mobile service robots and to get itcertified as SIL-3 standard compliant

1 Implement in Misra-C

2 Verify safety properties in Isabelle

3 Submit to Certification Agency (TUV)

• Follow the V-Model Discipline

• Document Collection SAMSDocs(Word, LATEX)

• Idea: turn documents intoLinked [Closed] Data

Christoph Lange: STEX+ 2 September 1, 2010

Page 3: sTeX+ – a System for Flexible Formalization of Linked Data

Linked Data in Software Engineering: Motivation

Information needs in a software engineering project:

Programmer: Symbol: defined where?Specification: how much already implemented?Proof: already verified?

Project Manager: Has the code been verified? (needs high-level figures)Changes since last certification? Affected parts?Who is in charge of what? How can one employee be replaced?

Certifier: Get an overviewFollow links through specification and implementationWhat needs to be re-certified?

All: Whom to ask for details about something?

Goal: Answer questions from distributed document collection

• Partially formalize LATEX documents (STEX+)

• Linked Data also works in the [enterprise] intranet [Ser08](it’s just an architecture after all)

Christoph Lange: STEX+ 3 September 1, 2010

Page 4: sTeX+ – a System for Flexible Formalization of Linked Data

Agenda: From LATEX to Linked Data

1 SAMSDocs were available as LATEX (state of the art for technical writing)

2 We had a semantic extension of LATEX: STEX (formalizing mathematics)

3 Too rigid w.r.t. existing layout, no support for project-specific metadata

4 “STEX+”: identify objects, define ad hoc metadata vocabulary, annotate objectswith these metadata terms

5 Output targets: PDF and OMDoc+RDFa

{RDFXHTML+MathML+RDFa

6 Offer interactive services in XHTML+MathML+RDFa documents

Christoph Lange: STEX+ 4 September 1, 2010

Page 5: sTeX+ – a System for Flexible Formalization of Linked Data

STEX, a Semantic Variant of TEX/LATEX

• Problem: Need content markup formats for semantic services, butMathematicians write LATEX

• Idea: Enable the author to make structure explicit and disambiguate meanings• use the TEX macro mechanism for this (well established)• the author knows the semantics best (at least she understands)• the burden is is alleviated by manageability savings (MKM on TEX/LATEX)

• Definition 1 (STEX Approach) Semantic pre-loading of TEX/LATEX documents.

• Introduce semantic macros: e.g. \union{a,b,c} a ∪ b ∪ c• Mark up discourse structure: (largely invisible)

e.g. \begin{sproof}[id=Wiles,for=Fermat]. . . \end{sproof}• Generate PDF and OMDoc from that (via LATEXML [Mil])

http://trac.kwarc.info/sTeX/

Christoph Lange: STEX+ 5 September 1, 2010

Page 6: sTeX+ – a System for Flexible Formalization of Linked Data

STEXIDE: Integrated STEX Development Environment

http://stexide.googlecode.com [JK10]

Christoph Lange: STEX+ 6 September 1, 2010

Page 7: sTeX+ – a System for Flexible Formalization of Linked Data

The Formalization Workflow with sTeX

Christoph Lange: STEX+ 7 September 1, 2010

Page 8: sTeX+ – a System for Flexible Formalization of Linked Data

Coping with SAMS Practices in STEX I

• Problem: vanilla STEX is not enough for this projectThe SAMS has its own structures (but want to preserve appearance)

• Example 2 (Definition tables)Definitions in tables (OMDoc only allows sequences of definitions)

Idea: Extend STEX to STEX-SD with custom macros and environments to cope.

Christoph Lange: STEX+ 8 September 1, 2010

Page 9: sTeX+ – a System for Flexible Formalization of Linked Data

Coping with SAMS Practices in STEX II

•• Idea: Tabular environment that only outputs definitions to OMDoc(LATEXML bindings)

Christoph Lange: STEX+ 9 September 1, 2010

Page 10: sTeX+ – a System for Flexible Formalization of Linked Data

Non-Logical Relations in SAMSDocs

• The V-Model introduces relationsbetween document fragments

• formalize V-Model vocabulary:• refines• implements• describesUse• . . .

• Idea: Mark up these secondary (non-logical) relations as OMDoc metadata

• OMDoc allows flexibly extensible metadata [LK09]:• annotate relations via RDFa (RDF in XML)• specify their meaning via ontologies (also expressible in OMDoc)

• Example 3 \SemVMrel[cd=reqspec,refid=R12,rel=refines] generatesthe RDFa metadata

< l i n k r e l=” s v m : r e f i n e s ” h r e f=” . . / r e q u i r e m e n t s s p e c#R12”/>

Christoph Lange: STEX+ 10 September 1, 2010

Page 11: sTeX+ – a System for Flexible Formalization of Linked Data

The STEX-SD Vocabulary Extensions Classified

Christoph Lange: STEX+ 11 September 1, 2010

Page 12: sTeX+ – a System for Flexible Formalization of Linked Data

Vocabulary Extensions Without LATExml Hacking

Dual role of STEX:

• define modular, domain-specific vocabularies

• . . . and use them to annotate documents

So far: mathematical theories (“vocabularies” of mathematical symbols)use semantic macros for symbols in formulae

(otherwise fixed vocabulary of mathematical structures)

Now (STEX+): ontologies (RDF vocabularies)annotate documents with metadata

Christoph Lange: STEX+ 12 September 1, 2010

Page 13: sTeX+ – a System for Flexible Formalization of Linked Data

Defining RDF Vocabularies in STEX+

our ad hoc vocabulary so far

We want something that

• . . . is more scalable and reusablethan hand-crafted LATExmlbindings

• . . . translates to a real RDFS/OWLontology (via OMDoc, or directly)

\beg in{module } [ i d=c e r t i f i c a t i o n ]\metalanguage [ . . / background / r d f s ]{ r d f s }% metadata p r op e r t y w i th domain :\keydef{document}{ ha sS ta t e}\symdef{ s t a t e−doc−rd } [ 1 ]{ rd . #1}\symdef{ tuev}{\ t e x t {T\”UV}}\beg in{ d e f i n i t i o n } [ f o r=ha sS ta t e ]A document \ d e f i [ h a sS t a t e ]{%has s t a t e } $x $ , i f f the p r o j e c t managerd e c r e e s i t so . \end{ d e f i n i t i o n }\beg in{ d e f i n i t i o n } [ f o r=s t a t e−doc−rd ]

A document has s t a t e \def in iendum [%s t a t e−doc−rd ]{ rd . $x $} ,

i f f i t has been submi t t ed to$x$ f o r c e r t i f i c a t i o n . \end{ d e f i n i t i o n }

\beg in{ d e f i n i t i o n } [ f o r=tuev ]The $\ tuev $ ( Techn i s che r\”Uberwachungsve re in ) i s awe l l−known c e r t i f i c a t i o n agencyi n Germany . \end{ d e f i n i t i o n }

\end{module}

% Usage\ importmodule [ . . / onto / c e r t ]{ c e r t i f i c a t i o n }\beg in{document } [ h a sS ta t e=

$\ s t a t e d o c r d {\ tuev }$] . . .\end{document}

Christoph Lange: STEX+ 13 September 1, 2010

Page 14: sTeX+ – a System for Flexible Formalization of Linked Data

Our Infrastructure

http://kwarc.info/LinkedLectures/ [DKL+10]

Christoph Lange: STEX+ 14 September 1, 2010

Page 15: sTeX+ – a System for Flexible Formalization of Linked Data

Finding a Substitute for an Employee via the V-Model

• Harvest the RDFa from OMDoc into a RDF triple store (standard)

• Ask the following Query to a SPARQL endpoint

PREFIX vm : <ht tp : //www. sams−p r o j e k t . de/ o n t o l o g i e s /VersionManagement#>PREFIX omdoc : <ht tp : // omdoc . org / on to l ogy#> # OMDocPREFIX semVM: <ht tp : //www. sams−p r o j e k t . de/ o n t o l o g i e s /V−model#>PREFIX dc : <ht tp : // p u r l . o rg /dc/ e l ement s /1.1/> # Dubl in CorePREFIX xsd : <ht tp : //www.w3 . org /2001/XMLSchema#> # XML Schema da t a t yp e s

SELECT ? po t e n t i a l S ub s t i t u t eName WHERE {?document vm : r e s p o n s i b l e < . . . / employees#A l i c e> ;

omdoc : hasPar t ? o b j e c t .{ ? o b j e c t semVM: r e f i n e s ? r e l a t e dOb j e c t }

UNION{ ? o b j e c t omdoc : o c c u r s I nD e f i n i t i o nO f ? r e l a t e dOb j e c t }? otherDocument omdoc : hasPar t ? r e l a t e dOb j e c t ;

dc : date ? date ;vm : r e s p o n s i b l e ? p o t e n t i a l S u b s t i t u t e .

FILTER (? otherDocument > ”2009−01−01”ˆˆ xsd : date )? p o t e n t i a l S u b s t i t u t e f o a f : name ? po t e n t i a l S ub s t i t u t eName .}

• Present the result to the user.

Christoph Lange: STEX+ 15 September 1, 2010

Page 16: sTeX+ – a System for Flexible Formalization of Linked Data

RDFa and MathML Annotations as Anchors for InteractiveServices/Mashups

Here: an example in mathematical lecture notesIn SAMSDocs: e.g. information on the processing state of a document

Christoph Lange: STEX+ 16 September 1, 2010

Page 17: sTeX+ – a System for Flexible Formalization of Linked Data

Conclusions

• Software engineering documents: Contracts, Requirements, MathematicalModels, Manuals

• Wanted to exploit these structures as Linked Data

• Formalized them from LATEX to STEX+ making project-specific semanticstructures explicit

• We can define RDF vocabularies and annotate documents in the same languagedocument and ontology co-development (reduces formalization barriers)

• We obtain OMDoc+RDFa output, and further . . .

Plain RDF: SPARQL queries: How much is implemented/verified? How toreplace an employee? What needs to be re-certified?

XHTML+MathML+RDFa: hook interactive (lookup) services right into theannotations

• Ongoing work: stabilize the setup (current focus: CS/math lecture notes)

Christoph Lange: STEX+ 17 September 1, 2010

Page 18: sTeX+ – a System for Flexible Formalization of Linked Data

Catalin David, Michael Kohlhase, Christoph Lange, Florian Rabe, NikitaZhiltsov, and Vyacheslav Zholudev.Publishing math lecture notes as linked data.In Lora Aroyo, Grigoris Antoniou, Eero Hyvonen, Annette ten Teije, HeinerStuckenschmidt, Liliana Cabral, and Tania Tudorache, editors, The SemanticWeb: Research and Applications (Part II), number 6089 in Lecture Notes inComputer Science, pages 370–375. Springer Verlag, 2010.

Udo Frese, Daniel Hausmann, Christoph Luth, Holger Taubig, and DennisWalter.The importance of being formal.In Hardi Hungar, editor, International Workshop on the Certification ofSafety-Critical Software Controlled Systems SafeCert’08, volume 238 ofElectronic Notes in Theoretical Computer Science, pages 57–70, September2008.

Constantin Jucovschi and Michael Kohlhase.sTeXIDE: An integrated development environment for sTeX collections.In Serge Autexier, Jacques Calmet, David Delahaye, Patrick D. F. Ion,Laurence Rideau, Renaud Rioboo, and Alan P. Sexton, editors, IntelligentComputer Mathematics, number 6167 in LNAI. Springer Verlag, 2010.

Christoph Lange and Michael Kohlhase.Christoph Lange: STEX+ 17 September 1, 2010

Page 19: sTeX+ – a System for Flexible Formalization of Linked Data

A mathematical approach to ontology authoring and documentation.In Jacques Carette, Lucas Dixon, Claudio Sacerdoti Coen, and Stephen M.Watt, editors, MKM/Calculemus Proceedings, number 5625 in LNAI, pages389–404. Springer Verlag, July 2009.

Bruce Miller.LaTeXML: A LATEX to XML converter.Web Manual at http://dlmf.nist.gov/LaTeXML/.seen September2011.

Francois-Paul Servant.Linking enterprise data.In Christian Bizer, Tom Heath, Kingsley Idehen, and Tim Berners-Lee, editors,Linked Data on the Web (LDOW), number 369 in CEUR WorkshopProceedings, Aachen, April 2008.

Christoph Lange: STEX+ 17 September 1, 2010