Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Sasaki – SOAP! 2014
Value Beyond Content Crea<on: Introducing ITS 2.0
Felix Sasaki DFKI / W3C Fellow
Slides at hLp://www.w3.org/Talks/2014/1003-‐soap-‐sasaki.pdf
1
Sasaki – SOAP! 2014
If you want to have nice visualiza<on what ITS 2.0 is: go here J
“Linguini a la transla<on: An Introduc<on to ITS 2.0”
hLps://www.youtube.com/watch?v=5Goet3hX6Jo
2
Sasaki – SOAP! 2014
What content authors normally do
• Make money by crea<ng – Content – Layout – Apps
• More and more difficult – Growing amount of content & apps – What is the differen<ator?
3
Sasaki – SOAP! 2014
What content authors may do in the future
• Make money by enriching content – Using automa<c tools with manual correc<on – Create the basis for further processes
• Transla<on, search engine op<miza<on, contextualiza<on, personaliza<on, ..
– Authors become content curators J
• Background: R&D projects and their results
4
Sasaki – SOAP! 2014
Background 1: LIDER project hLp://lider-‐project.eu/
• EU funded project – aims: – Demonstra<ng the value of mul<lingual linguis<c linked data sources
– Exploring usage scenarios & requirements in various domains
– Crea<ng an R&D roadmap around the topic
5
Sasaki – SOAP! 2014
Background 2: ITS 2.0 hLp://www.w3.org/TR/its20/
• W3C standard to foster mul<lingual content crea<on
• Defines metadata (“data categories”) to support the mul<lingual content life cycle
• A way to interlink Web content and mul<lingual linked data sources
6
Sasaki – SOAP! 2014
ITS 2.0 data categories • Translate • Localiza<on Note • Terminology • Direc<onality • Language Informa<on • Elements Within Text • Domain • Text Analysis • Locale Filter • Provenance
• External Resource • Target Pointer • ID Value • Preserve Space • Localiza<on Quality Issue • Localiza<on Quality
Ra<ng • MT Confidence • Allowed Characters • Storage Size
7
Sasaki – SOAP! 2014
ITS 2.0: High level features
• Can be applied to general XML content and to HTML5
• Par<ally na<vely supported in HTML5 – E.g. HTML5 “translate” aLribute
• Applying data categories – locally: ITS aLributes in content – globally: CSS like selector mechanism, using XPath
• Independent data categories: no need to support (as tool maker or user) everything J
8
Sasaki – SOAP! 2014
Example: “Translate” local and global
9
<p>The <span translate=no>World Wide Web Consor<um</span> is making the World Wide Web worldwide!</p>
<its:rules xmlns:its="hLp://www.w3.org/2005/11/its" version="2.0"> <its:translateRule selector="//h:code" translate="no" xmlns:h="hLp://www.w3.org/1999/xhtml"/> </its:rules>
Sasaki – SOAP! 2014
Example: “Localiza<on Note”
10
<data its:locNote="%1\$s is the original text's date in the format YYYY-‐MM-‐DD HH:MM always in GMT" …> <value>Translated from English content dated <span id="version-‐info">%1\$s</span> GMT.</value> </data>
Sasaki – SOAP! 2014
Example: “Elements within Text”
11
<text xmlns:its="hLp://www.w3.org/2005/11/its" its:version="2.0"> <body> <par>Text with <bold its:withinText="yes">bold</bold>.</par> </body> </text>
Sasaki – SOAP! 2014
Example: “Locale Filter”
12
<book xmlns:its="hLp://www.w3.org/2005/11/its"> <info> <legalno<ce its:localeFilterList="en-‐CA, fr-‐CA"> <para>This legal no<ce is only for English and French Canadian locales.</para> </legalno<ce> </info> </book>
Sasaki – SOAP! 2014
Example: “Allowed Characters”
13
<p>Login names can only use leLers from A to Z (upper or lowercase) and the character underscore (_) and minus (-‐). For example: <code its-‐allowed-‐characters=[a-‐zA-‐Z_\-‐]>Huck_Finn</code>.</p>
Sasaki – SOAP! 2014
Example: “Terminology”
14
<p>And he said: you need a new <quote its:term="yes" its:termInfoRef ="hLp://www.directron.com/motherboards1.html" its:termConfidence="0.5">motherboard</quote></p>
Sasaki – SOAP! 2014
Example: “MT Confidence”
15
<body its-‐annotators-‐ref="mt-‐confidence|file:///tools.xml#T1"> <p> <span its-‐mt-‐confidence=0.8982>Dublin is the capital of Ireland.</span>
Sasaki – SOAP! 2014
Example: “Provenance”
16
<p its-‐tool-‐ref="hLp://www.onlinemtex.com/2012/7/25/wsdl/" its-‐org="acme-‐CAT-‐v2.3" its-‐prov-‐ref="hLp://www.examplelsp.com/excontent987/produc<on/prov/e6354" its-‐rev-‐org="acme-‐CAT-‐v2.3" >This paragraph was translated from the machine.</p>
Sasaki – SOAP! 2014
Example: “Localiza<on Quality Issue”
17
<p> <span data-‐mytool-‐qacode=named_en<ty_not_found its-‐loc-‐quality-‐issue-‐comment="Should be Thomas Cahill.” its-‐loc-‐quality-‐issue-‐profile-‐ref=hLp://example.org/qaMovel/v1 its-‐loc-‐quality-‐issue-‐severity=100 its-‐loc-‐quality-‐issue-‐type=inconsistent-‐en<<es> Chris<an Bale</span> (1867–1934) conceived of an instrument … </p>
Sasaki – SOAP! 2014
Example: “Text Analysis”
• Iden<fy concepts in content, like named en<<es – persons, places, events, …
• Store iden<fiers in (Web) content • Provide a link to mul<lingual linked data sources – a basis for content cura<on
18
Sasaki – SOAP! 2014
Example: “Text Analysis”
19
<p><span its-‐ta-‐confidence="0.7" its-‐ta-‐class-‐ref="hLp://nerd.eurecom.fr/ontology#Loca<on" its-‐ta-‐ident-‐ref="hLp://dbpedia.org/resource/Dublin" >Dublin</span> is the <span its-‐ta-‐source="Wordnet3.0" its-‐ta-‐ident="301467919" its-‐ta-‐confidence="0.5" >capital</span> of Ireland.</p>
Sasaki – SOAP! 2014
What content authors can do with mul<lingual linked data sources and ITS 2.0
• Add value to content beyond the content itself • Curate content: provide iden<fiers, context, cross lingual informa<on
• Tool examples: 1) Genera<on of ITS 2.0 “Text Analysis” for ePub, and
Schema.org markup 2) Genera<on of transla<on sugges<ons 3) Working with linked data in the browser – without
understanding details
20
Sasaki – SOAP! 2014
TOOLING 1): GENERATION OF ITS 2.0 “TEXT ANALYSIS” AND SCHEMA.ORG MARKUP FOR EPUB
21
Sasaki – SOAP! 2014
Setup
• oXygen XML editor, modified for ePub / XHTML5 author mode
• Input: ePub or XHTML5 documents • Output: documents enriched with Schema.org structured informa<on
• User does informa<on genera<on in a WYSIYWG mode
22
Sasaki – SOAP! 2014
Process
1. Automa<c genera<on of en<ty annota<on, using DBpedia spotlight, producing DBpedia iden<fiers
2. Access to DBpedia informa<on with pre-‐defined linked data queries
3. Genera<on of Schema.org markup
23
Sasaki – SOAP! 2014
1. Automa<c genera<on of en<ty annota<on
• Input: <p>Welcome to Dublin in Ireland, the home of Samuel BeckeL.</p>
24
Sasaki – SOAP! 2014
1. Automa<c genera<on of en<ty annota<on
• Output, stored with ITS 2.0 “Text Analysis” markup:
<p>Welcome to <span its-‐ta-‐ident-‐ref="hLp://dbpedia.org/resource/Dublin" ...>Dublin</span> in Ireland, the home of <span its-‐ta-‐ident-‐ref="hLp://dbpedia.org/resource/Samuel_BeckeL" ...>Samuel BeckeL</span>.</p>
25
Sasaki – SOAP! 2014
2. Access to DBpedia informa<on
• Using DBpedia iden<fiers from previous steps in linked data query templates. Example query (part of the query), checking whether en<ty is a person:
SELECT ?birthPlace ... WHERE{ <hLp://dbpedia.org/resource/Samuel_BeckeL> rdf:type foaf:Person. ... }
26
Sasaki – SOAP! 2014
3. Genera<on of Schema.org structured informa<on
• Using output of previous step (query result) • Genera<ng Schema.org structured informa<on – Taking types derived from DBpedia into account, currently • hLp://schema.org/Person • hLp://schema.org/Place
27
Sasaki – SOAP! 2014
3. Genera<on of Schema.org structured informa<on
• Input: linked data query result and marked-‐up document
<p>Welcome to <span its-‐ta-‐ident-‐ref="hLp://dbpedia.org/resource/Dublin" ...>Dublin</span> in Ireland, the home of <span its-‐ta-‐ident-‐ref="hLp://dbpedia.org/resource/Samuel_BeckeL" ...>Samuel BeckeL</span>.</p>
28
Sasaki – SOAP! 2014
3. Genera<on of Schema.org structured informa<on
• Output: marked-‐up document with Schema.org structured informa<on
<p>Welcome to <span ... itemscope="" itemtype="hLp://schema.org/Place"> <a itemprop="url" href=" hLp://en.wikipedia.org/wiki/Dublin"><span itemprop="name" >Dublin</span></a></span>…</p>
29
Sasaki – SOAP! 2014
3. Genera<on of Schema.org structured informa<on
• Output: auto-‐genera<ng markup + text <p>... Samuel BeckeL ... (born in <span itemscope="" itemtype="hLp://schema.org/Place"> <a itemprop="url" href="hLp://en.wikipedia.org/wiki/Foxrock"> <span itemprop="name" >Foxrock</span></a></span>)</p>
30
Sasaki – SOAP! 2014
Checking output with Structured Data Tes<ng Tool
31
Sasaki – SOAP! 2014
Broad review: a view of schema.org types that may work well
Book (dbpedia-‐owl:Book) City (dbpedia-‐owl:City) Country (dbpedia-‐owl:Country) Event (dbpedia-‐owl:Event) Hotel (dbpedia-‐owl:Hotel) Library (dbpedia-‐owl:Library) Movie (dbpedia-‐owl:Film) Person (foaf:Person) Place (dbpedia-‐owl:Place) Organiza<on (dbpedia-‐owl:Organiza<on)
32
Sasaki – SOAP! 2014
TOOLING 2): GENERATION OF TRANSLATION SUGGESTIONS
33
Sasaki – SOAP! 2014
Genera<ng transla<on sugges<ons
• Input: like before • Steps:
1. En<ty annota<ons (again) 2. Access to DBpedia and Wikidata to get
transla<on sugges<ons 3. Storing the results as a localiza<on note
34
Sasaki – SOAP! 2014
1. Automa<c genera<on of en<ty annota<on
• Output, stored with ITS 2.0 “Text Analysis” markup:
<p>Welcome to <span its-‐ta-‐ident-‐ref="hLp://dbpedia.org/resource/Dublin" ...>Dublin</span> in Ireland, the home of <span its-‐ta-‐ident-‐ref="hLp://dbpedia.org/resource/Samuel_BeckeL" ...>Samuel BeckeL</span>.</p>
35
Sasaki – SOAP! 2014
2. Access to DBpedia and Wikidata to get transla<on sugges<ons
• Get transla<on sugges<on from Dbpedia
SELECT ?o WHERE { <hLp://dbpedia.org/resource/Samuel_BeckeL> rdfs:label ?o }
36
Sasaki – SOAP! 2014
2. Access to DBpedia and Wikidata to get transla<on sugges<ons
• Get transla<on sugges<on from Wikidata
hLp://www.wikidata.org/w/api.php?ac<on= wbgeten<<es& sites=itwiki& <tles=Samuel%20BeckeL
37
Sasaki – SOAP! 2014
3. Storing the results as ITS 2.0 localiza<on note
• Input: DBpedia + Wikidata query result and marked-‐up document
<p>… the home of <span its-‐ta-‐ident-‐ref="hLp://dbpedia.org/resource/Samuel_BeckeL" ...>Samuel BeckeL</span>.</p>
38
Sasaki – SOAP! 2014
3. Storing the results as localiza<on note
• Output: Transla<on sugges<ons stored as localiza<on note
<p>… the home of <span its-‐ta-‐ident-‐ref="hLp://dbpedia.org/resource/Samuel_BeckeL" its-‐loc-‐note=" TRANSLATION SUGGESTIONS: 1) wikidata:サミュエル・ベケット 2) dbpedia:サミュエル・ベケット" ...>Samuel BeckeL</span>.</p>
39
Sasaki – SOAP! 2014
TOOLING 3: WORKING WITH LINKED DATA IN THE BROWSER – WITHOUT UNDERSTANDING DETAILS
40
Sasaki – SOAP! 2014
MLOD4CON
• Working with links to external mul<lingual data sources
• Under the hood: lot’ of technology – ITS 2.0, RDF, SPARQL, JavaScript, …
• Good news: the user does not need to know about these J
Demo at hLp://www.w3.org/People/fsasaki/mlod4con/
41
Sasaki – SOAP! 2014
EVERYTHING DONE?
42
Sasaki – SOAP! 2014
Issues
• Learn from communi<es what they want to do with ITS 2.0 and linked data sources – Content creators and content architects, translators, XML / Web tool makers, researchers in the data and language technology area, …
• Provide adequate tooling • Look carefully into requirements: “Too much informa<on is no informa<on!”
43
Sasaki – SOAP! 2014
What next for you? • ITS 2.0 Tooling hLps://www.w3.org/Interna<onal/its/wiki/ITS_Implementa<ons
• Videos explaining ITS 2.0 usage hLps://www.youtube.com/user/W3CITS20/videos • Linked Data for Language Technology Community Group: discuss use cases and requirements for mul<lingual linked data
hLp://www.w3.org/community/ld4lt/ • ITS Interest Group: Join the community of ITS 2.0 users and implementers
hLps://www.w3.org/Interna<onal/its/ig/
44
Sasaki – SOAP! 2014
Value Beyond Content Crea<on: Introducing ITS 2.0
Felix Sasaki DFKI / W3C Fellow
Slides at hLp://www.w3.org/Talks/2014/1003-‐soap-‐sasaki.pdf
45