Zanichelli XML-based Dictionaries Editing System Daniele
Fusi
Slide 2
1 - System Requirements Multiple presentations, legacy content,
operating environment
Slide 3
One content, multiple presentations data cd-rom / dvd web sites
or services paper books e-books
Slide 4
Existing environment: requirements authors accustomed toWYSIWYG
editing in Word processors no technical training IT point of view
text as a database query and interactivity multiple media and forms
editors content validation and uniformation text-based tools simple
content structure designers DTP pagination flattened structure
import / export
Slide 5
Existing content: conversion word processor documents 3rd party
formats
Slide 6
Digital format requirements text-based storage, both machine-
and user-readable using standard technologies (portable &
durable) open to expansion and customization easy to manipulate
easy to transform for import/export focused on semantics: content
rather than its presentation
Slide 7
Content and semantics: dictionary... lemma:
Slide 8
Marking semantics in text: fields lemma morphology etymon
translation sample work etc...
Slide 9
Semantic markup: applications lemma morphology etymon
translation sample work alphabetical lemmata list, normal or
inverted list of lemmata grouped by grammatical category list of
lemmata grouped by etymon (roots dictionary) rudimentary
bidirectional dictionary look for quotation list of quoted works
and authors etc... complex searches lemma morphology etymon work
etc...
Slide 10
2 Solution overview XML-based implementation
Slide 11
Implementation: XML XML Dictionary Unicode text files widely
used standard built for openness and transformation (XSLT)
representation of any kind of data, independently from their
presentation hierarchical model well-fit to hierarchical model:
letter, lemma, fields typically stored as text for existing works
dictionary letter lemma field
Slide 12
Sample: lemma and fields lemma = dizionrio date = 1965 grammar
= s.m. translation 1 = complesso dei lemmi di un dizionario e sim.
separator translation 2 = lista dei lemmi dizionrio [1965] s.m.
complesso dei lemmi di un dizionario e sim. lista dei lemmi
Slide 13
dictionary translation separator translation grammar date
Hierarchical structure lemma lemmata... letter letters...
Slide 14
Minimalist structure Flat, yet extensible smallest depth
satisfies practical requirements fields vary at will accor- ding to
the dictionary language and type variability of fields compensates
for relatively flat hierarchy dictionary letter lemma field...
Slide 15
Structure and compromises Practical devices fields define lemma
parts: etymon, translation, grammar, samples,... formatting is
automatically derived from semantic structure (lemma = bold,
grammar = italic, author = smallcaps,...) text escapes define
specific formatting for portions of field values, whenever they are
not considered as semantically relevant I came by cab Focus on
semantics 1 field (sample) in lemma: hierarchy needs not to be
deeper, yet allow emphasis on by
Slide 16
Storage: data XML files: one file per letter each dictionary
has its own alphabet and sorting scheme lemmata: automatically
inserted in the proper file and at the proper position according to
their content lemma ID overriding for special sorting XML files
(letters) lemma ct (du)acote ABSabiesse 10 minutestenminutes
Slide 17
Storage: metadata self-descriptive dictionary: additional XML
files define: fields list and types within each dictionary alphabet
and sort order for each dictionary, including diacritics
sensitivity other support dictionary- specific resources (e.g.
frequently typed symbols, preview styles) prelemma etymon
abbreviation phonetics translation variant grammar category (A, B,
C...) section (1, 2, 3...) separator ( ... )... abcd efghijkl mn
oprstuvz croatian
Slide 18
3 Editing Authors
Slide 19
Visual Editing visual UI: authors build lemmata visually by
blocks, and are shielded from underlying XML code XML code
integrity is granted by software typographical preview is provided
for WYSIWYG accustomed authors XML data file = letter letter
lemmata fields XML metadata
Slide 20
Editing software: editing by blocks lemmata list visual
editing: fields in lemma typographical preview letter selector
Slide 21
Editing in distributed scenarios Web based visual editing
Slide 22
Web: distributed scenario dictionaries are stored centrally in
a web server an ASP.NET web site manages accesses and versioning
for different authors and works visual editing implemented as a
Silverlight RIA, running from authors own computer, yet inside a
web page: desktop-class responsiveness for application true
platform independence (Mac / PC, IE / Mozilla / Safari) no need for
software distribution and installation centralized software
maintenance
Slide 23
Distributed editing SQL database for managing access ASP.NET
server application manages users and works versions Silverlight
application runs on client computer for visual editing XML author
specialized author editor
Slide 24
Visual editing in your web browser lemmata list visual editing:
fields in lemma typographical preview letter selector
Slide 25
4 - Revision Editors
Slide 26
Content revisions and transformations merging different
versions (multiple authors scenarios) editors validation and
uniformation DTP pagination for printing
Slide 27
Automated revision and correction test selection test
description results
Slide 28
5 - Publication Editors
Slide 29
One content, multiple outputs print cd/dvd mobile devices
(Mobipocket) web sites
Slide 30
Extending the model Sample: RTL languages and root-based
dictionaries
Slide 31
Arabic-Italian dictionary clashing RTL/LTR text flows special
alphabetical order: several letters share the same rank different
sorting according to level root-based dictionary: letter root lemma
field existing dictionaries structure must be kept unchanged even
if a deeper hierarchy would be required +0621...+0627 +0628 +062A 1
2 3... roots are sorted according to predefined scheme, lemmata in
roots are arbitrarily sorted by authors
Slide 32
letter Hierarchy depths Other dictionaries Arabic: roots...
lemma... lemma... item = root item = lemma item = lemma = set of
fields item = root = set of fields, some delimiting lemmata
boundaries
Slide 33
Deeper hierarchy illusion: special editor Arabic-X editor XML
structure unchanged: each file is a letter containing items, each
item contains fields items are roots, not lemmata a special field
defines lemmata boundaries whithin each root user sees letters,
roots, lemmata in root, fields in lemmata; XML structure remains
letter-items-fields... lemma... lemma
Slide 34
Specialized editor: Arabic letter selector roots in letter
lemmata in root visual editing: fields in lemma typographical
preview, bidirectional flows
Slide 35
Arabic editor trick: advantages user experience is almost
unchanged (there are 2 lists instead of 1 to choose from for
editing, roots and lemmata) XML structure unchanged: all the other
editorial processes require no change so that the new dictionary
fits into them easily fields variability (already responsible for
structure expandability) makes this trick possible one model,
several views