Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
22 January 2003 1
eBookseBooks and and ePublishingePublishing
Vilas WuwongseAsian Institute of Technology
22 January 2003 2
OutlineOutline
• Introduction• eBook Structure • Metadata• Conclusions
22 January 2003 3
OutlineOutline
• Introduction• eBook Structure • Metadata• Conclusions
22 January 2003 4
What is an What is an eBookeBook? (1)? (1)• “…a digital object that is an
electronic representation of a book. While an eBook can consist of a single page, it is normally thought of as an electronic analog of a multi-page hardcover or paperback book. An eBook may exist in a variety of formats…”
22 January 2003 5
What is an What is an eBookeBook? (2)? (2)• “…short for electronic book. 1. A Literary
Work in the form of a Digital Object, consisting of one or more standard Unique Identifiers, metadata, and a Monographic body of content, intended to be published and accessed electronically. 2. May also refer to the hardware device created for the purpose of reading eBooks(RocketBook, SoftBook, Franklin e-bookman),”
22 January 2003 6
Goal: provision of contentGoal: provision of content
22 January 2003 7
Parties InvolvedParties Involved
• Creators
• Publishers
• Conversion Services
• Distributors
• eRetailers
• Tool Developers
• Device Manufactures
• Software Vendors
• Libraries
• Users
22 January 2003 8
Editing
Printing
Sale
Consumption
Distribution
Pre-press
Creation
Print Business ModelPrint Business Model
22 January 2003 9
eBook/ePublishingeBook/ePublishing Business ModelBusiness Model
Editing
Printing
Delivery
Consumption
Distribution
Pre-press
Creation
• Efficient• Inexpensive• Flexible
22 January 2003 10
OutlineOutline
• Introduction• eBook Structure• Metadata• Conclusions
22 January 2003 11
Open Open eBookeBook (OEB) (OEB) SpecificationSpecification
• OEB defines a standard format for exchanging eBooks between Publishers and eBook Reader Systems.
• The Publisher who invests in converting content to OEB will be guaranteed that numerous systems will be able to display that content.
• OEB was NOT designed to be the format displayed on rendering devices, whether hand-held devices or desktop computers.
• It is expected that any OEB eBook Reading System will convert OEB to a proprietary format before final delivery to the end-user.
22 January 2003 12
OEB is not...OEB is not...• OEB is to eBooks as MP3 is to music
…. NOT!!!
• MP3 is delivered to end-user who uses any player to “render” the music file.
• OEB is processed and some proprietary format is delivered to the end-user.
22 January 2003 13
OEB is not...OEB is not...
• OEB does not deal with digital rights
management
• OEB does not deal with distribution
• OEB does not deal with eCommerce
22 January 2003 14
OEB Compliant OEB Compliant Authoring SystemAuthoring System
OEB FilesContent Authoring System
to OEB format.
The Authoring System converts content from some format...
22 January 2003 15
The “The “xyzxyz” OEB Compliant Reading ” OEB Compliant Reading SystemSystem
OEB Files
•e-Commerce•Security•Encryption•Rights Management•Delivery to “device”
Stage III.Render eBook on a “Device”
Stage I.Convert OEB files to
“xyz” proprietary format for efficient handling and secure delivery
Stage II.“Services”, includingdelivery to “rendering device”
“xyz” proprietary format
“device”
Once upon a time...
22 January 2003 16
The OEB specification goalsThe OEB specification goals•The specification should bolster consumer and publisher confidence in the performance of eBook Readers and the utility of eBooks
•The specification should limit the burden on content providers of adopting the specification – in particular, by exploiting existing data, tools and expertise and ensuring predictable Reading System performance
•The specification should limit burden on Reading System developers – specifically, by defining a reasonable base-line functionality for OEB compliance
•The specification should have an immediate and direct impact on the creation of a flourishing eBook industry
•The specification must align industry practices to scale with emerging standards (particularly XML)
•The specification must include a standardized mechanism for adding features beyond the base functionality
•The specification must support interoperability between vendor systems
•The specification must encourage innovation and competitive differentiation
22 January 2003 17
XMLXML• a W3C standard to complement
HTML• origins: structured text SGML• motivation:
–HTML describes presentation–XML describes content
• http://www.w3.org/TR/REC-xml (2/98)
22 January 2003 18
From HTML to XMLFrom HTML to XML
HTML describes the presentation
22 January 2003 19
HTMLHTML<h1> Bibliography </h1><p> <i> Foundations of Databases </i>
Abiteboul, Hull, Vianu<br> Addison Wesley, 1995
<p> <i> Data on the Web </i>Abiteoul, Buneman, Suciu<br> Morgan Kaufmann, 1999
22 January 2003 20
XMLXML
<bibliography><book> <title> Foundations… </title>
<author> Abiteboul </author><author> Hull </author><author> Vianu </author><publisher> Addison Wesley </publisher><year> 1995 </year>
</book>…
</bibliography>
XML describes the content
22 January 2003 21
XML TerminologyXML Terminology
• tags: book, title, author, …• start tag: <book>, end tag: </book>• elements: <book>…<book>,<author>…</author>• elements are nested• empty element: <red></red> abbrv. <red/>• an XML document: single root element
well formed XML document: if it has matching tags22 January 2003 22
TerminologyTerminologyThe segment of an XML document between an opening and a corresponding closing tag is called an element.
<person><name> Malcolm Atchison </name><tel> (215) 898 4321 </tel><tel> (215) 898 4321 </tel><email> [email protected] </email>
</person>
element
not an elementelement, a sub-elementof
22 January 2003 23
More XML: AttributesMore XML: Attributes
<book price = “55” currency = “USD”><title> Foundations of Databases </title><author> Abiteboul </author>…
<year> 1995 </year></book>
attributes are alternative ways to represent data22 January 2003 24
Schemas in XMLSchemas in XML
• Document Type Definition (DTD)• XML Schema
22 January 2003 25
Document Type Definition: Document Type Definition: DTDDTD
• part of the original XML specification• an XML document may have a DTD• terminology for XML:
–well-formed: if tags are correctly closed–valid: if it has a DTD and conforms to it
• validation is useful in data exchange
22 January 2003 26
DTDsDTDs as Grammarsas Grammars
<!DOCTYPE paper [<!ELEMENT paper (section*)><!ELEMENT section ((title,section*) | text)><!ELEMENT title (#PCDATA)><!ELEMENT text (#PCDATA)>
]>
<!DOCTYPE paper [<!ELEMENT paper (section*)><!ELEMENT section ((title,section*) | text)><!ELEMENT title (#PCDATA)><!ELEMENT text (#PCDATA)>
]>
<paper> <section> <text> </text> </section><section> <title> </title> <section> … </section>
<section> … </section></section>
</paper>
22 January 2003 27
XSL OverviewXSL Overview• For any eBook it is better to separate its content
from its presentation (rendering)
• XSL (XML Stylesheet Language) is a stylesheetspecification language for XML documents
• XSL stylesheets are denoted in XML syntax
• XSL components:
1. a language for transforming XML documents (XSLT: integral part of the XSL specification)
2. an XML formatting vocabulary
22 January 2003 28
XSLT Processing ModelXSLT Processing Model
XML source tree XML, HTML, pdf, text… result tree
XSLT stylesheet
Transformation
22 January 2003 29
OutlineOutline
• Introduction• eBook Structure • Metadata• Conclusions
22 January 2003 30
MetadataMetadata
• Metadata is “structured data about data” • Metadata is language and used to:
– Organize and manage content– Support discovery of resources– Filter and direct content in channels– Enable automated discovery and manipulation
of resources
• As the eBook industry grows, metadata becomes more important
22 January 2003 31
eBookeBook Metadata IssuesMetadata Issues
• Who provides metadata? – author? “publisher”? professional cataloger?
extracted from content?
• Is metadata “integrated” with data?– related question: is metadata a first class object?
• Formats!– which ones?– extensible?– paradox: the more powerful the format, the less
likely it will be used...
22 January 2003 32
Metadata FormatsMetadata Formats
• MARC is very rich– good candidate for an “archival” metadata format,
from which simpler formats can be derived
• Dublin Core designed to be simple enough for the average author to generate by hand– only 15 core fields defined
• Other formats defined for specific purposes:– BibTeX: TeX/LaTeX publishing– RFC-1807: email exchange
22 January 2003 33
MARC MARC Leader: : 01663ngm 22002771 4500:
005: : 19950927090218. 0 :007: :vducgaiuu:008: : 950927s1993 mau--- d vlfre d :
Ctrl Numb 001 200312310Cntl Iden 003 OBgNWOETISBN Numb 020 -- a 0300056958Catl Orig 040 __ a OBgNWOETTran c OBgNWOETLang Summ 041 -- b freTitl Main 245 00 a A la recontre de Philippe
GMD h [videorecording] /Resp c Massachusetts Institute of Technology ; written by
Gilberte Furstenburg ; directed by Janet H. Murray ; software programmed by Stuart
A. Malone.Pubn City 260 __ a Cambridge, MA :
Publ b dist. by Annenberg/CPB.,Date c 1993.
Desc Extn 300 __ a 1 laserdisc (CAV) :Othr b sd., col. :Dimn c 12 in. +Accm e Teacher guide + 3 computer disks.
Note Genl 500 __ a Issued as videodisc.Note Genl 500 __ a Title from cover.Note Summ 520 __ a Provides an engaging way to sharpen
comprehension skills. Students navigate through Paris neighborhoods and shops,dealing with friends, tradespeople, telephones and answering machines with the goal of finding an apartment for the hapless Philippe. Includes many helpful tools,such as self-testing exercises and an electronic glossary, visual and audio resources,including maps, telephones and newspapers which help students function within the story. Teachers can customize the program according to their students levels and abilities.
Note Targ 521 2_ a Senior high and college.Note Targ 521 2_ a 09-adult.Note Tech 538 -- a Macintosh computer ; system 6.0 or later ; 2 MB ofRAM ; 3.5 MB of hard disk space ; videodisc player ; video monitor.Subj Topc 650 _0 a Languages, Modern.Subj Topc 650 _0 a Language and languages.Subj Topc 658 _7 a Foreign languages, French.
Srce 2 nwoetLocn Coll 852 1_ a OBgNWOET
SubA b Northwest Ohio Media CenterClas h 200312310BarC p 200312310
from:http://m27-5.bgsu.edu/nwoetf/marc/phillippe.html
22 January 2003 34
BibTeXBibTeX
@InProceedings{dha96:pods,author = {Chanda Dharap and C. Mic Bowman},title = {Typed Structured Documents for Information
Retrieval},booktitle = {Third International Workshop on Principles of
DocumentProcessing}
year = 1996,month = sep,address = {Palo Alto, California}
}
from:http://www.transarc.com/afs/transarc.com/public/mic/html/Bio.html
22 January 2003 35
RFCRFC--18071807BIB-VERSION:: CS-TR-v2.1ID:: OUKS//CS-TR-91-123
ENTRY:: January 15, 1992ORGANIZATION:: Oceanview University, Kansas, Computer Science
TYPE:: Technical ReportREVISION:: January 5, 1995; FTP access information added
TITLE:: Scientific Communication must be timelyAUTHOR:: Finnegan, James A.CONTACT:: Prof. J. A. Finnegan, CS Dept, Oceanview Univ,
Oceanview, KS 54321 Tel: 913-456-7890<[email protected]>
AUTHOR:: Pooh, Winnie TheCONTACT:: 100 Aker Wood
DATE:: December 1991PAGES:: 48
COPYRIGHT:: Copyright for the report (c) 1991, by J. A.Finnegan. All rights reserved. Permission is grantedfor any academic use of the report.
HANDLE:: hdl:oceanview.electr/CS-TR-91-123OTHER_ACCESS:: url:http://electr.oceanview.edu/CS-TR-91-123OTHER_ACCESS:: url:ftp://electr.oceanview.edu/CS-TR-91-123
RETRIEVAL:: send email to [email protected] with fax numberKEYWORD:: Scientific Communication
CR-CATEGORY:: D.0CR-CATEGORY:: C.2.2 Computer Sys Org, Communication nets, Net
ProtocolsSERIES:: CommunicationFUNDING:: FAS
CONTRACT:: FAS-91-C-1234MONITORING:: FNBOLANGUAGE:: English
NOTES:: This report is the full version of the paper withthe same title in IEEE Trans ASSP Dec 1976
ABSTRACT::
Many alchemists in the country work on important fusion problems.All of them cooperate and interact with each other through thescientific literature. This scientific communication methodologyhas many advantages. Timeliness is not one of them.
END:: OUKS//CS-TR-91-123
from:http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1807.txt
22 January 2003 36
A Grammar of Dublin CoreA Grammar of Dublin Core• DC: a language used to describe metadata• Simpler than natural language, but easy
to learn and useful in practice• Pidgins: small vocabularies (Dublin Core:
fifteen special nouns and some optional adjectives)
• Simple grammars: sentences (statements) follow a simple fixed pattern
• http://www.dlib.org/dlib/october00/baker/10baker.html
22 January 2003 37
DC Simple GrammarDC Simple Grammar
• {Resource}/{has}/PROPERTY/X.wherePROPERTY, a noun, is one of DC 15 elements with optional qualifiers served as adjectives,X is property value.
• Example: These slides/have/Revised DC:DATE/29 August 2002.
22 January 2003 38
DC QualifiersDC Qualifiers• The fifteen elements should be
usable and understandable with or without the qualifiers
• Qualifiers refine meaning (but may be harder to understand)
• Nouns can stand on their own without adjectives
22 January 2003 39
Metadata is LanguageMetadata is Language• Metadata is a language for making
statements about resources:–Book has TITLE “Harry Porter”–Web page has PUBLISHER “AIT”
• Vocabulary terms (elements) are defined in standards like Dublin Core
• Metadata grammars constrain the statements and data models one can form
22 January 2003 40
The 15 Special Nouns The 15 Special Nouns (Properties)(Properties)
Creator Title Subject
Contributor Date Description
Publisher Type Format
Coverage Rights Relation
Source Language Identifier
22 January 2003 41
Dublin Core, XMLDublin Core, XML--encodedencoded
<?xml version="1.0"?><!DOCTYPE rdf:RDF SYSTEM "http://purl.org/dc/schemas/dcmes-xml-20000714.dtd"><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc ="http://purl.org/dc/elements/1.1/"><rdf:Description about="http://foo.edu/dl/report-1">
<dc:title>Perpetual Motion Machine</dc:title><dc:description>This report redefines physics.</dc:description><dc:date>1998-10-10</dc:date><dc:format>text/html</dc:format><dc:language>en</dc:language><dc:contributor>Kant, B. Reproduced</dc:contributor>
</rdf:Description></rdf:RDF>
example adapted from: http://www.purl.org/dc/documents/wd/dcmes-xml-20000714.htm
22 January 2003 42
Qualified Dublin Core: Qualified Dublin Core: Element RefinementsElement Refinements
• extend the core elements of DC with domain-specific qualifications
• Make the meaning of an element narrower or more specific–A Date Created versus a Date Modified–An IsReplacedBy Relation versus a
Replaces Relation• If your software does not understand
the qualifier, you can safely ignore it
22 January 2003 43
RDFRDF• http://www.w3.org/TR/REC-rdf-
syntax (2/99)• purpose: metadata for Web
–help search engines
• syntax in XML• semantics: edge-labeled graphs
22 January 2003 44
RDF SyntaxRDF Syntax
<rdf:Description about=“www.mypage.com”><about> birds, butterflies, snakes </about><author> <rdf:Description>
<firstname> John </firstname><lastname> Smith </lastname>
</rdf:Description></author>
</rdf:Description>
22 January 2003 45
RDF Data ModelRDF Data Model
www.mypage.com
birds, butterflies, snakes
John Smith
about author
firstname lastname
22 January 2003 46
More RDF ExamplesMore RDF Examples
www.mypage.com
birds, butterflies, snakes
John Smith
about author
firstname lastname
www.anotherpage.com
author
related
Joe Doe
author
22 January 2003 47
<rdf:Description about=“www.mypage.com”><about> birds, butterflies, snakes </about><author> <rdf:Description ID=“&o55”>
<firstname> John </firstname><lastname> Smith </lastname>
</rdf:Description> </author></rdf:Description>
<rdf:Description about=“www.anotherpage.com”><related> <rdf:Description about=“www.mypage.com”/> </related><author rdf:resource=“&o55”/> <author> Joe Doe </author>
</rdf:Description>22 January 2003 48
RDF TerminologyRDF Terminology
subject
object
predicate
statement
22 January 2003 49
More RDF: Higher Order More RDF: Higher Order StatementsStatements
“the author of www.thispage.com says: ‘the topic of www.thatpage.com is environment’ “
www.thatpage.com
environment
topic
www.thispage.com
saysauthor
RDF uses reification22 January 2003 50
Simple DC in RDF ExampleSimple DC in RDF Example
Page.html
John Smith
John’s Home Page
DC: Creator
DC: Title
22 January 2003 51
Simple DC RDF Example Simple DC RDF Example -- 11bb
<RDF:RDF>
<RDF:Description RDF:HREF = “page.html”>
<DC:Creator> John Smith </DC:Creator>
<DC:Title> John’s Home Page </DC:Title>
</RDF:Description>
</RDF:RDF>
22 January 2003 52
Selling vs. LicensingSelling vs. Licensing• Encryption• Migration• Taking a Portion• What does it mean “to sell” an
ebook?
22 January 2003 53
OutlineOutline
• Introduction• eBook Structure • Metadata• Conclusions
22 January 2003 54
Conclusions (1)Conclusions (1)• Although there has not existed a unique
standard for eBooks, some common basic technologies have emerged, e.g., XML, XSL, RDF and DC
• Authors and publishers could employ these technologies to create their eBooksand be assured that the eBooks, through some transformation, would be readable and accessible by any device
22 January 2003 55
Conclusions (2)Conclusions (2)• Other issues:
–Digital Rights Management• “enables digital commerce”• “protection of digital content”• “secure ebook distribution”• “ensures content authenticity”• “participant identification”
– Licensing– Inter-Library Loan–Business Models for Selling eBooks