14
22 January 2003 1 eBooks eBooks and and ePublishing ePublishing Vilas Wuwongse Asian Institute of Technology 22 January 2003 2 Outline Outline • Introduction • eBook Structure • Metadata • Conclusions 22 January 2003 3 Outline Outline • Introduction • eBook Structure • Metadata • Conclusions 22 January 2003 4 What is an What is an eBook eBook ? (1) ? (1) • “…a digital object that is an electronic representation of a book. While an eBook can consist of a single page, it is normally thought of as an electronic analog of a multi- page hardcover or paperback book. An eBook may exist in a variety of formats…”

Outline What is an eBook? (1) · • eBook Structure • Metadata • Conclusions 22 January 2003 3 Outline • Introduction • eBook Structure • Metadata • Conclusions 22 January

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Outline What is an eBook? (1) · • eBook Structure • Metadata • Conclusions 22 January 2003 3 Outline • Introduction • eBook Structure • Metadata • Conclusions 22 January

22 January 2003 1

eBookseBooks and and ePublishingePublishing

Vilas WuwongseAsian Institute of Technology

22 January 2003 2

OutlineOutline

• Introduction• eBook Structure • Metadata• Conclusions

22 January 2003 3

OutlineOutline

• Introduction• eBook Structure • Metadata• Conclusions

22 January 2003 4

What is an What is an eBookeBook? (1)? (1)• “…a digital object that is an

electronic representation of a book. While an eBook can consist of a single page, it is normally thought of as an electronic analog of a multi-page hardcover or paperback book. An eBook may exist in a variety of formats…”

Page 2: Outline What is an eBook? (1) · • eBook Structure • Metadata • Conclusions 22 January 2003 3 Outline • Introduction • eBook Structure • Metadata • Conclusions 22 January

22 January 2003 5

What is an What is an eBookeBook? (2)? (2)• “…short for electronic book. 1. A Literary

Work in the form of a Digital Object, consisting of one or more standard Unique Identifiers, metadata, and a Monographic body of content, intended to be published and accessed electronically. 2. May also refer to the hardware device created for the purpose of reading eBooks(RocketBook, SoftBook, Franklin e-bookman),”

22 January 2003 6

Goal: provision of contentGoal: provision of content

22 January 2003 7

Parties InvolvedParties Involved

• Creators

• Publishers

• Conversion Services

• Distributors

• eRetailers

• Tool Developers

• Device Manufactures

• Software Vendors

• Libraries

• Users

22 January 2003 8

Editing

Printing

Sale

Consumption

Distribution

Pre-press

Creation

Print Business ModelPrint Business Model

Page 3: Outline What is an eBook? (1) · • eBook Structure • Metadata • Conclusions 22 January 2003 3 Outline • Introduction • eBook Structure • Metadata • Conclusions 22 January

22 January 2003 9

eBook/ePublishingeBook/ePublishing Business ModelBusiness Model

Editing

Printing

Delivery

Consumption

Distribution

Pre-press

Creation

• Efficient• Inexpensive• Flexible

22 January 2003 10

OutlineOutline

• Introduction• eBook Structure• Metadata• Conclusions

22 January 2003 11

Open Open eBookeBook (OEB) (OEB) SpecificationSpecification

• OEB defines a standard format for exchanging eBooks between Publishers and eBook Reader Systems.

• The Publisher who invests in converting content to OEB will be guaranteed that numerous systems will be able to display that content.

• OEB was NOT designed to be the format displayed on rendering devices, whether hand-held devices or desktop computers.

• It is expected that any OEB eBook Reading System will convert OEB to a proprietary format before final delivery to the end-user.

22 January 2003 12

OEB is not...OEB is not...• OEB is to eBooks as MP3 is to music

…. NOT!!!

• MP3 is delivered to end-user who uses any player to “render” the music file.

• OEB is processed and some proprietary format is delivered to the end-user.

Page 4: Outline What is an eBook? (1) · • eBook Structure • Metadata • Conclusions 22 January 2003 3 Outline • Introduction • eBook Structure • Metadata • Conclusions 22 January

22 January 2003 13

OEB is not...OEB is not...

• OEB does not deal with digital rights

management

• OEB does not deal with distribution

• OEB does not deal with eCommerce

22 January 2003 14

OEB Compliant OEB Compliant Authoring SystemAuthoring System

OEB FilesContent Authoring System

to OEB format.

The Authoring System converts content from some format...

22 January 2003 15

The “The “xyzxyz” OEB Compliant Reading ” OEB Compliant Reading SystemSystem

OEB Files

•e-Commerce•Security•Encryption•Rights Management•Delivery to “device”

Stage III.Render eBook on a “Device”

Stage I.Convert OEB files to

“xyz” proprietary format for efficient handling and secure delivery

Stage II.“Services”, includingdelivery to “rendering device”

“xyz” proprietary format

“device”

Once upon a time...

22 January 2003 16

The OEB specification goalsThe OEB specification goals•The specification should bolster consumer and publisher confidence in the performance of eBook Readers and the utility of eBooks

•The specification should limit the burden on content providers of adopting the specification – in particular, by exploiting existing data, tools and expertise and ensuring predictable Reading System performance

•The specification should limit burden on Reading System developers – specifically, by defining a reasonable base-line functionality for OEB compliance

•The specification should have an immediate and direct impact on the creation of a flourishing eBook industry

•The specification must align industry practices to scale with emerging standards (particularly XML)

•The specification must include a standardized mechanism for adding features beyond the base functionality

•The specification must support interoperability between vendor systems

•The specification must encourage innovation and competitive differentiation

Page 5: Outline What is an eBook? (1) · • eBook Structure • Metadata • Conclusions 22 January 2003 3 Outline • Introduction • eBook Structure • Metadata • Conclusions 22 January

22 January 2003 17

XMLXML• a W3C standard to complement

HTML• origins: structured text SGML• motivation:

–HTML describes presentation–XML describes content

• http://www.w3.org/TR/REC-xml (2/98)

22 January 2003 18

From HTML to XMLFrom HTML to XML

HTML describes the presentation

22 January 2003 19

HTMLHTML<h1> Bibliography </h1><p> <i> Foundations of Databases </i>

Abiteboul, Hull, Vianu<br> Addison Wesley, 1995

<p> <i> Data on the Web </i>Abiteoul, Buneman, Suciu<br> Morgan Kaufmann, 1999

22 January 2003 20

XMLXML

<bibliography><book> <title> Foundations… </title>

<author> Abiteboul </author><author> Hull </author><author> Vianu </author><publisher> Addison Wesley </publisher><year> 1995 </year>

</book>…

</bibliography>

XML describes the content

Page 6: Outline What is an eBook? (1) · • eBook Structure • Metadata • Conclusions 22 January 2003 3 Outline • Introduction • eBook Structure • Metadata • Conclusions 22 January

22 January 2003 21

XML TerminologyXML Terminology

• tags: book, title, author, …• start tag: <book>, end tag: </book>• elements: <book>…<book>,<author>…</author>• elements are nested• empty element: <red></red> abbrv. <red/>• an XML document: single root element

well formed XML document: if it has matching tags22 January 2003 22

TerminologyTerminologyThe segment of an XML document between an opening and a corresponding closing tag is called an element.

<person><name> Malcolm Atchison </name><tel> (215) 898 4321 </tel><tel> (215) 898 4321 </tel><email> [email protected] </email>

</person>

element

not an elementelement, a sub-elementof

22 January 2003 23

More XML: AttributesMore XML: Attributes

<book price = “55” currency = “USD”><title> Foundations of Databases </title><author> Abiteboul </author>…

<year> 1995 </year></book>

attributes are alternative ways to represent data22 January 2003 24

Schemas in XMLSchemas in XML

• Document Type Definition (DTD)• XML Schema

Page 7: Outline What is an eBook? (1) · • eBook Structure • Metadata • Conclusions 22 January 2003 3 Outline • Introduction • eBook Structure • Metadata • Conclusions 22 January

22 January 2003 25

Document Type Definition: Document Type Definition: DTDDTD

• part of the original XML specification• an XML document may have a DTD• terminology for XML:

–well-formed: if tags are correctly closed–valid: if it has a DTD and conforms to it

• validation is useful in data exchange

22 January 2003 26

DTDsDTDs as Grammarsas Grammars

<!DOCTYPE paper [<!ELEMENT paper (section*)><!ELEMENT section ((title,section*) | text)><!ELEMENT title (#PCDATA)><!ELEMENT text (#PCDATA)>

]>

<!DOCTYPE paper [<!ELEMENT paper (section*)><!ELEMENT section ((title,section*) | text)><!ELEMENT title (#PCDATA)><!ELEMENT text (#PCDATA)>

]>

<paper> <section> <text> </text> </section><section> <title> </title> <section> … </section>

<section> … </section></section>

</paper>

22 January 2003 27

XSL OverviewXSL Overview• For any eBook it is better to separate its content

from its presentation (rendering)

• XSL (XML Stylesheet Language) is a stylesheetspecification language for XML documents

• XSL stylesheets are denoted in XML syntax

• XSL components:

1. a language for transforming XML documents (XSLT: integral part of the XSL specification)

2. an XML formatting vocabulary

22 January 2003 28

XSLT Processing ModelXSLT Processing Model

XML source tree XML, HTML, pdf, text… result tree

XSLT stylesheet

Transformation

Page 8: Outline What is an eBook? (1) · • eBook Structure • Metadata • Conclusions 22 January 2003 3 Outline • Introduction • eBook Structure • Metadata • Conclusions 22 January

22 January 2003 29

OutlineOutline

• Introduction• eBook Structure • Metadata• Conclusions

22 January 2003 30

MetadataMetadata

• Metadata is “structured data about data” • Metadata is language and used to:

– Organize and manage content– Support discovery of resources– Filter and direct content in channels– Enable automated discovery and manipulation

of resources

• As the eBook industry grows, metadata becomes more important

22 January 2003 31

eBookeBook Metadata IssuesMetadata Issues

• Who provides metadata? – author? “publisher”? professional cataloger?

extracted from content?

• Is metadata “integrated” with data?– related question: is metadata a first class object?

• Formats!– which ones?– extensible?– paradox: the more powerful the format, the less

likely it will be used...

22 January 2003 32

Metadata FormatsMetadata Formats

• MARC is very rich– good candidate for an “archival” metadata format,

from which simpler formats can be derived

• Dublin Core designed to be simple enough for the average author to generate by hand– only 15 core fields defined

• Other formats defined for specific purposes:– BibTeX: TeX/LaTeX publishing– RFC-1807: email exchange

Page 9: Outline What is an eBook? (1) · • eBook Structure • Metadata • Conclusions 22 January 2003 3 Outline • Introduction • eBook Structure • Metadata • Conclusions 22 January

22 January 2003 33

MARC MARC Leader: : 01663ngm 22002771 4500:

005: : 19950927090218. 0 :007: :vducgaiuu:008: : 950927s1993 mau--- d vlfre d :

Ctrl Numb 001 200312310Cntl Iden 003 OBgNWOETISBN Numb 020 -- a 0300056958Catl Orig 040 __ a OBgNWOETTran c OBgNWOETLang Summ 041 -- b freTitl Main 245 00 a A la recontre de Philippe

GMD h [videorecording] /Resp c Massachusetts Institute of Technology ; written by

Gilberte Furstenburg ; directed by Janet H. Murray ; software programmed by Stuart

A. Malone.Pubn City 260 __ a Cambridge, MA :

Publ b dist. by Annenberg/CPB.,Date c 1993.

Desc Extn 300 __ a 1 laserdisc (CAV) :Othr b sd., col. :Dimn c 12 in. +Accm e Teacher guide + 3 computer disks.

Note Genl 500 __ a Issued as videodisc.Note Genl 500 __ a Title from cover.Note Summ 520 __ a Provides an engaging way to sharpen

comprehension skills. Students navigate through Paris neighborhoods and shops,dealing with friends, tradespeople, telephones and answering machines with the goal of finding an apartment for the hapless Philippe. Includes many helpful tools,such as self-testing exercises and an electronic glossary, visual and audio resources,including maps, telephones and newspapers which help students function within the story. Teachers can customize the program according to their students levels and abilities.

Note Targ 521 2_ a Senior high and college.Note Targ 521 2_ a 09-adult.Note Tech 538 -- a Macintosh computer ; system 6.0 or later ; 2 MB ofRAM ; 3.5 MB of hard disk space ; videodisc player ; video monitor.Subj Topc 650 _0 a Languages, Modern.Subj Topc 650 _0 a Language and languages.Subj Topc 658 _7 a Foreign languages, French.

Srce 2 nwoetLocn Coll 852 1_ a OBgNWOET

SubA b Northwest Ohio Media CenterClas h 200312310BarC p 200312310

from:http://m27-5.bgsu.edu/nwoetf/marc/phillippe.html

22 January 2003 34

BibTeXBibTeX

@InProceedings{dha96:pods,author = {Chanda Dharap and C. Mic Bowman},title = {Typed Structured Documents for Information

Retrieval},booktitle = {Third International Workshop on Principles of

DocumentProcessing}

year = 1996,month = sep,address = {Palo Alto, California}

}

from:http://www.transarc.com/afs/transarc.com/public/mic/html/Bio.html

22 January 2003 35

RFCRFC--18071807BIB-VERSION:: CS-TR-v2.1ID:: OUKS//CS-TR-91-123

ENTRY:: January 15, 1992ORGANIZATION:: Oceanview University, Kansas, Computer Science

TYPE:: Technical ReportREVISION:: January 5, 1995; FTP access information added

TITLE:: Scientific Communication must be timelyAUTHOR:: Finnegan, James A.CONTACT:: Prof. J. A. Finnegan, CS Dept, Oceanview Univ,

Oceanview, KS 54321 Tel: 913-456-7890<[email protected]>

AUTHOR:: Pooh, Winnie TheCONTACT:: 100 Aker Wood

DATE:: December 1991PAGES:: 48

COPYRIGHT:: Copyright for the report (c) 1991, by J. A.Finnegan. All rights reserved. Permission is grantedfor any academic use of the report.

HANDLE:: hdl:oceanview.electr/CS-TR-91-123OTHER_ACCESS:: url:http://electr.oceanview.edu/CS-TR-91-123OTHER_ACCESS:: url:ftp://electr.oceanview.edu/CS-TR-91-123

RETRIEVAL:: send email to [email protected] with fax numberKEYWORD:: Scientific Communication

CR-CATEGORY:: D.0CR-CATEGORY:: C.2.2 Computer Sys Org, Communication nets, Net

ProtocolsSERIES:: CommunicationFUNDING:: FAS

CONTRACT:: FAS-91-C-1234MONITORING:: FNBOLANGUAGE:: English

NOTES:: This report is the full version of the paper withthe same title in IEEE Trans ASSP Dec 1976

ABSTRACT::

Many alchemists in the country work on important fusion problems.All of them cooperate and interact with each other through thescientific literature. This scientific communication methodologyhas many advantages. Timeliness is not one of them.

END:: OUKS//CS-TR-91-123

from:http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1807.txt

22 January 2003 36

A Grammar of Dublin CoreA Grammar of Dublin Core• DC: a language used to describe metadata• Simpler than natural language, but easy

to learn and useful in practice• Pidgins: small vocabularies (Dublin Core:

fifteen special nouns and some optional adjectives)

• Simple grammars: sentences (statements) follow a simple fixed pattern

• http://www.dlib.org/dlib/october00/baker/10baker.html

Page 10: Outline What is an eBook? (1) · • eBook Structure • Metadata • Conclusions 22 January 2003 3 Outline • Introduction • eBook Structure • Metadata • Conclusions 22 January

22 January 2003 37

DC Simple GrammarDC Simple Grammar

• {Resource}/{has}/PROPERTY/X.wherePROPERTY, a noun, is one of DC 15 elements with optional qualifiers served as adjectives,X is property value.

• Example: These slides/have/Revised DC:DATE/29 August 2002.

22 January 2003 38

DC QualifiersDC Qualifiers• The fifteen elements should be

usable and understandable with or without the qualifiers

• Qualifiers refine meaning (but may be harder to understand)

• Nouns can stand on their own without adjectives

22 January 2003 39

Metadata is LanguageMetadata is Language• Metadata is a language for making

statements about resources:–Book has TITLE “Harry Porter”–Web page has PUBLISHER “AIT”

• Vocabulary terms (elements) are defined in standards like Dublin Core

• Metadata grammars constrain the statements and data models one can form

22 January 2003 40

The 15 Special Nouns The 15 Special Nouns (Properties)(Properties)

Creator Title Subject

Contributor Date Description

Publisher Type Format

Coverage Rights Relation

Source Language Identifier

Page 11: Outline What is an eBook? (1) · • eBook Structure • Metadata • Conclusions 22 January 2003 3 Outline • Introduction • eBook Structure • Metadata • Conclusions 22 January

22 January 2003 41

Dublin Core, XMLDublin Core, XML--encodedencoded

<?xml version="1.0"?><!DOCTYPE rdf:RDF SYSTEM "http://purl.org/dc/schemas/dcmes-xml-20000714.dtd"><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:dc ="http://purl.org/dc/elements/1.1/"><rdf:Description about="http://foo.edu/dl/report-1">

<dc:title>Perpetual Motion Machine</dc:title><dc:description>This report redefines physics.</dc:description><dc:date>1998-10-10</dc:date><dc:format>text/html</dc:format><dc:language>en</dc:language><dc:contributor>Kant, B. Reproduced</dc:contributor>

</rdf:Description></rdf:RDF>

example adapted from: http://www.purl.org/dc/documents/wd/dcmes-xml-20000714.htm

22 January 2003 42

Qualified Dublin Core: Qualified Dublin Core: Element RefinementsElement Refinements

• extend the core elements of DC with domain-specific qualifications

• Make the meaning of an element narrower or more specific–A Date Created versus a Date Modified–An IsReplacedBy Relation versus a

Replaces Relation• If your software does not understand

the qualifier, you can safely ignore it

22 January 2003 43

RDFRDF• http://www.w3.org/TR/REC-rdf-

syntax (2/99)• purpose: metadata for Web

–help search engines

• syntax in XML• semantics: edge-labeled graphs

22 January 2003 44

RDF SyntaxRDF Syntax

<rdf:Description about=“www.mypage.com”><about> birds, butterflies, snakes </about><author> <rdf:Description>

<firstname> John </firstname><lastname> Smith </lastname>

</rdf:Description></author>

</rdf:Description>

Page 12: Outline What is an eBook? (1) · • eBook Structure • Metadata • Conclusions 22 January 2003 3 Outline • Introduction • eBook Structure • Metadata • Conclusions 22 January

22 January 2003 45

RDF Data ModelRDF Data Model

www.mypage.com

birds, butterflies, snakes

John Smith

about author

firstname lastname

22 January 2003 46

More RDF ExamplesMore RDF Examples

www.mypage.com

birds, butterflies, snakes

John Smith

about author

firstname lastname

www.anotherpage.com

author

related

Joe Doe

author

22 January 2003 47

<rdf:Description about=“www.mypage.com”><about> birds, butterflies, snakes </about><author> <rdf:Description ID=“&o55”>

<firstname> John </firstname><lastname> Smith </lastname>

</rdf:Description> </author></rdf:Description>

<rdf:Description about=“www.anotherpage.com”><related> <rdf:Description about=“www.mypage.com”/> </related><author rdf:resource=“&o55”/> <author> Joe Doe </author>

</rdf:Description>22 January 2003 48

RDF TerminologyRDF Terminology

subject

object

predicate

statement

Page 13: Outline What is an eBook? (1) · • eBook Structure • Metadata • Conclusions 22 January 2003 3 Outline • Introduction • eBook Structure • Metadata • Conclusions 22 January

22 January 2003 49

More RDF: Higher Order More RDF: Higher Order StatementsStatements

“the author of www.thispage.com says: ‘the topic of www.thatpage.com is environment’ “

www.thatpage.com

environment

topic

www.thispage.com

saysauthor

RDF uses reification22 January 2003 50

Simple DC in RDF ExampleSimple DC in RDF Example

Page.html

John Smith

John’s Home Page

DC: Creator

DC: Title

22 January 2003 51

Simple DC RDF Example Simple DC RDF Example -- 11bb

<RDF:RDF>

<RDF:Description RDF:HREF = “page.html”>

<DC:Creator> John Smith </DC:Creator>

<DC:Title> John’s Home Page </DC:Title>

</RDF:Description>

</RDF:RDF>

22 January 2003 52

Selling vs. LicensingSelling vs. Licensing• Encryption• Migration• Taking a Portion• What does it mean “to sell” an

ebook?

Page 14: Outline What is an eBook? (1) · • eBook Structure • Metadata • Conclusions 22 January 2003 3 Outline • Introduction • eBook Structure • Metadata • Conclusions 22 January

22 January 2003 53

OutlineOutline

• Introduction• eBook Structure • Metadata• Conclusions

22 January 2003 54

Conclusions (1)Conclusions (1)• Although there has not existed a unique

standard for eBooks, some common basic technologies have emerged, e.g., XML, XSL, RDF and DC

• Authors and publishers could employ these technologies to create their eBooksand be assured that the eBooks, through some transformation, would be readable and accessible by any device

22 January 2003 55

Conclusions (2)Conclusions (2)• Other issues:

–Digital Rights Management• “enables digital commerce”• “protection of digital content”• “secure ebook distribution”• “ensures content authenticity”• “participant identification”

– Licensing– Inter-Library Loan–Business Models for Selling eBooks