How publishing works in the digital era

Preview:

Citation preview

Bill KasdorfVP and Principal Consultant, Apex Content Solutions

Markup, Metadata, Formats, and WorkflowsHow Publishing Works in the Digital Era

Part IMarkup & Metadata

ContentThink of content as the

stuff you can see.

MarkupThink of markup as the engineering that

makes it work like a well-oiled machine.

MetadataThink of metadata

as the oil.

Content

Think first about the content, not about the publication.

Content

Think first about the content, not about the publication.That helps you focus on

what things are, not what they look like.

Content

Think first about the content, not about the publication.That helps you focus on

what things are, not what they look like.

That leads to adaptable markup that you can optimize for

print, online, ebooks, or apps.

Content Analysis

What kind of content is this?

Content Analysis

What kind of content is this?Who needs it? Why? (Later, ask “how?”)

Content Analysis

What kind of content is this?Who needs it? Why? (Later, ask “how?”)

What pieces are meaningful?

Content Analysis

What kind of content is this?Who needs it? Why? (Later, ask “how?”)

What pieces are meaningful?What chunks are needed for rendering?

Content Analysis

What kind of content is this?Who needs it? Why? (Later, ask “how?”)

What pieces are meaningful?What chunks are needed for rendering?

What chunks will people want to point to?

Content Analysis

What kind of content is this?Who needs it? Why? (Later, ask “how?”)

What pieces are meaningful?What chunks are needed for rendering?

What chunks will people want to point to?How does one chunk relate to other

chunks . . . across all your publications?

Content Analysis

What kind of content is this?Who needs it? Why? (Later, ask “how?”)

What pieces are meaningful?What chunks are needed for rendering?

What chunks will people want to point to?How does one chunk relate to other

chunks . . . across all your publications?The Goal:

THOUGHTFUL CHUNKING

Vocabulary and Markup:What to name the components

and how to tag them for editing,

typesetting, and digital publishing.

It works best if the same vocabulary (but not necessarily the same markup syntax)

can be used for all of these phases of your workflow.

Design: Typography and Layout

Typography is really implied “markup.” Typography distinguishes the components.

Layout is a navigation guide.This is a centuries-in-the-making collection of design conventions.

Design is based on semantic distinctions: What is this thing? How important is it?

How does it relate to the other things around it?

What do you see on this page?

What do you see on this page?“Huge numeral?”“24 pt Meta, fl rr?”“11 pt Charter, letterspaced?”“Rag right para indented on left?”“12 pt Meta Black all caps, & sm caps?”“Bold term?”I don’t think so. . . .

Here’s what we “see” on this page:“Chapter number”“Chapter title”“Author’s name”“Introductory paragraph”“Level 1 subhead”“Level 2 subhead”“Glossary term”We see structure and semantics, not specs.

XML

XML enables the separation of

structure and semantics from

rendering, presentation.

<CN> </CN>

</CT></AU>

<INTRO>

</INTRO><H1><H2>

</H1></H2>

<CT><AU>

<GLOSS> </GLOSS>

Here’s one possible markup scheme:“Chapter number”“Chapter title”“Author’s name”“Introductory paragraph”“Level 1 subhead”“Level 2 subhead”“Glossary term”That’s XML markup. Those are “tags.”

<CN> </CN>

</CT></AU>

<INTRO>

</INTRO><H1><H2>

</H1></H2>

<CT><AU>

<GLOSS> </GLOSS>

Here’s one possible markup scheme:“Chapter number”“Chapter title”“Author’s name”“Introductory paragraph”“Level 1 subhead”“Level 2 subhead”“Glossary term”That’s XML markup. Those are “tags.”

You don’t have to use XML.You do need some form of markup, even if in the form of styles, to

distinguish the components.

XML is the most powerful,

future-proof markup.

XML

Extensible Markup Language

XML

Extensible Markup Language

Extensible:Designed to adapt to various• kinds of documents• modes of publication• patterns of access and use

XML

Extensible Markup Language

Markup:Tagging a document to provide• structural information• semantic information• formatting information• supplemental information

XML

Extensible Markup LanguageLanguage:A standard way to express markup.Not a set of tags or a vocabulary, but an agreed-upon way to express a given vocabulary or tag set.

XML

XML liberates your content from any particular page design, any particular reading system,

any particular workflow.Print, app, ebook, and online:

all from the same XML document!

XML is not a set of tags. It is a LANGUAGE for expressing:

XML is not a set of tags. It is a LANGUAGE for expressing:

• Semantic information: what the pieces are

XML is not a set of tags. It is a LANGUAGE for expressing:

• Semantic information: what the pieces are• Structural information:

how the pieces fit together

XML is not a set of tags. It is a LANGUAGE for expressing:

• Semantic information: what the pieces are• Structural information:

how the pieces fit together• Metadata: information about the content

XML is not a set of tags. It is a LANGUAGE for expressing:

• Semantic information: what the pieces are• Structural information:

how the pieces fit together• Metadata: information about the content• Presentation information, but only where

semantics and structure don’t apply

XML is not a set of tags. It is a LANGUAGE for expressing:

• Semantic information: what the pieces are• Structural information:

how the pieces fit together• Metadata: information about the content• Presentation information, but only where

semantics and structure don’t apply. . . creating an unlimited number

of presentations from a single XML document.

So where do the tags come from?

Surely you don’t just make them up.

Wasn’t the whole point to make the tagging

clear, consistent, and non-proprietary?

Well, technically, you can just make them up.

But then only you know what they mean.As long as you follow the XML rules,

it’s called “well-formed” XML.

Well, technically, you can just make them up.

But then only you know what they mean.As long as you follow the XML rules,

it’s called “well-formed” XML.It’s better to have a formal specification (a DTD or other schema), and if your XML

also conforms to that, it’s called “valid” XML (which is also well-formed).

That lets any XML-based system interpret and use your markup.

DTD

Document Type DefinitionA special formal syntax

used to define a particular type of document

or set of related documents.It defines a tag set:

the specific tags and how they’re used.

DTD

Elements are the nouns: e.g., <title> or <blockquote>.

A chunk of content is surrounded by a “start tag” and an “end tag”:

e.g., <title>This Publication</title>; and elements must “nest” properly.

Now systems can tell the chunks apart and process them appropriately.

DTD

Attributes are the adjectives that describe the elements:

e.g., <title class="title-page"> vs. <title class="chapter">.

Now they can be distinguished, processed, and rendered differently.

Unique IDs identify “this specific one,” e.g., <section class="chapter" id="ch001">.

DTD

DTDs can also define metadata: information about the content.

For example: • Bibliographic information

• Subject codes • Author and publisher information

• Technical information • Rights and usage information

DTD

DTDs (or other types of schemas) are often called “models.”

Most publishers’ models today are based on one of a number of

standard models that are widely used and well known

in a certain “community.”

Some Standard Models

DocBook A generic book model, initially developed for technical books and documentation

TEI, the Text Encoding Initiative Mainly used for textual research

NLM/JATS/BITS The model for scholarly journals and books

XHTML The language of the Web and EPUB,

expressed as XML

Some Standard Models

DocBook A generic book model, initially developed for technical books and documentation

TEI, the Text Encoding Initiative Mainly used for textual research

NLM/JATS/BITS The model for scholarly journals and books

XHTML The language of the Web and EPUB,

expressed as XML

These each provide a standard,

widely used framework

to which a publisher’s

specific vocabulary

can be added to address their needs.

Part IIWorkflows

We all know what the stages of the editorial and production workflow are . . .

Design. Copyediting. Typesetting.

Artwork. Indexing.

Quality Control. Online/Ebook Creation.

. . . but we need to look deeper to optimize how they work in any given organization.

They’re usually done in silos. Which are hard to see into,

and are starting to break down.

Thinking of these stages in the traditional way

leads to suboptimization.In today’s digital ecosystem

we need to deconstruct them in order to optimize:

Who does what? At what stage(s) of the workflow? How to best manage the process?

Who Does What?

Do it in-house?Outsource it?Automate it?

You can’t answer these questions properly without deconstructing the categories.

And the answers differ from publisher to publisher.

At What Stage(s) of the Workflow?

How do these aspects intersect?How do you avoid duplication and rework?

How do you get out of “loopy QC”?

Getting the right things right upstream eliminates a lot of headaches downstream.

How Best to Manage the Process?

Balancing predictability and creativity: where to be strict, and where to be flexible?

How can systems and standards help?Buy vs. build vs. wing it?

Your systems, partners, and processes should make it easy for you to do the right work

and keep you from doing the wrong work.

Let’s deconstruct two key workflow stages

to see what options there are for optimizing them.

Copyediting

Editing in Word?Who cleans up the author’s messy MS files?

Who “normalizes” the styling?Who designs those styles in the first place?

Who checks all the links to figures, tables, cross references, notes?

Who actually does the intellectual work?How do the files get trafficked?

What about version control?

Copyediting

Editing in Word?Who cleans up the author’s messy MS files?

Who “normalizes” the styling?Who designs those styles in the first place?

Who checks all the links to figures, tables, cross references, notes?

Who actually does the intellectual work?How do the files get trafficked?

What about version control?

The copyeditor?The project or production editor?

Dedicated in-house file prep team?Outsourced to vendor?

“Normalizing? What’s that?”

Copyediting

Editing in Word?Who cleans up the author’s messy MS files?

Who “normalizes” the styling?Who designs those styles in the first place?

Who checks all the links to figures, tables, cross references, notes?

Who actually does the intellectual work?How do the files get trafficked?

What about version control?

They need to bealigned with your XML markup

and easy to use by the copyeditor.

Copyediting

Editing in Word?Who cleans up the author’s messy MS files?

Who “normalizes” the styling?Who designs those styles in the first place?

Who checks all the links to figures, tables, cross references, notes?

Who actually does the intellectual work?How do the files get trafficked?

What about version control?

The copyeditor?An editorial assistant?The editorial vendor?

The typesetter?Software?

Copyediting

Editing in Word?Who cleans up the author’s messy MS files?

Who “normalizes” the styling?Who designs those styles in the first place?

Who checks all the links to figures, tables, cross references, notes?

Who actually does the intellectual work?How do the files get trafficked?

What about version control?

In-house copyeditor?Freelance copyeditor?An editorial service?

Full-service comp vendor?

Copyediting

Editing in Word?Who cleans up the author’s messy MS files?

Who “normalizes” the styling?Who designs those styles in the first place?

Who checks all the links to figures, tables, cross references, notes?

Who actually does the intellectual work?How do the files get trafficked?

What about version control?

Email files, named whatever. . . .Consistent file naming, FTP, transmittals.Digital Asset Management System (DAM).

Content Management System (CMS).

Typesetting

Who determines the tags or style names?How do the editing styles translate to comp?

Who does the artwork?How are figures, tables, etc. placed?Are links preserved or implemented?

How do the files get trafficked?What about version control?

Typesetting

Who determines the tags or style names?How do the editing styles translate to comp?

Who does the artwork?How are figures, tables, etc. placed?Are links preserved or implemented?

How do the files get trafficked?What about version control?

Freelance designer, ad hoc?Compositor’s own system?

Publisher’s system?XML?

Typesetting

Who determines the tags or style names?How do the editing styles translate to comp?

Who does the artwork?How are figures, tables, etc. placed?Are links preserved or implemented?

How do the files get trafficked?What about version control?

“They don’t.”“The typesetter does it,

we don’t know what they do.”Word styles imported into InDesign.Programmatic transforms to XML.

Typesetting

Who determines the tags or style names?How do the editing styles translate to comp?

Who does the artwork?How are figures, tables, etc. placed?Are links preserved or implemented?

How do the files get trafficked?What about version control?

“Then we fix it in-house.”“We send it to an art studio.”

“The typesetter fixes it.”“We make the author fix it.”

“It depends. . . .”

“The author. Sorta.”

Typesetting

Who determines the tags or style names?How do the editing styles translate to comp?

Who does the artwork?How are figures, tables, etc. placed?Are links preserved or implemented?

How do the files get trafficked?What about version control?

Manually based on callouts marked by copyeditor.

Automatically from XML in Typefi, 3B2.

Typesetting

Who determines the tags or style names?How do the editing styles translate to comp?

Who does the artwork?How are figures, tables, etc. placed?Are links preserved or implemented?

How do the files get trafficked?What about version control?

“Nope.”“The typesetter adds them.”

“We put them in when we make the ebook.”“Yes, they’re in the XML.”

Typesetting

Who determines the tags or style names?How do the editing styles translate to comp?

Who does the artwork?How are figures, tables, etc. placed?Are links preserved or implemented?

How do the files get trafficked?What about version control?

Email files, named whatever. . . .Consistent file naming, FTP, transmittals.Digital Asset Management System (DAM).

Content Management System (CMS).

Sound familiar?

Workflow

Workflow is where it all comes together:A vocabulary that fits your publications.Markup that makes your content agile.

Metadata that makes it meaningful.The standards that make it interoperable.The technologies that fit your capabilities.

Part IIIFile Formats and Standards

Publications today are composed of a multitude of files and formats.

Text FilesMetadata

Image FilesVideo and Audio Files

ScriptsFonts

StylesheetsDeliverable Products

XML is not the whole story!

Some Common Text File Formats

Microsoft Word Used for most authoring and editing

TeX/LaTeX Common for math, statistics, engineering

InDesign The leading design/page layout format

XML The foundation of most modern publishing

HTML5 The format of the World Wide Web

Some Common Text File Formats

Microsoft Word Used for most authoring and editing

TeX/LaTeX Common for math, statistics, engineering

InDesign The leading design/page layout format

XML The foundation of most modern publishing

HTML The format of the World Wide Web

Ubiquitous but typically undisciplined Authors do lots of inconsistent, messy thingsStyle templates work well for editing

Visually distinct styles for elements, names align with terms in rest of workflow

Old .doc is “binary”; new .docx is XML Don’t get excited; this “WordML” is full of messy stuff,

but at least it can be worked with

Some Common Text File Formats

Microsoft Word Used for most authoring and editing

TeX/LaTeX Common for math, statistics, engineering

InDesign The leading design/page layout format

XML The foundation of most modern publishing

HTML The format of the World Wide Web

Very specialized Encountered only in specific disciplines

Often used for authoring + typesetting Difficult to convert, so publishers often treat TeX as an outlier and skip XML

Some Common Text File Formats

Microsoft Word Used for most authoring and editing

TeX/LaTeX Common for math, statistics, engineering

InDesign The leading design/page layout format

XML The foundation of most modern publishing

HTML The format of the World Wide Web

Ideal for design-intensive publications Integrated with Adobe’s full toolset, now cloud-basedStructure: paragraph & character styles

Align vocabulary with rest of workflow

Can import and export XML This is how Typefi and PShift work;

IDML and EPUB export can be problematic

Some Common Text File Formats

Microsoft Word Used for most authoring and editing

TeX/LaTeX Common for math, statistics, engineering

InDesign The leading design/page layout format

XML The foundation of most modern publishing

HTML5 The format of the World Wide Web

Most flexible, future-proof format Adapts as technologies change and new products are developed

Optimal for multi-channel delivery Same XML file for print, ebook, app, & online,

either directly or with automated transformation

Some Common Text File Formats

Microsoft Word Used for most authoring and editing

TeX/LaTeX Common for math, statistics, engineering

InDesign The leading design/page layout format

XML The foundation of most modern publishing

HTML5 The format of the World Wide Web

Can be expressed as XML: XHTML5 The HTML “tag set” following XML syntax and rules

HTML5 is structure + semantics Presentation is via CSS (Cascading Style Sheets)Basis of Open Web Platform and EPUB 3

OWP is a huge collection of standards that form the Web ecosystem:

HTML5, CSS3, JavaScript, and many more

Some Common Image Formats

TIFF (.tif or .tiff) “Tagged Image File Format”

JPEG (.jpg or .jpeg) “Joint Photographic Experts Group”

GIF (.gif) “Graphics Interchange Format”

PNG (.png) “Portable Network Graphics”

SVG (.svg) “Scalable Vector Graphics”

Some Common Image Formats

TIFF (.tif or .tiff) “Tagged Image File Format”

JPEG (.jpg or .jpeg) “Joint Photographic Experts Group”

GIF (.gif) “Graphics Interchange Format”

PNG (.png) “Portable Network Graphics”

SVG (.svg) “Scalable Vector Graphics”

Mainly used for photos (continuous tone) “Raster” or “bitmap” (grid of pixels)

Typically “lossless”: keeps all the image data Primarily for print

Grayscale or CMYK high-resolution images File sizes are usually quite large, esp. color images

Some Common Image Formats

TIFF (.tif or .tiff) “Tagged Image File Format”

JPEG (.jpg or .jpeg) “Joint Photographic Experts Group”

GIF (.gif) “Graphics Interchange Format”

PNG (.png) “Portable Network Graphics”

SVG (.svg) “Scalable Vector Graphics”

Also mainly for continuous tone images “Lossy” compression: can adjust balance of

quality and file sizePrimarily for online, ebooks, etc.

Time to “load” is a factor (plus device capacity) Preserve more data when zooming is needed

Some Common Image Formats

TIFF (.tif or .tiff) “Tagged Image File Format”

JPEG (.jpg or .jpeg) “Joint Photographic Experts Group”

GIF (.gif) “Graphics Interchange Format”

PNG (.png) “Portable Network Graphics”

SVG (.svg) “Scalable Vector Graphics”

Mainly for line art (diagrams, flat color) Small file size: designed for online/digital

Lossless compressionCan be animated: “Animated GIF”

[Editorial comment: also can be annoying. ;-) ]

Some Common Image Formats

TIFF (.tif or .tiff) “Tagged Image File Format”

JPEG (.jpg or .jpeg) “Joint Photographic Experts Group”

GIF (.gif) “Graphics Interchange Format”

PNG (.png) “Portable Network Graphics”

SVG (.svg) “Scalable Vector Graphics”

Created as open-source successor to GIF Small file size for line art, flat color; offers excellent quality, good transparency, lossless compression

Can be used for photos or line art. Better than JPEG at flat color areas,

but PNG photos are larger files than JPEGs

Some Common Image Formats

TIFF (.tif or .tiff) “Tagged Image File Format”

JPEG (.jpg or .jpeg) “Joint Photographic Experts Group”

GIF (.gif) “Graphics Interchange Format”

PNG (.png) “Portable Network Graphics”

SVG (.svg) “Scalable Vector Graphics”

W3C standard XML-based vector format Vector math based on Adobe’s PDF/Postscript

Searchable, accessible textNo loss of quality when resized

Sharp on for laptop, tablet, phone, zoom—like PDF Not widely or consistently implemented yet,

but should become a dominant image format

. . . and Some Common Proprietary Formats

AI (.ai) Adobe Illustrator

PSD (.psd) PhotoshopEPS (.eps)

Encapsulated PostscriptPPT (.ppt)

PowerPointWMF/EMF

Windows Metafile / Enhanced Metafile

These are used in production

but don’t belong in deliverable

products.

Audio and Video Formats

HTML5 vs. Proprietary Best: open formats permitted by HTML5

in the <audio> and <video> elements:they work natively in browsers & e-readersProprietary formats like Flash (.swf) and

QuickTime (.mov, .qt) require plug-insIdeal: Formats Recommended by EPUB 3

Audio: MP3 and MP4 AAC LC Video: H.264 and VP8/WebM

(often both due to browser/RS inconsistency)

Scripts

JavaScript Fundamental to the Open Web Platform

JavaScript Libraries “Pre-written” scripts to adapt as needed

Most popular: open-source jQueryWidgets

Interactive features like quizzes, sliders, “assessments” in educational content,

graphing data from a table, etc.

Fonts

OpenType Primary font format for print

WOFF Primary font format for web

Licensing Know what rights you’ve got!Obfuscating and Embedding

Enable ebook to contain the fonts it needsUnicode Fonts

Character encoding of the Web & XML

Fonts

OpenType Primary font format for print

WOFF Primary font format for web

Licensing Know what rights you’ve got!Obfuscating and Embedding

Enable ebook to contain the fonts it needsUNICODE Fonts

Encoding aligns with Web and XML

The “legal” fonts in EPUB 3 Reading systems required to handle both—

but many systems just use their own default fonts nowMany fonts available in both formats

WOFF is a “wrapper” for underlying font data/metrics

Fonts

OpenType Primary font format for print

WOFF Primary format for web

Licensing Know what rights you’ve got!Obfuscating and Embedding

Enable ebook to contain the fonts it needsUNICODE Fonts

Encoding aligns with Web and XML

Need license to embed font in ebook Beware “free” fonts! “Open License Fonts” are safe

Need “fallbacks” for embedded fonts “System fonts” are built into a reading system

“Web fonts” require you to be online—not for ebooks The CSS lets you default to “serif” or “sans-serif ”

Embedded fonts for “special characters” Math, linguistics, quotes from non-latin languages

Fonts

OpenType Primary font format for print

WOFF Primary format for web

Licensing Know what rights you’ve got!Obfuscating and Embedding

Enable ebook to contain the fonts it needsUnicode Fonts

Character encoding of the Web & XML

All the characters in XML are Unicode by definition

This enables unambiguous character specificationWord, InDesign, and XML-based systems

all understand and use Unicode Use Unicode fonts throughout your workflow!

Stylesheets

Word A good “styles library” helps add

structure and semanticsInDesign/Quark

Paragraph styles and character styles ensure consistency, efficiency

Browsers/Ebooks CSS (Cascading Style Sheets)

Adapts rendering for context/device Enables “responsive design”

Deliverable Products

PDF Preserves look of typeset page

Used for printing, online delivery Doesn’t “reflow” for different screen sizes

EPUBInternational standard format

Non-proprietary, works almost everywhere Reflowable or fixed layout

KF8 Amazon’s proprietary ebook format

Thanks!

Bill Kasdorfbkasdorf@apexcovantage.com

+1 734 904 6252@BillKasdorf

Recommended