14
Leiden University. The university to discover. DMT Week 3 Adriaan van der Weel and Peter Verhaar

Leiden University. The university to discover. DMT Week 3 Adriaan van der Weel and Peter Verhaar

Embed Size (px)

Citation preview

Page 1: Leiden University. The university to discover. DMT Week 3 Adriaan van der Weel and Peter Verhaar

Leiden University. The university to discover.

DMT Week 3

Adriaan van der Weel and Peter Verhaar

Page 2: Leiden University. The university to discover. DMT Week 3 Adriaan van der Weel and Peter Verhaar

Leiden University. The university to discover.

Where do we stand?

Page 3: Leiden University. The university to discover. DMT Week 3 Adriaan van der Weel and Peter Verhaar

Leiden University. The university to discover.

Principles of markup- HTML:

- Document instance (your CV)- Stylesheet (css)

- Application- Document instance (your CV)- Stylesheet (css)- DTD/Schema- Add: Prologue (XML decl.; DTD)

Page 4: Leiden University. The university to discover. DMT Week 3 Adriaan van der Weel and Peter Verhaar

Leiden University. The university to discover.

Text and markup

Page 5: Leiden University. The university to discover. DMT Week 3 Adriaan van der Weel and Peter Verhaar

Leiden University. The university to discover.

Knowledge representation- Structure and content- Ontology

- What knowable things exist - What are the relationships that hold

between them- Tree diagram

- The book has structure and content: chapters, paragraphs, footnotes, etc.

- XML represents structure and content- Various ontologies - various DTDs

Page 6: Leiden University. The university to discover. DMT Week 3 Adriaan van der Weel and Peter Verhaar

Leiden University. The university to discover.

XML Basics 1- Elements <p>...</p>- Attributes <title

type=play>...</title>- Entities

- Character: &#xE8; = è- General entities, referencing:

• Chunks of text defined elsewhere• Text or image files, etc. • E.g., <p>The &BTCP; aims to ... </p>

- Well-formedness, validation- Prologue (XML decl.; DTD)

Page 7: Leiden University. The university to discover. DMT Week 3 Adriaan van der Weel and Peter Verhaar

Leiden University. The university to discover.

XML Basics 2- Open standard (cf de facto standard):

- Publicly available- Royalty-free- Fully and publicly documented

- NB: ‘Who owns your data?’- (Lower) ASCII and Unicode:

- Platform and software independent- Software independent- Device independent

Page 8: Leiden University. The university to discover. DMT Week 3 Adriaan van der Weel and Peter Verhaar

Leiden University. The university to discover.

Open standards 1- Open standards in a networking

world- Why?- Which? E.g., Internet Protocol Suite:

- Link layer (physical/data, e.g., ethernet)

- Internet layer, facilitating transport, e.g., IP

- Transport layer, e.g. TCP- Application layer, e.g., HTTP,

SMTP, FTP

Page 9: Leiden University. The university to discover. DMT Week 3 Adriaan van der Weel and Peter Verhaar

Leiden University. The university to discover.

Open standards 2- E.g.:

- File format: Pdf, txt- Programming language: PHP,

Linux- Style language: CSS, XSLT- Markup metalanguage: SGML, XML- Markup language: DocBook, HTML,

EAD, TEI

Page 10: Leiden University. The university to discover. DMT Week 3 Adriaan van der Weel and Peter Verhaar

Leiden University. The university to discover.

TEI basics- Text Encoding Initiative, 1987- Text exchange in the humanities- TEI is a DTD

- TEI is a collection of DTD fragments or modules

- Platform and software independent (ASCII); open standard; open source

- Used in an XML application (diagram)

- Document ‘instances’ should be validated against the TEI DTD

Page 11: Leiden University. The university to discover. DMT Week 3 Adriaan van der Weel and Peter Verhaar

Leiden University. The university to discover.

TEI DTD- The TEI DTD is modular. We use:

- <!DOCTYPE TEI PUBLIC "-//TEI P5//DTD Main Document Type//EN" "http://www.tei-c.org/release/xml/tei/schema/dtd//tei.dtd" [

<!ENTITY % TEI.header "INCLUDE"> <!ENTITY % TEI.core "INCLUDE"> <!ENTITY % TEI.textstructure "INCLUDE"> <!ENTITY % TEI.transcr "INCLUDE"> <!ENTITY % TEI.linking "INCLUDE"> <!ENTITY % TEI.namesdates "INCLUDE"> ]>

http://www.tei-c.org/release/xml/tei/schema/dtd/

Page 12: Leiden University. The university to discover. DMT Week 3 Adriaan van der Weel and Peter Verhaar

Leiden University. The university to discover.

Why this rigmarole?

- Print (‘Order of the Book’):- Author’s brain > Book > reader’s brain- Instrument: typography

- Digital (‘Digital Order’?):- Author’s brain > Computer > reader’s

brain- Instrument: markup- For both typography(=form) and

content

- So: Need to make text intelligent

Page 13: Leiden University. The university to discover. DMT Week 3 Adriaan van der Weel and Peter Verhaar

Leiden University. The university to discover.

Using the computer / UM- Author’s brain > Computer > reader’s

brain

- Vary output format (paper, pdf, html, mobile phone, etc.)

- Exchange- Reuse - Search and select- Count- Change content (order) and form- Etcetera

Page 14: Leiden University. The university to discover. DMT Week 3 Adriaan van der Weel and Peter Verhaar

Leiden University. The university to discover.

New research questions?- Chris Anderson (The Long Tail), in Wired

‘The end of theory’- But: need for hypothesis remains- But: humanities data:

- Quantity: not such a wealth of data. Bitty. Discontinuous.

- Quality: narrative, evaluative, ambiguous, subjective, conceptual

- Who decides the agenda? Need to lead, rather than follow.