13
1 XML: an introduction David Nathan

1 XML: an introduction David Nathan. 2 XML an in-line markup system single sequence of plain text only (but can be unicode) equivalent to a tree

Embed Size (px)

Citation preview

Page 1: 1 XML: an introduction David Nathan. 2 XML  an in-line markup system  single sequence of plain text only (but can be unicode)  equivalent to a tree

1

XML: an introductionDavid Nathan

Page 2: 1 XML: an introduction David Nathan. 2 XML  an in-line markup system  single sequence of plain text only (but can be unicode)  equivalent to a tree

2

XML

an in-line markup system single sequence of plain text only (but can be

unicode) equivalent to a tree structure consists of elements and content elements: tag syntax entities syntax reserved characters < > & " ‘

Page 3: 1 XML: an introduction David Nathan. 2 XML  an in-line markup system  single sequence of plain text only (but can be unicode)  equivalent to a tree

3

XML syntax

structures are defined by tags in angle brackets:

eg: <noun> tags are usually in pairs:

a start/open tag, and an end/close tag:the <noun> dog </ noun>

chased ... but can also be single and closed:

the dog <pause /> sat down

Page 4: 1 XML: an introduction David Nathan. 2 XML  an in-line markup system  single sequence of plain text only (but can be unicode)  equivalent to a tree

4

XML syntax

tags can have attributes with values :the <noun num=“1”> dog </ noun> sat down

you can name your tags, attributes or values (almost) anything

there are some restrictions: you can have hierarchies, but not overlaps:

<a>the <b><c>cat</c> sat</b> on the mat</a><a>the <b><c>cat</b> sat</c> on the mat</a>

Page 5: 1 XML: an introduction David Nathan. 2 XML  an in-line markup system  single sequence of plain text only (but can be unicode)  equivalent to a tree

5

XML is used to add knowledge ...

add knowledge to content: usually structures and labels

add the knowledge that’s relevant to your domain or task

knowledge priorities: what’s required what’s visually represented (eg by

format/layout) what’s implicit

Page 6: 1 XML: an introduction David Nathan. 2 XML  an in-line markup system  single sequence of plain text only (but can be unicode)  equivalent to a tree

6

Compare to HTML

... the man who really liked the book The Lawyer Who Lost, about habeas corpus ...

in HTML:... the man who <i>really</i> liked the book <i>The Lawyer Who Lost</i>, about <i>habeas corpus</i> ...

in XML, we can define our own elements that focus on logical structure rather than visial format

Page 7: 1 XML: an introduction David Nathan. 2 XML  an in-line markup system  single sequence of plain text only (but can be unicode)  equivalent to a tree

7

Compare to HTML

XML: is flexible and extensible must be well-formed can be validated is application-, platform-, and vendor-

independent is machine readable (ie parsable, or

understandable by computer programs)

Page 8: 1 XML: an introduction David Nathan. 2 XML  an in-line markup system  single sequence of plain text only (but can be unicode)  equivalent to a tree

8

Where does XML come from?

write “raw” XML (we will do this) XML editors generated, eg from databases, programs

Page 9: 1 XML: an introduction David Nathan. 2 XML  an in-line markup system  single sequence of plain text only (but can be unicode)  equivalent to a tree

9

What is XML used for?

any symbolic data data exchange data transformation

structure format content

Page 10: 1 XML: an introduction David Nathan. 2 XML  an in-line markup system  single sequence of plain text only (but can be unicode)  equivalent to a tree

10

Why do I need to know about it?

you already consume a lot of XML! many linguistic software tools (eg ELAN) use

XML as data format XML is very powerful and flexible, especially

for certain tasks, and for archiving XML is an ISO standard XML is growing in use and support! XML is easy!

Page 11: 1 XML: an introduction David Nathan. 2 XML  an in-line markup system  single sequence of plain text only (but can be unicode)  equivalent to a tree

11

XML exercise 1

James departed from Manilla on Wednesday 11 May and arrived in Boston on Thursday 12 May.

1. identify times and names, code as XML2. draw as a tree structure3. add more information to your XML as

attributes4. draw a tree structure again

Page 12: 1 XML: an introduction David Nathan. 2 XML  an in-line markup system  single sequence of plain text only (but can be unicode)  equivalent to a tree

12

XML exercise 2

1. draw a simple linguistic tree structure2. represent it as XML

Page 13: 1 XML: an introduction David Nathan. 2 XML  an in-line markup system  single sequence of plain text only (but can be unicode)  equivalent to a tree

13

Congratulations

you have now taken your first steps in XML