Upload
heather-cox
View
40
Download
3
Tags:
Embed Size (px)
DESCRIPTION
XML Programming CSPP51038. Overview Simple Schema XML Sample Applications. Course specifics. Prerequisites. Since XML is about interoperability , I’m going to do my best to make the class language-agnostic. Being master of any one of the following is required Python Perl Java C C++ - PowerPoint PPT Presentation
Citation preview
XML ProgrammingXML ProgrammingCSPP51038CSPP51038
OverviewOverview
Simple SchemaSimple Schema
XML Sample ApplicationsXML Sample Applications
Course specificsCourse specifics
PrerequisitesPrerequisites
Since XML is about Since XML is about interoperabilityinteroperability, I’m going to , I’m going to do my best to make the class language-do my best to make the class language-agnostic. Being agnostic. Being mastermaster of any one of the of any one of the following is requiredfollowing is required PythonPython PerlPerl JavaJava CC C++C++ Visual BasicVisual Basic
I will base class lectures/examples on JavaI will base class lectures/examples on Java
FormatFormat
Five homeworks: 70%Five homeworks: 70%
In-class quizzes (8 total): 30%In-class quizzes (8 total): 30%
No midterm or finalNo midterm or final
Class participation can help gradeClass participation can help grade
Getting helpGetting help
Mandatory: Register for discussion group via course Mandatory: Register for discussion group via course websitewebsite
To post to list, send mail toTo post to list, send mail to [email protected]@cs.uchicago.edu
TA info, office hours/locations all on web-site starting TA info, office hours/locations all on web-site starting Wed.Wed.
http://people.cs.uchicago.edu/~asiegel/cspp51038http://people.cs.uchicago.edu/~asiegel/cspp51038
Consult website frequently for updates/announcements, Consult website frequently for updates/announcements, homework, readings, etc.homework, readings, etc.
PoliciesPolicies
Late homework allowed Late homework allowed up to three daysup to three days – – 10% penalty charged automatically.10% penalty charged automatically.
If you need any special considerations, If you need any special considerations, please see me in advance or I won’t be please see me in advance or I won’t be able to help youable to help you
Will turn over homework in 7 daysWill turn over homework in 7 days
Programming modelsProgramming models
Distributed programming modelsDistributed programming modelsTypical Web-basedTypical Web-based
Easy to deploy but slow, not great user experienceEasy to deploy but slow, not great user experience
htmlbrowser
WebServer
http
DynamicallyGenerated
html
Many programming models•JSP•Servlets•PHP•CGI (python, perl, C)•Cold Fusion
html
plus optionallyJavaScript to jazz up html
database
Distributed programming modelsDistributed programming modelsTypical Web-basedTypical Web-based
Better user experience. Heavier, less portable, requires Better user experience. Heavier, less portable, requires socket programming to stream to server.socket programming to stream to server.
WebServer
http
DynamicallyGenerated
html
html + applet
databaseapplet
html
socket
Direct ConnectionsDirect Connections
Application client
App1sockets
App2
App3
ports
Application client
App1
Remote ProceduresApp2
App3N
DS
Examples: Java’s rmi, CORBA
XML basicsXML basics
XML Basics, contXML Basics, cont
Most modern languages have method of Most modern languages have method of representing structured data.representing structured data.
Typical flow of events in applicationTypical flow of events in application
Read data(file, db, socket)
Marshalobjects
Manipulate inprogram
Unmarshal (file, db, socket)
•Many language-specific technologies to reduce these steps: RMI, object serialization in any language, CORBA (actually somewhat language neutral), MPI, etc.
•XML provides a very appealing alternative that hits the sweet spot for many applications
User-defined types in programming User-defined types in programming languageslanguages
XML is a text-based, programming-language-XML is a text-based, programming-language-neutral way of representing structured neutral way of representing structured information. Compare:information. Compare:
struct Student{ char* name; char* ssn; int age; float gpa;}
class Student{ public String name; public String ssn; public int age; public float gpa;}
C Java Fortrantype Student character(len=*) :: name character(len=*) :: ssn integer :: age real :: gpaend type Student
Sample XML SchemaSample XML Schema
• In XML, (a common) datatype description is called an XML schema.• DTD and Relax NG are other common alternatives• Below uses schema just for illustration purposes
<?xml version="1.0" encoding="UTF-8"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:element name="student"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="ssn" type="xs:string"/> <xs:element name="age" type="xs:integer"/> <xs:element name="gpa" type="xs:decimal"/> </xs:sequence> </xs:complexType> </xs:element></xs:schema>
Ignore thisFor now
Alternative schemaAlternative schema
•In this example studentType is defined separately rather than anonymously
<xs:schema> <xs:element name="student" type="studentType“/> <xs:complexType name="studentType"> <xs:sequence>
<xs:element name="name" type="xs:string"/><xs:element name="ssn" type="xs:string"/><xs:element name="age" type="xs:integer"/><xs:element name="gpa" type="xs:decimal"/>
</xs:sequence> </xs:complexType></xs:schema>
new type defined separately
Alternative: DTDAlternative: DTD• Can also use a DTD (Document Type Descriptor), but this is much simpler than a schema but also much less powerful (notice the lack of types)
<!DOCTYPE Student [ <! – Each XML file is stored in a document whose name is the same as the root node -- > <! ELEMENT Student (name,ssn,age,gpa)> <! – Student has four attributes -- > <!ELEMENT name (#PCDATA)> <! – name is parsed character data -- > <!ELEMENT ssn (#PCDATA)> <!ELEMENT age (#PCDATA)> <!ELEMENT gpa (#PCDATA)>]>
Another alternative: Relax NGAnother alternative: Relax NG
Gaining in popularityGaining in popularity
Can be very simple to write but also Can be very simple to write but also contain many more features than DTDcontain many more features than DTD
Still much less common than schemaStill much less common than schema
Creating instances of typesCreating instances of types
In programming languages, we instantiate objects:
struct Student s1, s2;s1.name = “Andrew”s1.ssn=“123-45-6789”;
Student s = new Student();s1.name = “Andrew”;s1.ssn=“123-45-6789”;.type(Student) :: s1s1%name = ‘Andrew’.
C
Java
Fortran
Creating XML documentsCreating XML documents
XML is XML is notnot a programming language! a programming language!
In XML we make a Student “object” in an xml file In XML we make a Student “object” in an xml file (Student.xml):(Student.xml):
<Student><Student>
<name>Andrew</name><name>Andrew</name>
<ssn>123-45-6789</ssn><ssn>123-45-6789</ssn>
<age>36</age><age>36</age>
<gpa>2.0</gpa><gpa>2.0</gpa>
</Student> </Student>
Think of this as like a serialized object.Think of this as like a serialized object.
XML and SchemaXML and Schema
Note that there are two parts to what we didNote that there are two parts to what we did Defining the “structure” layoutDefining the “structure” layout Defining an “instance” of the structureDefining an “instance” of the structure
The first is done with an appropriate Schema or The first is done with an appropriate Schema or DTD.DTD.
The second is the XML partThe second is the XML part
Both can go in the same file, or an XML file can Both can go in the same file, or an XML file can refer to an external Schema or DTD (typical)refer to an external Schema or DTD (typical)
From this point on we use only SchemaFrom this point on we use only Schema
ExerciseExercise
Create an XML file that contains a list of Create an XML file that contains a list of two Car “objects” (pick five relevant fields).two Car “objects” (pick five relevant fields).
Note that each XML file must have a Note that each XML file must have a single root node, so each car element single root node, so each car element must be under a common parent (e.g. must be under a common parent (e.g. cars).cars).
Exercise SolutionExercise Solution<?xml version="1.0" encoding="UTF-8"?><cars> <car> <make>dodge</make> <model>ram</model> <color>red</color> <year>2004</year> <mileage>22000</mileage> </car>
<car> <make>Ford</make> <model>Pinto</model> <color>white</color> <year>1980</year> <mileage>100000</mileage> </car>
</cars>
??
Question: What can we do with such a file?Question: What can we do with such a file?
Write corresponding Schema to define its Write corresponding Schema to define its contentcontent
Write XSL transformation to displayWrite XSL transformation to display
Parse into a programming languageParse into a programming language
Some sample XML Some sample XML documentsdocuments
Order / WhitespaceOrder / Whitespace
Note that element order is important, but whitespace is not. This is the same as far as the xml parser is concerned:
<Article ><Headline>Direct Marketer Offended by Term 'Junk Mail' </Headline><authors>
<author> Joe Garden</author><author> Tim Harrod</author>
</authors><abstract>Dan Spengler, CEO of the direct-mail-marketing firm Mailbox of
Savings, took umbrage Monday at the use of the term <it>junk mail</it></abstract><body type="url" > http://www.theonion.com/archive/3-11-01.html </body>
</Article>
Molecule ExampleMolecule Example
<?xml version "1.0" ?><?xml version "1.0" ?><CML><CML>
<MOL TITLE="Water" ><MOL TITLE="Water" ><ATOMS> <ATOMS>
<ARRAY BUILTIN="ELSYM" > H O H</ARRAY><ARRAY BUILTIN="ELSYM" > H O H</ARRAY>
</ATOMS></ATOMS><BONDS><BONDS>
<ARRAY BUILTIN="ATID1" >1 2</ARRAY><ARRAY BUILTIN="ATID1" >1 2</ARRAY><ARRAY BUILTIN="ATID2" >2 3</ARRAY><ARRAY BUILTIN="ATID2" >2 3</ARRAY><ARRAY BUILTIN="ORDER" >1 1</ARRAY><ARRAY BUILTIN="ORDER" >1 1</ARRAY>
</BONDS></BONDS></MOL></MOL>
</CML></CML>
Rooms exampleRooms example
<?xml version="1.0" ?><?xml version="1.0" ?> <rooms> <rooms>
<room name="<room name="RedRed">"> <capacity><capacity>1010</capacity> </capacity> <equipmentList><equipmentList>
<equipment><equipment>ProjectorProjector</equipment> </equipment> </equipmentList></equipmentList>
</room></room><room name="<room name="GreenGreen">">
<capacity><capacity>55</capacity> </capacity> <equipmentList /> <equipmentList /> <features><features> <feature><feature>No RoofNo Roof</feature> </feature> </features></features>
</room></room> </rooms></rooms>
SuggestionSuggestion
Try building each of those documents in Try building each of those documents in XMLSpy, Oxygen, etc.XMLSpy, Oxygen, etc.
Note: it is not required to create a schema Note: it is not required to create a schema to do this. Just create new XML document to do this. Just create new XML document and start building.and start building.
Dissecting an XML Dissecting an XML DocumentDocument
Things that can appear in an XML documentThings that can appear in an XML document
ELEMENTSELEMENTS: : simplesimple, , complexcomplex, , emptyempty, or , or mixedmixed content; content; attributes. attributes.
The The XML declarationXML declaration
Processing instructions(PIsProcessing instructions(PIs) ) <? …?><? …?> Most common is Most common is <?xml-stylesheet …?><?xml-stylesheet …?> <?xml-stylesheet type=“text/css” <?xml-stylesheet type=“text/css” href=“mys.css”?>href=“mys.css”?>
CommentsComments <!-- <!-- comment textcomment text --> -->
Begin TagsEnd Tags
Tags
Attributes
<?xml version "1.0"<?xml version "1.0" ?>?>
<<CMLCML><><MOL TITLE="Water" MOL TITLE="Water" > <> <ATOMSATOMS>> <<ARRAY BUILTIN="ELSYM" ARRAY BUILTIN="ELSYM" >> H O H H O H</</ARRAYARRAY>></</ATOMSATOMS>><<BONDSBONDS>><<ARRAY BUILTIN="ATID1" >1 2ARRAY BUILTIN="ATID1" >1 2</</ARRAYARRAY>><<ARRAY BUILTIN="ATID2" >2 3ARRAY BUILTIN="ATID2" >2 3</</ARRAYARRAY>><<ARRAY BUILTIN="ORDER" >1 1ARRAY BUILTIN="ORDER" >1 1</</ARRAYARRAY>></</BONDSBONDS>></</MOLMOL>></</CMLCML>>
Parts of an XML documentParts of an XML documentDeclaration
AttributeValues
An XML element is everything from (including) the element's start tag to (including) the element's end tag.
XML and TreesXML and TreesTags give the structure of a Tags give the structure of a document. They divide the document. They divide the document up into document up into Elements, Elements, starting at the top most starting at the top most element, theelement, the root element. root element. The The stuff inside an element is its stuff inside an element is its content – content cancontent – content caninclude other elements along include other elements along with ‘character data’with ‘character data’
CML
MOL
ATOMS BONDS
ARRAY ARRAY ARRAY ARRAY
HOH 12 23 11
Root element
CDATA sections
XML and XML and TreesTrees
<?xml version "1.0"<?xml version "1.0" ?>?><<CMLCML>>
<<MOL TITLE="Water" MOL TITLE="Water" >><<ATOMSATOMS>>
<<ARRAY BUILTIN="ELSYM" ARRAY BUILTIN="ELSYM" >> H O H H O H</</ARRAYARRAY>></</ATOMSATOMS>><<BONDSBONDS>>
<<ARRAY BUILTIN="ATID1" >1 2ARRAY BUILTIN="ATID1" >1 2</</ARRAYARRAY>><<ARRAY BUILTIN="ATID2" >2 3ARRAY BUILTIN="ATID2" >2 3</</ARRAYARRAY>><<ARRAY BUILTIN="ORDER" >1 1ARRAY BUILTIN="ORDER" >1 1</</ARRAYARRAY>>
</</BONDSBONDS>></</MOLMOL>>
</</CMLCML>>
CML
MOL
ATOMS BONDS
ARRAY ARRAY ARRAY ARRAY
HOH 12 23 11
Root element
Data sections
XML and TreesXML and Trees
rooms
room
capacity equipmentlistequipmentlist
equipment
capacity
room
features
feature10
projector
5
No Roof
More detail on elementsMore detail on elements
Element relationshipsElement relationships
<book> <title>My First XML</title> <prod id="33-657" media="paper"></prod> <chapter>Introduction to XML <para>What is HTML</para> <para>What is XML</para> </chapter> <chapter>XML Syntax <para>Elements must have a closing tag</para> <para>Elements must be properly nested</para> </chapter> </book>
•Book is the root element. •Title, prod, and chapter are child elements of book.•Book is the parent element of title, prod, and chapter.• Title, prod, and chapter are siblings (or sister elements) because they have the same parent.
Element contentElement content
• Elements can have different content types.
• An element is everything from (including) the element's start tag to (including) the element's end tag.
• An element can have element content, mixed content,simple content, or empty content, and attributes.
• Exercise• List the content type for each element in the previous example
Exercise answerExercise answer
• In the previous example
• book has element content, because it contains other elements.
• Chapter has mixed content because it contains both textand other elements.
• Para has simple content (or text content) because it
contains only text. • • Prod has empty content, because it carries no information
Element namingElement naming
XML elements must follow these naming rules:
•Names can contain letters, numbers, and other characters •Names must not start with a number or punctuation character •Names must not start with the letters xml (or XML or Xml ..) •Names cannot contain spaces
Take care when you "invent" element names and follow these simple rules:
•Any name can be used, no words are reserved, but the idea is to make names descriptive. Names with an underscore separator are nice.
Examples: <first_name>, <last_name>.
Well formed XMLWell formed XML
Well-formed vs ValidWell-formed vs Valid
An XML document is said to be An XML document is said to be well-well-formedformed if it obeys basic semantic and if it obeys basic semantic and syntactic constraints.syntactic constraints.
This is different from a This is different from a validvalid XML XML document, which (as we will see in more document, which (as we will see in more depth) properly matches a schema.depth) properly matches a schema.
Rules for Well-Formed XMLRules for Well-Formed XML
An XML document is considered well-formed if it obeys the An XML document is considered well-formed if it obeys the following rules:following rules:
There must be one element that contains all others (root element)There must be one element that contains all others (root element)
All tags must be balanced All tags must be balanced <BOOK>...</BOOK><BOOK>...</BOOK> <BOOK /><BOOK />
Tags must be nested properly:Tags must be nested properly: <BOOK> <LINE> This is OK </LINE> </BOOK><BOOK> <LINE> This is OK </LINE> </BOOK> <LINE> <BOOK> This is </LINE> definitely NOT </BOOK> <LINE> <BOOK> This is </LINE> definitely NOT </BOOK>
OKOK
Text is case-sensitive soText is case-sensitive so <P>This is not ok, even though we do it all the time <P>This is not ok, even though we do it all the time
in HTML!</p>in HTML!</p>
More Rules for Well-Formed XMLMore Rules for Well-Formed XML
The attributes in a tag must be in quotesThe attributes in a tag must be in quotes < ITEM CATEGORY=“Home and Garden” Name=“hoe-matic < ITEM CATEGORY=“Home and Garden” Name=“hoe-matic
t500”>t500”>
Comments are allowedComments are allowed <!–- They are done just as in HTML… --><!–- They are done just as in HTML… -->
Must begin withMust begin with <?xml version=‘1.0’ ?><?xml version=‘1.0’ ?>
Special characters must be escaped: the most common are Special characters must be escaped: the most common are < < " ' > &" ' > &
<formula> x < y+2x </formula><formula> x < y+2x </formula><cd title="" mmusic"><cd title="" mmusic">
An XML document that obeys these rules isAn XML document that obeys these rules is Well-Formed Well-Formed
Some aspects of XML syntaxSome aspects of XML syntax
It is illegal to omit closing tags It is illegal to omit closing tags unlike e.g. htmlunlike e.g. html
XML tags are case-sensitiveXML tags are case-sensitive
XML elements must be properly nestedXML elements must be properly nested
XML elements must have a root elementXML elements must have a root element
XML comments:XML comments:
< -- This is a comment -- >< -- This is a comment -- >
XML ToolsXML Tools
XML can be created with any text editorXML can be created with any text editor
Normally we use an XML-friendly editorNormally we use an XML-friendly editor e.g. XMLSpye.g. XMLSpy nXML emacs extensionsnXML emacs extensions OxygenOxygen Etc etc.Etc etc.
To check and validate XML, use either To check and validate XML, use either these tools and/or xmllint on Unix systems.these tools and/or xmllint on Unix systems.
Another ViewAnother View
XML-as-data is one way to introduce XMLXML-as-data is one way to introduce XML
Another is as a Another is as a markup language markup language similar to html.similar to html.
One typically says that html has a fixed tag set, whereas One typically says that html has a fixed tag set, whereas XML allows the definition of arbitrary tagsXML allows the definition of arbitrary tags
This analogy is particularly useful when the goal is to use This analogy is particularly useful when the goal is to use XML for text presentation -- that is, when most of our XML for text presentation -- that is, when most of our data fields contain textdata fields contain text
Note that mixed element/text fields are permissible in XMLNote that mixed element/text fields are permissible in XML
Article exampleArticle example
<Article > <Headline>Direct Marketer Offended by Term 'Junk Mail' </Headline> <authors> <author> Joe Garden</author> <author> Tim Harrod</author> </authors> <abstract>Dan Spengler, CEO of the direct-mail-marketing firm Mailbox of Savings, took umbrage Monday at the use of the term <it>junk mail</it>. </abstract> <body type="url" > http://www.theonion.com/archive/3-11-01.html </body>
</Article>
XML SchemaXML Schema
There are many details to cover of schema There are many details to cover of schema specification.specification.
We will do this in detail next lectureWe will do this in detail next lecture
Now, we detour to study the usefulness of Now, we detour to study the usefulness of this simple modelthis simple model
How is XML UsefulHow is XML Useful
Part IPart I
Simple Mortgage CalculatorSimple Mortgage Calculator
Mortgage payment calculatorMortgage payment calculator
Design a simple application which does the Design a simple application which does the following:following: Accepts user inputAccepts user input
Loan amountLoan amount
Loan termLoan term
Interest rateInterest rate
Extras (assessments + taxes)Extras (assessments + taxes) Returns per-month table ofReturns per-month table of
total paymenttotal payment
interestinterest
PrincipalPrincipal
Some other fun stuffSome other fun stuff
Mortgage Calculator General Mortgage Calculator General RequirementsRequirements
Must beMust be Clean simple interface (easy)Clean simple interface (easy) Remotely accessible with securityRemotely accessible with security PortablePortable Not require too much installation on the part Not require too much installation on the part
of the userof the user Sufficiently fast not to be embarrassingSufficiently fast not to be embarrassing
Some Possible architecturesSome Possible architectures
Web serverWeb server Server-side scripting with pure htmlServer-side scripting with pure html Server-side scripting with html+javascriptServer-side scripting with html+javascript Server-side scripting with html+appletServer-side scripting with html+applet
Direct connectionDirect connection Raw socketsRaw sockets Distributed objectsDistributed objects
Initial architectureInitial architecture
Front-end: pure html formFront-end: pure html form
Back end: python cgiBack end: python cgi Python generates web page dynamically after making Python generates web page dynamically after making
calculationscalculations No use of higher-level web generation libraries at this pointNo use of higher-level web generation libraries at this point
What are advantages/disadvantages of this architecture?What are advantages/disadvantages of this architecture?
Run application: Run application: http://people.cs.uchicago.edu/~asiegel/mortgagehttp://people.cs.uchicago.edu/~asiegel/mortgage
DisadvantagesDisadvantages
Two obvious disadvantages are:Two obvious disadvantages are: Formatted web content in print statements low-level, Formatted web content in print statements low-level,
ugly error proneugly error prone Data is not decoupled from formatting. What if we Data is not decoupled from formatting. What if we
want to switch to an application client?want to switch to an application client?
Several strategies can help with both of these Several strategies can help with both of these (higher-level htmlgen libraries, server-side (higher-level htmlgen libraries, server-side scripting model, beans, etc.) and XMLscripting model, beans, etc.) and XML
We will look at how XML fits in We will look at how XML fits in
XML-based architectureXML-based architecture
webbrowser
WebServer
http
“hand-rolled”XML
XML
pyth
on C
GI
“hand-rolled”XML
File system
Observations/questionsObservations/questions
What does browser do with XML?What does browser do with XML? Can it displayCan it display Does it even understand XML?Does it even understand XML?
If not, what good is this?If not, what good is this?
Do we have to hand roll our programming Do we have to hand roll our programming language objects from XML?language objects from XML?
Some answersSome answers
Regarding first point, try this with your web Regarding first point, try this with your web browserbrowser Note that XML is displayed/formatted nicely, but not Note that XML is displayed/formatted nicely, but not
nearly to the same level of utility as the html tablenearly to the same level of utility as the html table To add formatting instructions, we must associate a To add formatting instructions, we must associate a
separate separate XSLXSL file with the XML file. We will study XSL file with the XML file. We will study XSL soon.soon.
Regarding XML-language conversion, we will Regarding XML-language conversion, we will study language binding for various high-level study language binding for various high-level ways of doing this! For now, we will hand-roll ways of doing this! For now, we will hand-roll ourselves!ourselves!
XSLXSL
We will not cover details of XSL until the We will not cover details of XSL until the third week.third week.
However, for now we can easily create However, for now we can easily create XSL at a high level using XMLSpyXSL at a high level using XMLSpy
See example applicationSee example application
Lottery applicationLottery application
Lottery overviewLottery overview
Given a list of student members of a Given a list of student members of a dormitory, perform an ordered randomized dormitory, perform an ordered randomized sort of the students to determine a room sort of the students to determine a room draft order.draft order.
Lottery detailsLottery details
Students are defined byStudents are defined by Last nameLast name First nameFirst name SenioritySeniority
Quarters in the HouseQuarters in the HouseQuarters in the CollegeQuarters in the College
The sort keys areThe sort keys are1.1. Quarters in HouseQuarters in House2.2. Quarters in CollegeQuarters in College3.3. RandomRandom
Software requirementsSoftware requirements
Secure loginSecure login House nameHouse name PasswordPassword
Remotely accessibleRemotely accessible
Prototypes:Prototypes: Standalone Standalone excelexcel Web-basedWeb-based
Architectural requirementsArchitectural requirements
XMLLoginInfo
XMLStudent
Data
filesystem
WebServer
login
lottery
WebClient
XML
XSL