29
Data Formats and APIs Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 0 Mike Carey [email protected]

Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Data Formats and APIs

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 0

Mike [email protected]

Page 2: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Announcements

• Keep watching the course wiki page (especially its attachments):• https://grape.ics.uci.edu/wiki/asterix/wiki/stats170ab-2018

• Ditto for the Piazza page (for Q&A):• http://piazza.com/uci/winter2018/stats170a/home

• Note: HW#3 is due tonight (11:45pm)• HW#4 should be available by then as well

• Today:• More PostgreSQL techniques and tips• Twitter APIs and Python’s Tweepy package• Beyond tables: XML and JSON

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 1

Page 3: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

XML

• Stands for eXtensible Markup Language• XML 1.0 – a recommendation from W3C, 1998• Roots: SGML (a complex document markup language)• After the roots: a format for sharing data as well

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 4: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Why XML is of Interest

• XML is just syntax for data• (Note: we have no syntax for relational data!)• XML is not relational: it’s semistructured

• XML’s data syntax is exciting because:• Can translate any data to XML• Can ship XML over the Web (HTTP)• Can input XML into any application• Thus: Data sharing and exchange on the Web!

(Note: JSON is another similar technology today.)

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 5: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

HTML (a descendant of SGML)

<h1> Bibliography </h1><p> <i> Foundations of Databases </i>

Abiteboul, Hull, Vianu<br> Addison Wesley, 1995

<p> <i> Data on the Web </i>Abiteoul, Buneman, Suciu<br> Morgan Kaufmann, 1999

HTML describes the presentationMichael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 6: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

XML<bibliography>

<book> <title> Foundations of Databases </title><author> Abiteboul </author><author> Hull </author><author> Vianu </author><publisher> Addison Wesley </publisher><year> 1995 </year>

</book>. . . .

</bibliography>

XML describes the contentMichael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 7: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

XML Terminology: Elements & Tags

• Tags: book, title, author, …• Start tag: <book>, end tag: </book>• Elements: <book>…</book>,<author>…</author>• Elements can be nested• Empty element: <red></red> (abbreviated <red/>)• XML document: single root element

Well formed XML document: matching/nested tags

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 8: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

More XML: Attributes

<book price = “55” currency = “USD”><title> Foundations of Databases </title><author> Abiteboul </author>…

<year> 1995 </year></book>

Attributes are alternative ways to represent data

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 9: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

More XML: Attributes Revisited

<book><title> Foundations of Databases </title><author> Abiteboul </author>…

<year> 1995 </year><price currency = “USD”> 55 </price>

</book>

Attributes are best used to represent “metadata”!Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 10: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

XML Semantics: Tree of Data

<data><person id=“o555” >

<name> Mary </name><address>

<street> Maple </street> <no> 345 </no> <city> Seattle </city>

</address></person><person>

<name> John </name><address> Thailand </address><phone> 23456 </phone>

</person></data>

data

Mary

personperson

name addressname address

street no city

Maple 345 Seattle

JohnThailand

phone

23456

id

o555

Elementnode

Textnode

Attributenode

Also: Order matters! (Or at least it can…)Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 11: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

XML Data

• XML is self-describing• Schema information is part of the data

• Consider a relational schema: person(name, phone)• In XML <person>, <name>, <phone> are part of the data

(and are repeated for each person)• Consequence: XML is much more flexible

• Can have variations from instance to instance• Supports “schema later” (or “schema never”) methodology

• XML = semistructured data

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 12: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Ex: Relational Data as XML

<person><row> <name>John</name>

<phone> 3634</phone></row><row> <name>Sue</name>

<phone> 6343</phone><row> <name>Dick</name>

<phone> 6363</phone></row></person>

n a m e p h o n e

J o h n 3 6 3 4

S u e 6 3 4 3

D i c k 6 3 6 3

row row row

name name namephone phone phone“John” 3634 “Sue” “Dick”6343 6363

person relation: XML: person

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 13: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

XML is Semi-structured Data

• Missing elements and/or attributes:

• Could represent in atable with nulls:

<person> <name>John</name><phone>1234</phone>

</person><person> <name>Joe</name></person> ß No phone!

name phone

John 1234

Joe -Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 14: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

XML is Semi-structured Data

• Repeated attributes

• Impossible in tables (w/o normalization – due to 1NF)

<person> <name> Mary</name><phone>2345</phone><phone>3456</phone>

</person>

ß Two phones!

name phone

Mary 2345 3456 ???

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 15: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

XML is Semi-structured Data

• Attributes with different types in different objects

• Nested collections (not 1NF)• Heterogeneous collections:

• <db> containing both <book>’s and <publisher>’s

<person> <name> <first> John </first><last> Smith </last>

</name><phone>1234</phone>

</person>

ß Structured name!

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 16: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

XML: So What Is It Again…?

¡ A standard, flexible, self-describing syntax used to represent and exchange data of all shapes and sizes§ Regular, structured data (think records)▪ E.g., a purchase order (customer info and line items)▪ Record-like, typed, nested data values

§ Irregular, unstructured data (think documents)▪ E.g., a book (title, author, chapters, and text)▪ Text-like, untyped, variant, marked-up data values

¡ Uses include document storage, data exchange, Web service calls, B2B messaging, information integration, even configuration metadata…

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 17: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

XML: One Final Example<?xml version="1.0" encoding="ISO-8859-1" ?><catalog><book isbn="ISBN 1565114302"><title>No Such Thing as a Bad Day</title><author>Hamilton Jordan</author><publisher>Longstreet Press, Inc.</publisher><price currency="USD">17.60</price><review><reviewer>Publisher</reviewer>: This book is the moving account

of one man's successful battles against three cancers ...<title>No Such Thing as a Bad Day</title> is warmly recommended. </review></book>

<!-- more books and specifications -->

</catalog>

(Mixed content)

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 18: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

JSON

• JavaScript Object Notation• Born from JavaScript, now language-independent

• Minimal• Much (much!) simpler than XML

• Textual• Machine- and human-readable format

• Subset of JavaScript• But similar to many languages’ types (including Python)

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 19: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Values

• Primitive values• Strings• Numbers• Booleans

• Structured values• Objects• Arrays

• A special “missing” value• null• (or a field can be altogether missing)

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 20: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Numbers

• Integer• Real• Scientific

• No octal or hex• No NaN or Infinity

• Use null instead

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 21: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Booleans

• true• false

null• A value that isn't anything

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 22: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Object

• Objects are unordered containers of key/value pairs• Objects are wrapped in { }• , separates key/value pairs• : separates keys and values• Keys are strings• Values are any JSON values

• Similar to struct, record, hashtable, object, dict, …

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 23: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Object Example{

"name": "Jack B. Nimble", "at large": true, "grade": "A", "format": {

"type": "rect", "width": 1920, "height": 1080, "interlace": false, "framerate": 24

}}

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 24: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Array

• Arrays are ordered sequences of values• Arrays are wrapped in []• , separates values • JSON does not talk about indexing

• JSON is just a data format (not a language)• An implementation can start array indexing at 0 or 1

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 25: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Array Examples

["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"]

[[0, -1, 0],[1, 0, 0],[0, 0, 1]

]

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 26: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Arrays vs. Objects

• Use objects when the key names are arbitrary strings – i.e., for record-like data• Similar to a dict in Python (slightly more restrictive)

• Use arrays when the key names are sequential integers – i.e., for indexed sequences• Similar to a tuple or an array in Python

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 27: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

JSON vs. Relational (and CSV)

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Relational (and CSV) JSONStructure Flat (Tables) Nested (Complex Objects)

Schema Per collection (and static) Per object

Query Support SQL standard Varies (no standard)

Ordering None (sets/bags) Includes arrays

Native System Support

DB2, Oracle, SQL Server, SQLite, PostgreSQL, MySQL, ….

MongoDB, Couchbase Server, AsterixDB, …

Page 28: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

JSON vs. XML

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

XML JSONVerbosity Higher LowerComplexity Higher LowerUse of Validation Common (DTD, xsd) Rare (JSON schema)PL Friendliness Low (impedance mismatch) HighQuery Support XSLT, XPath, XQuery JAQL, AQL, JSONiq, SQL++

Page 29: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Questions?

• Next time we’ll talk about data management technologies (databases and query languages) for “modern data”

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 28