Upload
raghu-nath
View
38
Download
0
Embed Size (px)
Citation preview
The .NET XML Parsing Model1. XML is a natural element of all forms of programming life.
2. XML in the .NET Framework
The .NET Framework XML core classes can be categorized according to their functions:
1. reading and writing documents
2. validating documents
3. navigating and selecting nodes
4. managing schema information
5. performing document transformations
The assembly in which the whole XML .NET Framework is implemented is system.xml.dll
The most commonly used namespaces are listed here:
1. System.Xml
2. System.Xml.Schema
3. System.Xml.XPath
4. System.Xml.Xsl
The .NET Framework also provides for XML object serialization.
The classes involved with this functionality are grouped in the System.Xml.Serializationnamespace.
XML serialization writes objects to, and reads them from, XML documents.
This kind of serialization is particularly useful over the Web in combination with the Simple Object Access Protocol (SOAP) and within the boundaries of .NET Framework XML Web services.
Areas of the .NET Framework in Which XML Is Key
Category Description
ADO.NET Data container objects (for example, the DataSetobject) are always transferred and remoted via XML
Configuration Application settings are stored in XML files, making use of predefined and user-defined section readers.
Remoting Remote .NET Framework objects can be accessed by using SOAP packets to prepare and perform the call.
Web services SOAP is a lightweight XML protocol that Web services use for the exchange of information in a decentralized, distributed environment.
XML parsing The core classes providing for XML parsing andmanipulation through both the stream-based API and the XML Document Object Model (XMLDOM).
XML serialization Supplies the ability to save and restore living instances of objects to and from XML documents
Classes for ParsingThe available XML parsers fall into one of two main categories:
1. tree-based parsers
2. event-based parsers
XML and ADO.NET
The interaction between ADO.NET classes and XML documents takes one of two
forms:
Serialization of ADO.NET objects (in particular, the DataSet object) to
XML documents and corresponding deserialization. Data can be saved to
XML in a variety of formats, with or without schema information, as a full
snapshot of the in-memory data including pending changes and errors, or
with just the current instance of the data
A dual-access model that lets you access and update the same piece of data either through a
hierarchical programming interface or using the ADO.NET relational API. Basically, you can
transform a DataSet object into an XMLDOM object and view the XMLDOM's subtrees as tables
merged with the DataSet object's tables.
The .NET Framework XML APIThe essence of XML in the .NET Framework is found in two abstract classes—
XmlReader and XmlWriter. These classes are at the core of all other .NET Framework
XML classes, including the XMLDOM classes, and are used extensively by various
subsystems to parse or generate XML text. For example, ADO.NET data adapters
retrieve the data to store in a DataSet object using a database reader, and the DataSet
object serializes its contents to the DiffGram format using an XmlTextWriter object,
which derives from XmlWriter
The XML API for the .NET Framework comprises the following set of functionalities:
1. XML readers
2. XML writers
3. XML document classes
Streams can be read and written using made-to-measure reader and writer classes.
The base classes are TextReader, TextWriter, BinaryReader, BinaryWriter, and
Stream. With the exception of the binary classes, all of these classes are marked as
abstract (MustInherit, if you speak Visual Basic) and cannot be directly instantiated in
code. You can use abstract classes to reference living instances of derived classes,
however.
In the .NET Framework, base reader and writer classes find a number of concrete
implementations, including StreamReader and StringReader and their writing
counterparts.
XML ReadersAn XML reader makes externally available a programming interface through which
callers can connect and pull out all the data they need. This is in no way different from
what happens when you connect to a database and fetch data. The database server
returns a reference to an internal object—the cursor—which manages all the query
results and makes them available on demand. This statement applies regardless of the
fact that the database world might provide several flavors of cursors—client, scrollable,
server-side, and so on.
Readers vs. XMLDOMXML readers don't require you to keep more data in memory than you actually need.
When you open the XML document, a simple logical pointer that corresponds to a node
is returned. You can easily skip over nodes to locate the one you need. In doing so, you
don't tax in any way the application's memory with extra data other than that required to
bufferize the currently selected node.
Readers vs. SAXA SAX parser directly controls the evolution of the parsing process and pushes data to
the client application. A cursor parser (that is, an XML reader), on the other hand, plays
a more passive role and leaves client applications to control the process
XML WritersThe .NET XML API separates parsing from editing and writing and offers a set of
methods that provides effective results for performance as well as usability. When
writing, you create new XML documents working at a considerably high level of
abstraction and explicitly indicate the XML elements to create—nodes, attributes,
comments, or processing instructions. The writer works on a stream, dumping content
incrementally, one node after the next, without the random access capabilities of the
XMLDOM but also without its memory footprint.
The XML Document Object API in .NETAs mentioned, along with XML readers and writers, the .NET Framework also provides
classes that load and edit XML documents according to the W3C DOM Level 1 and
Level 2 Core. The key XMLDOM class in the .NET Framework is XmlDocument—not
much different from the DOMDocument class, which you might recognize from working
with MSXML
XPath Expressions and XSLTIn the .NET Framework, XSLT and XPath expressions are fully supported but are
implemented in classes distinct from those that parse and write XML text. This is a key
feature of the overall .NET XML API. Any functionality is provided through a small
hierarchy of objects, although each subtree connects and interoperates well with
others.
The XMLDOM API is built on top of readers and writers, but both XSLT and
XPath expressions need to have a complete and XMLDOM-based vision of the entire XML
document to process it.
XML readers and writers are the primitive elements of the .NET XML API. Whenever
XML text must be parsed or written, all classes, directly or indirectly, refer to them. A
more complex primitive element is the XMLDOM tree. Transformations and advanced
queries must rely on the document in its entirety being held in memory and accessible
through a well-known interface—the XMLDOM.
The XSLT Processor
The key class for XSLT is XslTransform. The class works as an XSLT processor and
complies with version 1.0 of the XSLT recommendation. The class has two key
methods, Load and Transform, whose behavior is for the most part selfexplanatory
The XPath Query EngineXPath is a language that allows you to navigate within XML documents. Think of XPath
as a general-purpose query language for addressing, sorting, and filtering both the
elements and the text of an XML document.
Further ReadingFurther Reading
1.The W3C organization is currently working on a draft of the DOM Level 3 Core to
include support for an abstract modeling schema and I/O serialization. Check out the
most recent draft at http://www.w3.org/TR/2002/WD-DOM-Level3-ASLS-20020409. The
approved standard—DOM Level 2 Core—is available at http://www.w3.org/TR/DOMLevel-
2.Relevant information about XML standards is available from the W3C Web site, at
http://www.w3.org. If you want to learn more about the SAX specification, look at the
new Web site for the SAX project, at http://www.saxproject.org.