36
Applied XML Programming for Microsoft .NET PART 2

Applied xml programming for microsoft 2

Embed Size (px)

Citation preview

Page 1: Applied xml programming for microsoft  2

Applied XML Programming for Microsoft .NETPART 2

Page 2: Applied xml programming for microsoft  2

XML ReadersIn the Microsoft .NET Framework, two distinct sets of classes provide for XML-driven

reading and writing operations

These classes are known globally as XML readers and writers. The base class for readers is

XmlReader, whereas XmlWriter provides the base programming interface for writers.

Page 3: Applied xml programming for microsoft  2

The Programming Interface of ReadersXmlReader is an abstract class available from the System.Xml namespace. It defines

the set of functionalities that an XML reader exposes to let developers access an XML

stream in a noncached, forward-only, read-only way.

An XML reader works on a read-only stream by jumping from one node to the next in a

forward-only direction. The XML reader maintains an internal pointer to the current

node and its attributes and text but has no notion of previous and next nodes. You can't

modify text or attributes, and you can move only forward from the current node. If you

are visiting attribute nodes, however, you can move back to the parent node or access

an attribute by index.

Page 4: Applied xml programming for microsoft  2

The specification for the XmlReader class recommends that any derived class should

check at least whether the XML source is well-formed and throw exceptions if an error

is encountered

XML exceptions are handled through the tailor-made XmlException class. The XMLReader class

specification does not say anything about XML validation.

Page 5: Applied xml programming for microsoft  2

An OOP Refresher1. In the .NET Framework, an interface is a container for a named collection of method,

property, and event definitions referred to as a contract An interface can be used as a

reference type, but it is not a creatable type.

2. A class is a container that can include data and function members (methods,

properties, events, operators, and constructors). Classes support inheritance from

other classes as well as from interfaces. Any class from which another class inherits is

called a base class.

Page 6: Applied xml programming for microsoft  2

An abstract class simply declares its members without providing any implementation.

Like interfaces, abstract classes are not creatable but can be used as reference types.

An abstract class differs from an interface in that it has a slightly richer set of internal

members (constructors, constants, and operators). Members of an abstract class can

be scoped as private, public, or protected, whereas members of an interface are

mostly public. In addition, child classes can implement multiple interfaces but can

inherit from only one class.

Page 7: Applied xml programming for microsoft  2

Parsing with the XmlTextReader ClassThe XmlTextReader class is designed to provide fast access to streams of XML data in

a forward-only and read-only manner. The reader verifies that the submitted XML is

well-formed. It also performs a quick check for correctness on the referenced DTD, if

one exists. In no case, though, does this reader validate against a schema or DTD. If

you need more functionality (for example, validation), you must resort to other reader

classes such as XmlNodeReader or XmlValidatingReader

Page 8: Applied xml programming for microsoft  2

An instance of the XmlTextReader class can be created in a number of ways and from

a variety of sources, including disk files, URLs, streams, and text readers. To process

an XML file, you start by instantiating the constructor, as shown here:

XmlTextReader reader = new XmlTextReader(file);

Page 9: Applied xml programming for microsoft  2

Accessing NodesThe following example shows how to use an XmlTextReader object to parse the

contents of an XML file and build the node layout. Let's begin by considering the

following XML data:

<platforms type="software">

<platform vendor="Microsoft">.NET</platform>

<platform vendor=""OpenSource="yes">Linux</platform>

<platform vendor="Microsoft">Win32</platform>

<platform vendor="Sun">Java</platform>

</platforms>

Page 10: Applied xml programming for microsoft  2

Character EncodingXML documents can contain an attribute to specify the encoding. Character encoding

provides a mapping between numeric indexes and corresponding characters that users

read from a document. The following declaration shows how to set the required

encoding for an XML document:

<?xml version="1.0" encoding="ISO-8859-5"?>The Encoding property of the XML reader returns the character encoding found in the document. The default encoding attribute is UTF-8 (UCS Transformation Format, 8 bits).

Page 11: Applied xml programming for microsoft  2

Accessing AttributesOf all the node types supplied in the .NET Framework, only Element, DocumentType,

and XmlDeclaration support attributes. To check whether a given node contains

attributes, use the HasAttributes Boolean property. The AttributeCount property returns

the number of attributes available for the current node

Page 12: Applied xml programming for microsoft  2

This next example demonstrates how to programmatically access any sequence of

attributes for a node and concatenate their names and values in a single string.

Consider the following XML fragment:

<employee id="1" lastname="Users" firstname="Joe" />

Page 13: Applied xml programming for microsoft  2

Attribute NormalizationThe W3C XML 1.0 Recommendation defines attribute normalization as the preliminary

process that an attribute value should be subjected to prior to being returned to the

application. The normalization process can be summarized in a few basic rules:

1. Any referenced character (for example, &nbsp;) is expanded.

2. any white space character (blanks, carriage returns, linefeeds, and tabs) is replaced with a blank (ASCII 0x20) character.

3. Any leading or trailing sequence of blanks is discarded.

4. Any other sequence of blanks is replaced with a single blank character (ASCII 0x20).

Page 14: Applied xml programming for microsoft  2

The XmlTextReader parser lets you toggle the normalization process on and off

through the Normalization Boolean property. By default, the Normalization property is

set to false, meaning that attribute values are not normalized. If the normalization

process is disabled, an attribute can contain any character, including characters in the

&#00; to &#20; range, which are normally considered invalid and not permitted. When

normalization is on, using any of those character entities results in an XmlException

being thrown.

Page 15: Applied xml programming for microsoft  2

Parsing XML FragmentsThe XmlTextReader class provides the basic set of functionalities to process any XML

data coming from a disk file, a stream, or a URL. This kind of reader works sequentially,

reading one node after the next, and does not deliberately provide any ad hoc search

function to parse only a particular subtree

Page 16: Applied xml programming for microsoft  2

In the .NET Framework, to process only fragments of XML data, excerpted from a

variety of sources, you can take one of two routes. You can initialize the text reader

with the XML string that represents the fragment, or you can use another, more

specific, reader class—the XmlNodeReader class.

Page 17: Applied xml programming for microsoft  2

Parsing Well-Formed XML StringsThe trick to initializing a text reader from a string is all in packing the string into a

StringReader object. One of the XmlTextReader constructors looks like this:

public XmlTextReader(TextReader);

TextReader is an abstract class that represents a .NET reader object capable of

reading a sequence of characters no matter where they are physically stored. The

StringReader class inherits from TextReader and simply makes itself capable of

reading the bytes of an in-memory string. Because StringReader derives from

TextReader, you can safely use it to initialize XmlTextReader.

Page 18: Applied xml programming for microsoft  2

string xmlText = "…";

StringReader strReader = new StringReader(xmlText);

XmlTextReader reader = new XmlTextReader(strReader);

Page 19: Applied xml programming for microsoft  2

Writing a Custom XML ReaderWe have one more topic to consider on the subject of XML readers, which opens up a

whole new world of opportunities: creating customized XML readers. An XML reader

class is merely a programming interface for reading data that appears to be XML. The

XmlTextReader class represents the simplest and the fastest of all possible XML

readers but—and this is what really matters—it is just one reader. Its inherent simplicity

and effectiveness stems from two key points. First, the class operates as a read-only,

forward-only, nonvalidating parser. Second, the class is assumed to work on native

XML data. It has no need, and no subsequent overhead, to map input data internally to

XML data structures

Page 20: Applied xml programming for microsoft  2

Mapping Data Structures to XML Nodes

INI files have been a fundamental part of Microsoft Windows applications.

Read and Write the content of an INI file using file and I/O classes, or you might resort to making

calls to the underlying Win32 unmanaged platform.

Page 21: Applied xml programming for microsoft  2

Mapping CSV Files to XML1. A CSV file consists of one or more lines of text. Each line contains strings of text separated by

commas. Each line of a CSV file can be naturally associated with a database row in which each token maps to a column.

2. Likewise, a line in a CSV file can also be correlated to an XML node with as many attributes as the comma-separated tokens. The following code shows a typical CSV file:

Davolio,Nancy,Sales Representative

Fuller,Andrew,Sales Manager

Leverling,Janet,Sales Representative

Page 22: Applied xml programming for microsoft  2

Exposing Data as XMLIn a true XML reader, methods like ReadInnerXml and ReadOuterXml serve the

purpose of returning the XML source code embedded in, or sitting around, the currently

selected node. For a CSV reader, of course, there is no XML source code to return.

You might want to return an XML description of the current CSV node, however.

Assuming that this is how you want the CSV reader to work, the ReadInnerXml method

for a CSV XML reader can only return either null or the empty string, as shown in the

following code. By design, in fact, each element has an empty body

Page 23: Applied xml programming for microsoft  2

public override string ReadInnerXml()

{

if (m_readState != ReadState.Interactive)

return null;

return String.Empty;

}

Page 24: Applied xml programming for microsoft  2

In contrast, the outer XML text for a CSV node can be designed like a node with a

sequence of attributes, as follows:

<row attr1="…" attr2="…" />

The source code to obtain this output is shown here:

public override string ReadOuterXml()

{

if (m_readState != ReadState.Interactive)

return null;

StringBuilder sb = new StringBuilder("");

sb.Append("<");

sb.Append(CsvRowName);

Page 25: Applied xml programming for microsoft  2

sb.Append(" ");

foreach(object o in m_tokenValues)

{

sb.Append(o);

sb.Append("=");

sb.Append(QuoteChar);

sb.Append(m_tokenValues[o.ToString()].ToString());

sb.Append(QuoteChar);

sb.Append("");

}

sb.Append("/>");

return sb.ToString();

}

Page 26: Applied xml programming for microsoft  2

The CSV XML Reader in Action

In this section, you'll see the CSV XML reader in action and learn how to instantiate and

use it in the context of a realistic application. In particular, I'll show you how to load the

contents of a CSV file into a DataTable object to appear in a Windows Forms DataGrid

control

Page 27: Applied xml programming for microsoft  2
Page 28: Applied xml programming for microsoft  2

You start by instantiating the reader object, passing the name of the CSV file to be

processed and a Boolean flag. The Boolean value indicates whether the values in the

first row of the CSV source file must be read as the column names or as data. If you

pass false, the row is considered a plain data row and each column name is formed by

a prefix and a progressive number. You control the prefix through the CsvColumnPrefix

property.

Page 29: Applied xml programming for microsoft  2

// Instantiate the reader on a CSV file

XmlCsvReader reader;

reader = new XmlCsvReader("employees.csv", hasHeader.Checked);

reader.CsvColumnPrefix = colPrefix.Text;

reader.Read();

// Define the target table

DataTable dt = new DataTable();

for(int i=0; i<reader.AttributeCount; i++)

{

reader.MoveToAttribute(i);

DataColumn col = new DataColumn(reader.Name,

typeof(string));

Page 30: Applied xml programming for microsoft  2

dt.Columns.Add(col);

}

reader.MoveToElement();

Before you load data rows into the table and populate the data grid, you must define the

layout of the target DataTable object. To do that, you must scroll the attributes of one

row—typically the first row. You move to each of the attributes in the first row and

create a DataColumn object with the same name as the attribute and specified as a

string type. You then add the DataColumn object to the DataTable object and continue

until you've added all the attributes. The MoveToElement call restores the focus to the

CSV row element.

Page 31: Applied xml programming for microsoft  2

// Loop through the rows and populate a DataTable

do

{

DataRow row = dt.NewRow();

for(int i=0; i<reader.AttributeCount; i++)

{

row[i] = reader[i].ToString();

}

dt.Rows.Add(row);

}

Page 32: Applied xml programming for microsoft  2

while (reader.Read());

reader.Close();

// Bind the table to the grid

dataGrid1.DataSource = dt;

Page 33: Applied xml programming for microsoft  2

Next you walk through the various data rows of the CSV file and create a new DataRow

object for each. The row will then be filled in with the values of the attributes. Because

the reader is already positioned in the first row when the loop begins, you must use a

do…while loop instead of the perhaps more natural while loop. At the end of the loop,

you simply close the reader and bind the freshly created DataTable object to the

DataGrid control for display.

Page 34: Applied xml programming for microsoft  2

The CSV XML reader now reads the column names from the first row in the

source file.

Page 35: Applied xml programming for microsoft  2

Readers and XML ReadersTo cap off our examination of XML readers and custom readers, let's spend a few

moments looking at the difference between an XML reader and a generic reader for a

non-XML data structure.

A reader is a basic and key concept in the .NET Framework. Several different types of

reader classes do exist in the .NET Framework: binary readers, text readers, XML

readers, and database readers, just to name a few. Of course, you can add your own

data-specific readers to the list. But that's the point. How would you write your new

reader? The simplest answer would be, you write the reader by inheriting from one of

the existing reader classes

Page 36: Applied xml programming for microsoft  2

Further ReadingAn article that summarizes in a few pages the essence of XML readers and writers was written for the January 2001 issue of MSDN Magazine. Although based on a beta version of .NET, it is still of significant value and can be found at http://msdn.microsoft.com/msdnmag/issues/01/01/xml/xml.asp. Fresh, up-to-date, and

handy information about XML in the .NET world (and other topics) can be found monthly in the "Extreme XML" column on MSDN Online.

If you need to know more about ADO.NET and its integration with XML, you can check out my book Building Web Solutions with ASP.NET and ADO.NET (Microsoft Press, 2002) or David Sceppa's book Microsoft ADO.NET (Core Reference) (Microsoft Press, 2002).

XML extensions for SQL Server 2000 are described in detail in Chapter 2. Finally, for a very informative article about the development of XML custom readers, see "Implementing XmlReader Classes for Non-XML Data Structures and Formats,“ available on MSDN at http://msdn.microsoft.com/library/enus/dndotnet/html/Custxmlread.asp.