SQLPASS AD501-M XQuery MRys

October 11-14, Seattle, WA

Best Practices and Performance Tuning of XML Queries in SQL ServerAD-501-M

Michael RysPrincipal Program ManagerMicrosoft Corp

mrys@microsoft.com@SQLServerMike

Session Objectives• Understand when and how

to use XML in SQL Server• Understand and correct common

performance problems with XML and XQuery

XML Scenarios and when to store XML

XML Design Optimizations

General Optimizations

XML Datatype method Optimizations

XQuery Optimizations

XML Index Optimizations

Session Agenda

AD-501-M| XQuery Performance

4AD-501-M| XQuery Performance

XML Scenarios

Data Exchange between loosely-coupled systems • XML is ubiquitous, extensible, platform independent transport format• Message Envelope in XML

Simple Object Access Protocol (SOAP), RSS, REST• Message Payload/Business Data in XML• Vertical Industry Exchange schemasDocument Management• XHTML, DocBook, Home-grown, domain-specific markup (e.g. contracts),

OpenOffice, Microsoft Office XML (both default and user-extended)Ad-hoc modeling of semistructured data• Storing and querying heterogeneous complex objects• Semistructured data with sparse, highly-varying

structure at the instance level• XML provides self-describing format and extensible schemas

→Transport, Store, and Query XML data

AD-501-M| XQuery Performance 6

Decision Tree: Processing XML In SQL ServerDoes the data

fit the relational model?

Is the data semi-

structured?

Is the data a document?

Query into the XML?

Search within the XML? Is the XML

constrained by schemas?

Shred the XML into relations

Shred the structured XML into relations,

store semistructured aspects as XML and/or

sparse col

Define a full-text index

Use primary and secondary XML

indexes as needed

Constrain XML if validation cost is ok

Store as XMLStore as varbinary(max)

Open schema

YesPromote

frequently queried

properties relationally

structured

Shred known sparse data into sparse columns

Known sparse

SQL Server XML Data Type Architecture

XML ParserXML

Validation

XML data type

(binary XML)

SchemaCollectio

XML Relational

XML Schemata

OpenXML/nodes()

FOR XML with TYPE directive

Rowsets

XQuery

XML-DMLNode Table

PATH Index PROP Index VALUE Index

PRIMARYXML INDEX

XQuery

General ImpactsConcurrency Control• Locks on both XML data type and relevant

rows in primary and secondary XML Indices• Lock escalation on indices• Snapshot Isolation reduces locks and lock contentionTransaction Logs• Bulkinsert into XML Indices may fill transaction log• Delay the creation of the XML indexes and use the SIMPLE recovery

model• Preallocate database file instead of dynamically growing• Place log on different diskIn-Row/Out-of-Row of XML large object• Moving XML into side table or out-of-row if

mixed with relational data reduces scan timeDue to clustering, insertion into XML Index may not be linear• Chose integer/bigint identity column as key

Choose The Right XML Model• Element-centric versus attribute-centric

<Customer><name>Joe</name></Customer> <Customer name="Joe" />+: Attributes often better performing querying–: Parsing Attributes uniqueness check

• Generic element names with type attribute vs Specific element names <Entity type="Customer"> <Prop type="Name">Joe</Prop> </Entity>

<Customer><name>Joe</name></Customer>+: Specific names shorter path expressions+: Specific names no filter on type attribute/Entity[@type="Customer"]/Prop[@type="Name"] vs /Customer/name

• Wrapper elements <Orders><Order id="1"/></Orders>+: No wrapper elements smaller XML, shorter path expressions

Use an XML Schema Collection?

Using no XML Schema (untyped XML)• Can still use XQuery and XML Index!!!• Atomic values are always weakly typed strings

compare as strings to avoid runtime conversions and loss of index usage

• No schema validation overhead• No schema evolution revalidation costs

XML Schema provides structural information• Atomic typed elements are now using only one instead of two rows in node

table/XML index (closer to attributes)• Static typing can detect cardinality and feasibility of expression

XML Schema provides semantic information• Elements/attributes have correct atomic

type for comparison and order semantics• No runtime casts required and better use of index for value lookup

XQuery Methods

query() creates new, untyped XML data type instanceexist() returns 1 if the XQuery expression returns at least one item, 0 otherwisevalue() extracts an XQuery value into the SQL value and type space• Expression has to statically be a singleton • String value of atomized XQuery item is cast to

SQL type• SQL type has to be SQL scalar type

(no XML or CLR UDT)AD-501-M| XQuery Performance

XQuery: nodes()

Returns a row per selected node as a special XML data type instance• Preserves the original structure and types• Can only be used with the XQuery methods (but

not modify()), count(*), and IS (NOT) NULL

Appears as Table-valued Function (TVF) in queryplan if no index present

sql:column()/sql:variable()Map SQL value and type into XQuery values and types in context of XQuery or XML-DML• sql:variable(): accesses a SQL variable/parameter

declare @value int set @value=42select * from T where T.x.exist('/a/b[@id=sql:variable("@value")]')=1

• sql:column(): accesses another column value

tables: T(key int, x xml), S(key int, val int)

select * from T join S on T.key=S.keywhere T.x.exist('/a/b[@id=sql:column("S.val")]')=1

• Restrictions in SQL Server: No XML, CLR UDT, datetime, or deprecated text/ntext/image

Improving Slow XQueries, Bad FOR XMLdemo

Optimal Use Of Methods

BAD: CAST( CAST(xmldoc.query('/a/b/text()') as nvarchar(500)) as int)GOOD: xmldoc.value('(/a/b/text())[1]', 'int')BAD: node.query('.').value('@attr', 'nvarchar(50)')GOOD: node.value('@attr', 'nvarchar(50)')

How to Cast from XML to SQL

Group value() methods on same XML instance next to each other if the path expressions in the value() methods are• Simple path expressions that only use child and attribute

axis and do not contain wildcards, predicates, node tests, ordinals

• The path expressions infer statically a singleton

The singleton can be statically inferred from • the DOCUMENT and XML Schema Collection• Relative paths on the context node provided by the nodes()

method

Requires XML index to be presentAD-501-M| XQuery Performance

Grouping value() method

Optimal Use of Methods

Use exist() method, sql:column()/sql:variable() and an XQuery comparison for checking for a value or joining if secondary XML indices present

BAD:* select docfrom doc_tab join authorson doc.value('(/doc/mainauthor/lname/text())[1]', 'nvarchar(50)') = lastnameGOOD: select docfrom doc_tab join authorson 1 = doc.exist('/doc/mainauthor/lname/text()[. = sql:column("lastname")]')

* If applied on XML variable/no index present, value() method is most of the time more efficient

Using the right method to join and compare

Optimal Use of Methods

nodes() without XML index is a Table-valued function (details later)Bad cardinality estimates can lead to bad plans

• BAD: select c.value('@id', 'int') as CustID , c.value('@name', 'nvarchar(50)') as CNamefrom Customer, @x.nodes('/doc/customer') as N(c)where Customer.ID = c.value('@id', 'int')

• BETTER (if only one wrapper doc element): select c.value('@id', 'int') as CustID , c.value('@name', 'nvarchar(50)') as CNamefrom Customer, @x.nodes('/doc[1]') as D(d) cross apply d.nodes('customer') as N(c)where Customer.ID = c.value('@id', 'int')

Use temp table (insert into #temp select … from nodes()) or Table-valued parameter instead of XML to get better estimates

Avoiding bad costing with nodes()

Use subqueries• BAD:

SELECT CASE isnumeric (doc.value( '(/doc/customer/order/price)[1]', 'nvarchar(32)')) WHEN 1 THEN doc.value( '(/doc/customer/order/price)[1]', 'decimal(5,2)') ELSE 0 END FROM T

• GOOD:SELECT CASE isnumeric (Price) WHEN 1 THEN CAST(Price as decimal(5,2)) ELSE 0 END FROM (SELECT doc.value( '(/doc/customer/order/price)[1]', 'nvarchar(32)')) as Price FROM T) X

Use subqueries also with NULLIF()AD-501-M| XQuery Performance

Avoiding multiple method evaluations

Combined SQL And XQuery/DML Processing

XQuery Parser

Static Typing

Algebrization

XML Schema

Collection

MetadataStatic Phase

Runtime Optimization and Execution of physical Op Tree

Dynamic Phase

XML and rel.

Indices

Static Optimization of combined Logical and

Physical Operation Tree

SQL Parser

Algebrization

Static Typing

SELECT x.query('…'), y FROM T WHERE …

New XQuery Algebra Operators

Table-Valued Function XML Reader UDF with XPath Filter

Used if no Primary XML Index is presentCreates node table rowset in query flowMultiple XPath filters can be pushed in to reduce node table to subtreeBase cardinality estimate is always 10’000 rows! Some adjustment based on pushed path filters

ID TAG ID Node Type-ID VALUE HID1.3.

14 (TITLE) Element 2

(xs:string)

Bad Bugs #title#section#book

XMLReader node table format example (simplified)

XML Reader TVF

New XQuery Algebra Operators

• Serializer UDXserializes the query result as XML

• XQuery String UDXevaluates the XQuery string() function

• XQuery Data UDXevaluates the XQuery data() function

• Check UDXvalidates XML being inserted

• UDX name visible in SSMS properties window

Optimal Use Of XQueryValue comparisons, XQuery casts and value() method casts require atomization of item

• attribute: /person[@age = 42] /person[data(@age) = 42]

• Atomic typed element:/person[age = 42] /person[data(age) = 42]

• Untyped, mixed content typed element (adds UDX):/person[age = 42] /person[data(age) = 42] /person[string(age) = 42]

• If only one text node for untyped element (better):/person[age/text() = 42] /person[data(age/text()) = 42]

• value() method on untyped elements:value('/person/age', 'int') value('/person/age/text()', 'int')

String() aggregates all text nodes, prohibits index useAD-501-M| XQuery Performance

Atomization of nodes

Optimal Use Of XQuery

Value comparisons require casts and type promotion• Untyped attribute:

/person[@age = 42] /person[xs:decimal(@age) = 42]• Untyped text node():

/person[age/text() = 42] /person[xs:decimal(age/text()) = 42]

• Typed element (typed as xs:int):/person[salary = 3e4] /person[xs:double(salary) = 3e4]

Casting is expensive and prohibits index lookup

Tips to avoid casting• Use appropriate types for comparison (string for untyped)• Use schema to declare type AD-501-M| XQuery Performance

Casting Values

Single paths are more efficient than twig pathsAvoid predicates in the middle of path expressions

book[@ISBN = "1-8610-0157-6"]/author[first-name = "Davis"]

/book[@ISBN = "1-8610-0157-6"] "∩" /book/author[first-name = "Davis"]

Move ordinals to the end of path expressions• Make sure you get the same semantics!• /a[1]/b[1] ≠ (/a/b)[1] ≠ /a/b[1] • (/book/@isbn)[1] is better than/book[1]/@isbnAD-501-M| XQuery Performance

Maximize XPath expressions

Use context item in predicate to lengthen path in exist()• Existential quantification makes returned node irrelevant

• BAD:SELECT * FROM docs WHERE 1 = xCol.exist ('/book/subject[text() = "security"]')

• GOOD:SELECT * FROM docs WHERE 1 = xCol.exist ('/book/subject/text()[. = "security"]')

• BAD:SELECT * FROM docs WHERE 1 = xCol.exist ('/book[@price > 9.99 and @price < 49.99]')

• GOOD:SELECT * FROM docs WHERE 1 = xCol.exist ('/book/@price[. > 9.99 and . < 49.99]')

This does not work with or-predicate AD-501-M| XQuery Performance

Maximize XPath expressions in exist()

Most frequent offender: parent axis with nodes()

• BAD:select o.value('../@id', 'int') as CustID , o.value('@id', 'int') as OrdIDfrom T cross apply x.nodes('/doc/customer/orders') as N(o)

• GOOD:select c.value('@id', 'int') as CustID , o.value('@id', 'int') as OrdIDfrom T cross apply x.nodes('/doc/customer') as N1(c) cross apply c.nodes('orders') as N2(o)

Inefficient operations: Parent axis

Avoid descendant axes and // in the middle of path expressions if the data structure is known.

• // still can use the HID lookup, but is less efficient

XQuery construction performs worse than FOR XML• BAD:

SELECT notes.query(' <Customer cid="{sql:column(''cid'')}">{ <name>{sql:column("name")}</name>, / }</Customer>')FROM Customers WHERE cid=1

• GOOD:SELECT cid as "@cid", name, notes as "*"FROM Customers WHERE cid=1FOR XML PATH('Customer'), TYPE

Inefficient operations

Optimal Use Of FOR XMLUse TYPE directive when assigning result to XML

• BAD:declare @x xml;set @x = (select * from Customers for xml raw);

• GOOD:declare @x xml;set @x = (select * from Customers for xml raw, type);

Use FOR XML PATH for complex grouping and additional hierarchy levels over FOR XML EXPLICITUse FOR XML EXPLICIT for complex nesting if FOR XML PATH performance is not appropriate

XML IndicesCreate XML index on XML column

CREATE PRIMARY XML INDEX idx_1 ON docs (xDoc)Create secondary indexes on tags, values, pathsCreation:

• Single-threaded only for primary XML index• Multi-threaded for secondary XML indexes

Uses:• Primary Index will always be used if defined (not a cost based decision)• Results can be served directly from index• SQL’s cost based optimizer will consider secondary indexes

Maintenance:• Primary and Secondary Indices will be efficiently maintained during updates• Only subtree that changes will be updated• No online index rebuild • Clustered key may lead to non-linear maintenance cost

Schema revalidation still checks whole instanceAD-501-M| XQuery Performance

insert into Person values (42, '<book ISBN=”1-55860-438-3”>

<section> <title>Bad Bugs</title> Nobody loves bad bugs.</section><section> <title>Tree Frogs</title>

All right-thinking people <bold>love</bold> tree frogs.

</section></book>')

Example Index Contents

Primary XML IndexCREATE PRIMARY XML INDEX PersonIdx ON Person (Pdesc)

Assumes typed data; Columns and Values are simplified, see VLDB 2004 paper for details

PK XID TAG ID Node Type-ID VALUE HID

42 1 1 (book) Element 1 (bookT) null #book

42 1.1 2 (ISBN) Attribute 2 (xs:string) 1-55860-438-3 #@ISBN#book

42 1.3 3 (section) Element 3 (sectionT) null #section#book

42 1.3.1 4 (TITLE) Element 2 (xs:string) Bad Bugs #title#section#book

42 1.3.3 -- Text -- Nobody loves bad bugs.

#text()#section#book

42 1.5 3 (section) Element 3 (sectionT) null #section#book

42 1.5.1 4 (title) Element 2 (xs:string) Tree frogs #title#section#book

42 1.5.3 -- Text -- All right-thinking people

#text()#section#book

42 1.5.5 7 (bold) Element 4 (boldT) love #bold#section#book

42 1.5.7 -- Text -- tree frogs #text()#section#book

Secondary XML Indices

PK XID NID TID VALUE LVALUE HID xsinil …

1 Binary XML

2 Binary XML

3 Binary XML

XML Columnin table T(id, x)

Primary XML Index (1 per XML column)Clustered on Primary Key (of table T), XID

Non-clustered Secondary Indices (n per primary Index)Value Index Path IndexProperty Index

3 1 21 24 33 12

XQueries And XML Indicesdemo

Takeaway: XML Indices

PRIMARY XML Index – Use when lots of XQueryFOR VALUE – Useful for queries where values are more selective than paths such as //*[.=“Seattle”]FOR PATH – Useful for Path expressions: avoids joins by mapping paths to hierarchical index (HID) numbers. Example: /person/address/zipFOR PROPERTY – Useful when optimizer chooses other index (for example, on relational column, or FT Index) in addition so row is already known

Shredding ApproachesApproach Complex

ShapesBulkload Server

vs Midtier

Business logic

Programming Scale/Performance

SQLXML Bulkload with annotated schema

Yes with limits

Yes midtier staging tables on server, XSLT on midtier

annotated XSD and small API

very good/very good

ADO.Net DataSet

No No midtier midtier, SSIS

DataSet API or SSIS

good/good

CLR Table-valued function

Yes No Server or midtier

Server or midtier

C#, VB custom code

limited/good

OpenXML Yes No Server T-SQL declarative T-SQL, XPath against variable

limited/good

nodes() Yes No Server T-SQL declarative SQL, XQueryagainst var or table

good/careful

To Promote or Not Promote…Promotion pre-calculates paths

Requires relational query • XQuery does not know about promotion

Promotion during loading of the data• Using any of the shredding mechanisms• 1-to-1 or 1-to-many relationships

Promotion using computed columns• 1-to-1 only• Persist computed column: Fast lookup and retrieval• Relational index on persisted computed column: Fast lookup• Have to be precise

Promotion using Triggers• 1-to-1 or 1-to-many relationships• Trigger overhead

Relational View over XML data• Filters on relational view are not pushed down due to different type/value system

Promotion using computed columnsUse a schema-bound UDF that encapsulates XQuery

Persist computed column• Fast lookup and retrieval

Relational index on persisted computed column• Fast lookup

Query will have to use the schema-bound UDF to match

CAVEAT: No parallel plans with a persisted computed column based on a UDF

Use of Full-Text Index for Optimization

Can provide improvement for XQuery contains() queries

Query for documents where section title contains “optimization”

Use Fulltext index to prefilter candidates (includes false positives)

SELECT * FROM docs WHERE 1 = xCol.exist(' /book/section/title/text()[contains(.,"optimization")]')

SELECT * FROM docs WHERE contains(xCol, 'optimization') AND 1 = xCol.exist(' /book/section/title/text()[contains(.,"optimization")]')

Futures: Selective XML IndexCREATE SELECTIVE XML INDEX pxi_index ON Tbl(xmlcol)FOR (-– the first four match XQuery predicates -- in all XML data type methods

-- simple flavor - default mapping (xs:untypedAtomic),-- no optimization hintsnode42 = ‘/a/b’, pathatc = ‘/a/b/c/@atc’,

-- advanced flavor - use of optimization hintspath02 =‘/a/b/c’ as XQUERY ‘xs:string’ MAXLENGTH(25), node13 = ‘/a/b/d’ as XQUERY ‘xs:double SINGLETON, -– the next two match value() method -- require regular SQL Server type semantics-- they can be mixed with the XQUERY ones-- specifying a type is mandatory for the SQL type semantics

pathfloat = ‘/a/b/c’ as SQL FLOAT, pathabd = ‘/a/b/d’ as SQL VARCHAR(200) )

Session Takeaways• Understand when and how

to use XML in SQL Server• Understand and correct common

performance problems with XML and XQuery

• Shred “relational” XML to relations • Use XML datatype for semistructured

and markup scenarios • Write your XQueries so that XML

Indices can be used• Use persisted computed columns to

promote XQuery results (with caveat)

Related ContentOptimization whitepapershttp://msdn2.microsoft.com/en-us/library/ms345118.aspxhttp://msdn2.microsoft.com/en-us/library/ms345121.aspx

Online WebCastshttp://www.microsoft.com/events/series/msdnsqlserver2005.mspx#SQLXML Newsgroups & Forum: microsoft.public.sqlserver.xml http://communities.microsoft.com/newsgroups/default.asp?ICP=sqlserver2005&sLCID=us http://forums.microsoft.com/msdn/ShowForum.aspx?ForumID=89

General XML and Databases whitepapershttp://msdn2.microsoft.com/en-us/xml/bb190603.aspx

My E-mail: mrys@microsoft.com My Weblog: http://sqlblog.com/blogs/michael_rys/

Complete the Evaluation Form to Win!

Win a Dell Mini Netbook – every day – just for submitting your completed form. Each session evaluation form represents a chance to win.

Pick up your evaluation form:• In each presentation room• Online on the PASS Summit websiteDrop off your completed form:• Near the exit of each presentation room• At the Registration desk• Online on the PASS Summit website

SQLPASS AD501-M XQuery MRys

Technology

Than You Sponsors!dbace.us/repriser/sqlpass/SQLSaturday408_5Differences.pdfDifferences of SQL Server and Oracle Alter Index Rebuild • Oracle –Alter Index Rebuild (recommended never

SQLPASS AD404-M Spatial Index MRys

Андрей Коршиков korshikov@sqlpass

Internal RevenueService Address change Namechange Initial ......ad501(c)(4) Describe the organization's program service accomplishments for eachof its three largest program services,

SYSMAC C-series/CVM1/CV-series Analog I/O Units · 2020. 8. 24. · Unit, the C500-AD501 Analog Input Unit, and the C500-DA501 Analog Output Unit. Appendix B provides the dimensions

Designing a Postcode System for Arbil City Iraqijcce.org/papers/353-C209.pdfparish would require the postcode AD501. The similarity between the Canadian and Andorran postcodes is presented

INFINITY Fall 2013 · 2019. 5. 21. · Italian Folk Dancing (AD501) Learn the traditions of movement and dance from the Italian community, while exploring Italian culture through

RADIO electronica...zere vervanger voor de AD501 10* 25 fxvrc 0,5 pA/"C 10* MQ f 96,— D 513: snelle 25 V/fis, extern gecompen- ierde FET-input op amp, afgeleid van de D503 5.10*

AD501 JP Manual · 2013. 4. 24. · 商標と著作権 ① 本書の内容の一部または全部を無断で転載する事を禁じます。 ② 本書の内容および含まれている情報は、予告なく変更される事があります。

日本文化財科学会 · 2013. 3. 17. · 2 (AD501) of Northern Wei respectively. Chemical compositions were obtained by X-ray fluorescence method. The observed values indicated

Copyright 2000, Microsoft Corp. SQL Server 2000 XML Annotated Schemata Michael Rys Program Manager SQLServer XML Technologies Microsoft Corporation mrys@microsoft.com

Microsoft mobile business intelligence SQLPass

Sqlpass The Magic Of Replication