View
1.992
Download
1
Category
Tags:
Preview:
DESCRIPTION
SQLPASS presentation on performance tuning and best practices for XML and XQuery in Microsoft SQL Server 2005, SQL Server 2008, SQL Server 2008 R2 and SQL Server 2012.
Citation preview
October 11-14, Seattle, WA
Best Practices and Performance Tuning of XML Queries in SQL ServerAD-501-M
Michael RysPrincipal Program ManagerMicrosoft Corp
mrys@microsoft.com@SQLServerMike
Session Objectives• Understand when and how
to use XML in SQL Server• Understand and correct common
performance problems with XML and XQuery
3
XML Scenarios and when to store XML
XML Design Optimizations
General Optimizations
XML Datatype method Optimizations
XQuery Optimizations
XML Index Optimizations
Session Agenda
AD-501-M| XQuery Performance
4AD-501-M| XQuery Performance
5
XML Scenarios
Data Exchange between loosely-coupled systems • XML is ubiquitous, extensible, platform independent transport format• Message Envelope in XML
Simple Object Access Protocol (SOAP), RSS, REST• Message Payload/Business Data in XML• Vertical Industry Exchange schemasDocument Management• XHTML, DocBook, Home-grown, domain-specific markup (e.g. contracts),
OpenOffice, Microsoft Office XML (both default and user-extended)Ad-hoc modeling of semistructured data• Storing and querying heterogeneous complex objects• Semistructured data with sparse, highly-varying
structure at the instance level• XML provides self-describing format and extensible schemas
→Transport, Store, and Query XML data
AD-501-M| XQuery Performance
AD-501-M| XQuery Performance 6
Decision Tree: Processing XML In SQL ServerDoes the data
fit the relational model?
Is the data semi-
structured?
Is the data a document?
Query into the XML?
Search within the XML? Is the XML
constrained by schemas?
Shred the XML into relations
Shred the structured XML into relations,
store semistructured aspects as XML and/or
sparse col
Define a full-text index
Use primary and secondary XML
indexes as needed
Constrain XML if validation cost is ok
Store as XMLStore as varbinary(max)
No
No
No
Open schema
Yes
Yes
Yes
Yes
YesPromote
frequently queried
properties relationally
structured
Shred known sparse data into sparse columns
Known sparse
7
SQL Server XML Data Type Architecture
XML ParserXML
Validation
XML data type
(binary XML)
SchemaCollectio
n
XML Relational
XML Schemata
OpenXML/nodes()
FOR XML with TYPE directive
Rowsets
XQuery
XML-DMLNode Table
PATH Index PROP Index VALUE Index
PRIMARYXML INDEX
XQuery
AD-501-M| XQuery Performance
8
General ImpactsConcurrency Control• Locks on both XML data type and relevant
rows in primary and secondary XML Indices• Lock escalation on indices• Snapshot Isolation reduces locks and lock contentionTransaction Logs• Bulkinsert into XML Indices may fill transaction log• Delay the creation of the XML indexes and use the SIMPLE recovery
model• Preallocate database file instead of dynamically growing• Place log on different diskIn-Row/Out-of-Row of XML large object• Moving XML into side table or out-of-row if
mixed with relational data reduces scan timeDue to clustering, insertion into XML Index may not be linear• Chose integer/bigint identity column as key
AD-501-M| XQuery Performance
9
Choose The Right XML Model• Element-centric versus attribute-centric
<Customer><name>Joe</name></Customer> <Customer name="Joe" />+: Attributes often better performing querying–: Parsing Attributes uniqueness check
• Generic element names with type attribute vs Specific element names <Entity type="Customer"> <Prop type="Name">Joe</Prop> </Entity>
<Customer><name>Joe</name></Customer>+: Specific names shorter path expressions+: Specific names no filter on type attribute/Entity[@type="Customer"]/Prop[@type="Name"] vs /Customer/name
• Wrapper elements <Orders><Order id="1"/></Orders>+: No wrapper elements smaller XML, shorter path expressions
AD-501-M| XQuery Performance
10
Use an XML Schema Collection?
Using no XML Schema (untyped XML)• Can still use XQuery and XML Index!!!• Atomic values are always weakly typed strings
compare as strings to avoid runtime conversions and loss of index usage
• No schema validation overhead• No schema evolution revalidation costs
XML Schema provides structural information• Atomic typed elements are now using only one instead of two rows in node
table/XML index (closer to attributes)• Static typing can detect cardinality and feasibility of expression
XML Schema provides semantic information• Elements/attributes have correct atomic
type for comparison and order semantics• No runtime casts required and better use of index for value lookup
AD-501-M| XQuery Performance
11
XQuery Methods
query() creates new, untyped XML data type instanceexist() returns 1 if the XQuery expression returns at least one item, 0 otherwisevalue() extracts an XQuery value into the SQL value and type space• Expression has to statically be a singleton • String value of atomized XQuery item is cast to
SQL type• SQL type has to be SQL scalar type
(no XML or CLR UDT)AD-501-M| XQuery Performance
12
XQuery: nodes()
Returns a row per selected node as a special XML data type instance• Preserves the original structure and types• Can only be used with the XQuery methods (but
not modify()), count(*), and IS (NOT) NULL
Appears as Table-valued Function (TVF) in queryplan if no index present
AD-501-M| XQuery Performance
13
sql:column()/sql:variable()Map SQL value and type into XQuery values and types in context of XQuery or XML-DML• sql:variable(): accesses a SQL variable/parameter
declare @value int set @value=42select * from T where T.x.exist('/a/b[@id=sql:variable("@value")]')=1
• sql:column(): accesses another column value
tables: T(key int, x xml), S(key int, val int)
select * from T join S on T.key=S.keywhere T.x.exist('/a/b[@id=sql:column("S.val")]')=1
• Restrictions in SQL Server: No XML, CLR UDT, datetime, or deprecated text/ntext/image
AD-501-M| XQuery Performance
October 11-14, Seattle, WA
Improving Slow XQueries, Bad FOR XMLdemo
15
Optimal Use Of Methods
BAD: CAST( CAST(xmldoc.query('/a/b/text()') as nvarchar(500)) as int)GOOD: xmldoc.value('(/a/b/text())[1]', 'int')BAD: node.query('.').value('@attr', 'nvarchar(50)')GOOD: node.value('@attr', 'nvarchar(50)')
AD-501-M| XQuery Performance
How to Cast from XML to SQL
16
Optimal Use Of Methods
Group value() methods on same XML instance next to each other if the path expressions in the value() methods are• Simple path expressions that only use child and attribute
axis and do not contain wildcards, predicates, node tests, ordinals
• The path expressions infer statically a singleton
The singleton can be statically inferred from • the DOCUMENT and XML Schema Collection• Relative paths on the context node provided by the nodes()
method
Requires XML index to be presentAD-501-M| XQuery Performance
Grouping value() method
17
Optimal Use of Methods
Use exist() method, sql:column()/sql:variable() and an XQuery comparison for checking for a value or joining if secondary XML indices present
BAD:* select docfrom doc_tab join authorson doc.value('(/doc/mainauthor/lname/text())[1]', 'nvarchar(50)') = lastnameGOOD: select docfrom doc_tab join authorson 1 = doc.exist('/doc/mainauthor/lname/text()[. = sql:column("lastname")]')
* If applied on XML variable/no index present, value() method is most of the time more efficient
AD-501-M| XQuery Performance
Using the right method to join and compare
18AD-501-M| XQuery Performance
Optimal Use of Methods
nodes() without XML index is a Table-valued function (details later)Bad cardinality estimates can lead to bad plans
• BAD: select c.value('@id', 'int') as CustID , c.value('@name', 'nvarchar(50)') as CNamefrom Customer, @x.nodes('/doc/customer') as N(c)where Customer.ID = c.value('@id', 'int')
• BETTER (if only one wrapper doc element): select c.value('@id', 'int') as CustID , c.value('@name', 'nvarchar(50)') as CNamefrom Customer, @x.nodes('/doc[1]') as D(d) cross apply d.nodes('customer') as N(c)where Customer.ID = c.value('@id', 'int')
Use temp table (insert into #temp select … from nodes()) or Table-valued parameter instead of XML to get better estimates
Avoiding bad costing with nodes()
19
Optimal Use Of Methods
Use subqueries• BAD:
SELECT CASE isnumeric (doc.value( '(/doc/customer/order/price)[1]', 'nvarchar(32)')) WHEN 1 THEN doc.value( '(/doc/customer/order/price)[1]', 'decimal(5,2)') ELSE 0 END FROM T
• GOOD:SELECT CASE isnumeric (Price) WHEN 1 THEN CAST(Price as decimal(5,2)) ELSE 0 END FROM (SELECT doc.value( '(/doc/customer/order/price)[1]', 'nvarchar(32)')) as Price FROM T) X
Use subqueries also with NULLIF()AD-501-M| XQuery Performance
Avoiding multiple method evaluations
20
Combined SQL And XQuery/DML Processing
XQuery Parser
Static Typing
Algebrization
XML Schema
Collection
MetadataStatic Phase
Runtime Optimization and Execution of physical Op Tree
Dynamic Phase
XML and rel.
Indices
Static Optimization of combined Logical and
Physical Operation Tree
SQL Parser
Algebrization
Static Typing
SELECT x.query('…'), y FROM T WHERE …
AD-501-M| XQuery Performance
21
New XQuery Algebra Operators
Table-Valued Function XML Reader UDF with XPath Filter
Used if no Primary XML Index is presentCreates node table rowset in query flowMultiple XPath filters can be pushed in to reduce node table to subtreeBase cardinality estimate is always 10’000 rows! Some adjustment based on pushed path filters
ID TAG ID Node Type-ID VALUE HID1.3.
14 (TITLE) Element 2
(xs:string)
Bad Bugs #title#section#book
XMLReader node table format example (simplified)
AD-501-M| XQuery Performance
XML Reader TVF
22
New XQuery Algebra Operators
• Serializer UDXserializes the query result as XML
• XQuery String UDXevaluates the XQuery string() function
• XQuery Data UDXevaluates the XQuery data() function
• Check UDXvalidates XML being inserted
• UDX name visible in SSMS properties window
AD-501-M| XQuery Performance
UDX
23
Optimal Use Of XQueryValue comparisons, XQuery casts and value() method casts require atomization of item
• attribute: /person[@age = 42] /person[data(@age) = 42]
• Atomic typed element:/person[age = 42] /person[data(age) = 42]
• Untyped, mixed content typed element (adds UDX):/person[age = 42] /person[data(age) = 42] /person[string(age) = 42]
• If only one text node for untyped element (better):/person[age/text() = 42] /person[data(age/text()) = 42]
• value() method on untyped elements:value('/person/age', 'int') value('/person/age/text()', 'int')
String() aggregates all text nodes, prohibits index useAD-501-M| XQuery Performance
Atomization of nodes
24
Optimal Use Of XQuery
Value comparisons require casts and type promotion• Untyped attribute:
/person[@age = 42] /person[xs:decimal(@age) = 42]• Untyped text node():
/person[age/text() = 42] /person[xs:decimal(age/text()) = 42]
• Typed element (typed as xs:int):/person[salary = 3e4] /person[xs:double(salary) = 3e4]
Casting is expensive and prohibits index lookup
Tips to avoid casting• Use appropriate types for comparison (string for untyped)• Use schema to declare type AD-501-M| XQuery Performance
Casting Values
25
Optimal Use Of XQuery
Single paths are more efficient than twig pathsAvoid predicates in the middle of path expressions
book[@ISBN = "1-8610-0157-6"]/author[first-name = "Davis"]
/book[@ISBN = "1-8610-0157-6"] "∩" /book/author[first-name = "Davis"]
Move ordinals to the end of path expressions• Make sure you get the same semantics!• /a[1]/b[1] ≠ (/a/b)[1] ≠ /a/b[1] • (/book/@isbn)[1] is better than/book[1]/@isbnAD-501-M| XQuery Performance
Maximize XPath expressions
26
Optimal Use Of XQuery
Use context item in predicate to lengthen path in exist()• Existential quantification makes returned node irrelevant
• BAD:SELECT * FROM docs WHERE 1 = xCol.exist ('/book/subject[text() = "security"]')
• GOOD:SELECT * FROM docs WHERE 1 = xCol.exist ('/book/subject/text()[. = "security"]')
• BAD:SELECT * FROM docs WHERE 1 = xCol.exist ('/book[@price > 9.99 and @price < 49.99]')
• GOOD:SELECT * FROM docs WHERE 1 = xCol.exist ('/book/@price[. > 9.99 and . < 49.99]')
This does not work with or-predicate AD-501-M| XQuery Performance
Maximize XPath expressions in exist()
27
Optimal Use Of XQuery
Most frequent offender: parent axis with nodes()
• BAD:select o.value('../@id', 'int') as CustID , o.value('@id', 'int') as OrdIDfrom T cross apply x.nodes('/doc/customer/orders') as N(o)
• GOOD:select c.value('@id', 'int') as CustID , o.value('@id', 'int') as OrdIDfrom T cross apply x.nodes('/doc/customer') as N1(c) cross apply c.nodes('orders') as N2(o)
AD-501-M| XQuery Performance
Inefficient operations: Parent axis
28
Optimal Use Of XQuery
Avoid descendant axes and // in the middle of path expressions if the data structure is known.
• // still can use the HID lookup, but is less efficient
XQuery construction performs worse than FOR XML• BAD:
SELECT notes.query(' <Customer cid="{sql:column(''cid'')}">{ <name>{sql:column("name")}</name>, / }</Customer>')FROM Customers WHERE cid=1
• GOOD:SELECT cid as "@cid", name, notes as "*"FROM Customers WHERE cid=1FOR XML PATH('Customer'), TYPE
AD-501-M| XQuery Performance
Inefficient operations
29
Optimal Use Of FOR XMLUse TYPE directive when assigning result to XML
• BAD:declare @x xml;set @x = (select * from Customers for xml raw);
• GOOD:declare @x xml;set @x = (select * from Customers for xml raw, type);
Use FOR XML PATH for complex grouping and additional hierarchy levels over FOR XML EXPLICITUse FOR XML EXPLICIT for complex nesting if FOR XML PATH performance is not appropriate
AD-501-M| XQuery Performance
30
XML IndicesCreate XML index on XML column
CREATE PRIMARY XML INDEX idx_1 ON docs (xDoc)Create secondary indexes on tags, values, pathsCreation:
• Single-threaded only for primary XML index• Multi-threaded for secondary XML indexes
Uses:• Primary Index will always be used if defined (not a cost based decision)• Results can be served directly from index• SQL’s cost based optimizer will consider secondary indexes
Maintenance:• Primary and Secondary Indices will be efficiently maintained during updates• Only subtree that changes will be updated• No online index rebuild • Clustered key may lead to non-linear maintenance cost
Schema revalidation still checks whole instanceAD-501-M| XQuery Performance
31
insert into Person values (42, '<book ISBN=”1-55860-438-3”>
<section> <title>Bad Bugs</title> Nobody loves bad bugs.</section><section> <title>Tree Frogs</title>
All right-thinking people <bold>love</bold> tree frogs.
</section></book>')
Example Index Contents
AD-501-M| XQuery Performance
32
Primary XML IndexCREATE PRIMARY XML INDEX PersonIdx ON Person (Pdesc)
Assumes typed data; Columns and Values are simplified, see VLDB 2004 paper for details
PK XID TAG ID Node Type-ID VALUE HID
42 1 1 (book) Element 1 (bookT) null #book
42 1.1 2 (ISBN) Attribute 2 (xs:string) 1-55860-438-3 #@ISBN#book
42 1.3 3 (section) Element 3 (sectionT) null #section#book
42 1.3.1 4 (TITLE) Element 2 (xs:string) Bad Bugs #title#section#book
42 1.3.3 -- Text -- Nobody loves bad bugs.
#text()#section#book
42 1.5 3 (section) Element 3 (sectionT) null #section#book
42 1.5.1 4 (title) Element 2 (xs:string) Tree frogs #title#section#book
42 1.5.3 -- Text -- All right-thinking people
#text()#section#book
42 1.5.5 7 (bold) Element 4 (boldT) love #bold#section#book
42 1.5.7 -- Text -- tree frogs #text()#section#book
AD-501-M| XQuery Performance
33
Secondary XML Indices
PK XID NID TID VALUE LVALUE HID xsinil …
1
1
1
2
2
2
3
3
3
id x
1 Binary XML
2 Binary XML
3 Binary XML
XML Columnin table T(id, x)
Primary XML Index (1 per XML column)Clustered on Primary Key (of table T), XID
Non-clustered Secondary Indices (n per primary Index)Value Index Path IndexProperty Index
3 1 21 24 33 12
AD-501-M| XQuery Performance
October 11-14, Seattle, WA
XQueries And XML Indicesdemo
35
Takeaway: XML Indices
PRIMARY XML Index – Use when lots of XQueryFOR VALUE – Useful for queries where values are more selective than paths such as //*[.=“Seattle”]FOR PATH – Useful for Path expressions: avoids joins by mapping paths to hierarchical index (HID) numbers. Example: /person/address/zipFOR PROPERTY – Useful when optimizer chooses other index (for example, on relational column, or FT Index) in addition so row is already known
AD-501-M| XQuery Performance
Shredding ApproachesApproach Complex
ShapesBulkload Server
vs Midtier
Business logic
Programming Scale/Performance
SQLXML Bulkload with annotated schema
Yes with limits
Yes midtier staging tables on server, XSLT on midtier
annotated XSD and small API
very good/very good
ADO.Net DataSet
No No midtier midtier, SSIS
DataSet API or SSIS
good/good
CLR Table-valued function
Yes No Server or midtier
Server or midtier
C#, VB custom code
limited/good
OpenXML Yes No Server T-SQL declarative T-SQL, XPath against variable
limited/good
nodes() Yes No Server T-SQL declarative SQL, XQueryagainst var or table
good/careful
37
To Promote or Not Promote…Promotion pre-calculates paths
Requires relational query • XQuery does not know about promotion
Promotion during loading of the data• Using any of the shredding mechanisms• 1-to-1 or 1-to-many relationships
Promotion using computed columns• 1-to-1 only• Persist computed column: Fast lookup and retrieval• Relational index on persisted computed column: Fast lookup• Have to be precise
Promotion using Triggers• 1-to-1 or 1-to-many relationships• Trigger overhead
Relational View over XML data• Filters on relational view are not pushed down due to different type/value system
AD-501-M| XQuery Performance
38
Promotion using computed columnsUse a schema-bound UDF that encapsulates XQuery
Persist computed column• Fast lookup and retrieval
Relational index on persisted computed column• Fast lookup
Query will have to use the schema-bound UDF to match
CAVEAT: No parallel plans with a persisted computed column based on a UDF
AD-501-M| XQuery Performance
39
Use of Full-Text Index for Optimization
Can provide improvement for XQuery contains() queries
Query for documents where section title contains “optimization”
Use Fulltext index to prefilter candidates (includes false positives)
SELECT * FROM docs WHERE 1 = xCol.exist(' /book/section/title/text()[contains(.,"optimization")]')
SELECT * FROM docs WHERE contains(xCol, 'optimization') AND 1 = xCol.exist(' /book/section/title/text()[contains(.,"optimization")]')
AD-501-M| XQuery Performance
Futures: Selective XML IndexCREATE SELECTIVE XML INDEX pxi_index ON Tbl(xmlcol)FOR (-– the first four match XQuery predicates -- in all XML data type methods
-- simple flavor - default mapping (xs:untypedAtomic),-- no optimization hintsnode42 = ‘/a/b’, pathatc = ‘/a/b/c/@atc’,
-- advanced flavor - use of optimization hintspath02 =‘/a/b/c’ as XQUERY ‘xs:string’ MAXLENGTH(25), node13 = ‘/a/b/d’ as XQUERY ‘xs:double SINGLETON, -– the next two match value() method -- require regular SQL Server type semantics-- they can be mixed with the XQUERY ones-- specifying a type is mandatory for the SQL type semantics
pathfloat = ‘/a/b/c’ as SQL FLOAT, pathabd = ‘/a/b/d’ as SQL VARCHAR(200) )
Session Takeaways• Understand when and how
to use XML in SQL Server• Understand and correct common
performance problems with XML and XQuery
• Shred “relational” XML to relations • Use XML datatype for semistructured
and markup scenarios • Write your XQueries so that XML
Indices can be used• Use persisted computed columns to
promote XQuery results (with caveat)
October 11-14, Seattle, WA
Q&A
43
Related ContentOptimization whitepapershttp://msdn2.microsoft.com/en-us/library/ms345118.aspxhttp://msdn2.microsoft.com/en-us/library/ms345121.aspx
Online WebCastshttp://www.microsoft.com/events/series/msdnsqlserver2005.mspx#SQLXML Newsgroups & Forum: microsoft.public.sqlserver.xml http://communities.microsoft.com/newsgroups/default.asp?ICP=sqlserver2005&sLCID=us http://forums.microsoft.com/msdn/ShowForum.aspx?ForumID=89
General XML and Databases whitepapershttp://msdn2.microsoft.com/en-us/xml/bb190603.aspx
My E-mail: mrys@microsoft.com My Weblog: http://sqlblog.com/blogs/michael_rys/
AD-501-M| XQuery Performance
44
Complete the Evaluation Form to Win!
Win a Dell Mini Netbook – every day – just for submitting your completed form. Each session evaluation form represents a chance to win.
Pick up your evaluation form:• In each presentation room• Online on the PASS Summit websiteDrop off your completed form:• Near the exit of each presentation room• At the Registration desk• Online on the PASS Summit website
Sponsored by Dell
AD-501-M| XQuery Performance
October 11-14, Seattle, WA
Thank youfor attending this session and the 2011 PASS Summit in Seattle
46AD-501-M| XQuery Performance
Microsoft SQL Server Clinic
Work through your technical issues with
SQL Server CSS & get architectural guidance
from SQLCAT
Microsoft Product Pavilion
Talk with Microsoft SQL Server & BI experts to learn about the next version of SQL Server and check out the new Database Consolidation
Appliance
Expert Pods Meet Microsoft SQL Server Engineering team members &
SQL MVPs
Hands-on Labs
Get experienced through self-paced & instructor-led labs on our cloud based lab platform - bring your
laptop or use HP provided hardware
Room 611 Expo Hall 6th Floor Lobby Room 618-620
Recommended