Querying XML fields using t-sql
· 2010-10-06 15:48 by Thijs Kroesbergen for Brokenwire.NET
Recently I got several questions about this topic, so I thought it would be nice to write a little bit about how to use XML inside SQL Server.
First of all: *don’t* query XML using substring(), pos() and other string manipulations. If you still do ugly stuff like that then you have to read this article and you also have to buy a big pie for your co-workers.
So we had this project where we had several nvarchar(max) fields which actually contained xml documents. This way of storing XML is fine when you don’t need to look inside the xml data itself. But when you want to use the data inside the xml documents its convenient to use the xml datatype. This datatype is available from Sql Server 2005 and up. In this post i’ll try to introduce you to the magical world of XML inside sql server.
All examples below are in this downloadable xmldemo.sql file, so you can easily try them yourself.
The Xml datatype allows you to perform several operations on the xml data from within t-sql. Although this is not very fast, it’s often better than round-tripping and doing the xml parsing in your application layers.
You can use the xml datatype just like any other datatype. You can use it for tables, table variables and also for regular variables and procedure/function parameters.
This creates a table with one column of the xml datatype and inserts one row with a (simple) XML document:
create table #demo (field1 xml) insert into #demo (field1) values ('<document> <header>Alphabet</header> <items> <item id="a">a is for apple</item> <item id="b">b is for balloon</item> </items> </document>')
As you can observe the document structure is like this:
<document> <header>Alphabet</header> <items> <item id="a">a is for apple</item> <item id="b">b is for balloon</item> </items> </document>
Now the cool thing is that there are several methods that can be executed on the Xml datatype, such as:
In this brief introduction I’ll highlight the usage of the Value and Query methods. All methods are well documented in the MSDN library.
To get the “header” tag from our example xml document a query like this could be used:
select field1.query('/document/header') from #demo
This would return only the <header> element, with its contents. The language used to write the query is called “XQuery”, which is an XPath based query language defined by W3C. But be warned, only a subset of the official language is supported.
Say you’d like to get the xml node which has an “id” attribute with a value of “a”. You can use a query like this:
select field1.query('/document/items/item[@id="a"]') from #demo
This would return only the first of the two <item> elements, as an XML fragment in an XML field. If you are puzzeled by the syntax here, then please take a moment to look at these examples, that’ll give you a quick start.
The value method is one of the methods that I personally have used the most. With this method you can select contents from an XML field and convert it to a “regular” sql datatype (also known as scalar values).
select field1.value('(/document/items/item)', 'nvarchar(max)') from #demo
Will return: “a is for apple”, because we asked for value of the the first  node that matched the “/document/items/item” expression.
But you can use these methods & expressions also in the WHERE clause of a t-sql query. Like so:
select * from #demo where field1.value('(/document/items/item)', 'nvarchar(max)') like 'a%'
This statement would return all records which have an xml document in their “field1” column where the value first “item” tag starts with “a”. Please remember: this is nice but it will kill performance if you attempt this on a large table with loads of xml documents in it: in sql 2005 you cannot put indexes on values inside XML field! Sql server 2008 (and up) will let you create XML indexes as well.
One of the things I get asked most is how to query xml documents which have one or more xml namespaces associated.
So lets add another example record
insert into #demo (field1) values ('<document xmlns="http://www.brokenwire.net/xmldemo"> <header>Alphabet</header> <items> <item id="a">a is for apple</item> <item id="b">b is for balloon</item> </items> </document>')
Please notice that the only difference is the addition of a namespace.
Running the first .value method example query on this record will return a NULL value. This is because the xml elements now live inside the “http://www.brokenwire.net/xmldemo” namespace, and the “/document/items/item” xpath expression looks for these elements without a namespace, so it doesn’t find anything in this document.
To be able to query with namespaces there are (at least) 2 possibilities. You can either completely ignore the namespaces and go directly to the elements you need, or you can declare the namespace and use it properly in your expressions.
The quick-and-dirty and I-don’t-care-about-namespaces way:
select field1.value('(//*[local-name()="item"])', 'nvarchar(max)') from #demo
This expression will work for both the document WITH a namespace and the document WITHOUT a namespace.
The proper way:
select field1.value('declare namespace bw=http://www.brokenwire.net/xmldemo;
(/bw:document/bw:items/bw:item)', 'nvarchar(max)') from #demo
This expression will only work for the document WITH the namespace, because in the other document the elements live in the default namespace.
If you can’t get enough of this then this introduction to XQuery article on MSDN is worth reading too. You’ll need a lot more Xpath and Xquery knowledge than covered here to make really good use of the potential powers that are enabled by the xml datatype. Fortunately these are both standardized by W3C, so learning them will enable you to do much more cool stuff.
And don’t forget, you can post your ideas, notes, thoughts and “please send me the codez” requests in the comments, that’s what they are for.