Schema Aware XSLT

Schema Aware XSLT

In this page, I am compiling some facts about Schema Aware XSLT processing, which is a new feature introduced in the XSLT 2.0 language. This is a quite innovative feature of the XSLT 2.0 language, and has many benefits, as will be explained in this article. For someone reading this article, I assume the person has an understanding of W3C XML Schema language. This is necessary to use schema related features of XSLT 2.0. The Schema defines rules against which XML documents can be validated. Schemas play a central role in type system of XSLT 2.0 language.

The XSLT 2.0 stylesheets can take advantage of the W3C XML Schema (aka XSD) defined for input and output documents. Schema Awareness is an optional feature of XSLT 2.0. The XSLT 2.0 standard defines two conformance levels for XSLT processors: the basic processor, and the schema-aware processor. The XSLT 2.0 processor is not required to implement schema-awareness part of the standard.

Post Schema Validation Infoset (PSVI): The XML Schema processor can validate a given XML document against a Schema, and give a "yes" or "no" answer whether the document is valid or not. The designers of XML Schema defined a concept known as PSVI. When the XML document is validated against a Schema, the Schema processor can attach labels to the validated document. These labels indicate, the elements and attributes of XML are validated against which Schema definitions. The data model used by XSLT and XPath is based on the PSVI, but it only retains a subset of the information in the PSVI: specifically, the type annotations attached to element and attribute nodes.

Let's say we have the following type definition in XML Schema:

<xs:complexType name="xyz">

</xs:complexType>

And we have the following element definition which uses the above type:

<xs:element name="some-element" type="xyz" />

In the XSLT 2.0 stylesheet, we can write a template to process elements of a particular type:

<xsl:template match="element(*, xyz)">
  <!-- template contents -->
</xsl:template>


Declaring types in stylesheet
XSLT 2.0 allows us to define the type of variables. 

e.g. <xsl:variable name="var-name" as="xs:integer" />

This will work whether the XSLT processor is schema aware or not, or whether the XML document was validated against the schema or not.

The types can be declared for other things like return type of functions and the named template.

e.g. <xsl:function name="ns:func-name" as="xs:integer">
         <!-- function body -->
      </xsl:function>

(the above function returns an integer value)

or, <xsl:template name ="template-name" as="element()?">
        <!-- template body -->
     </xsl:template>

(the above template returns an element node or nothing)

Type declarations are useful in following ways:
1) The XSLT processor has extra information about the permitted values that the variable can take. This allows the processor to generate 
efficient code.
2) The XSLT processor can check that the supplied values to the variables match the declared type. If the values doesn't match the type, 
the processor will give an error. This results in faster debugging. The more the early stage at which error is found, the faster is the development 
time.

What appears in the as clause above is known as sequence type descriptor. Following are some of the examples of the sequence type descriptors,
that we can use in a basic XSLT processor and also the schema aware processor:

xs:integer - an integer value
xs:integer* - a sequence of zero or more integers
xs:string? - a string or an empty sequence
xs:date - a date value
xdt:anyAtomicType (the XSL WG has now decided to rename xdt namespace to XML Schema namespace. So this now would 
become xs:anyAtomicType) - an atomic value of any type 
node() - any node in the tree
node()* - a sequence of zero or more nodes, of any kind
element() - any element node
attribute()+ - a sequence of zero or more attribute nodes
document-node() - a document node

The diagram below shows the type hierarchy used in XSLT 2.0 and XPath 2.0.

The types shown above include the built-in types defined by the XML Schema specification (on the right), and the types defined in XPath 2.0 
data model (on the left).

The schema aware XSLT processor include the following features related to types:
1) All the built-in atomic types available in XML Schema specification become available.
2) User defined types can be imported from the XML Schema definition.

This is a very powerful way for writing XSLT stylesheets, and is a big change from the XSLT 1.0 language.

To make use of the user defined types in the XSLT stylesheet, the XML Schema must be imported in the stylesheet, with the following declaration:

<xsl:import-schema namespace="http://some-uri" schema-location="some-schema.xsd"/>

We can import any number of schemas, provided the namespaces do not clash.

Lets say, we have defined a user defined type address:pincode in the schema (which is a subtype of xs:string conforming to a specific pattern), then 
we can declare a XSLT variable as following:

<xsl:variable name="pin" as="address:pincode" select="expression"/>


Validating Source Documents
The source XML document can be validated by the XML Schema. The result of validation produces following two important information:
1) A yes or no answer, specifying whether the document is valid according to the rules of the schema.
2) The nodes of the source document get type annotation attached to them (which are like labels), which tell that the node is validated against which type 
in the schema.

The schema annotated XML document can be put to following uses:
1) Many operations on nodes require typed value of nodes. This process is known as atomization.
2) Various constructs in XSLT can specify types as specified in the schema (for e.g. variable types, function and template return types, function and 
template parameter types). Declaring types in stylesheet is useful in a way that, many type errors can be detected very early in the development
cycle, thereby reducing the overall program delivery time. With XSLT 1.0, many times due to very limited data types, the XSLT processor will
not give the error. But instead, we'll get wrong results. This results in longer debugging cycles.

The schema validation of XML source cannot be requested from within the stylesheet. How validation of source is requested, is XSLT processor 
specific. For e.g. with Saxon, this can be done with the -val command line option. What we can do within the stylesheet is that, we can check whether
the source was validated (this helps in processing the right XML source).

This stylesheet checks whether the XML source was validated:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

<xsl:import-schema schema-location="books.xsd" />

<xsl:template match="/">
    <xsl:if test="not(* instance of schema-element(BOOKLIST))">
       <xsl:message terminate="yes">
          Source document is not a validated book list
       </xsl:message>
   </xsl:if>
   <xsl:apply-templates/>
</xsl:template>

</xsl:stylesheet>

The input XML used is books.xml, and schema used is books.xsd.

Validating Result Documents
The resulting XML document can be validated against a schema from within the stylesheet.

Following is an example of such a stylesheet:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

<xsl:import-schema schema-location="result-schema.xsd" />

<xsl:template match="/">
    <xsl:result-document validation="strict">
      <output>
         <xsl:apply-templates/>
      </output>
    </xsl:result-document>
</xsl:template>

</xsl:stylesheet>

The result-schema.xsd file is:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <xs:element name="output" type="xs:string" />
</xs:schema>

The input XML used is books.xml.

Another option of validating the result tree is by using the type attribute on xsl:result-document element. The type attribute is a QName, and is 
a global type definition in the schema. We can use either validation attribute or the type attribute, but not both on xsl:result-document.

Using types in XPath
Lets say we have defined the following element in the Schema:
<xs:element name="elem" type="type-name"/>

The XPath expression: //element(*, type-name) will select all element nodes in the document having type annotation of type-name.

We can also test whether a variable contains an element or attribute of a particular type, as follows:
if ($x instance of element(*, type-name)) ...

I'll end the article here. Hope I have been able to convey, the power of using schemas within the XSLT stylesheets. There are lot of other details in the XSLT
language related to schemas. I'll encourage to read from the resources below.

References:
1) XSL Transformations (XSLT) Version 2.0
2) XML Path Language (XPath) 2.0
3) XQuery 1.0 and XPath 2.0 Functions and Operators
4) XSLT 2.0 Programmer's Reference, 3rd Edition - by Michael Kay
5) XPath 2.0 Programmer�s Reference - by Michael Kay
6) Saxon XSLT processor - All the examples mentioned in this article are tested with Saxon.