Techniques, XSLT 2.0

Miscellaneous techniques, with XSLT 2.0

1. Order as defined in the sequence

The following question was asked on XSL-List.

Is there any possibility to set the order of the selected nodes as defined in the sequence?

i.e.
<xsl:for-each select="/node[@class = ('z', 'a', 'b')]">

I would like to process the nodes exactly as defined in the sequence z-a-b,
that means:
1) node[@class eq 'z']
2) node[@class eq 'a']
3) node[@class eq 'b']

Or do I have to use 3 <xsl:apply-templates/>.

David Carlisle replied:

Just as you have used the comma operator to order eth sequence of strings you could use it to order a sequence of nodes:

<xsl:for-each select="node[@class = 'z'],node[@class = 'a'],node[@class = 'b']">

I'm assuming the leading / in /node[@class = ('z', 'a', 'b')] was a typo since in a well formed document there can only be one element node
below / (although XDM allows more than one).

Of course if the sequence ('z', 'a', 'b') is at all regular, or even if it is not, it is probably more convenient to view this as a sorting operation

<xsl:for-each select="node">
<xsl:sort select="index-of(('z', 'a', 'b'),@class)"/>

David further wrote:

or

<xsl:for-each select="for $c in ('z', 'a', 'b') return node[@class=$c]">

2. Extract only numeric value

The following question was asked on XSL-List.

Only numeric characters need to be preserved, in the input; for e.g.

For the input
<year>2007a</year>

Output should be
<year>2007</year>

I tried to answer as following (a XSLT 2.0 solution):

<year><xsl:value-of select="replace(year, '[a-zA-Z]', '')" /></year>

(Any alphabetic characters anywhere, are replaced by an empty string ('')).

But there were better replies:

Jeff Sese replied:

in XSLT 2.0 you can do:

replace(.,'\D','')

this would delete all non-digit characters.

Abel Braaksma replied:

This is a faq, though the faq entry is hidden behind the name "check for integer", it actually shows how to use a regex to match an integer. It
should be easy to extract the integer part based on these regexes: http://www.dpawson.co.uk/xsl/rev2/datatypes.html#d15622e974

Or you can also use simply (\d+) which will greedily grab the number.

Or you can do tokenize($var, '\D+') which will give you all numbers. You may want to change that to tokenize($var, '\D+')[.] to remove empty
items of the sequence.

3. Special string manipulation

The following question was asked on XSL-List.

The XML file contains single strings with upper and lower letters and I need to split them into several words, always just before the first
upper letter starts.

Example:
<SomeTag>thisTextNeedToBeSplit</SomeTag>

My output should look like:

this Text Need To Be Split

I provided following solutions.

XSLT 1.0

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:output method="text" />

<xsl:variable name="caps" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'" />

<xsl:template match="/">
    <xsl:call-template name="walkString">
        <xsl:with-param name="string" select="SomeTag" />
    </xsl:call-template>
</xsl:template>

<xsl:template name="walkString">
    <xsl:param name="string" />
    <xsl:choose>
        <xsl:when test="contains($caps, substring($string,1,1))">
            <xsl:text> </xsl:text><xsl:value-of select="substring($string,1,1)" />
            <xsl:if test="not(substring($string,2) = '')">
                <xsl:call-template name="walkString">
                    <xsl:with-param name="string" select="substring($string,2)" />
                </xsl:call-template>
            </xsl:if>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="substring($string,1,1)" />
            <xsl:if test="not(substring($string,2) = '')">
                <xsl:call-template name="walkString">
                    <xsl:with-param name="string" select="substring($string,2)" />
                </xsl:call-template>
            </xsl:if>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

</xsl:stylesheet>

XSLT 2.0

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

<xsl:output method="text" />

<xsl:template match="/">
    <xsl:variable name="result">
        <xsl:analyze-string
                select="SomeTag"
                regex="[A-Z][a-z]*">
            <xsl:matching-substring>
                <xsl:value-of select="." /><xsl:text> </xsl:text>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <xsl:value-of select="." /><xsl:text> </xsl:text>
            </xsl:non-matching-substring>
        </xsl:analyze-string>
    </xsl:variable>
    <xsl:value-of select="normalize-space($result)" />
</xsl:template>

</xsl:stylesheet>

G. Ken Holman suggested the following XSLT 2.0 solution:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">

<xsl:output method="text"/>

<xsl:template match="/*">
<xsl:value-of select="replace(.,'([a-z])([A-Z])','$1 $2')"/>
</xsl:template>

</xsl:stylesheet>

Dimitre Novatchev suggested following solution:

http://www.stylusstudio.com/xsllist/200302/post60380.html

4. Using XPath 2.0 'collection' function to process a set of XML files in a directory

Let's say, we have a set of XML files in a directory. From this directory, we need to produce a separate HTML file corresponding to each XML file.

Following is an XSLT 2.0 example for this scenario:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

<xsl:output method="html" />


<xsl:variable name="docs" select="collection('file:///E://xml//xsleg//xslt?select=test*.xml')" />

<xsl:template match="/">
    <xsl:for-each select="$docs">
        <xsl:result-document href="{position()}.html">
            <html>
                <head>
                    <title/>
                </head>
                <body>
                    
                </body>
            </html>
        </xsl:result-document>
    </xsl:for-each>
</xsl:template>

</xsl:stylesheet>

The xsl:result-document instruction is used to produce a separate HTML file for each XML file.

5. Find the node with maximum elements

The following question was asked on XSL-List.

I have a requirement to write a XSL transformation to find the node which has the maximum number of elements.

Below is a sample XML file:

<Sample>
    <Toyota>
        <Car>Camry</Car>
        <Car>Corrola</Car>
    </Toyota>
    <Honda>
        <Car>Accord></Car>
        <Car>Civic</Car>
        <Car>Pilot</Car>
    </Honda>
    <Mitsubishi>
        <Car>Lancer</Car>
        <Car>Lancer</Car>
        <Car>Lancer</Car>
    </Mitsubishi>
    <Hyundai>
        <Car>Sonata</Car>
        <Car>Accent</Car>
    </Hyundai>
</Sample>

The XSL should return Honda and Mitsubishi.

Following were the replies to this question.

G. Ken Holman ...

XSLT 2 provides the max() function that makes this very easy.

<?xml version="1.0" encoding="US-ASCII"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                      version="2.0">

<xsl:output method="text"/>

<xsl:template match="/">
    <xsl:for-each select="/Sample/*[count(Car)=max(/Sample/*/count(Car))]">
        <xsl:value-of select="name(.)"/>
        <xsl:text>
        </xsl:text>
    </xsl:for-each>
</xsl:template>

</xsl:stylesheet>

Ken further wrote: You can optimize the speed by putting the max value into a variable and testing against that.

Michael Kay remarked ...

Easiest (even in 2.0) is to sort elements according to the number of children and take the last:

<xsl:for-each select="Sample/*">
    <xsl:sort select="count(child::*)" data-type="number"/>
    <xsl:if test="position()=last()">
        <xsl:value-of select="name()"/>
    </xsl:if>
</xsl:for-each>

Mike further said ...

The solution *[count(*) = max(current()/*/count(*))] is easy to write, but it's very dependent on optimization. Saxon will move the condition
max(current()/*/count(*)) out of the loop if it's written this way, but not if it's written *[count(*) = max(../*/count(*))]. Even if the max() is
calculated outside the loop, you're visiting each node twice and calculating the "key" (count(*) twice for each node. Hence the slight preference for the
sorting approach.

The most efficient solution is probably a recursive function, but that's not the easiest to write. It really calls out for a higher-order function along
the lines of saxon:highest().

Scott Trenda provided the following XSLT 1.0 solution:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:key name="cars" match="*[Car]" use="count(Car)"/>

<xsl:template match="Sample">
    <xsl:variable name="max-cars">
        <xsl:for-each select=".//*[Car]">
            <xsl:sort select="count(Car)" data-type="number"/>
            <xsl:if test="position() = last()">
                <xsl:value-of select="count(Car)"/>
            </xsl:if>
        </xsl:for-each>
    </xsl:variable>
    <xsl:for-each select="key('cars', $max-cars)">
        
        <xsl:value-of select="name()"/>
    </xsl:for-each>
</xsl:template>

</xsl:stylesheet>

(A nice use of RTFs and keys ...)

6. An error diagnosing utility (from XPath to surrounding context)

The following question was asked on the list, xml-dev@lists.xml.org.

Given an XPath expression that points into a document, can anyone suggest a way to extract an arbitrary chunk of the document immediately before and after that point? Say, 40 characters before and after?

Here's the use case: we have an error report that is generated as part of an XSLT batch transformation process. The error message include the XPath that describes the element that caused the problem; e.g., /ART[1]/BM[1]/BIBL[1]/BIB[5]/NOTE[1]/P[1]. The users of the report have requested that we add a bit of context to help them assess the errors without having to open every single file...a preview snippet as it were.

It doesn't have to be a pure XML solution nor does it have to be extremely efficient in processor time. These are not large documents and it would be okay to reread the files as part of the error report generation.

I provided the following XSLT 2.0 solution, using the Saxon extension function, saxon:evaluate.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                        xmlns:xs="http://www.w3.org/2001/XMLSchema"
                        xmlns:saxon="http://saxon.sf.net/"
                        xmlns:fn="http://custom-functions"
                        version="2.0">

<xsl:output method="text" />

<xsl:variable name="xpath" select="'xpath expression ...'" />

<xsl:template match="/">
    <xsl:variable name="beforeText" select="fn:beforeNow(saxon:evaluate($xpath), 40)" />
    <xsl:variable name="afterText" select="fn:afterNow(saxon:evaluate($xpath), 40)" />
    <xsl:value-of select="$beforeText" />
    <xsl:text>
</xsl:text>
    <xsl:value-of select="$xpath" />
    <xsl:text>
</xsl:text>
    <xsl:value-of select="$afterText" />
</xsl:template>

<xsl:function name="fn:beforeNow" as="xs:string">
    <xsl:param name="items" as="item()*" />
    <xsl:param name="x" as="xs:integer" />

    <xsl:variable name="length" select="string-length(string-join($items/preceding::text(),''))" />
    <xsl:variable name="result" select="substring(string-join($items/preceding::text(),''), $length - $x, $length)" />

    <xsl:sequence select="$result" />
</xsl:function>

<xsl:function name="fn:afterNow" as="xs:string">
    <xsl:param name="items" as="item()*" />
    <xsl:param name="x" as="xs:integer" />

    <xsl:variable name="length" select="string-length(string-join($items/following::text(),''))" />
    <xsl:variable name="result" select="substring(string-join($items/following::text(),''), 0, $x)" />

    <xsl:sequence select="$result" />
</xsl:function>

</xsl:stylesheet>

7. 'except' operator on atomic integer values

I asked the following question on XSL-List.

I needed to do set difference operation using the 'except' operator provided in XPath 2.0. My constraint was, the input data is a set of atomic integer values, like (1, 2, 3).

My first attempt was the following stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                       version="2.0">

<xsl:output method="text" />

<xsl:template match="/">
   <xsl:variable name="seq1" select="(1,2,3)" />
   <xsl:variable name="seq2" select="(1,2)" />

   <xsl:value-of select="$seq1 except $seq2" />

</xsl:template>

</xsl:stylesheet>

This proved to be wrong, and Saxon 9-b produced following error:

XPTY0004: Required item type of first operand of 'except' is node(); supplied value has item type xs:integer.

If we look at the semantics of 'except' operator in F&O spec: http://www.w3.org/TR/xpath-functions/#func-except, the arguments to this operator are node()*, and the return type is node()*. Which means, the except operator works on the 'identity' of nodes in the tree.

There are two solutions possible for this problem.

1) Use a temporary tree, and use 'except' operator

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                     version="2.0">

<xsl:output method="text" />

<xsl:template match="/">
   <xsl:variable name="integers">
     <one>1</one>
   <two>2</two>
     <three>3</three>
   </xsl:variable>

   <xsl:value-of select="($integers/one, $integers/two, $integers/three) except ($integers/one, $integers/two)" />

</xsl:template>

</xsl:stylesheet>

2) Use an XPath expression to compare by values

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                       version="2.0">

<xsl:output method="text" />

<xsl:template match="/">
   <xsl:variable name="seq1" select="(1,2,3)" />
   <xsl:variable name="seq2" select="(1,2)" />

   <xsl:value-of select="$seq1[not(. = $seq2)]" />

</xsl:template>

</xsl:stylesheet>

Please note: It's interesting to see the following example from the F&O spec: http://www.w3.org/TR/xpath-functions/#value-except, which illustrates the same idea.

Thanks to Michael Kay, Colin Paul Adams and G. Ken Holman for ideas.

8. Stack implementation with XSLT 2.0

There was an interesting discussion in one of the threads on XSL-List, about how to implement a Stack data structure in a XSLT stylesheet.

Certainly, it's impossible (or nearly impossible) to implement Stacks in pure XSLT (with both 1.0 and 2.0 versions of the language). i.e., we cannot create mutable stacks in XSLT. This is because of the well known feature of XSLT, that we cannot modify variables.

I had following thoughts about this:

"To implement Stack in the XSLT stylesheet, we might use a Java extension for using Stack in an external Java object."

Using Stack in an external Java object is a good option, if we are ready to tolerate mutable objects, and have side effect producing code in the stylesheet.

Michael Kay commented on my view as following:

If you're going to resort to escaping to Java and using mutable objects that way, then you can probably use a completely different algorithm. But that's cheating...

In any case, using XSLT 2.0 sequences to maintain a stack is really easy.

Florent Georges provided the following implementation of Stack for XSLT 2.0:

<xsl:function name="x:push" as="item()+">
    <xsl:param name="stack" as="item()*"/>
    <xsl:param name="item" as="item()"/>
    <xsl:sequence select="$item, $stack"/>
</xsl:function>

<xsl:function name="x:pop" as="item()*">
    <xsl:param name="stack" as="item()*"/>
    <xsl:sequence select="remove($stack, 1)"/>
</xsl:function>

<xsl:function name="x:top" as="item()?">
    <xsl:param name="stack" as="item()*"/>
    <xsl:sequence select="$stack[1]"/>
</xsl:function>

(This is nice ...)

Michael Kay further commented:

One very minor observation: I think (though I would need to verify by testing) that in Saxon this might perform better if you added elements at the end rather than the start. In particular, this might allow the "new" stack to share underlying space with the "old" stack in many cases, and to avoid physical copying.

(I seem to be unusual in that I think of the top of the stack as being at the high-address end. Comes from years of exposure to a hardware architecture that worked that way.)

9. Benefits of Schema Aware XPath processing

Roger L. Costello provided following useful arguments, on XSL-List for Schema Aware XPath processing:

Consider this XML document:

<?xml version="1.0"?>
<Book>
    <Title>My Life and Times</Title>
    <Author>Paul McCartney</Author>
    <Date>1998</Date>
    <ISBN>1-56592-235-2</ISBN>
    <Publisher>McMillan Publishing</Publisher>
</Book>

Here is an XPath expression to count the number of <Author> elements:

count(/Book/Authr)

Notice that Author has been accidentally misspelled in the XPath expression.

The XML document conforms to this XML Schema:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">

<xs:element name="Book">
    <xs:complexType>
        <xs:sequence>
            <xs:element ref="Title" />
            <xs:element ref="Author" minOccurs="0" maxOccurs="unbounded" />
            <xs:element ref="Date" />
            <xs:element ref="ISBN" />
            <xs:element ref="Publisher" />
        </xs:sequence>
    </xs:complexType>
</xs:element>
<xs:element name="Title" type="xs:string"/>
<xs:element name="Author" type="xs:string"/>
<xs:element name="Date" type="xs:string"/>
<xs:element name="ISBN" type="xs:string"/>
<xs:element name="Publisher" type="xs:string"/>

</xs:schema>

Note that it is particularly important to design the XPath in such a way that the processor catches the misspelled tag name, since the XML Schema declares the number of occurrences of the <Author> element to be 0-to-unbounded. The XPath count function may return a result of 0,
which is a legitimate value and so the misspelling error may go undetected for a long time.

It should be possible for the XPath processor to detect, by consulting the XML Schema, that Authr is not a legal child of Book and generate an error or warning.

And it is possible. However, it cannot be accomplished entirely within XPath; features from the host language must be utilized.

For example, if the host language is XSLT then first create a variable for the <Book> element and use the XSLT variable declaration capability to specify its type, using the "as" attribute:

<xsl:variable name="bk" select="/Book" as="schema-element(Book)" />

Then use the variable in the XPath expression:

count($bk/Authr)

Now the processor will generate an error or warning message. Saxon generates this warning: "The complex type of element Book does not allow a child element named Authr".

10. Using the XPath 2.0 "for expression"

XPath 2.0 has an interesting and quite useful syntax: specifying the "for loops" in XPath expressions.

This allows us to write quite useful XPath expressions.

The XPath 2.0 for loop has two flavors:

1) Using a XPath sequence to drive the iterations

for e.g.,

for $x in $seq return expression

Here $seq is a sequence. $x is the current iterator value, which can be utilized in the 'expression'.

2) Using a range expression to drive the iterations

for e.g.,

for $i in $start to $end return expression

Here $start and $end are xs:integer values. $i is the current iterator value, which can be utilized in the 'expression'.

How "for expression" fits into a larger XPath expression:

Examples:

/a/b[count(for $x in p return ..) > 0]

<xsl:value-of select="for $i in $start to $end return concat('xyz', $i)" />

Note: As must be evident, the XPath "for expression" is different than the XSLT for-each instruction.

Home

Last Updated: Mar 22, 2009