XSLT 2.0 Grouping techniques
 

In this page, I am compiling some grouping problems on XML data, and their solutions using the XSLT 2.0 language.
 

1. Grouping problem

The following question was asked on XSL-List.

I have the following source XML:

<All_Results>
    <Result>
        <Name>John</Name>
        <Country>UK</Country>
        <!-- other upto 100 elements -->
        <Color>Red</Color>
    </Result>
    <Result>
        <Name>John</Name>
        <Country>US</Country>
        <!-- other upto 100 elements -->
        <Color>Green</Color>
    </Result>
    <Result>
        <Name>Thomas</Name>
        <Country>Estonia</Country>
        <!-- other upto 100 elements -->
        <Color>
        </Color>
    </Result>
    <Result>
        <Name>
        </Name>
        <Country>UK</Country>
        <!-- other upto 100 elements -->
        <Color>Red</Color>
    </Result>
</All_Results>

Each <Result> has the same list of sub-elements, some might not have a text value.

I want to aggregate and get something like this:

<Totals>
    <Name>
        <Tag value="John" count="2" />
        <Tag value="Thomas" count="1" />
    </Name>
    <Country>
        <Tag value="UK" count="2" />
        <Tag value="US" count="1" />
        <Tag value="Estonia" count="1" />
    </Country>
    <Color>
        <Tag value="Red" count="2" />
        <Tag value="Green" count="1" />
    </Color>
    <!-- other elements grouped by element name, sorted by total of element values-->
</Totals>

Following is a XSLT 2.0 solution for this (the sorting is not implemented):

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

<xsl:output method="xml" indent="yes" />

<xsl:template match="/">
    <Totals>
        <xsl:for-each select="All_Results/Result[1]/*">
            <xsl:variable name="name" select="name()" />
            <xsl:element name="{$name}">
                <xsl:for-each-group select="../../Result/*[name() = $name]" group-by=".">
                    <xsl:if test="not(normalize-space(.) = '')">
                        <Tag value="{.}" count="{count(current-group())}" />
                    </xsl:if>
                </xsl:for-each-group>
            </xsl:element>
        </xsl:for-each>
    </Totals>
</xsl:template>

</xsl:stylesheet>

Andrew Welch suggested:

Here's another way which doesn't rely on all elements being present in the first <Result>:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
    <Totals>
        <xsl:for-each-group select="/All_Results/Result/*[normalize-space()]" group-by="name()">
            <xsl:element name="{current-grouping-key()}">
                <xsl:for-each-group select="current-group()" group-by=".">
                    <Tag value="{current-grouping-key()}" count="{count(current-group())}"/>
                </xsl:for-each-group>
            </xsl:element>
        </xsl:for-each-group>
    </Totals>
</xsl:template>

</xsl:stylesheet>


2. Eliminating duplicates

The following question was asked on XSL-List.

What's the best way of getting rid of duplicate nodes which contain more than one attribute. Suppose I have the following xml:

<edge source="IGetter" target="CGetter" dependency="positive"/>
<edge source="IGetter" target="CGetter" dependency="positive"/>
<edge source="IGetter" target="CCount" dependency="positive"/>
<edge source="ICount" target="IGetter" dependency="positive"/>
<edge source="ICount" target="CGetter" dependency="positive"/>
<edge source="ICount" target="ICount" dependency="positive"/>
<edge source="ICount" target="CCount" dependency="positive"/>
<edge source="ICount" target="CCount" dependency="positive"/>

How do I get rid of one
<edge source="IGetter" target="CGetter" dependency="positive"/>
and one
<edge source="ICount" target="CCount" dependency="positive"/>
which appear twice?

Following is a solution for this, using some new XPath 2.0 constructs:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

<xsl:output method="xml" indent="yes" />

<xsl:template match="x">
    <x>
        <xsl:for-each select="edge[not(some $i in preceding-sibling::edge satisfies deep-equal($i, .))]">
            <xsl:copy-of select="." />
        </xsl:for-each>
    </x>
</xsl:template>

</xsl:stylesheet>

This when applied to the XML:

<x>
    <edge source="IGetter" target="CGetter" dependency="positive"/>
    <edge source="IGetter" target="CGetter" dependency="positive"/>
    <edge source="IGetter" target="CCount" dependency="positive"/>
    <edge source="ICount" target="IGetter" dependency="positive"/>
    <edge source="ICount" target="CGetter" dependency="positive"/>
    <edge source="ICount" target="ICount" dependency="positive"/>
    <edge source="ICount" target="CCount" dependency="positive"/>
    <edge source="ICount" target="CCount" dependency="positive"/>
</x>

Produces output:

<?xml version="1.0" encoding="UTF-8"?>
<x>
    <edge source="IGetter" target="CGetter" dependency="positive"/>
    <edge source="IGetter" target="CCount" dependency="positive"/>
    <edge source="ICount" target="IGetter" dependency="positive"/>
    <edge source="ICount" target="CGetter" dependency="positive"/>
    <edge source="ICount" target="ICount" dependency="positive"/>
    <edge source="ICount" target="CCount" dependency="positive"/>
</x>

(Thanks to Abel Braaksma for ideas.)


3.
Positional grouping problem


The following question was asked on
XSL-List.

The input XML is as following:

<Orders>
    <StartOrderGroup>
        <Id>1</Id>
    </StartOrderGroup>
    <Car>
        <Id>2</Id>
    </Car>
    <Car>
        <Id>3</Id>
    </Car>
    <Bus>
        <Id>4</Id>
    </Bus>
    <EndOrderGroup>
        <Id>5</Id>
    </EndOrderGroup>
    <Car>
        <Id>6</Id>
    </Car>
    <Truck>
        <Id>7</Id>
    </Truck>
    <StartOrderGroup>
        <Id>8</Id>
    </StartOrderGroup>
    <Truck>
        <Id>9</Id>
    </Truck>
    <EndOrderGroup>
        <Id>10</Id>
    </EndOrderGroup>
</Orders>

What I need to do is to select all nodes between a <StartOrderGroup> element and a <EndOrderGroup> element, so that I get an output like:

Order
------
Car - 2
Car - 3
Bus - 4

Order
------
Truck - 9

Here's a solution to this problem from, Michael Kay:

In XSLT 2.0, use

<xsl:for-each-group group-starting-with="StartOrderGroup">
    <xsl:variable name="start" select="current-group()[self::StartOrderGroup]"/>
    <xsl:variable name="end" select="current-group()[self::EndOrderGroup]"/>
    <xsl:variable name="group" select="current-group()[. >> $start and . << $end]
    <order>
        <xsl:for-each-select="$group">
            <xsl:value-of select="name()"/> - <xsl:value-of select="Id"/>
        </xsl:for-each>
    </order>
</xsl:for-each-group>

plus some formatting as required.

I attempted to solve this as following:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

<xsl:output method="text" />

<xsl:template match="Orders">
    <xsl:for-each-group select="*" group-starting-with="StartOrderGroup">
        <xsl:text>&#xa;Order&#xa;</xsl:text>
        <xsl:text>-----&#xa;</xsl:text>
        <xsl:variable name="curr-group" select="current-group()" />
        <xsl:variable name="indx" select="index-of(for $x in $curr-group return $x/local-name(), 'EndOrderGroup')" />
        <xsl:for-each select="$curr-group[position() &gt; 1 and position() &lt; $indx]">
            <xsl:value-of select="local-name()" /> - <xsl:value-of select="Id" /><xsl:text>&#xa;</xsl:text>
        </xsl:for-each>
    </xsl:for-each-group>
</xsl:template>

</xsl:stylesheet>


Home


Last Updated: Jan 11, 2009