XSLT 2.0 Grouping techniques
In this page, I am compiling some grouping
problems on XML data, and their solutions using the XSLT 2.0 language.
1. Grouping problem
The following question was asked on XSL-List.
I have the following source XML:
<All_Results>
<Result>
<Name>John</Name>
<Country>UK</Country>
<!-- other upto 100 elements -->
<Color>Red</Color>
</Result>
<Result>
<Name>John</Name>
<Country>US</Country>
<!-- other upto 100 elements -->
<Color>Green</Color>
</Result>
<Result>
<Name>Thomas</Name>
<Country>Estonia</Country>
<!-- other upto 100 elements -->
<Color>
</Color>
</Result>
<Result>
<Name>
</Name>
<Country>UK</Country>
<!-- other upto 100 elements -->
<Color>Red</Color>
</Result>
</All_Results>
Each <Result> has the same list of sub-elements, some might not have a text
value.
I want to aggregate and get something like this:
<Totals>
<Name>
<Tag value="John" count="2" />
<Tag value="Thomas" count="1" />
</Name>
<Country>
<Tag value="UK" count="2" />
<Tag value="US" count="1" />
<Tag value="Estonia" count="1" />
</Country>
<Color>
<Tag value="Red" count="2" />
<Tag value="Green" count="1" />
</Color>
<!-- other elements grouped by element name, sorted by total
of element values-->
</Totals>
Following is a XSLT 2.0 solution for this (the sorting is not implemented):
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output method="xml" indent="yes" />
<xsl:template match="/">
<Totals>
<xsl:for-each select="All_Results/Result[1]/*">
<xsl:variable
name="name" select="name()" />
<xsl:element
name="{$name}">
<xsl:for-each-group select="../../Result/*[name() = $name]" group-by=".">
<xsl:if test="not(normalize-space(.) = '')">
<Tag value="{.}" count="{count(current-group())}" />
</xsl:if>
</xsl:for-each-group>
</xsl:element>
</xsl:for-each>
</Totals>
</xsl:template>
</xsl:stylesheet>
Andrew Welch suggested:
Here's another way which doesn't rely on all
elements being present in the first <Result>:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<Totals>
<xsl:for-each-group select="/All_Results/Result/*[normalize-space()]"
group-by="name()">
<xsl:element
name="{current-grouping-key()}">
<xsl:for-each-group select="current-group()" group-by=".">
<Tag value="{current-grouping-key()}" count="{count(current-group())}"/>
</xsl:for-each-group>
</xsl:element>
</xsl:for-each-group>
</Totals>
</xsl:template>
</xsl:stylesheet>
2. Eliminating duplicates
The following question was asked on XSL-List.
What's the best way of getting rid of duplicate nodes which contain more than
one attribute. Suppose I have the following xml:
<edge source="IGetter" target="CGetter" dependency="positive"/>
<edge source="IGetter" target="CGetter" dependency="positive"/>
<edge source="IGetter" target="CCount" dependency="positive"/>
<edge source="ICount" target="IGetter" dependency="positive"/>
<edge source="ICount" target="CGetter" dependency="positive"/>
<edge source="ICount" target="ICount" dependency="positive"/>
<edge source="ICount" target="CCount" dependency="positive"/>
<edge source="ICount" target="CCount" dependency="positive"/>
How do I get rid of one
<edge source="IGetter" target="CGetter" dependency="positive"/>
and one
<edge source="ICount" target="CCount" dependency="positive"/>
which appear twice?
Following is a solution for this, using some new XPath 2.0 constructs:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output method="xml" indent="yes" />
<xsl:template match="x">
<x>
<xsl:for-each select="edge[not(some $i
in preceding-sibling::edge satisfies deep-equal($i, .))]">
<xsl:copy-of
select="." />
</xsl:for-each>
</x>
</xsl:template>
</xsl:stylesheet>
This when applied to the XML:
<x>
<edge source="IGetter" target="CGetter"
dependency="positive"/>
<edge source="IGetter" target="CGetter"
dependency="positive"/>
<edge source="IGetter" target="CCount"
dependency="positive"/>
<edge source="ICount" target="IGetter"
dependency="positive"/>
<edge source="ICount" target="CGetter"
dependency="positive"/>
<edge source="ICount" target="ICount" dependency="positive"/>
<edge source="ICount" target="CCount" dependency="positive"/>
<edge source="ICount" target="CCount" dependency="positive"/>
</x>
Produces output:
<?xml version="1.0" encoding="UTF-8"?>
<x>
<edge source="IGetter" target="CGetter"
dependency="positive"/>
<edge source="IGetter" target="CCount"
dependency="positive"/>
<edge source="ICount" target="IGetter"
dependency="positive"/>
<edge source="ICount" target="CGetter"
dependency="positive"/>
<edge source="ICount" target="ICount" dependency="positive"/>
<edge source="ICount" target="CCount" dependency="positive"/>
</x>
(Thanks to Abel Braaksma for ideas.)
3. Positional grouping problem
The following question was asked on
XSL-List.
The input XML is as following:
<Orders>
<StartOrderGroup>
<Id>1</Id>
</StartOrderGroup>
<Car>
<Id>2</Id>
</Car>
<Car>
<Id>3</Id>
</Car>
<Bus>
<Id>4</Id>
</Bus>
<EndOrderGroup>
<Id>5</Id>
</EndOrderGroup>
<Car>
<Id>6</Id>
</Car>
<Truck>
<Id>7</Id>
</Truck>
<StartOrderGroup>
<Id>8</Id>
</StartOrderGroup>
<Truck>
<Id>9</Id>
</Truck>
<EndOrderGroup>
<Id>10</Id>
</EndOrderGroup>
</Orders>
What I need to do is to select all nodes between a <StartOrderGroup> element
and a <EndOrderGroup> element, so that I get an output like:
Order
------
Car - 2
Car - 3
Bus - 4
Order
------
Truck - 9
Here's a solution to this problem from, Michael Kay:
In XSLT 2.0, use
<xsl:for-each-group group-starting-with="StartOrderGroup">
<xsl:variable name="start" select="current-group()[self::StartOrderGroup]"/>
<xsl:variable name="end" select="current-group()[self::EndOrderGroup]"/>
<xsl:variable name="group" select="current-group()[. >>
$start and . << $end]
<order>
<xsl:for-each-select="$group">
<xsl:value-of
select="name()"/> - <xsl:value-of select="Id"/>
</xsl:for-each>
</order>
</xsl:for-each-group>
plus some formatting as required.
I attempted to solve this as following:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output method="text" />
<xsl:template match="Orders">
<xsl:for-each-group select="*" group-starting-with="StartOrderGroup">
<xsl:text>
Order
</xsl:text>
<xsl:text>-----
</xsl:text>
<xsl:variable name="curr-group"
select="current-group()" />
<xsl:variable name="indx"
select="index-of(for $x in $curr-group return $x/local-name(), 'EndOrderGroup')"
/>
<xsl:for-each select="$curr-group[position()
> 1 and position() < $indx]">
<xsl:value-of
select="local-name()" /> - <xsl:value-of select="Id" /><xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
Last Updated: Jan 11, 2009