XML Schema topics
 

In this page I've compiled some topics related to W3C XML Schema, which the readers might find useful. XML Schema (the W3C flavor [other well known flavors of XML Schema are : RELAX NG and Schematron], aka XSD) is used to validate XML documents (i.e for specifying the structural and data-type constraints for XML documents), and is much enhanced over the previous XML validation language, DTD (which is defined in the XML specification itself). XML Schema has gained wide popularity in recent years, because other W3C specifications have aligned themselves to the W3C XML Schema language (for e.g. XSLT 2.0 and XQuery 1.0, which use XSD as their type system).


Some thoughts about XML Schema 1.1


1. A simple Schema example

I asked the following question on xml-dev mailing list.

It's required to validate the following XML instance (e.g. sample.xml) with XML Schema:

<OBJECTS xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="sample.xsd">
   <B>jack</B>
   <A>dooley</A>
   <C>john</C>
   <B>jill</B>
   <A>mike</A>
   <B>jane</B>
</OBJECTS>

Elements A, B & C can be 0 to unlimited number of instances, and can appear in any order. What will be the XML Schema for this requirement?

Solution
Following is the Schema for this document:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified" attributeFormDefault="unqualified">
  <xs:element name="OBJECTS">
     <xs:complexType>
       <xs:choice minOccurs="0" maxOccurs="unbounded">
          <xs:element name="A" type="xs:string"/>
          <xs:element name="B" type="xs:string"/>
          <xs:element name="C" type="xs:string"/>
       </xs:choice>
    </xs:complexType>
  </xs:element>
</xs:schema>

At first instance, it looked to me that the following XML Schema was right:

<xs:element name="OBJECTS">
   <xs:complexType>
      <xs:all>
         <xs:element name="A" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
         <xs:element name="A" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
         <xs:element name="A" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
      </xs:all>
   </xs:complexType>
</xs:element>

But this is wrong, because within xs:all, maxOccurs cannot be unbounded (it can be only 0 or 1).

Thanks to G. Ken Holman for ideas.
 

2. Type inheritance and polymorphism in XML Schema

I initiated this discussion on xml-dev mailing list.

W3C XML Schema language allows us to construct types based on other types (by extension or restriction), a concept known as inheritance. We can also connect types with polymorphic associations.

Consider the following XML Schema:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified" attributeFormDefault="unqualified">

<xs:complexType name="shape">
    <xs:sequence>
        <xs:element name="id" type="xs:integer" />
        <xs:element name="color" type="xs:string" />
    </xs:sequence>
</xs:complexType>

<xs:complexType name="circle">
    <xs:complexContent>
        <xs:extension base="shape">
            <xs:sequence>
                <xs:element name="radius" type="xs:float" />
            </xs:sequence>
        </xs:extension>
    </xs:complexContent>
</xs:complexType>

<xs:complexType name="rectangle">
    <xs:complexContent>
        <xs:extension base="shape">
            <xs:sequence>
                <xs:element name="side1" type="xs:float" />
                <xs:element name="side2" type="xs:float" />
                <xs:element name="side3" type="xs:float" />
                <xs:element name="side4" type="xs:float" />
            </xs:sequence>
        </xs:extension>
    </xs:complexContent>
</xs:complexType>

<xs:element name="OBJECTS">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="OBJECT" type="shape" maxOccurs="unbounded" />
        </xs:sequence>
    </xs:complexType>
</xs:element>

</xs:schema>

The corresponding XML instance is:

<?xml version="1.0" encoding="UTF-8"?>
<OBJECTS xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="sample.xsd">
    <OBJECT xsi:type="circle">
        <id>1</id>
        <color>red</color>
        <radius>10.5</radius>
    </OBJECT>
    <OBJECT xsi:type="rectangle">
        <id>2</id>
        <color>green</color>
        <side1>4</side1>
        <side2>2</side2>
        <side3>4</side3>
        <side4>2</side4>
        </OBJECT>
</OBJECTS>

In the XML instance, we refer to the XML Schema types "circle" and "rectangle". The XML Schema defines the element "OBJECT" to be of type "shape". The "circle" and "rectangle" types extend from type "shape".

This is an example of type inheritance and polymorphism in XML Schema.

Thanks to Pete Cordell for ideas.
 

3. Usage of targetNamespace in XML Schema

The XML instance can contain elements in no namespace, or in a namespace.

A no namespace example of XML instance is:

<test>
   <a/>
</test>

A namespace example of XML instance is:

<ns:test xmlns:ns="http://www.example.com/myNs">
   <a/>
</ns:test>

It is possible to define XML Schemas which can validate XML instances, which should contain elements in namespaces.

Following is an example of a XML Schema and the corresponding XML instance.

XML Schema:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="https://gandhimukul.tripod.com/mySchema" xmlns:test="https://gandhimukul.tripod.com/mySchema" elementFormDefault="unqualified" attributeFormDefault="unqualified">

<xs:element name="SAMPLE">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="x" type="xs:string" />
            <xs:element name="y" type="xs:string" />
        </xs:sequence>
    </xs:complexType>
</xs:element>

</xs:schema>

XML instance that is valid according to the above Schema:

<?xml version="1.0" encoding="UTF-8"?>
<test:SAMPLE xmlns:test="https://gandhimukul.tripod.com/mySchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://gandhimukul.tripod.com/mySchema tns.xsd">
    <x/>
    <y/>
</test:SAMPLE>

Notes: In the above XML instance, the usage of prefix "test" for the namespace URI "https://gandhimukul.tripod.com/mySchema" is arbitrary. Any prefix can be used, provided that the correct namespace URI is used.
 

4. A custom numeric data type for an XML Schema

The following question was asked on IBM developerWorks forum:

I am trying to define the data type which should allow the number between 1 to 4099 with the exception of 4070. How can I define?

Solution:

Consider this Schema (named, test.xsd):

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="x" type="myNumber" />

<xs:simpleType name="myNumber">
    <xs:union>
        <xs:simpleType>
            <xs:restriction base="xs:positiveInteger">
                <xs:maxInclusive value="4069"/>
            </xs:restriction>
        </xs:simpleType>
        <xs:simpleType>
            <xs:restriction base="xs:positiveInteger">
                <xs:minInclusive value="4071"/>
                <xs:maxInclusive value="4099"/>
            </xs:restriction>
        </xs:simpleType>
    </xs:union>
</xs:simpleType>

</xs:schema>

The test input is:

<x xsi:noNamespaceSchemaLocation="test.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">100</x>

(change the value of x, and you would get appropriate messages from the XML Schema validator.)

I tested this example using Xerces-J 2.9.1, and it works fine.
 

5. An example of 'nillable' property on XSD element declarations

The XML Schema language allows an attribute named, 'nillable' on xs:element declarations. This is a very useful XML Schema facility, but many Schema authors don't use it due to lack of information.

Please consider the following example, and the explanation thereafter.

person.xsd (An XML Schema representing a list of PERSON elements)

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="PEOPLE">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="PERSON" type="PersonType" minOccurs="1" maxOccurs="unbounded" />
        </xs:sequence>
    </xs:complexType>
</xs:element>

<xs:complexType name="PersonType">
    <xs:sequence>
        <xs:element name="FNAME" type="xs:string" />
        <xs:element name="LNAME" type="xs:string" />
        <xs:element name="DOB" type="xs:date" nillable="true" />
        <xs:element name="SEX">
            <xs:simpleType>
                <xs:restriction base="xs:string">
                    <xs:enumeration value="M" />
                    <xs:enumeration value="F" />
                </xs:restriction>
            </xs:simpleType>
        </xs:element>
    </xs:sequence>
</xs:complexType>

</xs:schema>

The corresponding, valid XML instance is:

<PEOPLE xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="person.xsd">
    <PERSON>
        <FNAME>Mukul</FNAME>
        <LNAME>Gandhi</LNAME>
        <DOB xsi:nil="true" />
        <SEX>M</SEX>
    </PERSON>
    <PERSON>
        <FNAME>John</FNAME>
        <LNAME>Good</LNAME>
        <DOB xsi:nil="true" />
        <SEX>M</SEX>
    </PERSON>
</PEOPLE>

Notes:
1) Please note the 'nillable' property in the Schema (on the desired element declarations), and the xsi:nil attribute on corresponding element instances.

2) According to the XML Schema specification,
If {nillable} is true, then an element may also be valid if it carries the namespace qualified attribute with local name nil from namespace http://www.w3.org/2001/XMLSchema-instance and value true even if it has no text or element content despite a {content type} which would otherwise require content.

The XML Schema 'nillable' property, and the corresponding 'nil' attribute (in an instance document) is a boolean value. The default value of, 'nillable' property is false.

In the above example, since the Schema defines the element 'DOB', the element 'DOB' must be present in the instance document. But the value of element 'DOB' in instance document can be empty, along with xsi:nil indicator. If attribute xsi:nil is not provided in instance document, then the value of 'DOB' element in instance document must conform to the Schema type, xs:date.

3) The popular open source XML parser, Xerces-J implements this feature, with which I tested this sample.
 


Useful references

*    http://www.w3.org/TR/xmlschema-1/ (XML Schema Part 1: Structures), W3C Recommendation

*    http://www.w3.org/TR/xmlschema-2/ (XML Schema Part 2: Datatypes), W3C Recommendation     

*    http://www.xfront.com/BestPracticesHomepage.html (XML Schemas: best practices), by Roger L. Costello


Home


Last Updated: Aug 22, 2011