Eric van der Vlist
September 26, 2002
The "classical" way to define interoperability frameworks seems to be external, "outside" schemas and pre-validation transformations, using a push mechanism such as defined by XPipe. Assuming that we use the namespace prefix "if", this could lead to constructs such as:
<if:pipe name="canonicalValidation">
<if:transform type="http://www.w3.org/1999/XSL/Transform" href="myC14n.xsl"/>
<if:validate type="http://relaxng.org/ns/structure/1.0" href="mySchema.rng">
</if:pipe>
to define a Relax NG validation performed after an XSLT canonicalization.
The big benefit of such an external framework is to be compatible with any existing tool. However, being non intrusive it may become heavy when the transformations and the validation get mixed together as it is the case for the transformation between the parsed and the lexical space.
If we wanted to define the "pre-lexical" transformation (which may depend on the text node or attribute under validation) using an external framework, we would need to split the validation in two phases: a first phase stoped before the pre-lexical information producing an anotated document, the pre-lexical transformation using these annotations to do its job and the final validation performed on the result of the pre-lexical transformation and this whole process seems messy and intrusive.
The other solution which is the subject of my approach is to include
framework elements within the schemas and define transformations to be performed
on the nodes during the validation.
The interoperability framework may be seen as a language and context neutral vocabulary to define which transformations should be applied to XML fragments. This neutrality is important to insure that it can be used either standalone or within any XML application (the range of its application could be extended to non XML schema applications such as for instance XSLT) and that it can be used to launch any transformation.
In the examples shown above, this neutrality has been achieved by focussing
on this feature of defing transformations applied to a XML "context node".
The description is performed as a "push" pipeline of transformations where
if "x" is the context node, the result "y" of the transformation
T applied on "x" (ie "y = T(x)" ) the transformation
is defined as:
The context nodeset (x) is defined by the host language here
<if:transform type="URI indentifying the nature of T">
<if:apply>
Implementation of T
</if:apply>
</if:transform>
The result of the transformation (y) is the context nodeset here
The implementation of T can be defined using a if:apply
element or a apply attribute or referenced using a if:apply/@href
element/attribute.
Several transformations can be chained and to define "y=T1(T2(x))"
the mechanism is repeated "top/down":
The context nodeset (x) is defined by the host language hereIn addition to transformations, we need to define validations, ie transformations producing a boolean result (true/false) and raising an exception aborting the pipe of transformation when their result is false. To define that a validation V needs to be performed on the context node x, we would write:
<if:transform type="URI indentifying the nature of T2">
<if:apply>
Implementation of T2
</if:apply>
</if:transform>
The result T2(x) is the context nodeset here
<if:transform type="URI indentifying the nature of T1">
<if:apply>
Implementation of T1
</if:apply>
</if:transform>
The result y=T1(T2(x)) is the context nodeset here
The context nodeset (x) is defined by the host language hereFinaly, the integration in existing vocabularies may be facilitated by a container allowing to group sets of transformations and validations and a "if:pipe" element may be used for this purpose:
<if:validate type="URI indentifying the nature of V">
<if:apply>
Implementation of V
</if:apply>
</if:validate>
The pipe is aborted if the reseult is false, otherwise, the context nodeset is left unchanged
<if:pipe>
The context nodeset (x) is defined by the host language here
<if:transform type="URI indentifying the nature of T2">
<if:apply>
Implementation of T2
</if:apply>
</if:transform>
The result T2(x) is the context nodeset here
<if:validate type="URI indentifying the nature of V1">
<if:apply>
Implementation of V1
</if:apply>
</if:validate>
The pipe is aborted with an exception if the validation fails. The context node is unchanged otherwise.
<if:transform type="URI indentifying the nature of T1">
<if:apply>
Implementation of T1
</if:apply>
</if:transform>
The result y=T1(T2(x)) is the context nodeset her
<if:validate type="URI indentifying the nature of V">
<if:apply>
Implementation of V
</if:apply>
</if:validate>
The result of the validation of y by V is the result of the pipe.
<if:pipe>
One of the interesting aspect of this proposal is its ability to be easily
embedded within Relax NG schemas and its compatibility with the derivative
algorithm presented by James Clark to implement Relax NG validators and
a pipe can be considered as a new class of Relax NG pattern.
There are even several Relax NG patterns which do already perform transformations
on the nodesets: the list pattern does a "split" transformation
on text nodes and most of the patterns do whitespace normalization. The
integration of the interoperability framework within Relax NG can thus
be seen as an extension and customization of the transformations performed
on the nodes at validation time.
The current implementation is built on a partial implementation of Relax
NG and the framework supports the following transformations:
The framework currently supports the following validations:
The use cases presented below are:
Note: all the examples presented here have been tested on my prototype and included in its test suite. The side effect is that I haven't included examples relying on features not yet supported such as some of the W3C XML Schema datatypes and facets or external references to transformations.
Relax NG doesn't support other separators than whitespaces in lists and
our first example will can be to show how a comma separated list can be expressed
using xvif:
<?xml version="1.0" encoding="utf-8"?>
<element xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" name="foo">
<if:pipe>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="split/,/"/>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<oneOrMore>
<choice>
<value>foo</value>
<value>bar</value>
</choice>
</oneOrMore>
</if:apply>
</if:validate>
</if:pipe>
</element>
[example 1 - try it]
In this first example, we have defined a pipe composed of:
Note that such lists of strings may contain empty strings when a list contains
two consecutive commas ("foo,,bar") or a leading or trailing comma ("foo,bar,")
or (",foo,bar") and that the schema above will consifer all these cases as
invalid since an empty string doesn"t match either "foo" or "bar".
What can be done for commas can be done for whitespaces and a first trial
to emulate the behavior of the Relax NG "list" pattern could be:
<?xml version="1.0" encoding="utf-8"?>
<element xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" name="foo">
<if:pipe>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="split/[ \t\n]+/"/>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<oneOrMore>
<choice>
<value>foo</value>
<value>bar</value>
</choice>
</oneOrMore>
</if:apply>
</if:validate>
</if:pipe>
</element>
[example 2 - try it]
Note that the regular expression has been modified to include all the whitespaces
instead of a single character like we had in our first example and also to
catch several consecutive whitespaces as a single separator. Per this regular
expression, a document such as:
<foo>foo bar foo</foo>
will be valid since the content of the element "foo" is split into the list ("foo", "bar", "foo") but the leading and trainling whitespaces are still caught as delimiters and the document:
<foo> foo bar foo </foo>
is invalid since the content of the element "foo" is ("", "foo", "bar", "foo", "").
To fix this issue, we have the possibility to update the schema checking the result of the transformation to accept leading and trailing empty strings:
<element xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" name="foo">
<if:pipe>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="split/[ \t\n]+/"/>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<optional>
<value/>
</optional>
<oneOrMore>
<choice>
<value>foo</value>
<value>bar</value>
</choice>
</oneOrMore>
<optional>
<value/>
</optional>
</if:apply>
</if:validate>
</if:pipe>
</element>
[example 3 - try it]
But we could as well modify the transformation to remove leading and trailing whitespaces:
<element xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" name="foo">
<if:pipe>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="s/(^[ \t\n]+|[ \t\n]+$)//"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="split/[ \t\n]+/"/>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<oneOrMore>
<choice>
<value>foo</value>
<value>bar</value>
</choice>
</oneOrMore>
</if:apply>
</if:validate>
</if:pipe>
</element>
[example 4 - try it]
Here, we have added a first transformation which removes the trailing and
leading whitespaces before spliting the text node. This example is a good
illustration of the flexibility given by pipes of independent transformations
and validations: the adaptations needed to express a model can be done at
each step and new steps can be added when needed. Finally different techniques
can be combined and in this case, we might have prefered to use a datatype
library to "canonicalize" the text node as a Relax NG token before spliting
it:
<element xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" name="foo">
<if:pipe>
<if:transform type="http://namespaces.xmlschemata.org/xvif/datatypes" apply="http://www.w3.org/2001/XMLSchema-datatypes#token"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="split/[ \t\n]/"/>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<oneOrMore>
<choice>
<value>foo</value>
<value>bar</value>
</choice>
</oneOrMore>
</if:apply>
</if:validate>
</if:pipe>
</element>
[example 5 - try it]
Here we are still taking the approach of transforming the text node before
spliting it, but we rely on the W3C XML Schema xs:token datatype to "tokenize"
the string. Note that although there are some similarities with the Relax
NG "data" pattern, the effect is different since the value computed by a "data"
pattern is used punctually to validate the string and lost even for the embedded
patterns which may use different types. Here on the contrary, we transform
a text node into its canonical value and this canonical value is available
for the next step of the pipe while the original value is lost.
Like with the Relax NG "list" pattern, each of the tokens may have their
own definition and we can use this to split the different parts of a date,
do some basic controls on the month and day and eventually add other constraints,
such as:
<element xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" name="date"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<if:pipe>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="split/-/"/>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<data type="unsignedInt">
<param name="minInclusive">2000</param>
</data>
<data type="unsignedByte">
<param name="maxInclusive">12</param>
</data>
<data type="unsignedByte">
<param name="maxInclusive">31</param>
</data>
</if:apply>
</if:validate>
</if:pipe>
</element>
[example 6 - try it]
This schema will validate dates with a year greater or equal to 2000 and
check that the day is less than 31 and the month less than 31. Although this
schema does a pretty good job to validate dates, one should note that each
"token" is treated as a W3C XML Schema unsigned int or byte and that the constraints
are applied on the value space, ie after canonicalization and whitespace
normalization. Instances such as:
<date>0000002001-012-031</date>
or
<date>
2001
-
12
-
31
</date>
Are thus valid per this schema. Here again, we have several possibilities
to solve this issue. The first coming in mind is to add a check on the lexical
value before doing the split, for instance:
<element xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" name="date" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<if:pipe>
<if:validate type="http://namespaces.xmlschemata.org/xvif/regexp" apply="m/[ \t\n]*[0-9]{4}-[0-9]{2}-[0-9]{2}[ \t\n]*/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="split/-/"/>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<data type="unsignedInt">
<param name="minInclusive">2000</param>
</data>
<data type="unsignedByte">
<param name="maxInclusive">12</param>
</data>
<data type="unsignedByte">
<param name="maxInclusive">31</param>
</data>
</if:apply>
</if:validate>
</if:pipe>
</element>
[example 7 - try it]
This pipe defines a transformation which splits text nodes into a list of
text nodes using "-" as a separator and a validation which checks that this
list has 3 nodes and include constraints on each of these three nodes. The
limitation of this method is that it manipulates list of text nodes which
are the result of the split and not something which can be found natively
in XML documents and we may prefer to split the text node into native XML
nodes. This can be done using for instance the technology called "Regular
Fragmentations" and proposed by Simon St.Laurent:
<grammar xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<start>
<if:pipe>
<if:transform type="http://simonstl.com/ns/fragments/">
<if:apply>
<fragmentRules xmlns="http://simonstl.com/ns/fragments/">
<fragmentRule pattern="^[ \t\n]*([0-9]{4})-([0-9]{2})-([0-9]{2})[ \t\n]*$">
<applyTo>
<element localName="date"/>
</applyTo>
<produce>
<element localName="year"/>
<element localName="month"/>
<element localName="day"/>
</produce>
</fragmentRule>
</fragmentRules>
</if:apply>
</if:transform>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<element name="date">
<element name="year">
<data type="unsignedInt">
<param name="minInclusive">2000</param>
</data>
</element>
<element name="month">
<data type="unsignedByte">
<param name="maxInclusive">12</param>
</data>
</element>
<element name="day">
<data type="unsignedByte">
<param name="maxInclusive">31</param>
</data>
</element>
</element>
</if:apply>
</if:validate>
</if:pipe>
</start>
</grammar>
[example 8 - try it]
Having replaced our regular expressions with Regular Fragmentations, we've
a slightly more verbose syntax since we need to define the mapping between
the tokens out of the regular expression and the elements to generate. On
the other hand, we have been able to specify a full regular expressions which
will check the syntax of the text node during the fragmentation. Another difference
is that regular fragmentations work on a XML fragment and we have needed
to embed the definition of the "date" element within the pipe. A side effect
is that since a "if:pipe" element can't be used as a Relax NG document element
we have needed to add explicit grammar and start elements to embed the 'if:pipe".
The benefit is that as an output of Regular Fragmentations, we have, instead
of a list of tokens, a XML fragment which can be queried using XPath and we
can easily implement new validation checks, such as co-occurence constraints
between days, months and years:
<grammar xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<start>
<if:pipe>
<if:transform type="http://simonstl.com/ns/fragments/">
<if:apply>
<fragmentRules xmlns="http://simonstl.com/ns/fragments/">
<fragmentRule pattern="^[ \t\n]*([0-9]{4})-([0-9]{2})-([0-9]{2})[ \t\n]*$">
<applyTo>
<element localName="date"/>
</applyTo>
<produce>
<element localName="year"/>
<element localName="month"/>
<element localName="day"/>
</produce>
</fragmentRule>
</fragmentRules>
</if:apply>
</if:transform>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<element name="date">
<element name="year">
<data type="unsignedInt">
<param name="minInclusive">2000</param>
</data>
</element>
<element name="month">
<data type="unsignedByte">
<param name="maxInclusive">12</param>
</data>
</element>
<element name="day">
<data type="unsignedByte">
<param name="maxInclusive">31</param>
</data>
</element>
</element>
</if:apply>
</if:validate>
<if:validate type="http://www.w3.org/TR/1999/REC-xpath-19991116"
apply="/date[day < 30 or month != 2]"/>
<if:validate type="http://www.w3.org/TR/1999/REC-xpath-19991116"
apply="/date[day < 29 or month != 2 or (year mod 4 = 0 and not (year mod 100 = 0))]"/>
<if:validate type="http://www.w3.org/TR/1999/REC-xpath-19991116"
apply="/date[day < 31 or (month!=4 and month!=6 and month!=9 and month!=11)]"/>
</if:pipe>
</start>
</grammar>
[example 9 - try it]
The last examples have been kind of emulating the W3C XML Schema "date"
type. Although I have shown them to demonstrate the flexibility of the xvif
framework, pragmatic minds would object that this could be more easily written
as:
<element xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" name="date">
<data type="date">
<param name="minInclusive">2000-01-01</param>
</data>
</element>
[example 10 - try it]
While we must reckon that this is true, we must note that here we have
just moved the complexity inherent to the date formats into the datatype library
itself and that we have lost all our ability to customize these formats. In
complement (or instead) of using such complex formats, xvif can be used for
instance to localize the date formats. For instance, if we need to use French
date formats such as:
<date>26 septembre 2002</date>
We may use regular expressions to transform these formats into the ISO
8601 and write:
<element xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe"
name="date" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<if:pipe>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) janvier ([0-9]+)/\2-01-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) février ([0-9]+)/\2-02-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) mars ([0-9]+)/\2-03-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) avril ([0-9]+)/\2-04-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) mai ([0-9]+)/\2-05-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) juin ([0-9]+)/\2-06-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) juillet ([0-9]+)/\2-07-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) août ([0-9]+)/\2-08-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) septembre ([0-9]+)/\2-09-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) octobre ([0-9]+)/\2-10-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) novembre ([0-9]+)/\2-11-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) décembre ([0-9]+)/\2-12-\1/"/>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<data type="date">
<param name="minInclusive">2002-09-01</param>
<param name="maxInclusive">2002-12-31</param>
</data>
</if:apply>
</if:validate>
</if:pipe>
</element>
[example 11 - try it]
Note that because of the semantics of the regular expression transformation
(ie a basic regexp "substitute"), the strings which do not match are left
unchanged and a date element which would already be ISO 8601 would pass through
all these transformations unchanged and be validated by the Relax NG validate
element. While this gives us the great flexibility to accept either French
or ISO 8601 dates, this may not be what is expected and it may be needed to
add a first validation before the transformations. Another flaw in this example
is the fact that leading zeros are not added to the day which can lead to
invalid ISO 8601 dates. Both issues can be fixed for instance as:
<element xmlns="http://relaxng.org/ns/structure/1.0"
xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe"
name="date" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<if:pipe>
<if:validate type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="m/[0-9]+ .+ [0-9]+/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/^[ \t\n]*([0-9] .*)$/0\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) janvier ([0-9]+)/\2-01-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) février ([0-9]+)/\2-02-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) mars ([0-9]+)/\2-03-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) avril ([0-9]+)/\2-04-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) mai ([0-9]+)/\2-05-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) juin ([0-9]+)/\2-06-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) juillet ([0-9]+)/\2-07-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) août ([0-9]+)/\2-08-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) septembre ([0-9]+)/\2-09-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) octobre ([0-9]+)/\2-10-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) novembre ([0-9]+)/\2-11-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) décembre ([0-9]+)/\2-12-\1/"/>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<data type="date">
<param name="minInclusive">2002-09-01</param>
<param name="maxInclusive">2002-12-31</param>
</data>
</if:apply>
</if:validate>
</if:pipe>
</element>
[example 12 - try it]
Xvif supports different sets of technologies and one might want to use
XSLT and take benefit of its formating features. A similar transformation
could then be defined as:
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<start>
<if:pipe>
<if:transform type="http://www.w3.org/1999/XSL/Transform">
<if:apply>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:vdv="http://eric.van-der-vlist.com/tmpns" version="1.0">
<vdv:dates>
<month name="janvier"/>
<month name="février"/>
<month name="mars"/>
<month name="avril"/>
<month name="mai"/>
<month name="juin"/>
<month name="juillet"/>
<month name="août"/>
<month name="septembre"/>
<month name="octobre"/>
<month name="novembre"/>
<month name="décembre"/>
</vdv:dates>
<xsl:template match="*|@*|text()">
<xsl:copy>
<xsl:apply-templates select="@*|*|text()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/date/text()">
<xsl:variable name="n" select="normalize-space(.)"/>
<xsl:if test="contains(., $n)">
<xsl:variable name="d" select="substring-before($n, ' ')"/>
<xsl:variable name="m" select="substring-before(substring-after($n, ' '), ' ')"/>
<xsl:variable name="y" select="substring-after(substring-after($n, ' '), ' ')"/>
<xsl:value-of select="format-number($y, '0000')"/>
<xsl:text>-</xsl:text>
<xsl:apply-templates select="document('')/xsl:stylesheet/vdv:dates/month[@name=$m]" mode="month"/>
<xsl:text>-</xsl:text>
<xsl:value-of select="format-number($d, '00')"/>
</xsl:if>
</xsl:template>
<xsl:template match="month" mode="month">
<xsl:value-of select="format-number(count(preceding-sibling::month)+1, '00')"/>
</xsl:template>
</xsl:stylesheet>
</if:apply>
</if:transform>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<element name="date">
<data type="date">
<param name="minInclusive">2002-09-01</param>
<param name="maxInclusive">2002-12-31</param>
</data>
</element>
</if:apply>
</if:validate>
</if:pipe>
</start>
</grammar>
[example 13 - try it]
In a previous example, we've seen how an ISO 8601 date could be split into three elements. The reverse operation can be done still more simply:
<grammar xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<start>
<if:pipe>
<if:transform type="http://www.w3.org/1999/XSL/Transform">
<if:apply>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="*|@*|text()">
<xsl:copy>
<xsl:apply-templates select="@*|*|text()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/date">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:copy-of select="*[not(self::year|self::month|self::day)]"/>
<xsl:value-of select="format-number(year, '0000')"/>
<xsl:text>-</xsl:text>
<xsl:value-of select="format-number(month, '00')"/>
<xsl:text>-</xsl:text>
<xsl:value-of select="format-number(day, '00')"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
</if:apply>
</if:transform>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<element name="date">
<data type="date">
<param name="minInclusive">2002-09-01</param>
<param name="maxInclusive">2002-12-31</param>
</data>
</element>
</if:apply>
</if:validate>
</if:pipe>
</start>
</grammar>
[example 14 - try it]
In the two previous examples, we have taken care to copy elements and addtibutes
which could eventually be present in the document in order to catch potential
invalidities. In practice it may be simpler (and less error prone) to just
validate the element before and after the transformation:
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<start>
<if:pipe>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<element name="date">
<element name="year"><text/></element>
<element name="month"><text/></element>
<element name="day"><text/></element>
</element>
</if:apply>
</if:validate>
<if:transform type="http://www.w3.org/1999/XSL/Transform">
<if:apply>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/date">
<xsl:copy>
<xsl:value-of select="format-number(year, '0000')"/>
<xsl:text>-</xsl:text>
<xsl:value-of select="format-number(month, '00')"/>
<xsl:text>-</xsl:text>
<xsl:value-of select="format-number(day, '00')"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
</if:apply>
</if:transform>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<element name="date">
<data type="date">
<param name="minInclusive">2002-09-01</param>
<param name="maxInclusive">2002-12-31</param>
</data>
</element>
</if:apply>
</if:validate>
</if:pipe>
</start>
</grammar>
[example 15 - try it]
After the relative complexity of the last examples, converting numbers from a French decimal format (using a comma as a decimal separator and forbiding decimal parts without a leading 0) is very easy:
<element xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe"
name="decimal" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<if:pipe>
<if:validate type="http://namespaces.xmlschemata.org/xvif/regexp" apply="m/^[ \t\n]*[-+]?[0-9]+(,[0-9]+)?[ \t\n]*$/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="s/,/./"/>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<data type="decimal">
<param name="minInclusive">-1</param>
<param name="maxInclusive">1</param>
</data>
</if:apply>
</if:validate>
</if:pipe>
</element>
[example 16 - try it]
As we are now getting used, we have a first validation to check the inital
format (here using a regular expression), a transformation to convert the
French format into the English format expected by the W3C XML Schema "decimal"
datatype and the Relax NG validation on the result of the transformation which
can use the facets available for this format.
Up to now, we have not considered the issue of the compatibility of our
schemas for Relax NG processors which do not support xvif.
<element xmlns="http://relaxng.org/ns/structure/1.0"
name="decimal" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
</element>
[example 16 seen by a non xvif Relax NG processor]
Which is not even a valid Relax NG Schema. If we want to write a schema which could be read by a non xvif processor and does a minimal amount of validation, this processor should see at least a "text" pattern in the "element" pattern. On the other hand, this "text" pattern should be ignored by a xvif processor which will have already derived text node against the if:pipe considered as a Relax NG pattern. To instruct a xvif processor to ignore a Relax NG pattern, we can introduce a "if:ignore" attribute (allowed and ignorred by Relax NG processors per the Relax NG specification). A schema with fallback could be:
<element xmlns="http://relaxng.org/ns/structure/1.0"
xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe"
name="decimal" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<if:pipe>
<if:validate type="http://namespaces.xmlschemata.org/xvif/regexp" apply="m/^[ \t\n]*[-+]?[0-9]+(,[0-9]+)?[ \t\n]*$/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="s/,/./"/>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<data type="decimal">
<param name="minInclusive">-1</param>
<param name="maxInclusive">1</param>
</data>
</if:apply>
</if:validate>
</if:pipe>
<data if:ignore="1" type="token">
<param name="pattern">[-+]?[0-9]+(,[0-9]+)?</param>
</data>
</element>
[example 17 - try it]
We can note that there is some amount of redundency between the check done
in the pipe before the transformation:
<if:validate type="http://namespaces.xmlschemata.org/xvif/regexp" apply="m/^[ \t\n]*[-+]?[0-9]+(,[0-9]+)?[ \t\n]*$/"/>
and the test done in the fallback pattern:
<param name="pattern">[-+]?[0-9]+(,[0-9]+)?</param>
but I have not found any syntactical construction which could avoid this
repetition while being conform to the Relax NG specification.
The if:piple element is now acting as a Relax NG pattern and can be implemented using the derivative algorithmdescribed by James Clark:
There are a couple of possibilities about what needs to be used as the
source of this transformation, though: should we send the current node or
a pointer on the current node in the document? The tradeoff being that sending
the document doesn't play well with streaming validation but lets the transformation
access to the entire document.
Also, my assumption is that the result of the transformation T(p),
is not persistent in the sense that an implementation may cache this result
to reuse it if it needs to reavaluate the derivation but that a user cannot
explicitely reuse it in a schema. While this is the simpler solution, I
am wondering if there wouldn't be use cases where explicit reuse of the results
of the transformations would be needed. Could this be the case, maybe, for
running integrity checks or additional Schematron tests? If this was the
case, how should we deal with it? Should we let people define variables? But
then what could be the scope of these variables? Or chould we just add flags
to say if constraints are run on the parsed, lexical or value spaces?
I am also wondering if we need conditional structures and, if yes if we
can rely on an exception like logic (a sub pipe is continued until it fails
in which case an alternative is tried) of if we should prefer more traditional
if/then/else structures.