Proposal: bringing the framework within schemas

Eric van der Vlist

September 26, 2002

History

September 26, 2002: cleaner syntax
June 19, 2002: change of some element names, formalization and clarification
May 29, 2002:Addition of Rick's use cases, example using STX and considerations on implementation and variables.
May 25, 2002: creation

Introduction

The "classical" way to define interoperability frameworks seems to be external, "outside" schemas and pre-validation transformations, using a push mechanism such as defined by XPipe. Assuming that we use the namespace prefix "if", this could lead to constructs such as:

<if:pipe name="canonicalValidation">
 <if:transform type="http://www.w3.org/1999/XSL/Transform" href="myC14n.xsl"/>
 <if:validate type="http://relaxng.org/ns/structure/1.0" href="mySchema.rng">
</if:pipe>

to define a Relax NG validation performed after an XSLT canonicalization.

The big benefit of such an external framework is to be compatible with any existing tool. However, being non intrusive it may become heavy when the transformations and the validation get mixed together as it is the case for the transformation between the parsed and the lexical space.

If we wanted to define the "pre-lexical" transformation (which may depend on the text node or attribute under validation) using an external framework, we would need to split the validation in two phases: a first phase stoped before the pre-lexical information producing an anotated document, the pre-lexical transformation using these annotations to do its job and the final validation performed on the result of the pre-lexical transformation and this whole process seems messy and intrusive.

The other solution which is the subject of my approach is to include framework elements within the schemas and define transformations to be performed on the nodes during the validation.

Semantic

The interoperability framework may be seen as a language and context neutral vocabulary to define which transformations should be applied to XML fragments. This neutrality is important to insure that it can be used either standalone or within any XML application (the range of its application could be extended to non XML schema applications such as for instance XSLT) and that it can be used to launch any transformation.

In the examples shown above, this neutrality has been achieved by focussing on this feature of defing transformations applied to a XML "context node".

The description is performed as a "push" pipeline of transformations where if "x" is the context node, the result "y" of the transformation T applied on "x" (ie "y = T(x)" ) the transformation is defined as:


  The context nodeset (x) is defined by the host language here
  <if:transform type="URI indentifying the nature of T">
    <if:apply>
       Implementation of T
    </if:apply>
 </if:transform>
 The result of the transformation (y) is the context nodeset here

The implementation of T can be defined using a if:apply element or a apply attribute or referenced using a if:apply/@href element/attribute.

Several transformations can be chained and to define "y=T1(T2(x))" the mechanism is repeated "top/down":

  The context nodeset (x) is defined by the host language here
  <if:transform type="URI indentifying the nature of T2">
    <if:apply>
       Implementation of T2
    </if:apply>
  </if:transform>
  The result T2(x) is the context nodeset here
  <if:transform type="URI indentifying the nature of T1">
    <if:apply>
       Implementation of T1
    </if:apply>     
  </if:transform>
  The result y=T1(T2(x)) is the context nodeset here

In addition to transformations, we need to define validations, ie transformations producing a boolean result (true/false) and raising an exception aborting the pipe of transformation when their result is false. To define that a validation V needs to be performed on the context node x, we would write:

 The context nodeset (x) is defined by the host language here
  <if:validate type="URI indentifying the nature of V">
    <if:apply>
       Implementation of V
    </if:apply>
 </if:validate>
 The pipe is aborted if the reseult is false, otherwise, the context nodeset is left unchanged

Finaly, the integration in existing vocabularies may be facilitated by a container allowing to group sets of transformations and validations and a "if:pipe" element may be used for this purpose:

<if:pipe>
  The context nodeset (x) is defined by the host language here
  <if:transform type="URI indentifying the nature of T2">
    <if:apply>
       Implementation of T2
    </if:apply>
  </if:transform>
  The result T2(x) is the context nodeset here
  <if:validate type="URI indentifying the nature of V1">
    <if:apply>
       Implementation of V1
    </if:apply>
  </if:validate>
  The pipe is aborted with an exception if the validation fails. The context node is unchanged otherwise.  
  <if:transform type="URI indentifying the nature of T1">
    <if:apply>
       Implementation of T1
    </if:apply>     
  </if:transform>
  The result y=T1(T2(x)) is the context nodeset her
  <if:validate type="URI indentifying the nature of V">
    <if:apply>
       Implementation of V
    </if:apply>
  </if:validate>
 The result of the validation of y by V is the result of the pipe.
<if:pipe>

Integration within Relax NG schemas

One of the interesting aspect of this proposal is its ability to be easily embedded within Relax NG schemas and its compatibility with the derivative algorithm presented by James Clark to implement Relax NG validators and a pipe can be considered as a new class of Relax NG pattern.

There are even several Relax NG patterns which do already perform transformations on the nodesets: the list pattern does a "split" transformation on text nodes and most of the patterns do whitespace normalization. The integration of the interoperability framework within Relax NG can thus be seen as an extension and customization of the transformations performed on the nodes at validation time.

Current implementation

The current implementation is built on a partial implementation of Relax NG and the framework supports the following transformations:

XPath (the node is transformed into the result of the XPath expression)
Regular expression (text nodes are passed through a regular expression substitution)
XSLT (the node is replaced by the result of the XSLT transformation)
Regular Fragmentation (the node is fragmented)

The framework currently supports the following validations:

XPath (the result of the XPath expression is converted into a boolean which must be true)
Regular expression (text nodes are matched on the regular expression)
Relax NG

Use cases

The use cases presented below are:

list type separator adjustment: transforming "foo, bar" into "foo bar".
number localization: transforming "123,45" into "123.45".
date localization: transforming "25 mai 2002" into "2002-05-25".
date decomposition: transforming "2002-05-25" into "<year>2002</year><month>05</month><day>25</day>".
date recomposition: transforming "<year>2002</year><month>05</month><day>25</day>" into "2002-05-25" .

Note: all the examples presented here have been tested on my prototype and included in its test suite. The side effect is that I haven't included examples relying on features not yet supported such as some of the W3C XML Schema datatypes and facets or external references to transformations.

List type separator adjustment

Relax NG doesn't support other separators than whitespaces in lists and our first example will can be to show how a comma separated list can be expressed using xvif:

<?xml version="1.0" encoding="utf-8"?>
<element xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" name="foo">
 <if:pipe>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="split/,/"/>
  <if:validate type="http://relaxng.org/ns/structure/1.0">
    <if:apply>
     <oneOrMore>
      <choice>
       <value>foo</value>
       <value>bar</value>
      </choice>
     </oneOrMore>
    </if:apply>
  </if:validate>
 </if:pipe>
</element>
[example 1 - try it]

In this first example, we have defined a pipe composed of:

A transformation which is a regular expression (split/,/) which will split comma separated lists into a list of strings.
A validation which will check that this list is one or more tokens equal to "foo" or "bar"

Note that such lists of strings may contain empty strings when a list contains two consecutive commas ("foo,,bar") or a leading or trailing comma ("foo,bar,") or (",foo,bar") and that the schema above will consifer all these cases as invalid since an empty string doesn"t match either "foo" or "bar".

What can be done for commas can be done for whitespaces and a first trial to emulate the behavior of the Relax NG "list" pattern could be:

<?xml version="1.0" encoding="utf-8"?>
<element xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" name="foo">
 <if:pipe>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="split/[ \t\n]+/"/>
  <if:validate type="http://relaxng.org/ns/structure/1.0">
    <if:apply>
     <oneOrMore>
      <choice>
       <value>foo</value>
       <value>bar</value>
      </choice>
     </oneOrMore>
    </if:apply>
  </if:validate>
 </if:pipe>
</element>
[example 2 - try it]

Note that the regular expression has been modified to include all the whitespaces instead of a single character like we had in our first example and also to catch several consecutive whitespaces as a single separator. Per this regular expression, a document such as:

<foo>foo      bar        foo</foo>

will be valid since the content of the element "foo" is split into the list ("foo", "bar", "foo") but the leading and trainling whitespaces are still caught as delimiters and the document:

<foo>     foo      bar        foo         </foo>

is invalid since the content of the element "foo" is ("", "foo", "bar", "foo", "").

To fix this issue, we have the possibility to update the schema checking the result of the transformation to accept leading and trailing empty strings:

<element xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" name="foo">
 <if:pipe>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="split/[ \t\n]+/"/>
  <if:validate type="http://relaxng.org/ns/structure/1.0">
    <if:apply>
     <optional>
      <value/>
     </optional>
     <oneOrMore>
      <choice>
       <value>foo</value>
       <value>bar</value>
      </choice>
     </oneOrMore>
     <optional>
      <value/>
     </optional>
    </if:apply>
  </if:validate>
 </if:pipe>
</element>
[example 3 - try it]

But we could as well modify the transformation to remove leading and trailing whitespaces:

<element xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" name="foo">
 <if:pipe>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="s/(^[ \t\n]+|[ \t\n]+$)//"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="split/[ \t\n]+/"/>
  <if:validate type="http://relaxng.org/ns/structure/1.0">
    <if:apply>
     <oneOrMore>
      <choice>
       <value>foo</value>
       <value>bar</value>
      </choice>
     </oneOrMore>
    </if:apply>
  </if:validate>
 </if:pipe>
</element>
[example 4 - try it]

Here, we have added a first transformation which removes the trailing and leading whitespaces before spliting the text node. This example is a good illustration of the flexibility given by pipes of independent transformations and validations: the adaptations needed to express a model can be done at each step and new steps can be added when needed. Finally different techniques can be combined and in this case, we might have prefered to use a datatype library to "canonicalize" the text node as a Relax NG token before spliting it:

<element xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" name="foo">
 <if:pipe>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/datatypes" apply="http://www.w3.org/2001/XMLSchema-datatypes#token"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="split/[ \t\n]/"/>
  <if:validate type="http://relaxng.org/ns/structure/1.0">
    <if:apply>
     <oneOrMore>
      <choice>
       <value>foo</value>
       <value>bar</value>
      </choice>
     </oneOrMore>
    </if:apply>
  </if:validate>
 </if:pipe>
</element>
[example 5 - try it]

Here we are still taking the approach of transforming the text node before spliting it, but we rely on the W3C XML Schema xs:token datatype to "tokenize" the string. Note that although there are some similarities with the Relax NG "data" pattern, the effect is different since the value computed by a "data" pattern is used punctually to validate the string and lost even for the embedded patterns which may use different types. Here on the contrary, we transform a text node into its canonical value and this canonical value is available for the next step of the pipe while the original value is lost.

Like with the Relax NG "list" pattern, each of the tokens may have their own definition and we can use this to split the different parts of a date, do some basic controls on the month and day and eventually add other constraints, such as:

<element xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" name="date" 
 datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
 <if:pipe>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="split/-/"/>
  <if:validate type="http://relaxng.org/ns/structure/1.0">
   <if:apply>
    <data type="unsignedInt">    
     <param name="minInclusive">2000</param>
    </data>
    <data type="unsignedByte">   
     <param name="maxInclusive">12</param>
    </data>
    <data type="unsignedByte">   
     <param name="maxInclusive">31</param>
    </data>
   </if:apply>
  </if:validate>
 </if:pipe>
</element>
[example 6 - try it]

This schema will validate dates with a year greater or equal to 2000 and check that the day is less than 31 and the month less than 31. Although this schema does a pretty good job to validate dates, one should note that each "token" is treated as a W3C XML Schema unsigned int or byte and that the constraints are applied on the value space, ie after canonicalization and whitespace normalization. Instances such as:

<date>0000002001-012-031</date>

<date>
	2001
	-
	12
	-
	31
</date>

Are thus valid per this schema. Here again, we have several possibilities to solve this issue. The first coming in mind is to add a check on the lexical value before doing the split, for instance:

<element xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" name="date" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
 <if:pipe>
  <if:validate type="http://namespaces.xmlschemata.org/xvif/regexp" apply="m/[ \t\n]*[0-9]{4}-[0-9]{2}-[0-9]{2}[ \t\n]*/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="split/-/"/>
  <if:validate type="http://relaxng.org/ns/structure/1.0">
   <if:apply>
    <data type="unsignedInt">    
     <param name="minInclusive">2000</param>
    </data>
    <data type="unsignedByte">   
     <param name="maxInclusive">12</param>
    </data>
    <data type="unsignedByte">   
     <param name="maxInclusive">31</param>
    </data>
   </if:apply>
  </if:validate>
 </if:pipe>
</element>
[example 7 - try it]

This pipe defines a transformation which splits text nodes into a list of text nodes using "-" as a separator and a validation which checks that this list has 3 nodes and include constraints on each of these three nodes. The limitation of this method is that it manipulates list of text nodes which are the result of the split and not something which can be found natively in XML documents and we may prefer to split the text node into native XML nodes. This can be done using for instance the technology called "Regular Fragmentations" and proposed by Simon St.Laurent:

<grammar xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" 
 datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
 <start>
  <if:pipe>
   <if:transform type="http://simonstl.com/ns/fragments/">
    <if:apply>
     <fragmentRules xmlns="http://simonstl.com/ns/fragments/">
      <fragmentRule pattern="^[ \t\n]*([0-9]{4})-([0-9]{2})-([0-9]{2})[ \t\n]*$">
       <applyTo>
        <element localName="date"/>
       </applyTo>
       <produce>
        <element localName="year"/>
        <element localName="month"/>
        <element localName="day"/>
       </produce>
      </fragmentRule>
     </fragmentRules>
    </if:apply>
   </if:transform> 
   <if:validate type="http://relaxng.org/ns/structure/1.0">
    <if:apply>
     <element name="date">
      <element name="year">
       <data type="unsignedInt">    
        <param name="minInclusive">2000</param>
       </data>
      </element>
      <element name="month">
       <data type="unsignedByte">   
        <param name="maxInclusive">12</param>
       </data>
      </element>
      <element name="day">
       <data type="unsignedByte">   
        <param name="maxInclusive">31</param>
       </data>
      </element>
     </element>
    </if:apply>
   </if:validate>
  </if:pipe>
 </start>
</grammar>
[example 8 - try it]

Having replaced our regular expressions with Regular Fragmentations, we've a slightly more verbose syntax since we need to define the mapping between the tokens out of the regular expression and the elements to generate. On the other hand, we have been able to specify a full regular expressions which will check the syntax of the text node during the fragmentation. Another difference is that regular fragmentations work on a XML fragment and we have needed to embed the definition of the "date" element within the pipe. A side effect is that since a "if:pipe" element can't be used as a Relax NG document element we have needed to add explicit grammar and start elements to embed the 'if:pipe".

The benefit is that as an output of Regular Fragmentations, we have, instead of a list of tokens, a XML fragment which can be queried using XPath and we can easily implement new validation checks, such as co-occurence constraints between days, months and years:

<grammar xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" 
 datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
 <start>
  <if:pipe>
   <if:transform type="http://simonstl.com/ns/fragments/">
    <if:apply>
     <fragmentRules xmlns="http://simonstl.com/ns/fragments/">
      <fragmentRule pattern="^[ \t\n]*([0-9]{4})-([0-9]{2})-([0-9]{2})[ \t\n]*$">
       <applyTo>
        <element localName="date"/>
       </applyTo>
       <produce>
        <element localName="year"/>
        <element localName="month"/>
        <element localName="day"/>
       </produce>
      </fragmentRule>
     </fragmentRules>
    </if:apply>
   </if:transform> 
   <if:validate type="http://relaxng.org/ns/structure/1.0">
    <if:apply>
     <element name="date">
      <element name="year">
       <data type="unsignedInt">    
        <param name="minInclusive">2000</param>
       </data>
      </element>
      <element name="month">
       <data type="unsignedByte">   
        <param name="maxInclusive">12</param>
       </data>
      </element>
      <element name="day">
       <data type="unsignedByte">   
        <param name="maxInclusive">31</param>
       </data>
      </element>
     </element>
    </if:apply>
   </if:validate>
   <if:validate type="http://www.w3.org/TR/1999/REC-xpath-19991116" 
    apply="/date[day < 30 or month != 2]"/>
   <if:validate type="http://www.w3.org/TR/1999/REC-xpath-19991116" 
    apply="/date[day < 29 or month != 2 or (year mod 4 = 0 and not (year mod 100 = 0))]"/>
   <if:validate type="http://www.w3.org/TR/1999/REC-xpath-19991116" 
    apply="/date[day < 31 or (month!=4 and month!=6 and month!=9 and month!=11)]"/>
  </if:pipe>
 </start>
</grammar>
[example 9 - try it]

The last examples have been kind of emulating the W3C XML Schema "date" type. Although I have shown them to demonstrate the flexibility of the xvif framework, pragmatic minds would object that this could be more easily written as:

<element xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" name="date">
 <data type="date">
  <param name="minInclusive">2000-01-01</param>
 </data>
</element>
[example 10 - try it]

While we must reckon that this is true, we must note that here we have just moved the complexity inherent to the date formats into the datatype library itself and that we have lost all our ability to customize these formats. In complement (or instead) of using such complex formats, xvif can be used for instance to localize the date formats. For instance, if we need to use French date formats such as:

<date>26 septembre 2002</date>

We may use regular expressions to transform these formats into the ISO 8601 and write:

<element xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" 
 name="date" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
 <if:pipe>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/([0-9]+) janvier ([0-9]+)/\2-01-\1/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/([0-9]+) février ([0-9]+)/\2-02-\1/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/([0-9]+) mars ([0-9]+)/\2-03-\1/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/([0-9]+) avril ([0-9]+)/\2-04-\1/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/([0-9]+) mai ([0-9]+)/\2-05-\1/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/([0-9]+) juin ([0-9]+)/\2-06-\1/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/([0-9]+) juillet ([0-9]+)/\2-07-\1/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/([0-9]+) août ([0-9]+)/\2-08-\1/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/([0-9]+) septembre ([0-9]+)/\2-09-\1/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/([0-9]+) octobre ([0-9]+)/\2-10-\1/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/([0-9]+) novembre ([0-9]+)/\2-11-\1/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/([0-9]+) décembre ([0-9]+)/\2-12-\1/"/>
  <if:validate type="http://relaxng.org/ns/structure/1.0">
   <if:apply>
    <data type="date"> 
     <param name="minInclusive">2002-09-01</param>
     <param name="maxInclusive">2002-12-31</param>
    </data>
   </if:apply>
  </if:validate>
 </if:pipe>
</element>
[example 11 - try it]

Note that because of the semantics of the regular expression transformation (ie a basic regexp "substitute"), the strings which do not match are left unchanged and a date element which would already be ISO 8601 would pass through all these transformations unchanged and be validated by the Relax NG validate element. While this gives us the great flexibility to accept either French or ISO 8601 dates, this may not be what is expected and it may be needed to add a first validation before the transformations. Another flaw in this example is the fact that leading zeros are not added to the day which can lead to invalid ISO 8601 dates. Both issues can be fixed for instance as:

<element xmlns="http://relaxng.org/ns/structure/1.0" 
 xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" 
 name="date" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
 <if:pipe>
  <if:validate type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="m/[0-9]+ .+ [0-9]+/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/^[ \t\n]*([0-9] .*)$/0\1/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/([0-9]+) janvier ([0-9]+)/\2-01-\1/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/([0-9]+) février ([0-9]+)/\2-02-\1/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/([0-9]+) mars ([0-9]+)/\2-03-\1/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/([0-9]+) avril ([0-9]+)/\2-04-\1/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/([0-9]+) mai ([0-9]+)/\2-05-\1/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/([0-9]+) juin ([0-9]+)/\2-06-\1/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/([0-9]+) juillet ([0-9]+)/\2-07-\1/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/([0-9]+) août ([0-9]+)/\2-08-\1/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/([0-9]+) septembre ([0-9]+)/\2-09-\1/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/([0-9]+) octobre ([0-9]+)/\2-10-\1/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/([0-9]+) novembre ([0-9]+)/\2-11-\1/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
   apply="s/([0-9]+) décembre ([0-9]+)/\2-12-\1/"/>
  <if:validate type="http://relaxng.org/ns/structure/1.0">
   <if:apply>
    <data type="date"> 
     <param name="minInclusive">2002-09-01</param>
     <param name="maxInclusive">2002-12-31</param>
    </data>
   </if:apply>
  </if:validate>
 </if:pipe>
</element>
[example 12 - try it]

Xvif supports different sets of technologies and one might want to use XSLT and take benefit of its formating features. A similar transformation could then be defined as:

<grammar xmlns="http://relaxng.org/ns/structure/1.0" 
 xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" 
 datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
 <start>
  <if:pipe>
   <if:transform type="http://www.w3.org/1999/XSL/Transform">
    <if:apply>
     <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
       xmlns:vdv="http://eric.van-der-vlist.com/tmpns" version="1.0">
      <vdv:dates>
       <month name="janvier"/>
       <month name="février"/>
       <month name="mars"/>
       <month name="avril"/>
       <month name="mai"/>
       <month name="juin"/>
       <month name="juillet"/>
       <month name="août"/>
       <month name="septembre"/>
       <month name="octobre"/>
       <month name="novembre"/>
       <month name="décembre"/>
      </vdv:dates>
      <xsl:template match="*|@*|text()">
       <xsl:copy>
        <xsl:apply-templates select="@*|*|text()"/>
       </xsl:copy>
      </xsl:template>
      <xsl:template match="/date/text()">
       <xsl:variable name="n" select="normalize-space(.)"/>
       <xsl:if test="contains(., $n)">
        <xsl:variable name="d" select="substring-before($n, ' ')"/>
        <xsl:variable name="m" select="substring-before(substring-after($n, ' '), ' ')"/>
        <xsl:variable name="y" select="substring-after(substring-after($n, ' '), ' ')"/>
        
        <xsl:value-of select="format-number($y, '0000')"/>
        <xsl:text>-</xsl:text>
        
        <xsl:apply-templates select="document('')/xsl:stylesheet/vdv:dates/month[@name=$m]" mode="month"/>
        <xsl:text>-</xsl:text>
        
        <xsl:value-of select="format-number($d, '00')"/>
       </xsl:if>
      </xsl:template>
      <xsl:template match="month" mode="month">
       <xsl:value-of select="format-number(count(preceding-sibling::month)+1, '00')"/>
      </xsl:template>
     </xsl:stylesheet>
    </if:apply>
   </if:transform>
   <if:validate type="http://relaxng.org/ns/structure/1.0">
    <if:apply>
     <element name="date">
      <data type="date"> 
       <param name="minInclusive">2002-09-01</param>
       <param name="maxInclusive">2002-12-31</param>
      </data>
     </element>
    </if:apply>
   </if:validate>
  </if:pipe>
 </start>
</grammar>
[example 13 - try it]

In a previous example, we've seen how an ISO 8601 date could be split into three elements. The reverse operation can be done still more simply:

<grammar xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" 
 datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
 <start>
  <if:pipe>
   <if:transform type="http://www.w3.org/1999/XSL/Transform">
    <if:apply>
     <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
      <xsl:template match="*|@*|text()">
       <xsl:copy>
        <xsl:apply-templates select="@*|*|text()"/>
       </xsl:copy>
      </xsl:template>
      <xsl:template match="/date">
       <xsl:copy>
        <xsl:copy-of select="@*"/>
        <xsl:copy-of select="*[not(self::year|self::month|self::day)]"/>
        <xsl:value-of select="format-number(year, '0000')"/>
        <xsl:text>-</xsl:text>
        <xsl:value-of select="format-number(month, '00')"/>
        <xsl:text>-</xsl:text>
        <xsl:value-of select="format-number(day, '00')"/>
       </xsl:copy>
      </xsl:template>
     </xsl:stylesheet>
    </if:apply>
   </if:transform>
   <if:validate type="http://relaxng.org/ns/structure/1.0">
    <if:apply>
     <element name="date">
      <data type="date"> 
       <param name="minInclusive">2002-09-01</param>
       <param name="maxInclusive">2002-12-31</param>
      </data>
     </element>
    </if:apply>
   </if:validate>
  </if:pipe>
 </start>
</grammar>
[example 14 - try it]

In the two previous examples, we have taken care to copy elements and addtibutes which could eventually be present in the document in order to catch potential invalidities. In practice it may be simpler (and less error prone) to just validate the element before and after the transformation:

<grammar xmlns="http://relaxng.org/ns/structure/1.0" 
 xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" 
 datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
 <start>
  <if:pipe>
   <if:validate type="http://relaxng.org/ns/structure/1.0">
    <if:apply>
     <element name="date">
      <element name="year"><text/></element>
      <element name="month"><text/></element>
      <element name="day"><text/></element>
     </element>
    </if:apply>
   </if:validate>
   <if:transform type="http://www.w3.org/1999/XSL/Transform">
    <if:apply>
     <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
      <xsl:template match="/date">
       <xsl:copy>
        <xsl:value-of select="format-number(year, '0000')"/>
        <xsl:text>-</xsl:text>
        <xsl:value-of select="format-number(month, '00')"/>
        <xsl:text>-</xsl:text>
        <xsl:value-of select="format-number(day, '00')"/>
       </xsl:copy>
      </xsl:template>
     </xsl:stylesheet>
    </if:apply>
   </if:transform>
   <if:validate type="http://relaxng.org/ns/structure/1.0">
    <if:apply>
     <element name="date">
      <data type="date"> 
       <param name="minInclusive">2002-09-01</param>
       <param name="maxInclusive">2002-12-31</param>
      </data>
     </element>
    </if:apply>
   </if:validate>
  </if:pipe>
 </start>
</grammar>
[example 15 - try it]

After the relative complexity of the last examples, converting numbers from a French decimal format (using a comma as a decimal separator and forbiding decimal parts without a leading 0) is very easy:

<element xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" 
 name="decimal" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
 <if:pipe>
  <if:validate type="http://namespaces.xmlschemata.org/xvif/regexp" apply="m/^[ \t\n]*[-+]?[0-9]+(,[0-9]+)?[ \t\n]*$/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="s/,/./"/>
  <if:validate type="http://relaxng.org/ns/structure/1.0">
   <if:apply>
    <data type="decimal"> 
     <param name="minInclusive">-1</param>
     <param name="maxInclusive">1</param>
    </data>
   </if:apply>
  </if:validate>
 </if:pipe>
</element>
[example 16 - try it]

As we are now getting used, we have a first validation to check the inital format (here using a regular expression), a transformation to convert the French format into the English format expected by the W3C XML Schema "decimal" datatype and the Relax NG validation on the result of the transformation which can use the facets available for this format.

Fallback

Up to now, we have not considered the issue of the compatibility of our schemas for Relax NG processors which do not support xvif.

Let's take the last example to wonder what such a processor would see of this schema. Per the Relax NG specification, such a processor should consider all the elements which do not belong to the Relax NG namespace as annotations and just drop them during the simplification of the schema. As a result, they would read the last schema as:

<element xmlns="http://relaxng.org/ns/structure/1.0"  
 name="decimal" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
</element>
[example 16 seen by a non xvif Relax NG processor]

Which is not even a valid Relax NG Schema. If we want to write a schema which could be read by a non xvif processor and does a minimal amount of validation, this processor should see at least a "text" pattern in the "element" pattern. On the other hand, this "text" pattern should be ignored by a xvif processor which will have already derived text node against the if:pipe considered as a Relax NG pattern. To instruct a xvif processor to ignore a Relax NG pattern, we can introduce a "if:ignore" attribute (allowed and ignorred by Relax NG processors per the Relax NG specification). A schema with fallback could be:

<element xmlns="http://relaxng.org/ns/structure/1.0" 
 xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" 
 name="decimal" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
 <if:pipe>
  <if:validate type="http://namespaces.xmlschemata.org/xvif/regexp" apply="m/^[ \t\n]*[-+]?[0-9]+(,[0-9]+)?[ \t\n]*$/"/>
  <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="s/,/./"/>
  <if:validate type="http://relaxng.org/ns/structure/1.0">
   <if:apply>
    <data type="decimal"> 
     <param name="minInclusive">-1</param>
     <param name="maxInclusive">1</param>
    </data>
   </if:apply>
  </if:validate>
 </if:pipe>
 <data if:ignore="1" type="token">
  <param name="pattern">[-+]?[0-9]+(,[0-9]+)?</param>
 </data>
</element>
[example 17 - try it]

We can note that there is some amount of redundency between the check done in the pipe before the transformation:

  <if:validate type="http://namespaces.xmlschemata.org/xvif/regexp" apply="m/^[ \t\n]*[-+]?[0-9]+(,[0-9]+)?[ \t\n]*$/"/>

and the test done in the fallback pattern:

  <param name="pattern">[-+]?[0-9]+(,[0-9]+)?</param>

but I have not found any syntactical construction which could avoid this repetition while being conform to the Relax NG specification.

Implementation, variables

The if:piple element is now acting as a Relax NG pattern and can be implemented using the derivative algorithmdescribed by James Clark:

There are a couple of possibilities about what needs to be used as the source of this transformation, though: should we send the current node or a pointer on the current node in the document? The tradeoff being that sending the document doesn't play well with streaming validation but lets the transformation access to the entire document.

Also, my assumption is that the result of the transformation T(p), is not persistent in the sense that an implementation may cache this result to reuse it if it needs to reavaluate the derivation but that a user cannot explicitely reuse it in a schema. While this is the simpler solution, I am wondering if there wouldn't be use cases where explicit reuse of the results of the transformations would be needed. Could this be the case, maybe, for running integrity checks or additional Schematron tests? If this was the case, how should we deal with it? Should we let people define variables? But then what could be the scope of these variables? Or chould we just add flags to say if constraints are run on the parsed, lexical or value spaces?

I am also wondering if we need conditional structures and, if yes if we can rely on an exception like logic (a sub pipe is continued until it fails in which case an alternative is tried) of if we should prefer more traditional if/then/else structures.

Issues

A compact syntax would be interesting. Process elements have a semantic which is similar to function calls and/or unix pipes and it would be nice to build on this...
Error reporting becomes hard to read.