XML Validation Interoperability Framework: "outie"

Purpose

This document is a proposition for the DSDL Interoperability Framework. It has not been endorsed in any way by the ISO DSDL working group and should be considered as an early working draft.

Prior work

This proposition is made after two previous contributions:

Outie is building on these proposals and may be seen as a non procedural rewriting of Schemachine.

Main concepts

Reference to documents

References to documents can be done either by their URLs or by reference to a variable (see below the definition and semantics of variables). The distinction between these two forms is done by the first character of the reference, a dollar ($) sign being the indication of a reference to a variable and any other reference being interpreted as a URL.

Document types

Implementations need to know the document type of the schemas and transformations used during a validation in order to use the corresponding implementation. The matching will be done according to the extensions of the documents and the implementations will have to provide a user updatable configuration to define the match between the extensions and the tools to use.

Documentation

Every element (except documentation itself) may contain documentation elements which are copied into the validation report. To facilitate the matching between a framework and its reports, every element may also contain "id" attributes which are copied as "ref" attributes in the validation report.

The content model of the documentation element is:

element documentation { documentation }
documentation = text

The definition of the id attribute is:

attribute id { xsd:ID }

Framework

The root element is a "framework".

This root element may be composed of "rules" and "variables". Its content model is:

element framework { framework }
framework =
attribute id { xsd:ID }?
& element documentation { documentation }*
& element rule { rule }*
& element variable { variable }

Variables

Variables may be defined to hold the results of transformations. All variables are global and the order in which they appear in the framework is not significant. Implementations may choose to defer the execution of the transformations described in variables until the first use of the variable.

Their content model is:

element variable { variable }
variable =
attribute name { text }?
& attribute id { xsd:ID }?
& element documentation { documentation }*
& element transform { transform }

Transformations

Transformations may be applied to transform both instance and schema documents. The source of the transformation is given by the "instance" attribute.

Their content model is:

element transform { transform }
transform =
attribute instance { text }?
& attribute extension { text }?
& attribute transformation { text }?
& attribute id { xsd:ID }?
& element documentation { documentation }*
& element with-param { with-param }*

Parameters

Parameters may be passed to the transformation and each of them consist of a name and a value.

Their content model is:

element with-param { with-param }
with-param =
attribute name { text }?
& attribute select { text }?
& attribute id { xsd:ID }?
& element documentation {documentation }*

Rules

Rules are sets of assertions to test over the same instance document.

By default, the instance document is the original instance document to validate. Other instance documents may be referenced through the instance attribute which may hold either a reference to a variable or a URL) or be defined inline in an instance element.

The validations to perform on this instance document are grouped in an assert element. A rule without an assert element is considered as true.

Rules have an optional "mode" attribute. When the validation is started, only the rules with no mode attribute (or with a mode attribute containing an empty string) are evaluated, but in the course of the validation, rules may request to apply the rules corresponding to the other modes (see "apply-modes" below).

The order in which the rules are evaluated is nor guaranteed and implementations may stop their evaluation after the first rule which is false.

Their content model is:

element rule { rule }
rule =
attribute id { xsd:ID }?
& attribute mode { text }?
& attribute instance { text }?
& element documentation { documentation }*
& element instance { schema-or-instance }?
& element assert { assert }?

Instance

Instance elements describe instances as transformations performed on another instances document and its content model is similar to the content model of a variable (the only difference is that a variable has an attribute "name").

Their content model is:

element instance { schema-or-instance }
schema-or-instance =
attribute id { xsd:ID }?
& element documentation { documentation }*
& element transform { transform }

Assertions

Assertions are containers for sets of validations. These validations may either be individual schema validations (isValid element), choices between assertions (choice element) or the application of the rules for a given mode (apply-rules element). The result of an assert is true if and only if the results of all its non documentation children elements are true. The order in which the children elements are evaluated is not guaranteed and implementations may stop their evaluations after the first result which is false.

The content model is:

element assert { assert }
assert =
attribute id { xsd:ID }?
& element documentation { documentation }*
& element assert { assert }*
& element isValid { isValid }*
& element choice { assert }*
& element apply-rules { apply-rules }*

isValid

This element performs a schema validation on the current instance (ie on the instance defined for its parent rule element). The schema may be selected either by reference (using the schema attribute) or inline through a schema child element. The type of schema language (and thus the tool to use to perform the validation) is determined by the extension of the schema. A isValid element is true if and only if the validation succeeds.

The content model is:

element isValid { isValid }
isValid =
attribute id { xsd:ID }?
& attribute schema { text }?
& element schema { schema-or-instance }?

Schema

The schema element defines a schema as a transformation to perform on a document. Its content model is the same than the content model of the instance element:

element schema { schema-or-instance }
schema-or-instance =
attribute id { xsd:ID }?
& element documentation { documentation }*
& element transform { transform }

Choice

The choice element is a container for assertions which is true if and only if at least one of its non documentation children element is true. The order in which the children elements are evaluated is not guaranteed and implementations may stop their evaluations after the first one which is true.

The content model is the same than the content model of assert:

element choice { assert }
assert =
attribute id { xsd:ID }?
& element documentation { documentation }*
& element assert { assert }*
& element isValid { isValid }*
& element choice { assert }*
& element apply-rules { apply-rules }*

Apply-rules

This element is an instruction to evaluate the rules for a certain mode. Apply-rule is true if and only if all the rules for the corresponding mode are true. The order in which the rules are evaluated is not guaranteed and implementations may stop after the first evaluation which is false. It is an error to provoke, directly or indirectly, recursive applications of the same mode.

The content model is:

element apply-rules { apply-rules }
apply-rules =
attribute mode { text }?
& attribute id { xsd:ID }?
& element documentation { documentation }*

Examples

A very simple framework

Of course, there are simpler ways of doing the trick, but the following framework defines a single rule on the instance document and this rule has a single assertion which is to test the instance against a Relax NG schema:

<?xml version="1.0" encoding="utf-8"?>
<framework>
 <rule>
  <assert>
   <isValid schema="schema.rng"/>
  </assert>
 </rule>
</framework>

Adding a second validation

A second validation on the same instance might be added in the same rule. For instance, to add a validation by a Schematron schema:

<?xml version="1.0" encoding="utf-8"?>
<framework>
 <rule>
  <assert>
   <isValid schema="schema.rng"/>
   <isValid schema="schema.sch"/>
  </assert>
 </rule>
</framework>

Adding a pre validation transformation

A transformation before these two validations might be added inline in the definition of the rule:

<?xml version="1.0" encoding="utf-8"?>
<framework>
 <rule>
  <instance>
   <transform transformation="normalize.xslt"/>
  </instance>
 <assert>
  <isValid schema="schema.rng"/>
  <isValid schema="schema.sch"/>
 </assert>
 </rule>
</framework>

Or as a variable referred in the rule:

<?xml version="1.0" encoding="utf-8"?>
<framework>
 <variable name="normalized">
  <transform transformation="normalize.xslt"/>
 </variable>
 <rule instance="$normalized">
  <assert>
   <isValid schema="schema.rng"/>
   <isValid schema="schema.sch"/>
  </assert>
 </rule>
</framework>

Note that in both cases the instance on which the transformation is performed is not specified and is thus the instance presented for validation.

If we needed to apply one of the validations on the original instance and the other on the normalized instance, we would need to define two different rules, for instance:

<?xml version="1.0" encoding="utf-8"?>
<framework>
 <variable name="normalized">
  <transform transformation="normalize.xslt"/>
 </variable>
 <rule instance="$normalized">
  <assert>
   <isValid schema="schema.rng"/>
  </assert>
 </rule>
 <rule>
  <assert>
   <isValid schema="schema.sch"/>
  </assert>
 </rule>
</framework>

Applying transformations on schemas

The same mechanisms (inline or by reference to variables) may be used to transform schemas. For instance, to apply the "getStage" transformation proposed by Bob DuCharme:

<?xml version="1.0" encoding="utf-8"?>
<framework>
 <rule>
  <assert>
   <isValid>
    <schema>
     <transform extension=".xsd" instance="schema.xsd" transformation="getStage.xsl">
      <with-param name="stageName" select="'final'"/>
     </transform>
    </schema>
   </isValid>
  </assert>
 </rule>
</framework>

Unless we prefer using a variable:

<?xml version="1.0" encoding="utf-8"?>
<framework>
 <variable name="final">
  <transform extension=".xsd" instance="schema.xsd" transformation="getStage.xsl">
   <with-param name="stageName" select="'final'"/>
  </transform>
 </variable>
 <rule>
  <assert>
   <isValid schema="$final"/>
  </assert>
 </rule>
</framework>

This feature could also be used to isolate Schematron rules embedded in other schema languages.

Choices

We might want to check that a document is valid for one of several schemas (each schema could for instance describe a version of a vocabulary). In this case, we can use the "choice" element:

<?xml version="1.0" encoding="utf-8"?>
<framework>
 <rule>
  <assert>
   <choice>
    <isValid schema="schema1.sch"/>
    <isValid schema="schema2.sch"/>
   </choice>
  </assert>
 </rule>
</framework>

If we had two schemas to apply for each alternative, we could embed these schemas in "assert" elements:

<?xml version="1.0" encoding="utf-8"?>
<framework>
 <rule>
  <assert>
   <choice>
    <assert>
     <isValid schema="schema1.sch"/>
     <isValid schema="schema1.rng"/>
    </assert>
    <assert>
     <isValid schema="schema2.sch"/>
     <isValid schema="schema2.rng"/>
    </assert>
   </choice>
  </assert>
 </rule>
</framework>

Unless we prefer defining different modes for each alternative:

<?xml version="1.0" encoding="utf-8"?>
<framework>
 <rule>
  <assert>
   <choice>
    <apply-rules mode="mode1"/>
    <apply-rules mode="mode2"/>
   </choice>
  </assert>
 </rule>
 <rule mode="mode1">
  <assert>
   <isValid schema="schema1.sch"/>
   <isValid schema="schema1.rng"/>
  </assert>
 </rule>
 <rule mode="mode2">
  <assert>
   <isValid schema="schema2.sch"/>
   <isValid schema="schema2.rng"/>
  </assert>
 </rule>
</framework>

In this case, the two styles are pretty much equivalent but using apply-rules would have been the only options if the validations performed in the two modes had been using different instances.

To be defined

Several points need to be defined:

Reference implementation

A reference implementation written in Python is available at http://downloads/xmlschemata.org/python/xvif/outie. The main script is "outie.py" which expects the name of a framework and the name of an instance document as parameters on the command line. The document "config.xml" defines the correspondance beetween the document types identified through their extensions and the various tools and needs to be tweaked to mach your environment.