Directory content:
FragFilter.py : Module implementing the regular fragmentations
test-regfrag.py : Unit tests for FragFilter.py (beware: the result is
platform dependent)
tryRegFrag.cgi : On line demonstration
Suggestions over the original proposition by Simon St.Laurent:
(All these suggestions are implemented in the version available in this directory)
1)Matching errors handling
These errors deserve some more specification: what's happening when a pattern doesn't match?
when there are more matchs than nodes specified to serialize them?
when there are more nodes specified than matchs?
My suggestion is to ignore nodes and matches "overflow" (ie process only the minimum
between the number of matchs and the number of specified nodes). To be coherent with
this rule, when there is no match, no nodes should be serialized and the fragmented
node could be left empty.
2) Attribute prefixes are only hints
Namespace prefixes specified for the attributes generated by regular fragmentations
cannot be used when they conflict with prefixes used in the instance document or
required by other attributes in the same element. They should therefore be considered
as hints rather than directives.
The algorithm used in my implementation is the following:
* The required prefix is used for the generated attribute if it is either not
defined or defined for the namespace URI of the generated attribute.
* Otherwise, if the namespace URI of the generated attribute is already associated
to a prefix, this prefix is used.
* In last resort, an indice is added to the required prefix to generate a prefix
not yet used in this element.
3) Generalization of the repeat attribute
An alternative way to write the example:
could be to generalize the use of the repeat attribute to match rules:
4) skipFirst
It's often subjective to define default values, however I think that the default
value for the skipFirst attribute could be "false". Also, it's not clear if this
attribute applies to all the types of rules (match and split) --I think that for
coherence, it should be the case.
5) Duplicate attributes
The current rule is: "Repeating the same attribute name will leave only the last
version in the final output" which I find error prone especially when attributes
are generated out of the fragmentation of other attributes: this can lead to
recursion loops and even when this is not the case, the order which which the
attributes will be processed and thus generated is not significant. I would suggest
to raise a fragmentation time error when an attribute is "overriden".
6) Escape recursion
An attribute "break" could be added to the fragmentRule element to specify that no
further recursion should take place.
7) Attributes fragmentation
I have implemented attribute fragmentations trying to stay as much as possible in the
original idea of using the same mechanism even though the semantic is slightly
different and this proposal is coherent even though not always deterministic.
The major two issues with fragmentating attributes are that the result of the
fragmentation cannot be kept in the attribute (at least not in the general case)
since attributes are not structured and that the order of the attributes is not meaningfull.
Since the result cannot be kept into the attribute, it is located in the "hosting"
element and if the result is serialized as elements or characters, the relative
order of the serialization of the fragmentation of two or more attributes in the
same element cannot be guaranted.
8) Other node types
Currently, elements and attributes can be fragmented into elements, attributes and
text nodes. What about adding other types of nodes (ie PIs and comments) to the list?