Directory content: FragFilter.py : Module implementing the regular fragmentations test-regfrag.py : Unit tests for FragFilter.py (beware: the result is platform dependent) tryRegFrag.cgi : On line demonstration Suggestions over the original proposition by Simon St.Laurent: (All these suggestions are implemented in the version available in this directory) 1)Matching errors handling These errors deserve some more specification: what's happening when a pattern doesn't match? when there are more matchs than nodes specified to serialize them? when there are more nodes specified than matchs? My suggestion is to ignore nodes and matches "overflow" (ie process only the minimum between the number of matchs and the number of specified nodes). To be coherent with this rule, when there is no match, no nodes should be serialized and the fragmented node could be left empty. 2) Attribute prefixes are only hints Namespace prefixes specified for the attributes generated by regular fragmentations cannot be used when they conflict with prefixes used in the instance document or required by other attributes in the same element. They should therefore be considered as hints rather than directives. The algorithm used in my implementation is the following: * The required prefix is used for the generated attribute if it is either not defined or defined for the namespace URI of the generated attribute. * Otherwise, if the namespace URI of the generated attribute is already associated to a prefix, this prefix is used. * In last resort, an indice is added to the required prefix to generate a prefix not yet used in this element. 3) Generalization of the repeat attribute An alternative way to write the example: could be to generalize the use of the repeat attribute to match rules: 4) skipFirst It's often subjective to define default values, however I think that the default value for the skipFirst attribute could be "false". Also, it's not clear if this attribute applies to all the types of rules (match and split) --I think that for coherence, it should be the case. 5) Duplicate attributes The current rule is: "Repeating the same attribute name will leave only the last version in the final output" which I find error prone especially when attributes are generated out of the fragmentation of other attributes: this can lead to recursion loops and even when this is not the case, the order which which the attributes will be processed and thus generated is not significant. I would suggest to raise a fragmentation time error when an attribute is "overriden". 6) Escape recursion An attribute "break" could be added to the fragmentRule element to specify that no further recursion should take place. 7) Attributes fragmentation I have implemented attribute fragmentations trying to stay as much as possible in the original idea of using the same mechanism even though the semantic is slightly different and this proposal is coherent even though not always deterministic. The major two issues with fragmentating attributes are that the result of the fragmentation cannot be kept in the attribute (at least not in the general case) since attributes are not structured and that the order of the attributes is not meaningfull. Since the result cannot be kept into the attribute, it is located in the "hosting" element and if the result is serialized as elements or characters, the relative order of the serialization of the fragmentation of two or more attributes in the same element cannot be guaranted. 8) Other node types Currently, elements and attributes can be fragmented into elements, attributes and text nodes. What about adding other types of nodes (ie PIs and comments) to the list?