Conditional Levels of a Schema

View: New views
20 Messages — Rating Filter:   Alert me  

Conditional Levels of a Schema

by Dieter Menne :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

we are currently defining a format for medical data storage
(hrmconsensus.org). The full version is available
http://hrmconsensus.org/media/hrm/xhrm/xhrm02/xhrm0_2.xsd here .

In the simplified example below, we have the always mandatory deviceTyp. For
patientsType, we would like to have a global conditional switch so that
three flavors are possible

-- minOccurs = "0" for internal clinical use
-- minOccurs = "1" for archiving, must contain patient info
-- minOccurs = "never" anonymized, must not contain patient info

I know that the latter is not possible, that conditionals are not supported
in XSL, and that Schematron would be an alternative.  Note that the
conditionals occur in several nesting levels, so that we cannot easily
combine versions of a master element with details, but they are always of
the type "may", "must", "must not".

We would like to avoid having several xsd files and prefer a common file
with branching. Any ideas or references to ideas are appreciated.

Dieter Menne
on behalf of the hrmconsensus group.


<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="0.2">
        <xs:element name="xhrm">
                <xs:complexType>
                        <xs:sequence>
                                <xs:element name="device" type="deviceType"/>
                                <xs:element name="patients" type="patientsType" minOccurs="0"/>
                        </xs:sequence>
                </xs:complexType>
                </xs:element>
</xs:schema>

--
View this message in context: http://www.nabble.com/Conditional-Levels-of-a-Schema-tp22842334p22842334.html
Sent from the w3.org - xmlschema-dev mailing list archive at Nabble.com.




Parent Message unknown RE: Conditional Levels of a Schema

by Michael Kay :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


This is a common requirement, and there's no simple answer to it.

One approach is to define restricted types: in your more general schema, the
item is defined as optional, and then you have types derived by restriction
than make it mandatory or prohibited. The instance document then has to
indicate which version of the type it wants to use by use of xsi:type; or
alternatively, if you schema validator allows it, you can indicate which
top-level type you want to validate the message against through your
validator's API.

There are a number of difficulties with this approach; one is that you not
only have to define restricted types for the element whose content model is
directly affected, but for its ancestor elements all the way up to the
top-level message structure. Because restrictions are defined by repeating
the content model rather than simply stating the differences, this can be a
maintenance nightmare.

One approach I have used in the past is to generate the schema documents
defining these restricted types automatically (using XSLT). This reduces the
maintenance burden - but in-house tools like this have their own problems in
terms of maintenance and documentation.

Probably a simpler approach, and one that is closer to your description of
the problem, is to implement the conditional logic not by generating
subtypes but simply by modifying the main type definition. Again you can do
this using XSLT: just define a stylesheet where the schema document is the
body of the <xsl:template match="/"> template rule, and it can then contain
things like <xs:element name="patients" minOccurs="{$param}"/> where $param
is a stylesheet parameter. One disadvantage of this approach is that
different variants of the schema cannot coexist in the same application -
typically your schema cache will only allow one version of a type with a
given name.

XSD 1.1 allows you to supplement the grammar constraints with assertions, so
for example the message in which the <patients> element is probited could be
defined using <xs:assert test="empty(//patients)"/>. You could also use
conditional type assignment at the message level to select the type for
validation based on a user-defined attribute such as @message-type. These
facilities are implemented in Saxon-SA 9.1 which you could use to explore
the concept.

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: xmlschema-dev-request@...
> [mailto:xmlschema-dev-request@...] On Behalf Of Dieter Menne
> Sent: 02 April 2009 19:06
> To: xmlschema-dev@...
> Subject: Conditional Levels of a Schema
>
> Hi,
>
> we are currently defining a format for medical data storage
> (hrmconsensus.org). The full version is available
> http://hrmconsensus.org/media/hrm/xhrm/xhrm02/xhrm0_2.xsd here .
>
> In the simplified example below, we have the always mandatory
> deviceTyp. For patientsType, we would like to have a global
> conditional switch so that three flavors are possible
>
> -- minOccurs = "0" for internal clinical use
> -- minOccurs = "1" for archiving, must contain patient info
> -- minOccurs = "never" anonymized, must not contain patient info
>
> I know that the latter is not possible, that conditionals are
> not supported in XSL, and that Schematron would be an
> alternative.  Note that the conditionals occur in several
> nesting levels, so that we cannot easily combine versions of
> a master element with details, but they are always of the
> type "may", "must", "must not".
>
> We would like to avoid having several xsd files and prefer a
> common file with branching. Any ideas or references to ideas
> are appreciated.
>
> Dieter Menne
> on behalf of the hrmconsensus group.
>
>
> <?xml version="1.0" encoding="utf-8"?>
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="0.2">
> <xs:element name="xhrm">
> <xs:complexType>
> <xs:sequence>
> <xs:element name="device"
> type="deviceType"/>
> <xs:element name="patients"
> type="patientsType" minOccurs="0"/>
> </xs:sequence>
> </xs:complexType>
> </xs:element>
> </xs:schema>
>
> --
> View this message in context:
> http://www.nabble.com/Conditional-Levels-of-a-Schema-tp2284233
4p22842334.html
> Sent from the w3.org - xmlschema-dev mailing list archive at
> Nabble.com.
>
>
>



Re: Conditional Levels of a Schema

by Pete Cordell-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I don't know whether this is possible, but could you define an attribute in
another namespace, such that you do:

<xs:element name="patients" minOccurs="0" maxOccurs="1"
                hrm:context-specific="true"/>

and then have a style sheet that says if @hrm:context-specific is present,
modify the minOccurs and maxOccurs attributes appropriately for the mode you
want?

One benefit is that the base schema is still a valid schema and hence might
be easier to work with.  It appears to self-document quite well also.

HTH,

Pete Cordell
Codalogic Ltd
Interface XML to C++ the easy way using XML C++
data binding to convert XSD schemas to C++ classes.
Visit http://codalogic.com/lmx/ for more info

----- Original Message -----
From: "Michael Kay" <mike@...>
To: "'Dieter Menne'" <dieter.menne@...>; <xmlschema-dev@...>
Sent: Monday, April 06, 2009 11:04 AM
Subject: RE: Conditional Levels of a Schema


>
>
> This is a common requirement, and there's no simple answer to it.
>
> One approach is to define restricted types: in your more general schema,
> the
> item is defined as optional, and then you have types derived by
> restriction
> than make it mandatory or prohibited. The instance document then has to
> indicate which version of the type it wants to use by use of xsi:type; or
> alternatively, if you schema validator allows it, you can indicate which
> top-level type you want to validate the message against through your
> validator's API.
>
> There are a number of difficulties with this approach; one is that you not
> only have to define restricted types for the element whose content model
> is
> directly affected, but for its ancestor elements all the way up to the
> top-level message structure. Because restrictions are defined by repeating
> the content model rather than simply stating the differences, this can be
> a
> maintenance nightmare.
>
> One approach I have used in the past is to generate the schema documents
> defining these restricted types automatically (using XSLT). This reduces
> the
> maintenance burden - but in-house tools like this have their own problems
> in
> terms of maintenance and documentation.
>
> Probably a simpler approach, and one that is closer to your description of
> the problem, is to implement the conditional logic not by generating
> subtypes but simply by modifying the main type definition. Again you can
> do
> this using XSLT: just define a stylesheet where the schema document is the
> body of the <xsl:template match="/"> template rule, and it can then
> contain
> things like <xs:element name="patients" minOccurs="{$param}"/> where
> $param
> is a stylesheet parameter. One disadvantage of this approach is that
> different variants of the schema cannot coexist in the same application -
> typically your schema cache will only allow one version of a type with a
> given name.
>
> XSD 1.1 allows you to supplement the grammar constraints with assertions,
> so
> for example the message in which the <patients> element is probited could
> be
> defined using <xs:assert test="empty(//patients)"/>. You could also use
> conditional type assignment at the message level to select the type for
> validation based on a user-defined attribute such as @message-type. These
> facilities are implemented in Saxon-SA 9.1 which you could use to explore
> the concept.
>
> Michael Kay
> http://www.saxonica.com/
>
>> -----Original Message-----
>> From: xmlschema-dev-request@...
>> [mailto:xmlschema-dev-request@...] On Behalf Of Dieter Menne
>> Sent: 02 April 2009 19:06
>> To: xmlschema-dev@...
>> Subject: Conditional Levels of a Schema
>>
>> Hi,
>>
>> we are currently defining a format for medical data storage
>> (hrmconsensus.org). The full version is available
>> http://hrmconsensus.org/media/hrm/xhrm/xhrm02/xhrm0_2.xsd here .
>>
>> In the simplified example below, we have the always mandatory
>> deviceTyp. For patientsType, we would like to have a global
>> conditional switch so that three flavors are possible
>>
>> -- minOccurs = "0" for internal clinical use
>> -- minOccurs = "1" for archiving, must contain patient info
>> -- minOccurs = "never" anonymized, must not contain patient info
>>
>> I know that the latter is not possible, that conditionals are
>> not supported in XSL, and that Schematron would be an
>> alternative.  Note that the conditionals occur in several
>> nesting levels, so that we cannot easily combine versions of
>> a master element with details, but they are always of the
>> type "may", "must", "must not".
>>
>> We would like to avoid having several xsd files and prefer a
>> common file with branching. Any ideas or references to ideas
>> are appreciated.
>>
>> Dieter Menne
>> on behalf of the hrmconsensus group.
>>
>>
>> <?xml version="1.0" encoding="utf-8"?>
>> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="0.2">
>> <xs:element name="xhrm">
>> <xs:complexType>
>> <xs:sequence>
>> <xs:element name="device"
>> type="deviceType"/>
>> <xs:element name="patients"
>> type="patientsType" minOccurs="0"/>
>> </xs:sequence>
>> </xs:complexType>
>> </xs:element>
>> </xs:schema>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Conditional-Levels-of-a-Schema-tp2284233
> 4p22842334.html
>> Sent from the w3.org - xmlschema-dev mailing list archive at
>> Nabble.com.
>>
>>
>>
>
>
>




Re: Conditional Levels of a Schema

by Dieter Menne :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Pete Cordell-5 wrote:
I don't know whether this is possible, but could you define an attribute in
another namespace, such that you do:

<xs:element name="patients" minOccurs="0" maxOccurs="1"
                hrm:context-specific="true"/>

and then have a style sheet that says if @hrm:context-specific is present,
modify the minOccurs and maxOccurs attributes appropriately for the mode you
want?
Thanks to Pete and Michael. The Stylesheet->Schema way seemed to be the easiest for me. A minor problem now is that I would like to parameterize the parameters, so that the two alternatives below could be selected with one external parameter, e.g. anonymous = "true" (there are much more in the final). Somehow, I failed getting the syntax right.

Apologies again for the multi-posting, looks like nabble had not updated the forum all the time, now I see the mess I left.

Dieter


<?xml version="1.0"?>
<!-- Following Michael Kay's suggestion on
     http://www.nabble.com/Conditional-Levels-of-a-Schema-td22905179.html 
     this stylesheet generated different flavors of the XHRM Schmema, e.g for
     use in research or for in-hospital use.
                 Alternatively, parameters could be passed upon transformation.
-->
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <!-- For In-Hospital use: patient name required -->
  <xsl:param name="minPatients">1</xsl:param>
  <xsl:param name="maxPatients">1</xsl:param>

  <!-- For research: patient name must not be included -->
  <!--
        <xsl:param name="minPatients">0</xsl:param>
  <xsl:param name="maxPatients">0</xsl:param>
  -->
        <xsl:output method="xml" indent="yes"/>
        <xsl:template match="/">
                <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="0.3" >
                        <xs:element name="xhrm">
                                <xs:complexType>
                                        <xs:sequence>
                                                <xs:element name="device" type="xs:string"/>
                                                <xs:element minOccurs="{$minPatients}" maxOccurs="{$maxPatients}"
                                                            name="patient" type="xs:string"/>
                                        </xs:sequence>
                                </xs:complexType>
                        </xs:element>
                </xs:schema>
        </xsl:template>
</xsl:stylesheet>

Parent Message unknown Re: Conditional Levels of a Schema

by C. M. Sperberg-McQueen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 2 Apr 2009, at 12:05 , Dieter Menne wrote:

> Hi,
>
> we are currently defining a format for medical data storage
> (hrmconsensus.org). The full version is available
> http://hrmconsensus.org/media/hrm/xhrm/xhrm02/xhrm0_2.xsd here .
>
> In the simplified example below, we have the always mandatory  
> deviceTyp. For
> patientsType, we would like to have a global conditional switch so  
> that
> three flavors are possible
>
> -- minOccurs = "0" for internal clinical use
> -- minOccurs = "1" for archiving, must contain patient info
> -- minOccurs = "never" anonymized, must not contain patient info

I may be being dense, but it's not clear to me what your requirement
is.  Is it that

(A) You want the internal clinical systems to use a schema with

   <xs:element name="patients" type="patientsType" minOccurs="0"/>

while the archival system uses

   <xs:element name="patients" type="patientsType" minOccurs="1"/>

while tools and data flows for anonymized data should use

   <xs:element name="patients" type="patientsType" maxOccurs="0"/>

?  In other words, you want to work with three related but different
schemas?

Or is it that

(B) based on some signal in the XML, the 'patients' element must occur,
must not occur, or may occur?

You don't seem to mention any visible signal in the XML, so I'm
guessing it's not B.


> I know that the latter is not possible, that conditionals are not  
> supported
> in XSL,

I'm not sure what you mean by that.  There are many conditions one
can check with the subset of regular languages which XSD uses for
content models.  It's true that to check conditions with a content
model you may need to write the content model in a particular way.

> and that Schematron would be an alternative.  Note that the
> conditionals occur in several nesting levels, so that we cannot easily
> combine versions of a master element with details, but they are  
> always of
> the type "may", "must", "must not".

I'm not sure what you mean by this.

> We would like to avoid having several xsd files and prefer a common  
> file
> with branching.

Is this (a) in order to avoid redundancy and eliminate the problem
of inconsistent updates during maintenance of the schema document(s)?
Or (b) because there are some important consumers of your work (maybe
potential users, maybe your bosses, maybe ISO Pascal programmers) who
might, you suspect, find it too hard to grasp the idea of a schema
made up by consulting more than one file at schema construction time?
Or (c) because you have no control over the schema processors to
be used with this schema, and you do not believe that xsd:include
is sufficiently interoperable to be relied upon? (d) Because
you believe in your hearts that you are defining a single language
here, and you want to make that fact manifest by producing a single
schema document?  (In this case, there is the troubling fact that
the 'patients' element follows three different syntactic rules based
not on syntactic context but based on application context, which
suggests that formally speaking you really are defining not one
language, but three.) (e) for some other reason?

Any of these can be a plausible reason (so forgive me if my tone
seems flippant or dismissive -- no offense to you intended), but
what you need to do may vary a lot depending on which reason you have.

> Any ideas or references to ideas are appreciated.

Some possibilities that occur to me off the top of my head.

(1) You single-source the schema document using a literate
programming system (or a macro processor).  So you have eliminated
the inconsistent-maintenance problem.  From your single source
you generate three schema documents, called clinical.xsd,
archival.xsd, and anonymized.xsd.  The appropriate tools and
systems use the appropriate schema document.

The suggestions made by Michael Kay and Pete Cordell both fall
into this category, I think.

(2) A particular variant of the preceding.  In the main schema
document, the relevant declaration reads

   <xs:element name="patients" type="patientsType"
       minOccurs="&patients.minOccurs;"
       maxOccurs="&patients.maxOccurs;"
   />

And the document begins

   <!DOCTYPE xs:schema SYSTEM ... >

By whatever means you choose, the different tools use different
entity declarations for patients.minOccurs and patients.maxOccurs.

(3) You declare that the syntactic rule in the language you are
defining is that 'patients' may occur optionally, and specify that
it is up to application-level checking to ensure that each
of the three applications you have described checks to see that
'patients' occurs, or does not occur, as prescribed.  (That is,
you kick the problem over to the business rule people and tell
them it's their problem not yours.)

(4) You enclose 'patients' in an enclosing element, indicating
which of the three rules the instance document is supposed to
be following at the moment.  So the sequence which now contains
deviceType and patients now reads instead:

    <xsd:sequence>
     <xsd:element name="device" type="deviceType"/>
     <xsd:choice>
      <xsd:element name="clinicalpatients">
       <xsd:complexType>
        <xsd:sequence>
        <xsd:element name="pateients" type="patientsType" minOccurs="0"/>
        </xsd:sequence>
       </xsd:complexType>
      </xsd:element>
      <xsd:element name="archivalpatients">
       <xsd:complexType>
        <xsd:sequence>
        <xsd:element name="pateients" type="patientsType" minOccurs="1"/>
        </xsd:sequence>
       </xsd:complexType>
      </xsd:element>
      <xsd:element name="anonymizedpatients">
       <xsd:complexType>
        <xsd:sequence/>
       </xsd:complexType>
      </xsd:element>
     </xsd:choice>
    </xsd:sequence>

The systems which transfer records from the clinical applications to
the archiving application, or to applications using anonymized data,
are responsible for changing the wrapper, which thus becomes a visible
signal that the record has been touched by the transfer application.
(This may be useful in debugging records transfer problems.)

(5) You get rid of the nesting and simply replace 'patients'
with three flavors of patients, all using the same type but
with different occurrence requirements.  Your sequence now becomes

    <xsd:sequence>
     <xsd:element name="device" type="deviceType"/>
     <xsd:choice>
      <xsd:element name="clinicalpatients" type="patientsType"  
minOccurs="0"/>
      <xsd:element name="archivalpatients" type="patientsType"  
minOccurs="1"/>
      <xsd:element name="anonymizedpatients">
       <xsd:complexType>
        <xsd:sequence/>
       </xsd:complexType>
      </xsd:element>
     </xsd:choice>
    </xsd:sequence>

Again the records transfer tools are responsible for changing the
name of the element in order to signal that they have done their work.

If you really want to document that 'clinicalpatients' and
'archivalpatients' and 'anonymizedpatients' are all really just
flavors of 'patients', by all means define an abstract 'patients'
element and make them all substitutable for it.

(6) You put an appropriate flag into the content model not as a
wrapper around 'patients' but as a preceding sibling:

    <xsd:sequence>
     <xsd:element name="device" type="deviceType"/>
     <xsd:choice>
      <xsd:sequence>
       <xsd:element name="clinical" type="our:flavor" minOccurs="1"/>
       <xsd:element name="patients" type="patientsType" minOccurs="0"/>
      </xsd:sequence>
      <xsd:sequence>
       <xsd:element name="archival" type="our:flavor" minOccurs="1"/>
       <xsd:element name="patients" type="patientsType" minOccurs="1"/>
      </xsd:sequence>
      <xsd:sequence>
       <xsd:element name="anonymized" type="our:flavor" minOccurs="1"/>
      </xsd:sequence>
     </xsd:choice>
    </xsd:sequence>

Which of these seems most appealing will depend on a lot of things,
including what it is you really want when you say you want a
conditional, and possibly including also what you think the other
tools you work with are going to be capable of doing.

I hope this helps.

Michael Sperberg-McQueen


--
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com
* http://cmsmcq.com/mib
* http://balisage.net
****************************************************************






Re: Conditional Levels of a Schema

by Dieter Menne :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


C. M. Sperberg-McQueen-2 wrote:
I may be being dense, but it's not clear to me what your requirement
is.  Is it that

(A) You want the internal clinical systems to use a schema with

   <xs:element name="patients" type="patientsType" minOccurs="0"/>

while the archival system uses

   <xs:element name="patients" type="patientsType" minOccurs="1"/>

while tools and data flows for anonymized data should use

   <xs:element name="patients" type="patientsType" maxOccurs="0"/>

?  In other words, you want to work with three related but different
schemas?
It is A, and Michael and Steve's as well as some of your ideas are exactly to the point. We would like to keep one master document that is the most liberal and has only the minimal set as required items; and separate derived ones; there are a few more variants than those mentioned here, most of them "nested" to form a stack of requirements (the patient case is the only non-nested).

The idea is that hospital administrators can put up a filter allowing only anonymized files out. Or that researcher who want calibrations information that is not relevant for others can check with their special version of the schema if all required items are there.

While Michael's $param idea looked easiest for me at first, I think Steve has made a good point and that his way is preferred because is ensures that the master document is always valid.

Being a signal processing and statistics guy with limited XML experience, I now only have to figure out how to get the XPATH-copying he suggested right. Michael's Book is on order but will need a few days at amazon.

Dieter












RE: Conditional Levels of a Schema

by Michael Kay :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> It is A, and Michael and Steve's as well as some of your
> ideas are exactly to the point. We would like to keep one
> master document that is the most liberal and has only the
> minimal set as required items; and separate derived ones;
> there are a few more variants than those mentioned here, most
> of them "nested" to form a stack of requirements (the patient
> case is the only non-nested).
>
> The idea is that hospital administrators can put up a filter
> allowing only anonymized files out. Or that researcher who
> want calibrations information that is not relevant for others
> can check with their special version of the schema if all
> required items are there.
>
> While Michael's $param idea looked easiest for me at first, I
> think Steve has made a good point and that his way is
> preferred because is ensures that the master document is always valid.
>

I've actually been experimenting with ideas that take the "conditional type
assignment" facility in XSD 1.1 and extend it by allowing access to "schema
parameters" which must be set when starting a validation episode (the
current facility can only be driven by data that appears in the instance
document, not by external data supplied at validation time). This approach
seems to offer a very good fit to your use case (which I think is not at all
uncommon). Unfortunately the schema specs move slowly.

Michael Kay
http://www.saxonica.com/



RE: Conditional Levels of a Schema

by Dieter Menne :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

In case someone is going to need it, here is Pete's suggestion:

The Master Schema

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="0.3"
   xmlns:hrm="http://www.hrmconsensus.org/layers">
        <xs:element name="xhrm">
                <xs:complexType>
                        <xs:sequence>
                                <xs:element name="device" type="xs:string"/>
                                <xs:element minOccurs="0" maxOccurs="1" name="patient"
                                   type="xs:string" hrm:patientInfo="1"/>
                        </xs:sequence>
                </xs:complexType>
        </xs:element>
</xs:schema>


Use xsl to convert it to a another xsd where patient info is required. I tried to use result-document with it, but could not get the syntax correct.

<?xml version="1.0"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:hrm="http://www.hrmconsensus.org/layers">
  <xsl:output method="xml" indent="no"/>
  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="@hrm:patientInfo">
    <xsl:attribute name="minOccurs">
      <xsl:value-of select="1"/>
    </xsl:attribute>
  </xsl:template>
</xsl:stylesheet>


RE: Conditional Levels of a Schema

by Michael Kay :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

This transformation isn't reliable.

You're using <xsl:apply-templates select="@*"/> to process all the
attributes, and the result will depend on the order in which they are
processed, which isn't predictable. A safer approach would be

<xsl:copy-of select="@*"/>
<xsl:apply-templates select="@hrm:*"/>

(if you create two attributes with the same name in XSLT, the last one
wins.)

Michael Kay
http://www.saxonica.com/ 

> -----Original Message-----
> From: xmlschema-dev-request@...
> [mailto:xmlschema-dev-request@...] On Behalf Of Dieter Menne
> Sent: 07 April 2009 14:41
> To: xmlschema-dev@...
> Subject: RE: Conditional Levels of a Schema
>
>
> In case someone is going to need it, here is Pete's suggestion:
>
> The Master Schema
>
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="0.3"
>    xmlns:hrm="http://www.hrmconsensus.org/layers">
> <xs:element name="xhrm">
> <xs:complexType>
> <xs:sequence>
> <xs:element name="device"
> type="xs:string"/>
> <xs:element minOccurs="0"
> maxOccurs="1" name="patient"
>   type="xs:string"
> hrm:patientInfo="1"/>
> </xs:sequence>
> </xs:complexType>
> </xs:element>
> </xs:schema>
>
>
> Use xsl to convert it to a another xsd where patient info is
> required. I tried to use result-document with it, but could
> not get the syntax correct.
>
> <?xml version="1.0"?>
> <xsl:stylesheet version="2.0"
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> xmlns:hrm="http://www.hrmconsensus.org/layers">
>   <xsl:output method="xml" indent="no"/>
>   <xsl:template match="node()|@*">
>     <xsl:copy>
>       <xsl:apply-templates select="@*"/>
>       <xsl:apply-templates/>
>     </xsl:copy>
>   </xsl:template>
>
>   <xsl:template match="@hrm:patientInfo">
>     <xsl:attribute name="minOccurs">
>       <xsl:value-of select="1"/>
>     </xsl:attribute>
>   </xsl:template>
> </xsl:stylesheet>
>
>
> --
> View this message in context:
> http://www.nabble.com/Conditional-Levels-of-a-Schema-tp2290517
> 9p22929208.html
> Sent from the w3.org - xmlschema-dev mailing list archive at
> Nabble.com.
>
>



RE: Conditional Levels of a Schema

by Dieter Menne :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Michael Kay wrote:
This transformation isn't reliable.
Sigh... Book is under way. I am in statistics normally, so pardon my ignorance.

Dieter


Re: Conditional Levels of a Schema

by XML4Pharma :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Technically this sounds nice ...

but, if I do understand it well, this means that you have two different
(versions of the) schemas, with the same namespace, and different (although
slightly different) content.

This is something, just from a principal point of view, I do not like.
My principle is "new standard (version) => new schema (version) => new
namespace".
I have seen standards, where each subsequent version used the same
namespace, though the schemas were different.
I have even seen different standards (root element is different), with
different schemas, all having the same namespace.

When allowing different schemas (i.e. different rules) having the same
namespace, you know where you start, but you do not know where you end
(probably in disaster).
For example, how do you work when you need to write an extension of one
schema with elements/attributes of the other, when both have the same
namespace?

In your case, I would definitely opt for writing a schematron (which is a
good excercise anyway).  ;-)

With best regards,

Jozef

Jozef Aerts
XML4Pharma

============

----- Original Message -----
From: "Dieter Menne" <dieter.menne@...>
To: <xmlschema-dev@...>
Sent: Tuesday, April 07, 2009 3:41 PM
Subject: RE: Conditional Levels of a Schema


>
> In case someone is going to need it, here is Pete's suggestion:
>
> The Master Schema
>
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="0.3"
>   xmlns:hrm="http://www.hrmconsensus.org/layers">
> <xs:element name="xhrm">
> <xs:complexType>
> <xs:sequence>
> <xs:element name="device" type="xs:string"/>
> <xs:element minOccurs="0" maxOccurs="1" name="patient"
>    type="xs:string" hrm:patientInfo="1"/>
> </xs:sequence>
> </xs:complexType>
> </xs:element>
> </xs:schema>
>
>
> Use xsl to convert it to a another xsd where patient info is required. I
> tried to use result-document with it, but could not get the syntax
> correct.
>
> <?xml version="1.0"?>
> <xsl:stylesheet version="2.0"
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> xmlns:hrm="http://www.hrmconsensus.org/layers">
>  <xsl:output method="xml" indent="no"/>
>  <xsl:template match="node()|@*">
>    <xsl:copy>
>      <xsl:apply-templates select="@*"/>
>      <xsl:apply-templates/>
>    </xsl:copy>
>  </xsl:template>
>
>  <xsl:template match="@hrm:patientInfo">
>    <xsl:attribute name="minOccurs">
>      <xsl:value-of select="1"/>
>    </xsl:attribute>
>  </xsl:template>
> </xsl:stylesheet>
>
>
> --
> View this message in context:
> http://www.nabble.com/Conditional-Levels-of-a-Schema-tp22905179p22929208.html
> Sent from the w3.org - xmlschema-dev mailing list archive at Nabble.com.
>
>
>




Re: Conditional Levels of a Schema

by Dieter Menne :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


info-1588 wrote:
but, if I do understand it well, this means that you have two different
(versions of the) schemas, with the same namespace, and different (although
slightly different) content.

This is something, just from a principal point of view, I do not like.
My principle is "new standard (version) => new schema (version) => new
namespace".
Point taken and I even was aware of it, but lost in namespace, so to say. Will
think it over.

info-1588 wrote:
In your case, I would definitely opt for writing a schematron (which is a
good excercise anyway).  ;-)
I am no longer sure if that is a good idea. Michael Kay already mentioned that in my case I do not want to validate data, but schemas. I want different schemas for hospital admins and researchers, so that some stacked series of schemas would be able to qualify the file as public, research, hospital.

Dieter


RE: Conditional Levels of a Schema

by Michael Kay :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> but, if I do understand it well, this means that you have two
> different (versions of the) schemas, with the same namespace,
> and different (although slightly different) content.
>
> This is something, just from a principal point of view, I do not like.
> My principle is "new standard (version) => new schema
> (version) => new namespace".

This is a very important question.

I've come to the conclusion that we do need multiple schemas for the same
namespace, for a variety of reasons:

(a) an organisation defines 400 message types for exchanging data between
different applications. There are many data elements shared between these
messages. It would greatly restrict reuse of code to have a different
namespace for each message type. Yet the validation rules are different: a
field that is optional in one message may well be mandatory in another.

(b) Different validation rules apply to the same document at different
stages in its life-cycle. You don't want to apply the same level of
validation to an internal draft document as you do to a final published
document. Yet both have to use the same namespace.

(c) The schema evolves. I don't believe it is practical to change the
namespace every time the schema changes - again, because that inhibits code
reuse. You want to be able to evolve gracefully, which means for example
that when you expand the range of values allowed for an attribute, existing
code continues to work provided the newly permitted values do not appear in
the instance, and might even work in the presence of the new value, if the
code was carefully written. Changing the namespace means that everyone has
to change their code at once, which simply doesn't work.

So the problem is that to identify a schema component, knowing the namespace
(and local name) isn't enough. There needs to be some other handle to
identify the "version" or "variant" we are after. I would like to see this
formalized, so that different versions/variants of the same schema component
can co-exist. At the moment the only identifier available is the schema
location, which is very weak for two reasons - (1) it's an address rather
than a name, and (2) the specs are full of stuff about it only being a hint.

Michael Kay
http://www.saxonica.com/



Re: Conditional Levels of a Schema

by XML4Pharma :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dear Michael,

In our development team (CDISC ODM standard) we had the same discussions and
issues, but we decided a bit differently - each time we change the schema
(new version, every 2-3 years) we also give it a new namespace.
Our standard is downwards compatible, so the changes required  (in software
that implements the standard)  with the new version are very small (a few
lines of code, the ones that define the namespace).

For your point a), I do not know whether your example is HL7-v3-XML.
This is also something I have been looking into in the last months.
My personal opionion is that the common (base) elements/attributes that can
be reused in all 400 messages should live in a separate namespace, and that
the main schemas for the individual messages should each have their own
namespace, but reference the base elements/attributes, meaning that these
will get a prefix in the instance documents. This also allows for different
use of the base elements/attributes depending on which of the message: an
attribute that is mandatory in one message can be optional in another.
Yes, it is more work - but much cleaner.

My principles in schema writing are based on the underlying set of
principles:
- validation by the schema as much as possible and desirable
- if that does not work anymore, or rules cannot be expressed in schema, use
schematron
- only in very last instance, when all this does not work anymore, write
software to implement validation of rules in the standard.

The reason for these is that writing software to implement/validate rules is
always considerably more expensive, and intransparent - usually the source
code is not published.
So for open standards, writing of validation software should be avoided as
much as possible, or it should be that the software just implements the
schema and schematron validation. If possible, the rules should not be
implemented in the software, but only in the schema and schematron.

One can, in some cases, also write software that acts independently of the
version of the standard, even when the schemas for the different versions
have different namespace names. In the past, I wrote a Clinical Study
Designer, which first reads the XML-Schema and generates all the GUI
elements from the information in the schema. The great advantage is that
when a new version of the standard comes out, the software is immediately
fit for it - no need or very little need for adaptions. Furthermore, it also
generates widgets for working with extensions of the schema (which is
allowed in our standard, as long as the extension elements/attributes live
in  a separate namespace).
When you do it like this, there is no issue with code reuse.

Of course we do not need to agree on all this - but I find this discussion
extremely interesting ...

With best regards,

Jozef Aerts
XML4Pharma



----- Original Message -----
From: "Michael Kay" <mike@...>
To: "'XML4Pharma'" <info@...>; "'Dieter Menne'"
<dieter.menne@...>; <xmlschema-dev@...>
Sent: Tuesday, April 07, 2009 5:08 PM
Subject: RE: Conditional Levels of a Schema


>> but, if I do understand it well, this means that you have two
>> different (versions of the) schemas, with the same namespace,
>> and different (although slightly different) content.
>>
>> This is something, just from a principal point of view, I do not like.
>> My principle is "new standard (version) => new schema
>> (version) => new namespace".
>
> This is a very important question.
>
> I've come to the conclusion that we do need multiple schemas for the same
> namespace, for a variety of reasons:
>
> (a) an organisation defines 400 message types for exchanging data between
> different applications. There are many data elements shared between these
> messages. It would greatly restrict reuse of code to have a different
> namespace for each message type. Yet the validation rules are different: a
> field that is optional in one message may well be mandatory in another.
>
> (b) Different validation rules apply to the same document at different
> stages in its life-cycle. You don't want to apply the same level of
> validation to an internal draft document as you do to a final published
> document. Yet both have to use the same namespace.
>
> (c) The schema evolves. I don't believe it is practical to change the
> namespace every time the schema changes - again, because that inhibits
> code
> reuse. You want to be able to evolve gracefully, which means for example
> that when you expand the range of values allowed for an attribute,
> existing
> code continues to work provided the newly permitted values do not appear
> in
> the instance, and might even work in the presence of the new value, if the
> code was carefully written. Changing the namespace means that everyone has
> to change their code at once, which simply doesn't work.
>
> So the problem is that to identify a schema component, knowing the
> namespace
> (and local name) isn't enough. There needs to be some other handle to
> identify the "version" or "variant" we are after. I would like to see this
> formalized, so that different versions/variants of the same schema
> component
> can co-exist. At the moment the only identifier available is the schema
> location, which is very weak for two reasons - (1) it's an address rather
> than a name, and (2) the specs are full of stuff about it only being a
> hint.
>
> Michael Kay
> http://www.saxonica.com/
>
>
>




Parent Message unknown Re: Conditional Levels of a Schema

by Arshad Noor :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dieter,

I am going to attempt to answer your question by providing a
solution from a different perspective - the Security one -
only because the issue you've raised stems from a security
requirement: preserving patient confidentiality based on
where the data exists/is used.

I am not an XML Schema expert and come to this forum occasionally
to get my own questions answered.  But, my career is currently
focused on addressing complex data-security issues and believe
that the solution to your question deserves another approach.

If the security-requirement I've stated above is correct, then
your approach to the solution is flawed.  You are making many
assumptions about the software and environment to preserve the
confidentiality of the patient data.  However, it is those very
assumptions by data-model designers and software programmers
that have, unfortunately, resulted in many vulnerable systems
today.  But you can solve your current design problem *and* the
security problem with what I've outlined below.

The security landscape is a vastly different environment today
than it was even five years ago, with professional attackers
being far superior to many standard software developers, IMHO,
in their knowledge of systems, software and vulnerabilities.
Evidence of this superiority is visible in the *known* breaches
at datalossdb.org; what is most problematic is what we don't
know that has been breached already, but will be discovered
many months/years later (Heartland Payment Systems).

That said, I believe the solution to the problem is simple:
assume that the network is compromised and that the host is
compromised.  You may rely on the fact that software has not
been replaced on the computer/device which uses your data, but
you cannot rely on the fact that there isn't something else
running on the system that's reading files and watching
traffic go by on the network adapter.

If the network/host are assumed to be compromised, how do you
address this problem?  By securing the data itself, through
message-level encryption within the application!

By encrypting the data and placing just a reference to the
key-identifier (using the XML Encryption XSD), you can now
use a single XSD for your own data and leave the patient
data in there all the time (minOccurs="1" all the time).

The difference is, those who need to see the actual data -
the hospital, for instance - would have the authorization to
retrieve the decryption-key from their key-management system
and read the data, while all others would not be able to see
it, despite having the data, knowing the key-identifier and
even knowing where to retrieve it from (we create open-source
software that provides this level of security).

This is a radically new paradigm for data-protection.

It allows you to stop worrying about whether the data belongs
in a specific place/application/device/etc. and lets you focus
on just managing access control to your keys.

It also solves your data-design problem: the patient data is
always present in the XSD and application rules are also simple
- unless they are authorized to retrieve the key, the extra data
is just noise.  (That is the only downside: the data is always
present.  But, in these days of megabit speeds to mobile devices,
and gigabit to desktop/laptops, I'm not so sure its an issue for
new applications).

I'm not trying to detract from the interesting discussion on
conditional processing of XSD elements - I'm sure there are many
other examples where such rules must be addressed.  I've only
offered this alternative, because of the underlying security
requirement in the problem statement.

Regards,

Arshad Noor
StrongAuth, Inc.


Dieter Menne wrote:

> Hi,
>
> we are currently defining a format for medical data storage
> (hrmconsensus.org). The full version is available
> http://hrmconsensus.org/media/hrm/xhrm/xhrm02/xhrm0_2.xsd here .
>
> In the simplified example below, we have the always mandatory deviceTyp. For
> patientsType, we would like to have a global conditional switch so that
> three flavors are possible
>
> -- minOccurs = "0" for internal clinical use
> -- minOccurs = "1" for archiving, must contain patient info
> -- minOccurs = "never" anonymized, must not contain patient info
>
> I know that the latter is not possible, that conditionals are not supported
> in XSL, and that Schematron would be an alternative.  Note that the
> conditionals occur in several nesting levels, so that we cannot easily
> combine versions of a master element with details, but they are always of
> the type "may", "must", "must not".
>
> We would like to avoid having several xsd files and prefer a common file
> with branching. Any ideas or references to ideas are appreciated.
>
> Dieter Menne
> on behalf of the hrmconsensus group.
>
>
> <?xml version="1.0" encoding="utf-8"?>
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="0.2">
> <xs:element name="xhrm">
> <xs:complexType>
> <xs:sequence>
> <xs:element name="device" type="deviceType"/>
> <xs:element name="patients" type="patientsType" minOccurs="0"/>
> </xs:sequence>
> </xs:complexType>
> </xs:element>
> </xs:schema>
>


Re: Conditional Levels of a Schema

by Dieter Menne :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Arshad Noor wrote:
By encrypting the data and placing just a reference to the
key-identifier (using the XML Encryption XSD), you can now
use a single XSD for your own data and leave the patient
data in there all the time (minOccurs="1" all the time).
Interesting point, and it certainly is a solution for the case of the patient records. For some other stuff like  "nice-to-have but better leave out if not required" (calibration) it is not an options.

I see the problem in the fact that this required decryption logic on the client side (but I may be wrong here, correct me please). It is an attractive feature for the hospitals to use the XML format to display customized reports; is it possible to transparently tell the browser "get your encoding if you can"?

Dieter


Re: Conditional Levels of a Schema

by Dieter Menne :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jozef,

info-1588 wrote:
For your point a), I do not know whether your example is HL7-v3-XML.
No, it is not HL7, but a rather specialized format for high resolution manometry data (hrmconsensus.org). It is most important to get three major vendors to support it, and for that reason it must be as simple a possible to be accepted; adding namespaces may build up another hurdle.

Currently, the schema has 1000 lines, which is small compared  HL7, but already too big to gain acceptance. Having several subschemas would kill the project, therefore I was looking for a solution where ONE liberal master (agreed by all) could semi-automatically spawn specialized version (done by me each time).

Dieter


RE: Conditional Levels of a Schema

by Michael Kay :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> I am going to attempt to answer your question by providing a
> solution from a different perspective - the Security one -
> only because the issue you've raised stems from a security
> requirement: preserving patient confidentiality based on
> where the data exists/is used.

I'm no security expert but it seems very surprising to me that an argument
based on security should lead you to include data in a message that the
recipient doesn't want or need. I would have thought the "need to know"
principle was still relevant.

>That is the only downside: the data is always present.  But, in these days
of megabit speeds to mobile devices, and gigabit to desktop/laptops, I'm not
so sure its an issue for new applications).

Wrong, it's a big issue. In the system I mentioned with 400 messages, many
trivial messages were reaching Gb size because the schema insisted on
inclusion of data that the recipient of the message wasn't interested in.
Rather than designing messages to match what the process model said was
needed on a particular data flow, they were designing messages based on the
static data model, so for example a complete bank account object was being
sent when the recipient only wanted to know the current balance.

Michael Kay
http://www.saxonica.com/




Re: Conditional Levels of a Schema

by Arshad Noor :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

So, you need to decide what is more important - brevity of schema
design, or brevity of information-transfer for a purpose (hospital
use vs. calibration, etc.).  Not all business cases will have
optimal answers.

If the client application needs to see the decrypted data, then yes,
it must have access to the decryption logic and key.  All others
can exclude the logic and ignore the ciphertext (encrypted data).

Currently, browsers have limited capabilities to work with new
key-management schemes; they only understand SSL/TLS and with local
( and proprietary) cryptographic key-stores.  So, any display of
encrypted data within a browser report must be handled by the
web-server before the browser receives the data - but this can be
transparent to the browser.  (I'm assuming this is what you meant
by "encoding").

Arshad Noor
StrongAuth, Inc.

Dieter Menne wrote:

>
> Arshad Noor wrote:
>>
>> By encrypting the data and placing just a reference to the
>> key-identifier (using the XML Encryption XSD), you can now
>> use a single XSD for your own data and leave the patient
>> data in there all the time (minOccurs="1" all the time).
>
> Interesting point, and it certainly is a solution for the case of the
> patient records. For some other stuff like  "nice-to-have but better leave
> out if not required" (calibration) it is not an options.
>
> I see the problem in the fact that this required decryption logic on the
> client side (but I may be wrong here, correct me please). It is an
> attractive feature for the hospitals to use the XML format to display
> customized reports; is it possible to transparently tell the browser "get
> your encoding if you can"?
>
> Dieter
>
>


Re: Conditional Levels of a Schema

by Arshad Noor :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Michael Kay wrote:
>
> I'm no security expert but it seems very surprising to me that an argument
> based on security should lead you to include data in a message that the
> recipient doesn't want or need. I would have thought the "need to know"
> principle was still relevant.
>
        The paradigm we are used to, currently, Michael, is to
        restrict access to data on a need-to-know (NTK) basis.
        In the paradigm that we promote, we stop worrying about
        data-flows and focus on access-control of decryption keys.

        The entire (failed) model of security today is because
        applications are designed with little or zero data-security
        inherent in them.  They all assume that the network/host
        will provide the protection.  However, evidence shows that
        this is fallacious.

        We believe that the model needs to be turned upside-down;
        that security must begin with the data by armoring the
        data first.  This allows the data to be secure no matter
        where it goes - disks, network, log-files, CDROMs, flash-
        disks, databases, etc.  This is very unlike reality today,
        where data is safe only on the SSL/IPSec wire, but completely
        unprotected the moment it comes out of that pipe.  Where
        do you think the attackers are focusing their attention?
        Outside the encrypted pipe.

        Some vendors tout encrypted databases, encrypted disk-drives,
        encrypted file-systems.  All of these are point-solutions
        that do not cover all the risks.

        With encryption *in the application*, you've addressed the
        vulnerability once and for all, leaving key-management as
        your biggest headache.  And, that's the problem we solved
        three years ago.
       

>> That is the only downside: the data is always present.  But, in these days
> of megabit speeds to mobile devices, and gigabit to desktop/laptops, I'm not
> so sure its an issue for new applications).
>
> Wrong, it's a big issue. In the system I mentioned with 400 messages, many
> trivial messages were reaching Gb size because the schema insisted on
> inclusion of data that the recipient of the message wasn't interested in.
> Rather than designing messages to match what the process model said was
> needed on a particular data flow, they were designing messages based on the
> static data model, so for example a complete bank account object was being
> sent when the recipient only wanted to know the current balance.

        I won't dispute that there are applications where this is a
        problem.  In those situations, the businesses must decide
        which cost they're willing to accept over the long-term:
        maintaining multiple schemas/application-logic that provide
        appropriate levels of data to an application, or dealing with
        extraneous data in a single schema/application-base.

        I believe the old expression - "you can't have your cake and
        eat it too" - comes to mind.  :-)

Arshad Noor
StrongAuth, Inc.