Testing for mixed content when using a DTD

View: New views
4 Messages — Rating Filter:   Alert me  

Testing for mixed content when using a DTD

by Jack Rugh :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Tatu:

I'm using Woodstox-4.0.2, and am parsing documents using an XML DTD.  In
the input event processing, when I have an
"XMLStreamConstants.START_ELEMENT" event, I need to know whether the
element allows mixed content.  For example <!ELEMENT no-mixed-content
(mixed-content)+> which does not allow mixed content vs. <!ELEMENT
mixed-content (#PCDATA)> or <!ELEMENT mixed-content (#PCDATA | bold)*>.

In the XMLValidator class I see static fields "CONTENT_ALL_WS".  Can it
be used to determine that mixed content is not allowed in the element
currently being processed?  If so, is there a practical way to get
access to that value for the current element?  In bowsing the user list,
I do see a "25 April 2006" message from you that suggests there may be
no easy way to get at the information unless Stax2 has been updated
since then.

If all else fails, I can build a list of elements for the specific DTD
being used, but I hope that would not be necessary.

Thanks in advance,
Jack......
-----------------------------
Jack S. Rugh
Retrieval Systems Corporation
2071 Chain Bridge Road
Suite 510
Vienna, VA 22182
703-749-0012 ext. 335
http://retrievalsystems.com
-----------------------------

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Testing for mixed content when using a DTD

by Cowtowncoder :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Apr 20, 2009 at 7:42 AM, Jack Rugh <jrugh@...> wrote:
> Hi Tatu:

Hi Jack!

> I'm using Woodstox-4.0.2, and am parsing documents using an XML DTD.  In
> the input event processing, when I have an
> "XMLStreamConstants.START_ELEMENT" event, I need to know whether the
> element allows mixed content.  For example <!ELEMENT no-mixed-content
> (mixed-content)+> which does not allow mixed content vs. <!ELEMENT
> mixed-content (#PCDATA)> or <!ELEMENT mixed-content (#PCDATA | bold)*>.

Ok, yes, makes sense.

> In the XMLValidator class I see static fields "CONTENT_ALL_WS".  Can it
> be used to determine that mixed content is not allowed in the element
> currently being processed?  If so, is there a practical way to get
> access to that value for the current element?  In bowsing the user list,
> I do see a "25 April 2006" message from you that suggests there may be
> no easy way to get at the information unless Stax2 has been updated
> since then.
>
> If all else fails, I can build a list of elements for the specific DTD
> being used, but I hope that would not be necessary.

It has been a while since I went through that part of code, so I may
be missing something, but I don't think Stax2 API gives you enough
visibility.
For what it's worth, yes, those ALLOW_xxx constants from XMLValidator
are what you would need.
So one possibility would be to expose currently used value for the
"legal content"; either via XMLValidator instance, or
XMLStreamReader2.
One potential issue is that there may be multiple validators, so
reader would return an aggregation. Probably not a problem for most
cases.
Another problem is that currently there is no way (I think) to get
access to validators being used by a reader -- not a problem if caller
has explicitly started validation, otherwise is.

However, there might be a simpler way. Woodstox DTDSubset (which is an
implementation of Stax2 XMLValidationSchema) has method
"getElementMap" (not part of Stax2 interface though). Key is prefixed
element name, and value DTDElement. DTDElement has method
'getAllowedContent', which is what you would be interested in.
You may need to read that DTD Subset (schema) in separate from
parsing, unless there's a way to find it.

So that just might work for you -- not the cleanest way, but it's
there and usable.
Let me know if that'd work,

-+ Tatu +-

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



RE: Testing for mixed content when using a DTD

by Jack Rugh :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Tatu:

Thanks for the information.  I implemented your "simpler approach" and
it worked like a champ.

Jack.....

> -----Original Message-----
> From: Tatu Saloranta [mailto:tsaloranta@...]
> Sent: Monday, April 20, 2009 12:54 PM
> To: user@...
> Subject: Re: [woodstox-user] Testing for mixed content when
> using a DTD
>
> On Mon, Apr 20, 2009 at 7:42 AM, Jack Rugh
> <jrugh@...> wrote:
> > Hi Tatu:
>
> Hi Jack!
>
> > I'm using Woodstox-4.0.2, and am parsing documents using an
> XML DTD.  
> > In the input event processing, when I have an
> > "XMLStreamConstants.START_ELEMENT" event, I need to know
> whether the
> > element allows mixed content.  For example <!ELEMENT
> no-mixed-content
> > (mixed-content)+> which does not allow mixed content vs. <!ELEMENT
> > mixed-content (#PCDATA)> or <!ELEMENT mixed-content
> (#PCDATA | bold)*>.
>
> Ok, yes, makes sense.
>
> > In the XMLValidator class I see static fields
> "CONTENT_ALL_WS".  Can
> > it be used to determine that mixed content is not allowed in the
> > element currently being processed?  If so, is there a
> practical way to
> > get access to that value for the current element?  In
> bowsing the user
> > list, I do see a "25 April 2006" message from you that
> suggests there
> > may be no easy way to get at the information unless Stax2 has been
> > updated since then.
> >
> > If all else fails, I can build a list of elements for the
> specific DTD
> > being used, but I hope that would not be necessary.
>
> It has been a while since I went through that part of code,
> so I may be missing something, but I don't think Stax2 API
> gives you enough visibility.
> For what it's worth, yes, those ALLOW_xxx constants from
> XMLValidator are what you would need.
> So one possibility would be to expose currently used value
> for the "legal content"; either via XMLValidator instance, or
> XMLStreamReader2.
> One potential issue is that there may be multiple validators,
> so reader would return an aggregation. Probably not a problem
> for most cases.
> Another problem is that currently there is no way (I think)
> to get access to validators being used by a reader -- not a
> problem if caller has explicitly started validation, otherwise is.
>
> However, there might be a simpler way. Woodstox DTDSubset
> (which is an implementation of Stax2 XMLValidationSchema) has
> method "getElementMap" (not part of Stax2 interface though).
> Key is prefixed element name, and value DTDElement.
> DTDElement has method 'getAllowedContent', which is what you
> would be interested in.
> You may need to read that DTD Subset (schema) in separate
> from parsing, unless there's a way to find it.
>
> So that just might work for you -- not the cleanest way, but
> it's there and usable.
> Let me know if that'd work,
>
> -+ Tatu +-
>
> ---------------------------------------------------------------------
> To unsubscribe from this list, please visit:
>
>     http://xircles.codehaus.org/manage_email
>
>
>

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Testing for mixed content when using a DTD

by Cowtowncoder :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Apr 21, 2009 at 1:51 PM, Jack Rugh <jrugh@...> wrote:
> Hi Tatu:
>
> Thanks for the information.  I implemented your "simpler approach" and
> it worked like a champ.

Great!

-+ Tatu +-

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email