StAX from source

View: New views
2 Messages — Rating Filter:   Alert me  

StAX from source

by Chris Faulkner :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi

Having previously been a  user of various SAX libraries, DOM (and JDOM), I was very happy to have found StAX and Woodstox ! It feels a lot more natural to me, even after quite a long time with those other implementations.

Anyway, I am at the point where I have a need to strip out extraneous characters (linefeeds, long sections of repeating whitespace and tabs, etc) from within some of my elements. I had hoped to understand how do this (if possible) by downloading the source jars and begining to debug in Eclipse by attaching source in the normal way in there. I have been through some of the javadoc and documentation but I can't see how I might do this with the existing implementations.

1. Is there a way of doing what I need in terms of stripping those extraneous characters ?  For most of my elements, I am iterating through elements with nextTag() and getElementText / getElementAs... and I am not doing anything within the elements. Even if there is, I'd like to get my source build working - I may be able to provide help and input back to the project.

2. How do you build woodstox AND the stax2-3.0-api.jar from source ? I downloaded the woodstox source and tried "ant dist -f build.xml" but I get errors because it tries to import some other build files (for osgi) which aren't there. 

3. I can't find all of the source for the stax2-3.0-api.jar - this contains classes in org.codehaus.stax2 namespace. The org.codehaus.stax2.ri code is in the woodstox source download but not the rest of the stuff in that jar (evt, io, etc).

Thanks all


Re: StAX from source

by Cowtowncoder :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Sep 21, 2009 at 3:58 AM, Chris Faulkner
<chris.faulkner@...> wrote:
> Hi
>
> Having previously been a  user of various SAX libraries, DOM (and JDOM), I
> was very happy to have found StAX and Woodstox ! It feels a lot more natural
> to me, even after quite a long time with those other implementations.

Thanks!

> Anyway, I am at the point where I have a need to strip out extraneous
> characters (linefeeds, long sections of repeating whitespace and tabs, etc)
> from within some of my elements. I had hoped to understand how do this (if

Makes sense, yes.

> possible) by downloading the source jars and begining to debug in Eclipse by
> attaching source in the normal way in there. I have been through some of the
> javadoc and documentation but I can't see how I might do this with the
> existing implementations.

There are no options directly doing this, unless you have a DTD that
defines what whitespace is "ignorable".
XML by default assumes no textual content is meaningless, so it is
reported as regular CHARACTERS.
However, DTD can define content model for elements which does not
include any CDATA; if so, any white space included must then be
ignorable white space (for indentation); if so, it will be reported as
SPACE. So filtering these out would be one way to achieve this goal,
iff there's a DTD to use. Or theoretically Schema/RNG; but I don't
remember whethere that has been tested to work this way (both can
declare element-only content, but whether that gets properly
propagated through validation API).

> 1. Is there a way of doing what I need in terms of stripping those
> extraneous characters ?  For most of my elements, I am iterating through
> elements with nextTag() and getElementText / getElementAs... and I am not

One way to trim leading/trailing space would be to just do trim() on
results of getElementText()?
One problem is that getElementText() only works for text-only content.

For what it's worth, I personally use StaxMate for much of my XML
processing; it builds on Stax API, implements fully streaming
extensions that allow somewhat more convenient access. And features
like "advanced" white space processing would fit nicely within that
framework (no such functionality yet exists in StaxMate either tho).

> doing anything within the elements. Even if there is, I'd like to get my
> source build working - I may be able to provide help and input back to the
> project.

Yes, source build should work. And help is always appreciated! So:

> 2. How do you build woodstox AND the stax2-3.0-api.jar from source ? I
> downloaded the woodstox source and tried "ant dist -f build.xml" but I get
> errors because it tries to import some other build files (for osgi) which
> aren't there.
>
> 3. I can't find all of the source for the stax2-3.0-api.jar - this contains
> classes in org.codehaus.stax2 namespace. The org.codehaus.stax2.ri code is
> in the woodstox source download but not the rest of the stuff in that jar
> (evt, io, etc).

Is this from trunk or one of branches? All sources should of course be
included, so there may be an error in build setup, probably due to
refactoring done to split jars (core vs others). :-/

One known problem is that the OSGi build task is bit tricky to get
working; I just couldn't get it working without adding task jar in my
local Ant lib dir (so any help to make that work would be well
appreciated!). Jar in question is under lib/osgi/, and its declared in
and invoked from build-osgi.xml.
I am not aware of other problems; but it is of course possible
something might be missing from source package. Perhaps you could
download sources from svn repo to see what is missing?

-+ Tatu +-

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email