analyzing XML structures

View: New views
6 Messages — Rating Filter:   Alert me  

analyzing XML structures

by Alan Baljeu :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
I'm currently interested in examining XML documents to discover and abstract the implicit structure of the document.  This means, in the absence of a schema, proposing a schema.  In the presence of a schema, developing rules that would abstract redundancies in the document, such as
   ("project/file/name" ends with  ".pl") <=> ("project/file/type" = "prolog")

Exact syntax is irrelevant.  I expect to design the inference algorithms myself.

I'm now interested in what tools are most suitable for the tasks of analyzing XML, encoding knowledge about XML, and generating XML from such knowledge.  I think SWI would be a good tool for this, and I'm thinking what semantic web libraries are most helpful.  Do I want to use XML schemas, RDF, RDFS, OWL?  I'd appreciate input from those familiar with the uses of these languages, and with the tool support in SWI.


Alan Baljeu



Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! Mail

Re: analyzing XML structures

by Jan Wielemaker-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Alan,

On Friday 25 September 2009 09:55:41 pm Alan Baljeu wrote:

> I'm currently interested in examining XML documents to discover and
> abstract the implicit structure of the document.  This means, in the
> absence of a schema, proposing a schema.  In the presence of a schema,
> developing rules that would abstract redundancies in the document, such as
> ("project/file/name" ends with  ".pl") <=> ("project/file/type" = "prolog")
>
>
> Exact syntax is irrelevant.  I expect to design the inference algorithms
> myself.
>
> I'm now interested in what tools are most suitable for the tasks of
> analyzing XML, encoding knowledge about XML, and generating XML from such
> knowledge.  I think SWI would be a good tool for this, and I'm thinking
> what semantic web libraries are most helpful.  Do I want to use XML
> schemas, RDF, RDFS, OWL?  I'd appreciate input from those familiar with the
> uses of these languages, and with the tool support in SWI.

I guess SWI-Prolog has about everything you need to start with. Note
that it can generate a DTD from a document that has none and that you
can query that.

How to deal with the more abstract things is more difficult. Roughly,
I'd say that XML is an attributed tree of sequential text fragments. If
the XML is really describing data, it is pretty normal to put values
into the content and although there is ordering there, it is often
irrelevant. That type of data is easily mapped to RDF, which is much
better suited for that purpose. If sequence is of importance, this is
less clear, though it might be possible to go for hybrid
representations.

        Success --- Jan

_______________________________________________
SWI-Prolog mailing list
SWI-Prolog@...
https://mailbox.iai.uni-bonn.de/mailman/listinfo.cgi/swi-prolog

Re: analyzing XML structures

by Alan Baljeu :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
>> I'm now interested in what tools are most suitable for the tasks of
>> analyzing XML, encoding knowledge about XML, and generating XML from such

>> knowledge.  I think SWI would be a good tool for this, and I'm thinking
>> what semantic web libraries are most helpful.  Do I want to use XML
>> schemas, RDF, RDFS, OWL?  I'd appreciate input from those familiar with the
>> uses of these languages, and with the tool support in SWI.
>
>I guess SWI-Prolog has about everything you need to start with. Note
>that it can generate a DTD from a document that has none and that you
>can query that.
>
>How to deal with the more abstract things is more difficult. Roughly,
>I'd say that XML is an attributed tree of sequential text fragments. If
>the XML is really describing data, it is pretty normal to put values
>into the content and although there is ordering there, it is often
>irrelevant. That type of data is easily mapped to RDF, which is much
>better suited for that purpose. If sequence is of importance, this is
>less clear, though it might be possible to go for hybrid
>representations.
>
>    Success --- Jan

I'm concurrently looking at 10-100 data-oriented documents, where everything is
in attributes.  Sequence is arbitrary.  The most productive way to anaylze these
things seems to be to
attach ids to all the elements, and then break everything up into triples
(attribute-name, attribute-value, elementId) and (element-name, elementId, parentId),
and then search for patterns in the data.  Other breakdowns might also be useful.

Is that the form of mapping you meant?  Is there a standard mapping?

Alan



All new Yahoo! Mail - Get a sneak peak at messages with a handy reading pane.

Re: analyzing XML structures

by Jan Wielemaker-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Monday 28 September 2009 02:50:42 pm Alan Baljeu wrote:

> >> I'm now interested in what tools are most suitable for the tasks of
> >>
> >> analyzing XML, encoding knowledge about XML, and generating XML from
> >> such knowledge.  I think SWI would be a good tool for this, and I'm
> >> thinking what semantic web libraries are most helpful.  Do I want to use
> >> XML schemas, RDF, RDFS, OWL?  I'd appreciate input from those familiar
> >> with the uses of these languages, and with the tool support in SWI.
> >
> >I guess SWI-Prolog has about everything you need to start with. Note
> >that it can generate a DTD from a document that has none and that you
> >can query that.
> >
> >How to deal with the more abstract things is more difficult. Roughly,
> >I'd say that XML is an attributed tree of sequential text fragments. If
> >the XML is really describing data, it is pretty normal to put values
> >into the content and although there is ordering there, it is often
> >irrelevant. That type of data is easily mapped to RDF, which is much
> >better suited for that purpose. If sequence is of importance, this is
> >less clear, though it might be possible to go for hybrid
> >representations.
> >
> >    Success --- Jan
>
> I'm concurrently looking at 10-100 data-oriented documents, where
> everything is in attributes.  Sequence is arbitrary.  The most productive
> way to anaylze these things seems to be to
> attach ids to all the elements, and then break everything up into triples
> (attribute-name, attribute-value, elementId) and (element-name, elementId,
> parentId), and then search for patterns in the data.  Other breakdowns
> might also be useful.
>
> Is that the form of mapping you meant?  Is there a standard mapping?

Standard RDF/XML representation basically maps an element onto an RDF
blank-node, mapps all attributes and sub-elements that contain only
a simple literal value into literal attributes (using the attribute-name
or element-name) and maps complex elements into a new blank-node.  So,
you map:

<e1 p1="v1">
  <p2>v2</p2>
  <p3 p4="v3">
    <p5>v4</p5>
  </p3>
</e1>

into the following Turtle representation:

[ p1 "v1" ;
  p2 "v2" ;
  p3 [ p4 "v3"
       p5 "v4"
     ]
] .

Now, RDF/XML is much more complicated, allowing you to name nodes, put
links to other URIs into values, type literals, etc. Still, the above is
the core idea of RDF/XML.

It should be easy to create this mapping without all the RDF stuff from
an XML document. You can add a bit of extras, such as generating an
explicit URI for XML elements that have an ID attribute and linking to
nodes that have an IDREF attribute.

        Cheers --- Jan



_______________________________________________
SWI-Prolog mailing list
SWI-Prolog@...
https://mailbox.iai.uni-bonn.de/mailman/listinfo.cgi/swi-prolog

Re: analyzing XML structures

by Alan Baljeu :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
So are you suggesting to load_xml(File, Xml), and then write my own code to convert the XML to what I need, or are you suggesting there's some appropriate Turtle or RDF methods I should use?

 
Alan Baljeu
P.S. I've been getting errors with your email address.

From: Jan Wielemaker <J.Wielemaker@...>

Standard RDF/XML representation basically maps an element onto an RDF
blank-node, mapps all attributes and sub-elements that contain only
a simple literal value into literal attributes (using the attribute-name
or element-name) and maps complex elements into a new blank-node.  So,
you map:

<e1 p1="v1">
  <p2>v2</p2>
  <p3 p4="v3">
    <p5>v4</p5>
  </p3>
</e1>

into the following Turtle representation:

[ p1 "v1" ;
  p2 "v2" ;
  p3 [ p4 "v3"
      p5 "v4"
    ]
] .

Now, RDF/XML is much more complicated, allowing you to name nodes, put
links to other URIs into values, type literals, etc. Still, the above is
the core idea of RDF/XML.

It should be easy to create this mapping without all the RDF stuff from
an XML document. You can add a bit of extras, such as generating an
explicit URI for XML elements that have an ID attribute and linking to
nodes that have an IDREF attribute.

    Cheers --- Jan





Looking for the perfect gift? Give the gift of Flickr!

Re: analyzing XML structures

by Jan Wielemaker-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, 2009-09-28 at 10:47 -0700, Alan Baljeu wrote:
> So are you suggesting to load_xml(File, Xml), and then write my own
> code to convert the XML to what I need, or are you suggesting there's
> some appropriate Turtle or RDF methods I should use?

Yip.  You have to make the conversion from the output of load_xml/2 to
this RDF notation yourself.  That shouldn't be too hard.  This limits
you to files that fit onto the stacks.  If that is a problem, you'll
have to use the callback (event) API, but this complicates the code
significantly.  Of course, a 64-bit machine with plenty memory generally
fixes the problem too :-)

    --- Jan

> P.S. I've been getting errors with your email address.

I'm not aware of that.  I surely get enough mail :-)  Could you send
details (to one of my other addresses)?

_______________________________________________
SWI-Prolog mailing list
SWI-Prolog@...
https://mailbox.iai.uni-bonn.de/mailman/listinfo.cgi/swi-prolog