|
View:
New views
6 Messages
—
Rating Filter:
Alert me
|
|
|
analyzing XML structuresI'm currently interested in examining XML documents to discover and abstract the implicit structure of the document. This means, in the absence of a schema, proposing a schema. In the presence of a schema, developing rules that would abstract redundancies in the document, such as ("project/file/name" ends with ".pl") <=> ("project/file/type" = "prolog") Exact syntax is irrelevant. I expect to design the inference algorithms myself. I'm now interested in what tools are most suitable for the tasks of analyzing XML, encoding knowledge about XML, and generating XML from such knowledge. I think SWI would be a good tool for this, and I'm thinking what semantic web libraries are most helpful. Do I want to use XML schemas, RDF, RDFS, OWL? I'd appreciate input from those familiar with the uses of these languages, and with the tool support in SWI. Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! Mail |
|
|
Re: analyzing XML structuresHi Alan,
On Friday 25 September 2009 09:55:41 pm Alan Baljeu wrote: > I'm currently interested in examining XML documents to discover and > abstract the implicit structure of the document. This means, in the > absence of a schema, proposing a schema. In the presence of a schema, > developing rules that would abstract redundancies in the document, such as > ("project/file/name" ends with ".pl") <=> ("project/file/type" = "prolog") > > > Exact syntax is irrelevant. I expect to design the inference algorithms > myself. > > I'm now interested in what tools are most suitable for the tasks of > analyzing XML, encoding knowledge about XML, and generating XML from such > knowledge. I think SWI would be a good tool for this, and I'm thinking > what semantic web libraries are most helpful. Do I want to use XML > schemas, RDF, RDFS, OWL? I'd appreciate input from those familiar with the > uses of these languages, and with the tool support in SWI. I guess SWI-Prolog has about everything you need to start with. Note that it can generate a DTD from a document that has none and that you can query that. How to deal with the more abstract things is more difficult. Roughly, I'd say that XML is an attributed tree of sequential text fragments. If the XML is really describing data, it is pretty normal to put values into the content and although there is ordering there, it is often irrelevant. That type of data is easily mapped to RDF, which is much better suited for that purpose. If sequence is of importance, this is less clear, though it might be possible to go for hybrid representations. Success --- Jan _______________________________________________ SWI-Prolog mailing list SWI-Prolog@... https://mailbox.iai.uni-bonn.de/mailman/listinfo.cgi/swi-prolog |
|
|
Re: analyzing XML structures>> I'm now interested in what tools are most suitable for the tasks of >> analyzing XML, encoding knowledge about XML, and generating XML from such >> knowledge. I think SWI would be a good tool for this, and I'm thinking >> what semantic web libraries are most helpful. Do I want to use XML >> schemas, RDF, RDFS, OWL? I'd appreciate input from those familiar with the >> uses of these languages, and with the tool support in SWI. > >I guess SWI-Prolog has about everything you need to start with. Note >that it can generate a DTD from a document that has none and that you >can query that. > >How to deal with the more abstract things is more difficult. Roughly, >I'd say that XML is an attributed tree of sequential text fragments. If >the XML is really describing data, it is pretty normal to put values >into the content and although there is ordering there, it is often >irrelevant. That type of data is easily mapped to RDF, which is much >better suited for that purpose. If sequence is of importance, this is >less clear, though it might be possible to go for hybrid >representations. > > Success --- Jan I'm concurrently looking at 10-100 data-oriented documents, where everything is in attributes. Sequence is arbitrary. The most productive way to anaylze these things seems to be to attach ids to all the elements, and then break everything up into triples (attribute-name, attribute-value, elementId) and (element-name, elementId, parentId), and then search for patterns in the data. Other breakdowns might also be useful. Is that the form of mapping you meant? Is there a standard mapping? Alan All new Yahoo! Mail - Get a sneak peak at messages with a handy reading pane. |
|
|
Re: analyzing XML structuresOn Monday 28 September 2009 02:50:42 pm Alan Baljeu wrote:
> >> I'm now interested in what tools are most suitable for the tasks of > >> > >> analyzing XML, encoding knowledge about XML, and generating XML from > >> such knowledge. I think SWI would be a good tool for this, and I'm > >> thinking what semantic web libraries are most helpful. Do I want to use > >> XML schemas, RDF, RDFS, OWL? I'd appreciate input from those familiar > >> with the uses of these languages, and with the tool support in SWI. > > > >I guess SWI-Prolog has about everything you need to start with. Note > >that it can generate a DTD from a document that has none and that you > >can query that. > > > >How to deal with the more abstract things is more difficult. Roughly, > >I'd say that XML is an attributed tree of sequential text fragments. If > >the XML is really describing data, it is pretty normal to put values > >into the content and although there is ordering there, it is often > >irrelevant. That type of data is easily mapped to RDF, which is much > >better suited for that purpose. If sequence is of importance, this is > >less clear, though it might be possible to go for hybrid > >representations. > > > > Success --- Jan > > I'm concurrently looking at 10-100 data-oriented documents, where > everything is in attributes. Sequence is arbitrary. The most productive > way to anaylze these things seems to be to > attach ids to all the elements, and then break everything up into triples > (attribute-name, attribute-value, elementId) and (element-name, elementId, > parentId), and then search for patterns in the data. Other breakdowns > might also be useful. > > Is that the form of mapping you meant? Is there a standard mapping? Standard RDF/XML representation basically maps an element onto an RDF blank-node, mapps all attributes and sub-elements that contain only a simple literal value into literal attributes (using the attribute-name or element-name) and maps complex elements into a new blank-node. So, you map: <e1 p1="v1"> <p2>v2</p2> <p3 p4="v3"> <p5>v4</p5> </p3> </e1> into the following Turtle representation: [ p1 "v1" ; p2 "v2" ; p3 [ p4 "v3" p5 "v4" ] ] . Now, RDF/XML is much more complicated, allowing you to name nodes, put links to other URIs into values, type literals, etc. Still, the above is the core idea of RDF/XML. It should be easy to create this mapping without all the RDF stuff from an XML document. You can add a bit of extras, such as generating an explicit URI for XML elements that have an ID attribute and linking to nodes that have an IDREF attribute. Cheers --- Jan _______________________________________________ SWI-Prolog mailing list SWI-Prolog@... https://mailbox.iai.uni-bonn.de/mailman/listinfo.cgi/swi-prolog |
|
|
Re: analyzing XML structuresSo are you suggesting to load_xml(File, Xml), and then write my own code to convert the XML to what I need, or are you suggesting there's some appropriate Turtle or RDF methods I should use? P.S. I've been getting errors with your email address. From: Jan Wielemaker <J.Wielemaker@...> Standard RDF/XML representation basically maps an element onto an RDF blank-node, mapps all attributes and sub-elements that contain only a simple literal value into literal attributes (using the attribute-name or element-name) and maps complex elements into a new blank-node. So, you map: <e1 p1="v1"> <p2>v2</p2> <p3 p4="v3"> <p5>v4</p5> </p3> </e1> into the following Turtle representation: [ p1 "v1" ; p2 "v2" ; p3 [ p4 "v3" p5 "v4" ] ] . Now, RDF/XML is much more complicated, allowing you to name nodes, put links to other URIs into values, type literals, etc. Still, the above is the core idea of RDF/XML. It should be easy to create this mapping without all the RDF stuff from an XML document. You can add a bit of extras, such as generating an explicit URI for XML elements that have an ID attribute and linking to nodes that have an IDREF attribute. Cheers --- Jan Looking for the perfect gift? Give the gift of Flickr! |
|
|
Re: analyzing XML structuresOn Mon, 2009-09-28 at 10:47 -0700, Alan Baljeu wrote:
> So are you suggesting to load_xml(File, Xml), and then write my own > code to convert the XML to what I need, or are you suggesting there's > some appropriate Turtle or RDF methods I should use? Yip. You have to make the conversion from the output of load_xml/2 to this RDF notation yourself. That shouldn't be too hard. This limits you to files that fit onto the stacks. If that is a problem, you'll have to use the callback (event) API, but this complicates the code significantly. Of course, a 64-bit machine with plenty memory generally fixes the problem too :-) --- Jan > P.S. I've been getting errors with your email address. I'm not aware of that. I surely get enough mail :-) Could you send details (to one of my other addresses)? _______________________________________________ SWI-Prolog mailing list SWI-Prolog@... https://mailbox.iai.uni-bonn.de/mailman/listinfo.cgi/swi-prolog |
| Free embeddable forum powered by Nabble | Forum Help |