|
View:
New views
10 Messages
—
Rating Filter:
Alert me
|
|
|
Introducing myself - SOA organised with RDFHello
My name is Frank Carvalho, and this is my first post to this forum. I join the forum to be able to discuss the use of semantic web technologies in my organisation with other people, since there seems to be very few people actually involved with this around here. I am a computer scientist, and am employed by the danish government in the Central Customs and Tax Administration. We are reengineering our numerous systems to work in a SOA architecture - a considerable task that will take years and years, as we have several hundred systems, maintained by a number of suppliers and developed layer upon layer during the past 37 years. Needless to say that this legacy has turned into a maintenance nightmare of point-to-point wiring of heterogenous systems. So something had to me done, and the government decided to implement a SOA architecture, and reengineer the systems to connect through a service bus, using webservices, etc. etc.. It was clear to me from the beginning that a SOA soon will turn into another tower of babel, unless there's a clear strategy to normalize the contents flowing on the service bus, and to address the issues of versioning and development in knowledge. Therefore I started a parallel activity to organise new in-house development projects and the information they produce, so that a canonical ontology could be developed for the service bus. I found that RDF and to some extent OWL seemed the most promising technologies to back this effort up, for a number of reasons. First of all I found its simple and powerful structure an ideal model to describe the numerous modelling techniques we use - UML, BPMN, Rules, WSDL and XSD generation - in a uniform manner, so that information may be combined across the different techniques. Second we are facing a challenge of controlling our suppliers, rather than being controlled by them. This requires knowledge about the solutions. RDF also seems to be an ideal model for describing the suppliers source code and documentation, and combining it with our ontologies. The combination will enable us to construct impact analysis that will show how changes to our models and ontologies will have an impact on the actual systems and source code. This is the idea at least. So far we have built an information base that has something like 50000 objects defined, or something of that size, combining modelling from six actual projects into one large information base of RDF/XML. To handle an information base of this size, and to enable the information for the organization, I decided to go along with the open source XML database eXist. (If anybody has any practical experience of combining eXist with RDF, I would be interested to know). With eXist I have built XQueries to list information of specific interest, and others to enable browsing through the RDF graph. I have also built an XQL-query to make forward chaining of the graph. Performance seems to be an issue. If anybody knows how to tune XQuery and eXist, I would be grateful. I have tried to use CWM, but it seems to crash when I use large graphs. I have also made a simple gawk-script that can actually both make forward-chaining and backward-chaining very efficiently. But to cut the story short, I have a lot of activity going with RDF, but I am very alone here in my organization, so I hope to make new friends here with whom I can share experience. Frank Carvalho Central Customs and Tax Administration Denmark e-mail (work): frank.carvalho@skat.dk |
|
|
Re: Introducing myself - SOA organised with RDFHi Frank, Thanks for such an interesting introduction! > With eXist I have built XQueries to list information of specific > interest, > and others to enable browsing through the RDF graph. I have also > built an > XQL-query to make forward chaining of the graph. Performance seems > to be an > issue. If anybody knows how to tune XQuery and eXist, I would be > grateful. I would very much suggest using a dedicated RDF store (any one would do), rather than storing the XML serialization of the RDF graph in an XML database. You will gain the ability to run queries against the graph, rather than just one of its possible tree serializations, and your scalability problem goes away (for a while, at least). > I have tried to use CWM, but it seems to crash when I use large > graphs. I > have also made a simple gawk-script that can actually both make > forward-chaining and backward-chaining very efficiently. cwm is not really designed for large-scale storage. Take a look at this list of alternative systems on the ESW Wiki: <http://esw.w3.org/topic/ SemanticWebTools#head-805c63479c854babe4657d5184de605910f6d3e2> If you're dealing with large graphs (>100M triples), you might find this list useful. <http://esw.w3.org/topic/LargeTripleStores> If you need to do reasoning on large graphs, your choices are more limited, and the kind of reasoning you want to use might dictate your solution. (I won't reveal any biases on a public forum :D) -R |
|
|
RE: Introducing myself - SOA organised with RDFHi Frank, My name is Brian McBride and I work in the Semantic Web group at HPLabs in Bristol UK. We have been working on Semantic Web technology since around 2000 and I have a particular interest in application to IT systems inside enterprises, a class that includes government organizations. I'm writing because we seem to have a common interest and views. [...] > It was clear to me from the beginning that a SOA soon will > turn into another tower of babel, unless there's a clear > strategy to normalize the contents flowing on the service > bus, and to address the issues of versioning and development > in knowledge. That is my view too - though I don't have a lot of evidence I can point to in support of it. This is a great opportunity for Semantic Web technology. > > Therefore I started a parallel activity to organise new > in-house development projects and the information they > produce, so that a canonical ontology could be developed for > the service bus. I found that RDF and to some extent OWL > seemed the most promising technologies to back this effort > up, for a number of reasons. First of all I found its simple > and powerful structure an ideal model to describe the > numerous modelling techniques we use - UML, BPMN, Rules, WSDL > and XSD generation - in a uniform manner, so that information > may be combined across the different techniques. Just so. > > Second we are facing a challenge of controlling our > suppliers, rather than being controlled by them. I'm wondering what you mean by control there. It is well known that if a customer invests heavily in implementing systems that depend on the characteristics of system components, e.g. using proprietary data formats or APIs, then this creates a barrier to changing suppliers. I was expecting you to write that because RDF is based on standards, it would be in customer's interests to promote its use to give them the flexibility to change supplier. But that's not what you wrote ... > This > requires knowledge about the solutions. RDF also seems to be > an ideal model for describing the suppliers source code and > documentation, and combining it with our ontologies. The > combination will enable us to construct impact analysis that > will show how changes to our models and ontologies will have > an impact on the actual systems and source code. This is the > idea at least. Ah right. I think there are number of existing solutions that do this - though not using RDF - e.g. IBM's metadata server. Have you looked at that. Is there something missing from that solution that RDF would address? > > So far we have built an information base that has something > like 50000 objects defined, or something of that size, > combining modelling from six actual projects into one large > information base of RDF/XML. To handle an information base of > this size, and to enable the information for the > organization, I decided to go along with the open source XML > database eXist. > (If anybody has any practical experience of combining eXist > with RDF, I would be interested to know). It is important to bear in mind that its best to think of RDF in terms of its abstract syntax, i.e. a graph of nodes, rather than the RDF/XML concrete syntax. There are a number of systems around that will store significant numbers of RDF triples in a relational store. We do one, Jena (http://jena.sourceforge.net) and there are others - sesame, mulgari, redland, etc. I'd strongly suggest you take a look at these, or, if you really feel an XML database is the way to go - I'd like to understand why. > > With eXist I have built XQueries to list information of > specific interest, and others to enable browsing through the > RDF graph. I have also built an XQL-query to make forward > chaining of the graph. Performance seems to be an issue. If > anybody knows how to tune XQuery and eXist, I would be grateful. An issue with using XML is that that same RDF graph can be represented many different ways in RDF/XML. This would make your queries dependent on the particular way that an RDF/XML document happened to represent a graph - and that's just - well - wrong - you would be programming to an inappropriate level of abstraction. > > I have tried to use CWM, but it seems to crash when I use > large graphs. I have also made a simple gawk-script that can > actually both make forward-chaining and backward-chaining > very efficiently. CWM is more generally used for its powerful rule capabilities on relatively small datasets. Jena also has rules - but they only really work on small'ish in memory graphs - they are too slow over large datasets at present. > > But to cut the story short, I have a lot of activity going > with RDF, but I am very alone here in my organization, so I > hope to make new friends here with whom I can share experience. I'd be very interested in talking with you; I'm happy to share our experience with you and am hoping to learn more about your applications and requirements to aid in our development efforts. Brian |
|
|
Re: Introducing myself - SOA organised with RDFHello Frank,
My name is Steve Sears, I'm the product manager for AllegroGraph at Franz Inc. in California. >But to cut the story short, I have a lot of activity going with RDF, but I am very alone here in my >organization, so I hope to make new friends here with whom I can share experience. I noticed replies to your introduction were mentioning native triple stores. I'd like to pass on information to you about AllegroGraph, a modern persistent, disk-based graph database with support for high-performance, scalable RDF triple-stores. AllegroGraph supports various query mechanisms, including SPARQL and RDFS++ which is a high performance, practical subset of OWL. We've worked with Racer Systems to integrate their full OWL description logic, RacerPro, and TopQuadrant to integrate with their TopBraid Composer Semantic Application platform. There are evaluation versions you can download and explore on our website, http://www.franz.com/products/ We are currently working with large customers mostly in the United States and Asia, eg. NASA, Boeing, Raytheon, BAE Systems, NSA, Kodak, GSK, the Malaysian Government, etc. We do know a consultant working in your time zone with knowledge of our products and with customs domain experience. I would be happy to make an introduction if you have interest. Best regards, Steve |
|
|
Re: Introducing myself - SOA organised with RDFHi, and thanks already for some very good and relevant answers.
Richard Newman wrote: > I would very much suggest using a dedicated RDF store (any one >would do), rather than storing the XML serialization of the RDF graph >in an XML database. You will gain the ability to run queries against >the graph, rather than just one of its possible tree serializations, >and your scalability problem goes away (for a while, at least). Well, I don't really understand if there is any theoretical difference between querying the XML serialization and the graph itself, if the serialization is in fact a representation of the graph. What do you mean when you say "tree serialization", BTW? The only serialization I work with is a large set of triples. I do reckon though that a dedicated store of course may be a lot more efficient than a general purpose XML database. > cwm is not really designed for large-scale storage. No, I kind of suspected that from it's behaviour. It's really a shame. It was descibed as a sort of RDF swiss army knife, and on small graphs it seems to be able to merge graphs nicely. But when I started to load large graphs, it came up with odd errors. >Take a look at >this list of alternative systems on the ESW Wiki: ><http://esw.w3.org/topic/ >SemanticWebTools#head-805c63479c854babe4657d5184de605910f6d3e2> > > If you're dealing with large graphs (>100M triples), you might find this list useful. > ><http://esw.w3.org/topic/LargeTripleStores> Very helpful, thank you. I will take a look at those. Eventually I suspect we will be using very large graphs. The current ones are perhaps up to 20M, but given all the tasks we plan on using the graphs for, we are likely to increase this number significantly. > If you need to do reasoning on large graphs, your choices are more >limited, and the kind of reasoning you want to use might dictate your >solution. (I won't reveal any biases on a public forum :D) In fact we don't need reasoning so much yet. It is the "resource description" aspect that currently has the biggest importance for us. We need to be able to do a lot of forward and backward chaining, but if I am not mistaken that really is not the same as reasoning. I do expect to assign some proper Owl interprations to the UML class diagrams - and probably the contents of the entire modelling tool we're using - some day, but as I don't really see how I can explain any specific benefits to my organization by doing so, that idea has a low priority right now. (I drive this project by visible benefits). Brian McBride also wrote, and thank you also, Brian, for your inspiring answers. Comments to the comments: >> Second we are facing a challenge of controlling our >> suppliers, rather than being controlled by them. > >I'm wondering what you mean by control there. It is well known that if >a customer invests heavily in implementing systems that depend on the >characteristics of system components, e.g. using proprietary data >formats or APIs, then this creates a barrier to changing suppliers. I >was expecting you to write that because RDF is based on standards, it >would be in customer's interests to promote its use to give them the >flexibility to change supplier. But that's not what you wrote ... What I meant was that the cooperation with us as customers and our suppliers traditionally has been on the terms of the suppliers. Our organization has a lot of business knowledge, but very little professional IT experience. So historically the suppliers have had succes convincing the organisation to buy suboptimal solutions at a too high price. Our department is there to change that, and to professionalize us as customers. "Control" was perhaps a bad word. "In charge" would have been better. Technically what we do is to establish well-defined webservicees between the many systems we have, and our SOA infrastructure. We have no intentions of dictating the internal designs of the systems - some of them are old COBOL systems anyway. In a SOA, the systems are characterized entirely by their interfaces - as a black box. So we only dictate the interfaces and leave the internal system design to the suppliers (roughly speaking). We rather want to collect the documentation of those (heterogenous) solutions, connect the documentation to the main graphs (by generating more bits of RDF from the documentation), and thus enable impact analysis into the system. The purpose is, of course, to be able to assert the extent and cost of changes, by analysing the amount of derived change it may require. >Ah right. I think there are number of existing solutions that do this - >though not using RDF - e.g. IBM's metadata server. Have you looked at >that. Is there something missing from that solution that RDF would >address? We are frequently being contacted by vendors of metadata systems, and I am not surprised that IBM also has such a product. We are using Telelogic System Architect here. Also, our experiences with suppliers of metadata repositories have been very bad so far. However, my main concern is to avoid vendor lock-in and proprietary internal formats. To do so I believe it is paramount to use open portable standards to carry the meta-information. This is where RDF comes in. RDF is easier to migrate between platforms, using the same core graphs. And it it will be much easier to integrate different sources of metadata, without proprietary point-to-point system integrations. Currenly we have a big issue here about carrying information between Systinet Information Manager and Telelogic System Architect. A direct integration may prove costly. But if both tools had an import/export facility for RDF, they could at least add useful information to the same pool of metadata. I am sure I could extract all essential information from both tools entirely into RDF and make it useful in other tools. It will not solve all problems as that information still has to be interpreted to be useful, but even so, if two different tools share nodes, their graphs will be able to connect, and new information can be extracted. I think it is a big step in the right direction. >It is important to bear in mind that its best to think of RDF in terms >of its abstract syntax, i.e. a graph of nodes, rather than the RDF/XML >concrete syntax. Well, this is also how I think of it. >There are a number of systems around that will store >significant numbers of RDF triples in a relational store. We do one, >Jena (http://jena.sourceforge.net) and there are others - sesame, >mulgari, redland, etc. I'd strongly suggest you take a look at these, >or, if you really feel an XML database is the way to go - I'd like to >understand why. > >An issue with using XML is that that same RDF graph can be represented >many different ways in RDF/XML. This would make your queries dependent >on the particular way that an RDF/XML document happened to represent a >graph - and that's just - well - wrong - you would be programming to an >inappropriate level of abstraction. Yes, this is my experience too. It took me some time to understand the different weird RDF/XML notations I found at the w3c specification, until I started to see it as "syntactic sugar", which in turn means that each block of RDF/XML could be broken down into a number of simple triples. After realising that I started to ignore the more "user-friendly" syntaxes of the w3c spec, and stick to the simplest form. In fact I always reduce the graphs to simple form before I load them into the database. I looked at cwm mainly to see if it could work as a tool to break the graphs down into triples. My first attempts to break down compound expressions into triples with XQuery were not succesful, so currently I'm doing it externally before I ever load the data into the store. The RDF I generate myself is always RDF/XML in its simplest form - as triples - and the way I use the XML database is from the assumption that I deal with triples exclusively. This makes it much easier to build sensible database indexes in the XML database, where you index the node ids. Performance is not spectacular, but is currently at an acceptable enough level to be useful. I use an XML database (eXist) mainly because I have a long history with XML, XSD, XSL, and now XQuery, so I can use knowledge I already have. Also because of the portable nature of XML, XSL and XQuery, and the numerous products empowering XML. (The vendor lock-in issue), and also because I like how the database integrates with web browsers, and is easy to load and maintain etc.. In any case, as long as the core data are triples, I think that a move from RDF/XML to a dedicated RDF store can be done at any time, should it be necessary for performance reasons. >I'd be very interested in talking with you; I'm happy to share our >experience with you and am hoping to learn more about your applications >and requirements to aid in our development efforts. Well, I also hope we can continue this discussion. I have already gotten some useful links. Best Frank Carvalho |
|
|
Re: Introducing myself - SOA organised with RDF> What do you mean when you say "tree serialization", BTW? The only > serialization I work with > is a large set of triples. (I see from other email that you already normalize your data, but I already had this written.) There are multiple possible serializations in RDF/XML of an RDF graph. (In fact, some RDF graphs can't be serialized in RDF/XML.) For example, the triples ex:foo ex:bar "baz" . ex:foo rdf:type ex:Thing. can be serialized as <RDF xmlns:ex="http://ex/" xmlns="http://www.w3.org/1999/02/22-rdf- syntax-ns#"> <ex:Thing resource="http://ex/foo" ex:bar="baz"/> </RDF> or as <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="http://ex/foo"> <ex:bar>baz</ex:bar> <rdf:type rdf:resource="http://ex/Thing"/> </rdf:Description> </rdf:RDF> ... or in a few other ways. Now, what XSLT will you use to get the type of ex:foo, or the value of ex:bar? Using XML tools on RDF/XML will require you to either normalize your representation, or write convoluted queries to handle RDF/XML's many possible appearances. If you're normalizing down into triple-level chunks, why not use a real triple store? >> cwm is not really designed for large-scale storage. > > No, I kind of suspected that from it's behaviour. It's really a > shame. It > was descibed as a sort of RDF swiss army knife, and on small graphs > it seems > to be able to merge graphs nicely. But when I started to load large > graphs, > it came up with odd errors. Yup, Swiss Army knife, not chainsaw :D -R |
|
|
Re: Introducing myself - SOA organised with RDFMcBride, Brian wrote: > Hi Frank, > > My name is Brian McBride and I work in the Semantic Web group at HPLabs > in Bristol UK. We have been working on Semantic Web technology since > around 2000 and I have a particular interest in application to IT > systems inside enterprises, a class that includes government > organizations. I'm writing because we seem to have a common interest > and views. > > [...] > > >> It was clear to me from the beginning that a SOA soon will >> turn into another tower of babel, unless there's a clear >> strategy to normalize the contents flowing on the service >> bus, and to address the issues of versioning and development >> in knowledge. >> > > That is my view too - though I don't have a lot of evidence I can point > to in support of it. This is a great opportunity for Semantic Web > technology. > > ... > Brian > or XML based (i.e. RPC) rather than RDF or RDF rich query (RDF, rules, remote queries), it is usually synchronous, half-duplex, RPC-styled, and layering that may be non-optimal. Not only is there a tower of babel problem to some extent up front, but resulting applications are fragile due to procedural back-end coupling. We have a Knowledge Oriented Architecture (KOA) that is fully decoupled (both "front" and "back" side), rule-driven, and RDF metadata and knowledge driven. We have been exploring a number of architectural ideas in this area for large clients. A public version is planned. sdw |
|
|
RE: Introducing myself - SOA organised with RDFHi
>It is important to bear in mind that its best to think of RDF in terms >of its abstract syntax, i.e. a graph of nodes, rather than the RDF/XML >concrete syntax. There are a number of systems around that will store >significant numbers of RDF triples in a relational store. We do one, >Jena (http://jena.sourceforge.net) and there are others - sesame, >mulgari, redland, etc. I'd strongly suggest you take a look at these, >or, if you really feel an XML database is the way to go - I'd like to >understand why Based on your recommendation I decided to give Jena a go. We installed a Postgres on the Linux box we're using for persistence, and we've been able to load up our database and make SPARQL queries on the graph. However my initial experiences with the response time of these SPARQL queries is not all that convincing. It seems that using the Java API is very fast, but has the drawback that it requires the queries to be pre-programmed. That may be fine for specific types of standard requests, but for the ability to quickly build and execute more tailormade requests we really need to be able to use SPARQL. My test case is to make a SPARQL expression for immediate graph vicinity to a specific node - an operation that is a must for jumping from node to node through a graphical RDF-viewer like Welkin. This operation can be performed in 10-15 seconds on my eXist XML database. The SPARQL-query I made for Jena took 75 sec. to execute on the very same set of data. (This of course could be because I am still a wannabe SPARQL programmer.) So my question is: Is this an inherent problem of executing SPARQL, or do I have look out for specific properties of the query complexity when I design my queries. Here's another question: Is it at all possible using SPARQL to define a query that will return a graph containing all nodes by forward-chaining from a specific start node? It seems to me that SPARQL is similar to SQL in that it does not have a "memory", that will enable me to remember visited nodes in a graph. Also I am not aware of any recursive features I can use in my queries. Or do I just need to go back and read some more?..... |
|
|
Re: Introducing myself - SOA organised with RDF2007/11/21, Frank Carvalho <dko4342@...>:
If I got your point, you want ways to hop from a node to another node in the proximity, say A is your node, you want node B such that: A someproperty B In this case it may be faster to use the Jena API instead of SPARQL: once you get A in some way, you call A.listProperties() to have all statements with A as subject (including those referring to B). More specialized requests, such as list the properties with a specific predicate, are easy as well. HTh, I. -- |
|
|
Re: Introducing myself - SOA organised with RDF>If I got your point, you want ways to hop from a node to another node in the >proximity, say A is your node, you want node B such that: >A someproperty B Yes, that's my standard test case, though I also need those nodes where C someproperty A. That is, the immediate vicinity of A - both forward and backward. >In this case it may be faster to use the Jena API instead of SPARQL: once >you get A in some way, you call A.listProperties() to have all statements >with A as subject (including those referring to B). More specialized >requests, such as list the properties with a specific predicate, are easy as >well. We did actually do this using the API, and it is vastly faster, so for the graphical exploration of the graph we would certainly use that approach in any case. I also suspect that the API is really the only way to make forward and backward chaining efficiently. My concern is really more if the SPARQL language is nice, but inefficient for practical purposes. You see, when the graph is a way to describe the artifacts for a SOA, and we want to enable the business to make intelligent analysis of the impact of changes, we need to be able to tailor make requests for each purpose. So we need a language like SPARQL to do that in a flexible way. So it is a serious problem if the language - or the implementation of the query engine - is inherently slow. /Frank |
| Free embeddable forum powered by Nabble | Forum Help |