|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
Re: RDF dataset/SPARQL endpoint descriptionsHi there,
I've implemented several features for ARQ/Joseki which I require for query federation with SemWIQ [1]. Feel free to try it out and comment. http://ramses.faw.uni-linz.ac.at:8900 As I suggested before I would separate service and dataset descriptions (e.g. provide DESCRIBE DATASET and DESCRIBE SERVICE and sub queries such as SELECT * FROM { DESCRIBE DATASET } WHERE ... later on) In order to benefit from HTTP caching, the endpoint should not only provide dataset descriptions (possibly including statistics) as SPARQL results, it should allow to retrieve them via HTTP and check the Last-modified header, etc. That's what I'm doing when monitoring datasources for SemWIQ => exec DESCRIBE SERVICE and follow the void:Dataset link. For query federation, it would be very useful if the future SPARQL REC supports BINDINGS such as introduced by Eric [2] before. My proposal works with a set of bindings with a special "null" keyword for unbound variables, e.g.: SELECT * WHERE { ?s :p ?a ; :p ?b ; ... } BINDINGS ?a ?b { bsbm:Product "34"^^xsd:int . null "23"^^xsd:int . foaf:Person . // remaining slots are interpreted as empty (null) } It is not much effort for implementers and a federated query processor can then process pipelined blocks of queries more efficiently. Regards, AndyL [1] http://semwiq.sourceforge.net - available soon [2] http://www.w3.org/2007/05/SPARQLfed/ > > Hi Andy, > > Thanks for sharing the experience. Separating the service from the > dataset makes a lot of sense, I like that idea. > > Cheers, > Benji |
|
|
Re: RDF dataset/SPARQL endpoint descriptionsOn Sep 8, 2009, at 2:21 PM, Andreas Langegger wrote:
> As I suggested before I would separate service and dataset > descriptions (e.g. provide DESCRIBE DATASET and DESCRIBE SERVICE and > sub queries such as SELECT * FROM { DESCRIBE DATASET } WHERE ... > later on) It looks like a "DESCRIBE DATASET/SERVICE" won't be the path taken, as there are some concerns about this operating at the query engine level, when it's really a protocol operation. The exact method for how to do it hasn't been nailed down yet, but some of the options under discussion are: - an HTTP response header linking to a service description document - the use of the HTTP OPTIONS verb on the endpoint URI - using content negotiation on the endpoint URI to request RDF (or possibly having the endpoint URI return RDFa) - a new protocol operation (/sparql?serviceDescription) > In order to benefit from HTTP caching, the endpoint should not only > provide dataset descriptions (possibly including statistics) as > SPARQL results, it should allow to retrieve them via HTTP and check > the Last-modified header, etc. That's what I'm doing when monitoring > datasources for SemWIQ => exec DESCRIBE SERVICE and follow the > void:Dataset link. Can you explain why returning a dataset description as SPARQL results would be better than returning it as RDF? > For query federation, it would be very useful if the future SPARQL > REC supports BINDINGS such as introduced by Eric [2] before. My > proposal works with a set of bindings with a special "null" keyword > for unbound variables, e.g.: ... > It is not much effort for implementers and a federated query > processor can then process pipelined blocks of queries more > efficiently. Unfortunately, this won't be part of the next SPARQL version, but service descriptions should allow any implementations to declare that they support such an extension. .greg |
|
|
Re: RDF dataset/SPARQL endpoint descriptionsHi Greg,
Gregory Williams wrote: > It looks like a "DESCRIBE DATASET/SERVICE" won't be the path taken, as > there are some concerns about this operating at the query engine level, > when it's really a protocol operation. The exact method for how to do it > hasn't been nailed down yet, but some of the options under discussion are: I also had some concerns when changing the grammar, because parts of a description may be "protocol"-specific. Why should a non-HTTP query engine (e.g. RDF store) provide SPARQL endpoint descriptions that are only relevant when using the HTTP protocol? I'll explain what I've experienced. Many features are query engine specific (e.g. fulltext seach, query language, initial bindings, etc.) and should be announced in a query engine specific way and not via the SPARQL HTTP protocol. I'd advice not to announce them as OPTIONS/X-Headers/etc. since they are relevant to any client using the query engine even without a HTTP endpoint (e.g. via in-process API, ODBC/Virtuoso, etc.) - Can you give me a reason why HTTP OPTIONS/X-Headers makes more sense? > - an HTTP response header linking to a service description document > - the use of the HTTP OPTIONS verb on the endpoint URI > - using content negotiation on the endpoint URI to request RDF (or > possibly having the endpoint URI return RDFa) > - a new protocol operation (/sparql?serviceDescription) I would also prefer an approach which allows to query endpoint descriptions with sub queries or FROM <uri>. It would allow clients to read descriptions without the need for parsing them, they may not have a SPARQL engine themselves. That won't be possible with HTTP OPTIONS, nor with X-Headers, I'd prefer a new protocol operation, as you suggested ?serviceDescription or maybe just ?desc. e.g. http://example.com/sparql?query=select+*+from+<http%3A%2F%2Fexample.com%2Fsparql%3Fdesc>+where+{+%3Fs+%3Fp+%3Fo+} would work (if FROM is allowed, should always allow local URIs) Some features are SPARQL protocol-only, such as result formats. I would suggest a way where the SPARQL endpoint can inject statements into the description generated by the query engine originally and pass ith trough. Those parts of the description will only be provided when a client connects via HTTP/SPARQL protocol. I think there are many non-LOD applications using SPARQL without HTTP. They should also be able to check out if a query engine supports full-text search, etc.! Dataset descriptions such as voiD are not protocol specific either. They exclusively relate to the dataset served. Why not provide such meta data in a more generic form than via HTTP SPARQL? A DESCRIBE DATASET would really make sense. If the query engine has no such information it would return an empty model, which would be more than correct. Are there any concerns with that? I would keep in mind that dataset descriptions may become large cause users want to include statistics, summaries, etc. Since it is cumbersome to send HTTP cache headers upon specific queries (DESCRIBE DATASET), it may be better to just return a voiD dataset URI which can be retrieved (or not if it hasn't changed). > Can you explain why returning a dataset description as SPARQL results > would be better than returning it as RDF? I ment SPARQL DESCRIBE results, which is RDF (not XML results). >> For query federation, it would be very useful if the future SPARQL REC >> supports BINDINGS such as introduced by Eric [2] before. My proposal >> works with a set of bindings with a special "null" keyword for unbound >> variables, e.g.: > ... >> It is not much effort for implementers and a federated query processor >> can then process pipelined blocks of queries more efficiently. > Unfortunately, this won't be part of the next SPARQL version, but > service descriptions should allow any implementations to declare that > they support such an extension. Well, no good news but I understand. Can I find some chat log about that? Just would like to get a picture of the reasons apart from lack of time (it's a fairly easy feature and simple to implement). The main bottleneck for large scale query federation is lack of statistics anyway. But these can be generated periodically remotely. If we add support for initial bindings to the SPARQL spec it would be much better than advertise it as a feature, nobody will do that (lack of incentives), and thus, impossible to do large scale query federation in the end. Are there any chances to still talk about it? Regards, AndyL |
|
|
Re: RDF dataset/SPARQL endpoint descriptionsOn Sep 11, 2009, at 6:47 AM, Andreas Langegger wrote:
> Hi Greg, > > Gregory Williams wrote: >> It looks like a "DESCRIBE DATASET/SERVICE" won't be the path taken, >> as there are some concerns about this operating at the query engine >> level, when it's really a protocol operation. The exact method for >> how to do it hasn't been nailed down yet, but some of the options >> under discussion are: > > I also had some concerns when changing the grammar, because parts of > a description may be "protocol"-specific. Why should a non-HTTP > query engine (e.g. RDF store) provide SPARQL endpoint descriptions > that are only relevant when using the HTTP protocol? I'll explain > what I've experienced. > > Many features are query engine specific (e.g. fulltext seach, query > language, initial bindings, etc.) and should be announced in a query > engine specific way and not via the SPARQL HTTP protocol. > > I'd advice not to announce them as OPTIONS/X-Headers/etc. since they > are relevant to any client using the query engine even without a > HTTP endpoint (e.g. via in-process API, ODBC/Virtuoso, etc.) - Can > you give me a reason why HTTP OPTIONS/X-Headers makes more sense? There are two issues here. The first is the argument for why it should be a protocol level feature and not a query engine feature. In general, many endpoints may use the same query engine, and it will be easier for protocol-level code to discover the features of the underlying endpoint for inclusion in a service description than for the query engine to discover which endpoint has called it. For non-http access to a SPARQL engine, the thinking is that there can be specific API calls for discovering service descriptions, depending on the protocol used. I'm not familiar enough with Virtuoso to know what that would look like, but a general "in-process API" can presumably have an implementation specific call(s) for service description. >> - an HTTP response header linking to a service description document >> - the use of the HTTP OPTIONS verb on the endpoint URI >> - using content negotiation on the endpoint URI to request RDF (or >> possibly having the endpoint URI return RDFa) >> - a new protocol operation (/sparql?serviceDescription) > > I would also prefer an approach which allows to query endpoint > descriptions with sub queries or FROM <uri>. It would allow clients > to read descriptions without the need for parsing them, they may not > have a SPARQL engine themselves. That won't be possible with HTTP > OPTIONS, nor with X-Headers, I'd prefer a new protocol operation, as > you suggested ?serviceDescription or maybe just ?desc. Agreed that this is a desirable feature. I have the same worries about HTTP OPTIONS, but a header-based method will give you back a URI that could presumably be used in a FROM clause. Again, much of this is still under discussion, but the group is aware of these concerns. > I think there are many non-LOD applications using SPARQL without > HTTP. They should also be able to check out if a query engine > supports full-text search, etc.! Again, if this is via an in-process API, this is something that will either be known ahead of time or could be discovered without needing to deal with an RDF-based service description. > Dataset descriptions such as voiD are not protocol specific either. > They exclusively relate to the dataset served. Why not provide such > meta data in a more generic form than via HTTP SPARQL? A DESCRIBE > DATASET would really make sense. If the query engine has no such > information it would return an empty model, which would be more than > correct. Are there any concerns with that? Again, it's probably easier for the endpoint to know where to find statistics about a dataset than for the engine to do it. I can imagine implementations for which this would be relatively simple at the engine level, but I suspect that's not the general case. > I would keep in mind that dataset descriptions may become large > cause users want to include statistics, summaries, etc. Since it is > cumbersome to send HTTP cache headers upon specific queries > (DESCRIBE DATASET), it may be better to just return a voiD dataset > URI which can be retrieved (or not if it hasn't changed). Understood, and I think we'll be discussing this. The same argument could also be made for the service description as well. >>> For query federation, it would be very useful if the future SPARQL >>> REC supports BINDINGS such as introduced by Eric [2] before. My >>> proposal works with a set of bindings with a special "null" >>> keyword for unbound variables, e.g.: >> ... >>> It is not much effort for implementers and a federated query >>> processor can then process pipelined blocks of queries more >>> efficiently. > >> Unfortunately, this won't be part of the next SPARQL version, but >> service descriptions should allow any implementations to declare >> that they support such an extension. > > Well, no good news but I understand. Can I find some chat log about > that? Just would like to get a picture of the reasons apart from > lack of time (it's a fairly easy feature and simple to implement). This was briefly discussed in [1] in the context of the Parameters feature[2], but I think it came down to time constraints, more important features, and the lack of existing implementations of this feature. > The main bottleneck for large scale query federation is lack of > statistics anyway. But these can be generated periodically remotely. > If we add support for initial bindings to the SPARQL spec it would > be much better than advertise it as a feature, nobody will do that > (lack of incentives), and thus, impossible to do large scale query > federation in the end. I'm not convinced of this. If it's a compelling extension, getting implementations to support it isn't impossible. It just didn't seem as ready for standardization as other features. .greg [1] http://lists.w3.org/Archives/Public/public-rdf-dawg/2009JanMar/0128.html [2] http://www.w3.org/2009/sparql/wiki/Feature:Parameters |
|
|
Re: RDF dataset/SPARQL endpoint descriptionsHello again,
sorry for my late reply. Will there be a dedicated meeting at ISWC to discuss about future SPARQL? Regards, AndyL > There are two issues here. The first is the argument for why it > should be a protocol level feature and not a query engine feature. > In general, many endpoints may use the same query engine, and it > will be easier for protocol-level code to discover the features of > the underlying endpoint for inclusion in a service description than > for the query engine to discover which endpoint has called it. > > For non-http access to a SPARQL engine, the thinking is that there > can be specific API calls for discovering service descriptions, > depending on the protocol used. I'm not familiar enough with > Virtuoso to know what that would look like, but a general "in- > process API" can presumably have an implementation specific call(s) > for service description. > > >>> - an HTTP response header linking to a service description document >>> - the use of the HTTP OPTIONS verb on the endpoint URI >>> - using content negotiation on the endpoint URI to request RDF (or >>> possibly having the endpoint URI return RDFa) >>> - a new protocol operation (/sparql?serviceDescription) >> >> I would also prefer an approach which allows to query endpoint >> descriptions with sub queries or FROM <uri>. It would allow clients >> to read descriptions without the need for parsing them, they may >> not have a SPARQL engine themselves. That won't be possible with >> HTTP OPTIONS, nor with X-Headers, I'd prefer a new protocol >> operation, as you suggested ?serviceDescription or maybe just ?desc. > > Agreed that this is a desirable feature. I have the same worries > about HTTP OPTIONS, but a header-based method will give you back a > URI that could presumably be used in a FROM clause. Again, much of > this is still under discussion, but the group is aware of these > concerns. > >> I think there are many non-LOD applications using SPARQL without >> HTTP. They should also be able to check out if a query engine >> supports full-text search, etc.! > > Again, if this is via an in-process API, this is something that will > either be known ahead of time or could be discovered without needing > to deal with an RDF-based service description. > >> Dataset descriptions such as voiD are not protocol specific either. >> They exclusively relate to the dataset served. Why not provide such >> meta data in a more generic form than via HTTP SPARQL? A DESCRIBE >> DATASET would really make sense. If the query engine has no such >> information it would return an empty model, which would be more >> than correct. Are there any concerns with that? > > Again, it's probably easier for the endpoint to know where to find > statistics about a dataset than for the engine to do it. I can > imagine implementations for which this would be relatively simple at > the engine level, but I suspect that's not the general case. > >> I would keep in mind that dataset descriptions may become large >> cause users want to include statistics, summaries, etc. Since it is >> cumbersome to send HTTP cache headers upon specific queries >> (DESCRIBE DATASET), it may be better to just return a voiD dataset >> URI which can be retrieved (or not if it hasn't changed). > > Understood, and I think we'll be discussing this. The same argument > could also be made for the service description as well. > >>>> For query federation, it would be very useful if the future >>>> SPARQL REC supports BINDINGS such as introduced by Eric [2] >>>> before. My proposal works with a set of bindings with a special >>>> "null" keyword for unbound variables, e.g.: >>> ... >>>> It is not much effort for implementers and a federated query >>>> processor can then process pipelined blocks of queries more >>>> efficiently. >> >>> Unfortunately, this won't be part of the next SPARQL version, but >>> service descriptions should allow any implementations to declare >>> that they support such an extension. >> >> Well, no good news but I understand. Can I find some chat log about >> that? Just would like to get a picture of the reasons apart from >> lack of time (it's a fairly easy feature and simple to implement). > > This was briefly discussed in [1] in the context of the Parameters > feature[2], but I think it came down to time constraints, more > important features, and the lack of existing implementations of this > feature. > >> The main bottleneck for large scale query federation is lack of >> statistics anyway. But these can be generated periodically >> remotely. If we add support for initial bindings to the SPARQL spec >> it would be much better than advertise it as a feature, nobody will >> do that (lack of incentives), and thus, impossible to do large >> scale query federation in the end. > > > I'm not convinced of this. If it's a compelling extension, getting > implementations to support it isn't impossible. It just didn't seem > as ready for standardization as other features. > > .greg > > [1] http://lists.w3.org/Archives/Public/public-rdf-dawg/2009JanMar/0128.html > [2] http://www.w3.org/2009/sparql/wiki/Feature:Parameters > > http://www.langegger.at ---------------------------------------------------------------------- Dipl.-Ing.(FH) Andreas Langegger FAW - Institute for Application-oriented Knowledge Processing Johannes Kepler University Linz A-4040 Linz, Altenberger Straße 69 |
| Free embeddable forum powered by Nabble | Forum Help |