Free-text search and SPARQL New Features and Rationale draft

View: New views
4 Messages — Rating Filter:   Alert me  

Free-text search and SPARQL New Features and Rationale draft

by Chris Bizer-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.

Dear all,

 

I really like the SPARQL New Features draft as it outlines many very useful and down to earth features that were missing in the first version of the language.

 

One question about

http://www.w3.org/TR/2009/WD-sparql-features-20090702/#Commonly_used_functions

 

How are the chances that one of these functions will be free-text search?

 

Most web-applications today use some kind of free-text search; most facet browsers as well as most (all?) Semantic Web search engines use free-text search to enable the user to specify starting points for further navigation.

 

Many SPARQL stores already implement free-text indexing.

 

Today, people have to use dirty hacks like FILTER regex(?label, "%word1%") to emulate free text search.

 

I therefore think that it would be great if you would foster the interoperability between SPARQL stores by  including free-text search into the spec.

 

Kind regards,

 

Chris

 


Re: Free-text search and SPARQL New Features and Rationale draft

by Lee Feigenbaum-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Chris,

Thanks for the feedback.

The Working Group did seriously & carefully consider free-text search as
a feature for this iteration of standards, but in the end decided
against it. You can see some of the discussion surrounding it in a few
places:

* proposal to work on full-text search:
http://www.w3.org/2009/sparql/wiki/Feature:FullText

* discussion on 4-21 teleconference showed decent support
http://www.w3.org/2009/sparql/meeting/2009-04-21#FullText

* discussion at the first F2F
http://www.w3.org/2009/sparql/meeting/2009-05-06#Full__2d_text_search

The end result is that while many in the group agree with you (myself
included), there was enough concern about the challenge of specifying it
and the cost of implementing it and the relative priority with the
things the group did adopt that it ended up falling (just) short of the
mark.

best,
Lee

Chris Bizer wrote:

> Dear all,
>
>  
>
> I really like the SPARQL New Features draft as it outlines many very
> useful and down to earth features that were missing in the first version
> of the language.
>
>  
>
> One question about
>
> http://www.w3.org/TR/2009/WD-sparql-features-20090702/#Commonly_used_functions
>
>  
>
> How are the chances that one of these functions will be free-text search?
>
>  
>
> Most web-applications today use some kind of free-text search; most
> facet browsers as well as most (all?) Semantic Web search engines use
> free-text search to enable the user to specify starting points for
> further navigation.
>
>  
>
> Many SPARQL stores already implement free-text indexing.
>
>  
>
> Today, people have to use dirty hacks like FILTER regex(?label,
> "%word1%") to emulate free text search.
>
>  
>
> I therefore think that it would be great if you would foster the
> interoperability between SPARQL stores by  including free-text search
> into the spec.
>
>  
>
> Kind regards,
>
>  
>
> Chris
>
>  
>


Re: Free-text search and SPARQL New Features and Rationale draft

by Andreas Langegger :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

I hope that there will be some final decision allowing full text  
search to be provided optionally and published
by the endpoint in terms of its feature description [1]. I think, many  
implementations will provide Lucene or other
engines out of the box, others (smaller ones) don't have to in order  
to be REC-compliant.

Since "full text search" is not "full text search" I can understand  
that position. You can configure Lucene etc. in many ways (Analyzers/
Tokenizer/Stemming/Synonyms/Cases/... both at index and query time).
But I would'nt compare regex to full text search, there is a major  
difference (there is an index and you can do fuzzy searches). I would  
only standardize the way how "full text search" is announced in the  
endpoint description. I'm sure most endpoints will provide fulltext  
search in the end.

Regards,
AndyL

[1] http://www.w3.org/2009/sparql/wiki/Feature:ServiceDescriptions

On Jul 3, 2009, at 5:22 PM, Lee Feigenbaum wrote:

> Hi Chris,
>
> Thanks for the feedback.
>
> The Working Group did seriously & carefully consider free-text  
> search as a feature for this iteration of standards, but in the end  
> decided against it. You can see some of the discussion surrounding  
> it in a few places:
>
> * proposal to work on full-text search: http://www.w3.org/2009/sparql/wiki/Feature:FullText
>
> * discussion on 4-21 teleconference showed decent support http://www.w3.org/2009/sparql/meeting/2009-04-21#FullText
>
> * discussion at the first F2F http://www.w3.org/2009/sparql/meeting/2009-05-06#Full__2d_text_search
>
> The end result is that while many in the group agree with you  
> (myself included), there was enough concern about the challenge of  
> specifying it and the cost of implementing it and the relative  
> priority with the things the group did adopt that it ended up  
> falling (just) short of the mark.
>
> best,
> Lee
>
> Chris Bizer wrote:
>> Dear all,
>> I really like the SPARQL New Features draft as it outlines many  
>> very useful and down to earth features that were missing in the  
>> first version of the language.
>> One question about
>> http://www.w3.org/TR/2009/WD-sparql-features-20090702/#Commonly_used_functions
>> How are the chances that one of these functions will be free-text  
>> search?
>> Most web-applications today use some kind of free-text search; most  
>> facet browsers as well as most (all?) Semantic Web search engines  
>> use free-text search to enable the user to specify starting points  
>> for further navigation.
>> Many SPARQL stores already implement free-text indexing.
>> Today, people have to use dirty hacks like FILTER regex(?label,  
>> "%word1%") to emulate free text search.
>> I therefore think that it would be great if you would foster the  
>> interoperability between SPARQL stores by  including free-text  
>> search into the spec.
>> Kind regards,
>> Chris
>>
>


http://www.langegger.at
----------------------------------------------------------------------
Dipl.-Ing.(FH) Andreas Langegger
FAW - Institute for Application-oriented Knowledge Processing
Johannes Kepler University Linz
A-4040 Linz, Altenberger Straße 69








Re: Free-text search and SPARQL New Features and Rationale draft

by Kjetil Kjernsmo-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Chris,

On Friday 03 July 2009 17:05:03 Chris Bizer wrote:
> How are the chances that one of these functions will be free-text search?

I am afraid they are very slim at this point, as Lee said, the WG gave it
careful consideration and it fell just outside. I was the main champion of
free-text search in the working group, and we spend quite a lot of time on it
on the face-to-face, where I defended it violently,  to the extent that I
ended up attacking the OWL entailment feature (which I certainly see as
useful), as I figured the only way to get freetext in was that OWL Entailment
had to go out of the time-permitting list.

Now, I think that the overall progress of the working group is important, so
we will not raise a formal objection over this matter, but if the community
at large decides to cry "what were you thinking?", I will be sympathetic to
their voices. :-) Myself, I regard it a lost battle for now.

>Today, people have to use dirty hacks like FILTER regex(?label, "%word1%")
>to emulate free text search.

Indeed. Several different approaches were discussed, including XPath/XQuery
freetext, which the group felt were overkill for us.

In an attempt to make the requirements more manageable, I suggested that we
only support the typical website "search box", i.e. a freetext search that
consists of a few words, that may or may not be truncated, may or may not be
combined with AND and OR. The WG noted that these requirements could all be
met by the hacks you described above, and rather than introducing a possibly
large and risky feature, one should instead use the freetext indexing engine
to optimize certain regexp queries. I have allready posted a feature request
for this in Virtuoso:
https://sourceforge.net/tracker/?func=detail&aid=2796431&group_id=161622&atid=820577

Also, we have noted that the main cost of migrating from one SPARQL backend to
another was the way freetext search is dealt with in different systems. This
is a problem for SPARQL.

So, this is where it stands from my perspective.

Kind regards

Kjetil Kjernsmo
--
Senior Knowledge Engineer / SPARQL F&R Editor
Mobile: +47 986 48 234
Email: kjetil.kjernsmo@...  
Web: http://www.computas.com/

|  SHARE YOUR KNOWLEDGE  |

Computas AS  PO Box 482, N-1327 Lysaker | Phone:+47 6783 1000 | Fax:+47 6783
1001