[ANN] Searchable Plugin 0.4.1 released

View: New views
11 Messages — Rating Filter:   Alert me  

[ANN] Searchable Plugin 0.4.1 released

by Maurice Nicholson-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Grails Searchable Plugin 0.4.1 is released!

"""
The Searchable Plugin aims to provide rich search features to Grails applications with minimum effort, and still give you power and flexibility when you need it.

It is built on the fantastic Compass Search Engine Framework and Lucene and has the same license as Grails (Apache 2).
"""

This is a maintenance release that fixes a few bugs - see JIRA for details: http://jira.codehaus.org/secure/ReleaseNote.jspa?projectId=11450&styleName=Html&version=14142  

It bundles a patched version of Compass 1.2.2 specifically for GRAILSPLUGINS-254. I will give the code to the Compass project and hope they will include it in future versions of Compass.

The next version of the plugin will be 0.5 and means upgrading to Compass 2.0 which has some excellent new features and improvements that we can take advantage of.

Compass 2.0 is natively Java 5 only, with a retroweaver version for 1.4 jvms. I reckon I will do the same thing and provide two versions of the plugin - Java 5 by default and a separate plugin called "searchable-jdk-14".

Thanks to everybody who has said nice things, raised issues, answered mailing lists posts and blogged about the plugin :-)

Cheers,
Maurice


Re: [ANN] Searchable Plugin 0.4.1 released

by bredo :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Wicked! Thanks dude, 272 fixes an issue that came up last week, one that
I was ignoring and hoping would go away magically and it has! :)

Maurice Nicholson wrote:

> Grails Searchable Plugin 0.4.1 is released!
>
> """
> The Searchable Plugin aims to provide rich search features to Grails
> applications with minimum effort, and still give you power and
> flexibility when you need it.
>
> It is built on the fantastic Compass Search Engine Framework and
> Lucene and has the same license as Grails (Apache 2).
> """
>
> This is a maintenance release that fixes a few bugs - see JIRA for
> details:
> http://jira.codehaus.org/secure/ReleaseNote.jspa?projectId=11450&styleName=Html&version=14142 
> <http://jira.codehaus.org/secure/ReleaseNote.jspa?projectId=11450&styleName=Html&version=14142>  
>
>
> It bundles a patched version of Compass 1.2.2 specifically for
> GRAILSPLUGINS-254. I will give the code to the Compass project and
> hope they will include it in future versions of Compass.
>
> The next version of the plugin will be 0.5 and means upgrading to
> Compass 2.0 which has some excellent new features and improvements
> that we can take advantage of.
>
> Compass 2.0 is natively Java 5 only, with a retroweaver version for
> 1.4 jvms. I reckon I will do the same thing and provide two versions
> of the plugin - Java 5 by default and a separate plugin called
> "searchable-jdk-14".
>
> Thanks to everybody who has said nice things, raised issues, answered
> mailing lists posts and blogged about the plugin :-)
>
> Cheers,
> Maurice
>

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: [ANN] Searchable Plugin 0.4.1 released

by Sey :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks a bunch, Seachable is the best plugins for Grails!


Re: [ANN] Searchable Plugin 0.4.1 released

by Fox Woo :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

great work! thanks

2008/4/17, Seymour Cakes <seymores@...>:
Thanks a bunch, Seachable is the best plugins for Grails!




--
爱生活,爱FOX

Re: [ANN] Searchable Plugin 0.4.1 released

by Dustin Whitney :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

A couple of questions I didn't see in the documentation:  is writing to the index thread safe?  (I'd imagine the answer must be yes)  and what strategies do you use for using this in a clustered environment?  Put the index on a shared drive?

Thanks
Dustin

On Thu, Apr 17, 2008 at 10:27 AM, Fox Woo <foxwu718@...> wrote:
great work! thanks

2008/4/17, Seymour Cakes <seymores@...>:
Thanks a bunch, Seachable is the best plugins for Grails!




--
爱生活,爱FOX


Re: [ANN] Searchable Plugin 0.4.1 released

by dahernan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Take a look at compass doc

http://www.compass-project.org/docs/1.2.2/reference/html/core-connection.html

Maybe for cluster, the best approach is a jdbc connection


On 17/04/2008, Dustin Whitney <dustin.whitney@...> wrote:

> A couple of questions I didn't see in the documentation:  is writing to the
> index thread safe?  (I'd imagine the answer must be yes)  and what
> strategies do you use for using this in a clustered environment?  Put the
> index on a shared drive?
>
> Thanks
> Dustin
>
>
> On Thu, Apr 17, 2008 at 10:27 AM, Fox Woo <foxwu718@...> wrote:
> > great work! thanks
> >
> >
> > 2008/4/17, Seymour Cakes <seymores@...>:
> >
> > > Thanks a bunch, Seachable is the best plugins for Grails!
> > >
> > >
> >
> >
> >
> > --
> > 爱生活,爱FOX
>
>

Re: [ANN] Searchable Plugin 0.4.1 released

by Barzilai Spinak-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Maurice.
I'm glad to tell you that the two main bugs I had with previous
versions, now seem to be fixed! (the NPE when cascade-saving, and some
other errors with component references).

The other thing was about termFreq, which I thought it had a bug. Maybe
it's not a bug after all, but some misunderstanding on my part, or it's
not clearly explained in the docs, or it's a bug :-)

Let me explain.

When calling SomeClass.termFreqs('someTerm'), I thought the resulting
number would be "the number of occurrences of someTerm within instances
SomeClass".

For example, if I had:
(new Album(title:'yeah yeah yeah')).save()
(new Album(title:'Just say yeah')).save()

and then I query: Album.termFreqs('yeah'), I would get a count of 4
However, what I seem to be getting is "the number of Album instances
that have the term 'yeah' in any of their indexable properties".

So... maybe this is the intended behaviour... maybe not... in any case,
I think that 1) it should be more explicitly explained, 2) a *real* term
frequencies, with respect to terms should be added.
Like for example (completely made up example), if I had a Book class,
which hasMany Paragraph, and 'm storing the text in the Paragraph. And
I'm doing some text analysis, wanting to know how many times a certain
term appears in the Book. I don't want the count of paragraphs that
contain that word, I want the actual number of occurrences of that word.

Thinking a little more, maybe this behaviour is a "feature" of Compass?


BarZ

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Re: [ANN] Searchable Plugin 0.4.1 released

by Maurice Nicholson-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hey Barz,

term frequences (SomeClass.termFreqs) gives you a list of term+frequency pairs, in other words, a list of terms and their respective frequencies.

Personally I think the documentation is pretty clear:

http://grails.org/Searchable+Plugin+-+Searching#SearchablePlugin-Searching-termFreqs

and the here's the first example from that section:

// print all Book term frequencies
def termFreqs = Book.termFreqs()
termFreqs.each {
println "${it.term} occurs ${it.freq} times in the index for Book instances"
}

Anyway, I think the feature you describe makes sense, but it can be achieved now, if not especially optimised, by simply hunting for the term in the term-freqs, eg:

Book.termFreqs.find { it.term == 'marmalade' }.freqs

The information exists in the index, so it could be exposed in a simpler fashion, but is it required? Term-freqs are an advanced topic (IMHO) and I wonder how many people will use this feature?

The other point is that the term frequency currently provided is the frequency of a term over the whole index, not just a single Book instance! Again the information is in the index on a per Lucene document (Book instance) basis, it's just a question of exposing it.

As you said these are features that make sense in Compass itself so I think they are questions for the Compass forum.

Cheers,
Maurice

On 21/04/2008, Barzilai Spinak <barcho@...> wrote:
Hi Maurice.
I'm glad to tell you that the two main bugs I had with previous
versions, now seem to be fixed! (the NPE when cascade-saving, and some
other errors with component references).

The other thing was about termFreq, which I thought it had a bug. Maybe
it's not a bug after all, but some misunderstanding on my part, or it's
not clearly explained in the docs, or it's a bug :-)

Let me explain.

When calling SomeClass.termFreqs('someTerm'), I thought the resulting
number would be "the number of occurrences of someTerm within instances
SomeClass".

For example, if I had:
(new Album(title:'yeah yeah yeah')).save()
(new Album(title:'Just say yeah')).save()

and then I query: Album.termFreqs('yeah'), I would get a count of 4
However, what I seem to be getting is "the number of Album instances
that have the term 'yeah' in any of their indexable properties".

So... maybe this is the intended behaviour... maybe not... in any case,
I think that 1) it should be more explicitly explained, 2) a *real* term
frequencies, with respect to terms should be added.
Like for example (completely made up example), if I had a Book class,
which hasMany Paragraph, and 'm storing the text in the Paragraph. And
I'm doing some text analysis, wanting to know how many times a certain
term appears in the Book. I don't want the count of paragraphs that
contain that word, I want the actual number of occurrences of that word.

Thinking a little more, maybe this behaviour is a "feature" of Compass?


BarZ


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email




Re: Re: [ANN] Searchable Plugin 0.4.1 released

by Maurice Nicholson-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
[Reposting this because it hasn't turned up on Nabble 15 hours later - problems with Nabble?]

Hey Barz,

term frequences (SomeClass.termFreqs) gives you a list of term+frequency pairs, in other words, a list of terms and their respective frequencies.

Personally I think the documentation is pretty clear:

http://grails.org/Searchable+Plugin+-+Searching#SearchablePlugin-Searching-termFreqs

and the here's the first example from that section:

// print all Book term frequencies
def termFreqs = Book.termFreqs()
termFreqs.each {
println "${it.term} occurs ${it.freq} times in the index for Book instances"

}

Anyway, I think the feature you describe makes sense, but it can be achieved now, if not especially optimised, by simply hunting for the term in the term-freqs, eg:

Book.termFreqs.find { it.term == 'marmalade' }.freqs

The information exists in the index, so it could be exposed in a simpler fashion, but is it required? Term-freqs are an advanced topic (IMHO) and I wonder how many people will use this feature?

The other point is that the term frequency currently provided is the frequency of a term over the whole index, not just a single Book instance! Again the information is in the index on a per Lucene document (Book instance) basis, it's just a question of exposing it.

As you said these are features that make sense in Compass itself so I think they are questions for the Compass forum.

Cheers,
Maurice

On 21/04/2008, Barzilai Spinak <barcho@...> wrote:
Hi Maurice.
I'm glad to tell you that the two main bugs I had with previous
versions, now seem to be fixed! (the NPE when cascade-saving, and some
other errors with component references).

The other thing was about termFreq, which I thought it had a bug. Maybe
it's not a bug after all, but some misunderstanding on my part, or it's
not clearly explained in the docs, or it's a bug :-)

Let me explain.

When calling SomeClass.termFreqs('someTerm'), I thought the resulting
number would be "the number of occurrences of someTerm within instances
SomeClass".

For example, if I had:
(new Album(title:'yeah yeah yeah')).save()
(new Album(title:'Just say yeah')).save()

and then I query: Album.termFreqs('yeah'), I would get a count of 4
However, what I seem to be getting is "the number of Album instances
that have the term 'yeah' in any of their indexable properties".

So... maybe this is the intended behaviour... maybe not... in any case,
I think that 1) it should be more explicitly explained, 2) a *real* term
frequencies, with respect to terms should be added.
Like for example (completely made up example), if I had a Book class,
which hasMany Paragraph, and 'm storing the text in the Paragraph. And
I'm doing some text analysis, wanting to know how many times a certain
term appears in the Book. I don't want the count of paragraphs that
contain that word, I want the actual number of occurrences of that word.

Thinking a little more, maybe this behaviour is a "feature" of Compass?


BarZ


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email




Re: Re: [ANN] Searchable Plugin 0.4.1 released

by Barzilai Spinak-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

It got through the first time :-)

Maurice Nicholson wrote:

> [Reposting this because it hasn't turned up on Nabble 15 hours later -
> problems with Nabble?]
>
> Hey Barz,
>
> term frequences (SomeClass.termFreqs) gives you a list of
> term+frequency pairs, in other words, a list of terms and their
> respective frequencies.
>
> Personally I think the documentation is pretty clear:
>
> http://grails.org/Searchable+Plugin+-+Searching#SearchablePlugin-Searching-termFreqs

I think we are using a different definition of term frequency. To me
it's the number of occurrences of the *term*. However, the termFreqs
method is returning the number of *documents* (instances of domain
classes, in Grails) where the term occurs, disregarding the occurrences
of the term itself.

Let me simplify my previous example. Let's imagine there's a single
instance of Paragraph in our DB/index:
p= new Paragraph(text: "Hello John, my name is John and this is my
friend John")
p.save()

Now, according to my definition, the frequency of the term "John" is 3
According to Paragraph.termFreqs(), it's 1  (because there's only one
domain object where the term John appears, disregarding the fact that it
appears 3 times)

Of course, when searching, a Paragraph object/document where the term
"John" appears three times will rank higher than a Paragraph where it
appears only once. So, of course, this information is stored somewhere
in the index. (Last night I spent some time working with Luke which is
amazing and fun :-)  )

I'm not in immediate need of this feature, I'm just being picky while I
learn :-)
I'll dig around Compass and Lucene a little more and see what I can find.


On a completely unrelated, but more important note:
   How would you describe the query performance of Compass/Lucene versus
searching in the relational database using normal GORM/HSQL?

BarZ

> Anyway, I think the feature you describe makes sense, but it can be
> achieved now, if not especially optimised, by simply hunting for the
> term in the term-freqs, eg:
>
> Book.termFreqs.find { it.term == 'marmalade' }.freqs
>
> The information exists in the index, so it could be exposed in a
> simpler fashion, but is it required? Term-freqs are an advanced topic
> (IMHO) and I wonder how many people will use this feature?
>
> The other point is that the term frequency currently provided is the
> frequency of a term over the whole index, not just a single Book
> instance! Again the information is in the index on a per Lucene
> document (Book instance) basis, it's just a question of exposing it.
>
> As you said these are features that make sense in Compass itself so I
> think they are questions for the Compass forum.
>
> Cheers,
> Maurice
>
> On 21/04/2008, *Barzilai Spinak* <barcho@...
> <mailto:barcho@...>> wrote:
>
>     Hi Maurice.
>     I'm glad to tell you that the two main bugs I had with previous
>     versions, now seem to be fixed! (the NPE when cascade-saving, and some
>     other errors with component references).
>
>     The other thing was about termFreq, which I thought it had a bug.
>     Maybe
>     it's not a bug after all, but some misunderstanding on my part, or
>     it's
>     not clearly explained in the docs, or it's a bug :-)
>
>     Let me explain.
>
>     When calling SomeClass.termFreqs('someTerm'), I thought the resulting
>     number would be "the number of occurrences of someTerm within
>     instances
>     SomeClass".
>
>     For example, if I had:
>     (new Album(title:'yeah yeah yeah')).save()
>     (new Album(title:'Just say yeah')).save()
>
>     and then I query: Album.termFreqs('yeah'), I would get a count of 4
>     However, what I seem to be getting is "the number of Album instances
>     that have the term 'yeah' in any of their indexable properties".
>
>     So... maybe this is the intended behaviour... maybe not... in any
>     case,
>     I think that 1) it should be more explicitly explained, 2) a
>     *real* term
>     frequencies, with respect to terms should be added.
>     Like for example (completely made up example), if I had a Book class,
>     which hasMany Paragraph, and 'm storing the text in the Paragraph. And
>     I'm doing some text analysis, wanting to know how many times a certain
>     term appears in the Book. I don't want the count of paragraphs that
>     contain that word, I want the actual number of occurrences of that
>     word.
>
>     Thinking a little more, maybe this behaviour is a "feature" of
>     Compass?
>
>
>     BarZ
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe from this list, please visit:
>
>         http://xircles.codehaus.org/manage_email
>
>
>


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Re: [ANN] Searchable Plugin 0.4.1 released

by Ted Dunning-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message




On 4/22/08 5:05 PM, "Barzilai Spinak" <barcho@...> wrote:
 
> I think we are using a different definition of term frequency. To me
> it's the number of occurrences of the *term*. However, the termFreqs
> method is returning the number of *documents* (instances of domain
> classes, in Grails) where the term occurs, disregarding the occurrences
> of the term itself.

Your definition is relatively natural, but it against common practice in
text retrieval.  Experiments on retrieval performance have generally borne
out the value of the "document count" definition over the "word count"
definition that you suggest.  This probably has much to do with the average
size of the documents under test interacting with the fact that you want to
weight terms based on the prevailing frequency without much contribution
from documents that are particularly related to the term.

> Of course, when searching, a Paragraph object/document where the term
> "John" appears three times will rank higher than a Paragraph where it
> appears only once. So, of course, this information is stored somewhere
> in the index.

Only indirectly.  There is a per term weight vector stored on each document,
but the weights don't only depend on the number of occurrences of that term.
The details vary depending on how you index the document.  Some details are
available in the javadoc for Lucene's Similarity function.

> On a completely unrelated, but more important note:
>    How would you describe the query performance of Compass/Lucene versus
> searching in the relational database using normal GORM/HSQL?

For what it does, it is vastly faster.  If you want semi-structured data,
take Lucene.  If you want the best few elements of a ranked list (ranked
according to a Lucene computable score), choose Lucene.  If you want joins,
aggregates and referential integrity pick the RDBMS.



---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email