Ideas for simplifying solr-ruby and making it better

View: New views
9 Messages — Rating Filter:   Alert me  

Ideas for simplifying solr-ruby and making it better

by goodieboy :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm trying gather some ideas for how solr-ruby's code can be simplified and
better. For example, a lot of the classes are just extending a base class as
a placeholder, and not really doing anything. Some of them extend a base
class and set one option; the request and response modules have a lot of
this going on. Another thing I'm thinking could be cleaned up, simplified or
even made dynamic is the field mapping; and it'd be nice to permit
arbitrary/un-mapped params to be passed in too. Some of the code doesn't
seem all that rubyish, and my feeling is that there are lots of places where
things could be made simpler.

Do any of you have ideas or things that you've disliked about solr-ruby? If
so, please say so! I've got all kinds of ideas I'd like to implement and
crank out, but for now I want to see what other people are thinking.

Matt

Re: Ideas for simplifying solr-ruby and making it better

by Koji Sekiguchi-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Matt,

 > Do any of you have ideas or things that you've disliked about
solr-ruby? If
 > so, please say so! I've got all kinds of ideas I'd like to implement and
 > crank out, but for now I want to see what other people are thinking.

I don't have a concrete idea of making it better, but I agree with you.
Do you have the idea? let's discuss it to be things more rubyish.

see also:
http://wiki.apache.org/solr/solr-ruby/BrainStorming

Koji


Re: Ideas for simplifying solr-ruby and making it better

by goodieboy :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hey Koji,

Yeah I've seen that page before. I'd love to see solr-ruby get to that
point!

I wonder if starting from the top down would be a good way to approach this
discussion. Like talking about the public API then talk about the code
underneath to support it, then refactoring etc.. So even before discussing
something request/response "placeholder" classes problem, I'll just express
some of the API things that I've always wanted and/or disliked ;)

* more rich and customizable document model:

  documents.each do {|doc| puts d.class==MyCustomDocClass}

* more rich and customizable facet model:

  response.facets.each do |facet|
    facet.field
    facet.hits
  end

* set document class type dynamically within result set before iterating:

  documents.assign_doc_class do |raw_doc|
    return Models::CD if doc[:format_type]=='CD'
    return Models::StandardDoc
  end

  # this would be nice because sometimes we're iterating through a result
set with entirely different "types" of docs.

* document field method accessors
    doc.id (or at least doc[:id])
    # instead of
    doc['id']

* pagination aware result sets (documents and facets)

    documents.total_pages # etc.
    response.facets.has_next?

* ability to pass in arbitrary query fields directly to solr without
worrying about solr-ruby raising an error

* ability to bypass query field mapping completely while querying

* flatten :facets mapping so that:

  :facets=>{:fields=>[]}
  # becomes
  :facet_fields=>[]

* ability to query a custom :query_type and NOT having to create a custom
request/response class pair

Those things are pretty easy to implement. I'd imagine that if solr-ruby has
a solid API, and a simpler code base it'd also be pretty easy to implement
some of the ORM-like features included on the wiki page and even a more DSL
like approach to regular querying:

response = connectiion.search do |q|
  q.per_page 10
  q.query 'twain'
  q.filter_query :title, 'finn'
  q.facet_fields :title, :author
  q.query_field :title, 0.5
end

What do you think?

Matt

On Sun, Sep 28, 2008 at 6:22 PM, Koji Sekiguchi <koji@...> wrote:

> Matt,
>
> > Do any of you have ideas or things that you've disliked about solr-ruby?
> If
> > so, please say so! I've got all kinds of ideas I'd like to implement and
> > crank out, but for now I want to see what other people are thinking.
>
> I don't have a concrete idea of making it better, but I agree with you.
> Do you have the idea? let's discuss it to be things more rubyish.
>
> see also:
> http://wiki.apache.org/solr/solr-ruby/BrainStorming
>
> Koji
>
>

Re: Ideas for simplifying solr-ruby and making it better

by Jamie Orchard-Hays-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Matt, you are the nexus of XTF and Solr. :-)


On Sep 29, 2008, at 9:37 AM, Matt Mitchell wrote:

> Hey Koji,
>
> Yeah I've seen that page before. I'd love to see solr-ruby get to that
> point!
>
> I wonder if starting from the top down would be a good way to  
> approach this
> discussion. Like talking about the public API then talk about the code
> underneath to support it, then refactoring etc.. So even before  
> discussing
> something request/response "placeholder" classes problem, I'll just  
> express
> some of the API things that I've always wanted and/or disliked ;)
>
> * more rich and customizable document model:
>
>  documents.each do {|doc| puts d.class==MyCustomDocClass}
>
> * more rich and customizable facet model:
>
>  response.facets.each do |facet|
>    facet.field
>    facet.hits
>  end
>
> * set document class type dynamically within result set before  
> iterating:
>
>  documents.assign_doc_class do |raw_doc|
>    return Models::CD if doc[:format_type]=='CD'
>    return Models::StandardDoc
>  end
>
>  # this would be nice because sometimes we're iterating through a  
> result
> set with entirely different "types" of docs.
>
> * document field method accessors
>    doc.id (or at least doc[:id])
>    # instead of
>    doc['id']
>
> * pagination aware result sets (documents and facets)
>
>    documents.total_pages # etc.
>    response.facets.has_next?
>
> * ability to pass in arbitrary query fields directly to solr without
> worrying about solr-ruby raising an error
>
> * ability to bypass query field mapping completely while querying
>
> * flatten :facets mapping so that:
>
>  :facets=>{:fields=>[]}
>  # becomes
>  :facet_fields=>[]
>
> * ability to query a custom :query_type and NOT having to create a  
> custom
> request/response class pair
>
> Those things are pretty easy to implement. I'd imagine that if solr-
> ruby has
> a solid API, and a simpler code base it'd also be pretty easy to  
> implement
> some of the ORM-like features included on the wiki page and even a  
> more DSL
> like approach to regular querying:
>
> response = connectiion.search do |q|
>  q.per_page 10
>  q.query 'twain'
>  q.filter_query :title, 'finn'
>  q.facet_fields :title, :author
>  q.query_field :title, 0.5
> end
>
> What do you think?
>
> Matt
>
> On Sun, Sep 28, 2008 at 6:22 PM, Koji Sekiguchi <koji@...>  
> wrote:
>
>> Matt,
>>
>>> Do any of you have ideas or things that you've disliked about solr-
>>> ruby?
>> If
>>> so, please say so! I've got all kinds of ideas I'd like to  
>>> implement and
>>> crank out, but for now I want to see what other people are thinking.
>>
>> I don't have a concrete idea of making it better, but I agree with  
>> you.
>> Do you have the idea? let's discuss it to be things more rubyish.
>>
>> see also:
>> http://wiki.apache.org/solr/solr-ruby/BrainStorming
>>
>> Koji
>>
>>


Re: Ideas for simplifying solr-ruby and making it better

by Koji Sekiguchi-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

 > I wonder if starting from the top down would be a good way to
approach this
 > discussion. Like talking about the public API then talk about the code
 > underneath to support it, then refactoring etc.. So even before
discussing
 > something request/response "placeholder" classes problem, I'll just
express
 > some of the API things that I've always wanted and/or disliked ;)

+1. Let's start from request/response "placeholder" classes.

 > * document field method accessors
 >     doc.id (or at least doc[:id])
 >     # instead of
 >     doc['id']

+1.

 > * ability to pass in arbitrary query fields directly to solr without
 > worrying about solr-ruby raising an error

Why do you need this ability?

Other than those above, I think you show good things up
to start our discussion and they are interesting.
I'd like to get comments/feedbacks from my associate (rubyist).

Cheers,

Koji


Re: Ideas for simplifying solr-ruby and making it better

by goodieboy :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Koji. Thanks for the feedback!

In regard to the arbitrary query param issue. There are a few reasons why I
brought that up. The first is that there have been times where I wanted to
pass in something to Solr and solr-ruby hadn't yet supported it. Which means
there needs to be a continuous process of field mapping integration, at
least enough to keep up with the latest Solr param spec. Probably won't be
be a real problem, but it did happen to me once. Another issue is that,
sometimes I feel like the mapping gets in the way. Remembering all of the
Solr params is one thing, but then when you use solr-ruby you have to
remember a whole new set. Oh and the solr params are shorter :)

c.query(:q=>'battery operated', :fq=>'location:Chicago', :qf=>'title^0.5',
:fl=>'title, man, price')
# v.s.
c.query(:query=>'battery operated', :filter_queries=>['location:Chicago'],
:query_fields=>'title^0.5', :field_list=>['title', 'man', 'price'])

It'd be really nice to have the field mapping be optional, and even
better... plugable field mapping!

For the class placeholder issue... if we first start with the request
classes, we see there is a :response_format, :content_type, and a :handler.
The rest of the data is essentially query param stuff (field mapping). To
make it really simple, the :handler could dissapear, it'd just be set in the
method ('select' for a :query or :search, 'update' for a :delete etc.). The
:response_format could be set based on the :wt value. And the :content_type
could be a preset attribute in the connection instance. So, with that, you
just provide a method that accepts a hash of params.

The current request classes in solr-ruby (Solr::Request::Dismax etc.) really
look like query field mappers to me, that's what the bulk of the code is
doing it seems. So imagine for a querying... the connection class, a simple
query method, and then something like the current Solr::Request::Standard
being thrown in as a plugable mapper?

Not to promote inheritence :) but if Solr::Connection provided raw query ->
HTTP, you could do something like:

MyConnection < Solr::Connection

  def query(params)
    super map(params) # the real query method accepts only raw solr
params...
  end

  protected
  def map(params)
    # convert my param structure to a raw solr query string...
  end

end

But there are better ways to do this in Ruby!

Matt

-- as an example, here is something I threw together a few weeks ago just to
get a feel for the minimal code needed for talking to solr. This is nothing
more than an experiment, hasn't been tested, and of course isn't a "real"
project!

lib:
http://github.com/mwmitchell/slite/tree/master/lib/slite.rb

example:
http://github.com/mwmitchell/slite/tree/master/README


>
> > * ability to pass in arbitrary query fields directly to solr without
> > worrying about solr-ruby raising an error
>
> Why do you need this ability?
>
> Other than those above, I think you show good things up
> to start our discussion and they are interesting.
> I'd like to get comments/feedbacks from my associate (rubyist).
>
> Cheers,
>
> Koji
>
>

Re: Ideas for simplifying solr-ruby and making it better

by Jamie Orchard-Hays-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Matt, I think that that is a great idea. Really useful.


On Sep 29, 2008, at 9:40 PM, Matt Mitchell wrote:

> Hi Koji. Thanks for the feedback!
>
> In regard to the arbitrary query param issue. There are a few  
> reasons why I
> brought that up. The first is that there have been times where I  
> wanted to
> pass in something to Solr and solr-ruby hadn't yet supported it.  
> Which means
> there needs to be a continuous process of field mapping integration,  
> at
> least enough to keep up with the latest Solr param spec. Probably  
> won't be
> be a real problem, but it did happen to me once. Another issue is  
> that,
> sometimes I feel like the mapping gets in the way. Remembering all  
> of the
> Solr params is one thing, but then when you use solr-ruby you have to
> remember a whole new set. Oh and the solr params are shorter :)
>
> c.query(:q=>'battery  
> operated', :fq=>'location:Chicago', :qf=>'title^0.5',
> :fl=>'title, man, price')
> # v.s.
> c.query(:query=>'battery  
> operated', :filter_queries=>['location:Chicago'],
> :query_fields=>'title^0.5', :field_list=>['title', 'man', 'price'])
>
> It'd be really nice to have the field mapping be optional, and even
> better... plugable field mapping!
>
> For the class placeholder issue... if we first start with the request
> classes, we see there is a :response_format, :content_type, and  
> a :handler.
> The rest of the data is essentially query param stuff (field  
> mapping). To
> make it really simple, the :handler could dissapear, it'd just be  
> set in the
> method ('select' for a :query or :search, 'update' for a :delete  
> etc.). The
> :response_format could be set based on the :wt value. And  
> the :content_type
> could be a preset attribute in the connection instance. So, with  
> that, you
> just provide a method that accepts a hash of params.
>
> The current request classes in solr-ruby (Solr::Request::Dismax  
> etc.) really
> look like query field mappers to me, that's what the bulk of the  
> code is
> doing it seems. So imagine for a querying... the connection class, a  
> simple
> query method, and then something like the current  
> Solr::Request::Standard
> being thrown in as a plugable mapper?
>
> Not to promote inheritence :) but if Solr::Connection provided raw  
> query ->
> HTTP, you could do something like:
>
> MyConnection < Solr::Connection
>
>  def query(params)
>    super map(params) # the real query method accepts only raw solr
> params...
>  end
>
>  protected
>  def map(params)
>    # convert my param structure to a raw solr query string...
>  end
>
> end
>
> But there are better ways to do this in Ruby!
>
> Matt
>
> -- as an example, here is something I threw together a few weeks ago  
> just to
> get a feel for the minimal code needed for talking to solr. This is  
> nothing
> more than an experiment, hasn't been tested, and of course isn't a  
> "real"
> project!
>
> lib:
> http://github.com/mwmitchell/slite/tree/master/lib/slite.rb
>
> example:
> http://github.com/mwmitchell/slite/tree/master/README
>
>
>>
>>> * ability to pass in arbitrary query fields directly to solr without
>>> worrying about solr-ruby raising an error
>>
>> Why do you need this ability?
>>
>> Other than those above, I think you show good things up
>> to start our discussion and they are interesting.
>> I'd like to get comments/feedbacks from my associate (rubyist).
>>
>> Cheers,
>>
>> Koji
>>
>>


Re: Ideas for simplifying solr-ruby and making it better

by Erik Hatcher :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Sep 29, 2008, at 9:40 PM, Matt Mitchell wrote:

> In regard to the arbitrary query param issue. There are a few  
> reasons why I
> brought that up. The first is that there have been times where I  
> wanted to
> pass in something to Solr and solr-ruby hadn't yet supported it.  
> Which means
> there needs to be a continuous process of field mapping integration,  
> at
> least enough to keep up with the latest Solr param spec. Probably  
> won't be
> be a real problem, but it did happen to me once. Another issue is  
> that,
> sometimes I feel like the mapping gets in the way. Remembering all  
> of the
> Solr params is one thing, but then when you use solr-ruby you have to
> remember a whole new set. Oh and the solr params are shorter :)

Yeah, it was a bit over designed to have alias and be too clever with  
parameter mappings from Ruby to HTTP.  I'd like to strip away all the  
mappings and have solr-ruby in its most elementary API be able to  
simply pass through parameters and get the raw Ruby response data  
structure back.  Very quickly folks will want to build on top of that  
to make things cleaner, but that is fine.

> c.query(:q=>'battery  
> operated', :fq=>'location:Chicago', :qf=>'title^0.5',
> :fl=>'title, man, price')
> # v.s.
> c.query(:query=>'battery  
> operated', :filter_queries=>['location:Chicago'],
> :query_fields=>'title^0.5', :field_list=>['title', 'man', 'price'])
>
> It'd be really nice to have the field mapping be optional, and even
> better... plugable field mapping!

   +1

Note that the current Solr::Request::Select is pretty close to what  
you're asking for here.

One thing I want to really handle cleanly is custom request handler  
mappings - making it trivial to request to any handler.  It's not too  
bad now, but the paired Request/Response class structure needs to go.

> The current request classes in solr-ruby (Solr::Request::Dismax  
> etc.) really
> look like query field mappers to me, that's what the bulk of the  
> code is
> doing it seems. So imagine for a querying... the connection class, a  
> simple
> query method, and then something like the current  
> Solr::Request::Standard
> being thrown in as a plugable mapper?

+1

> Not to promote inheritence :) but if Solr::Connection provided raw  
> query ->
> HTTP, you could do something like:
>
> MyConnection < Solr::Connection
>
>  def query(params)
>    super map(params) # the real query method accepts only raw solr
> params...
>  end
>
>  protected
>  def map(params)
>    # convert my param structure to a raw solr query string...
>  end
>
> end
>
> But there are better ways to do this in Ruby!

Solr::Connection does provide pretty raw operations to Solr.  Look at  
Solr::Connection#post.  Pass in an object that has #handler, #to_s,  
and #content_type methods in and you're off and running.  The #to_s  
being the key to parameter mapping.

> -- as an example, here is something I threw together a few weeks ago  
> just to
> get a feel for the minimal code needed for talking to solr. This is  
> nothing
> more than an experiment, hasn't been tested, and of course isn't a  
> "real"
> project!
>
> lib:
> http://github.com/mwmitchell/slite/tree/master/lib/slite.rb
>
> example:
> http://github.com/mwmitchell/slite/tree/master/README

Cute stuff, Matt!

I think there are goodies to be mined from there for sure.

How about using #method_missing on Connection such that  
connection.whatever(:key => 'value') would call to the "whatever"  
request handler?  That'd be cool.

I'm not sure I agree with creating objects beyond the eval of the Ruby  
response though.  At least not in the core of solr-ruby.  Let's let  
the response from Solr itself be the only object a client really  
needs.  Conversion to other objects can occur a layer above the inner  
core of solr-ruby, such as acts_as_solr.

Keep in mind we have access to to change Solr's response format to  
suit solr-ruby's needs also.  And I can see some custom solr-ruby  
classes coming into play that Solr's Ruby response would emit.

        Erik


Re: Ideas for simplifying solr-ruby and making it better

by Erik Hatcher :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

One other big wish list item I have for solr-ruby beyond gutting it  
and simplifying it to the bare essentials, is to make it JRuby-aware.  
When running with JRuby, the SolrJ library will be used and will allow  
the use of SolrServer such that an EmbeddedSolrServer can be used.  I  
suspect we can make this all seamless somehow such that MRI works fine  
over HTTP, and JRuby gets the advantage of being able to work really  
nicely with embedded Solr.

        Erik


On Sep 30, 2008, at 5:49 AM, Erik Hatcher wrote:

>
> On Sep 29, 2008, at 9:40 PM, Matt Mitchell wrote:
>> In regard to the arbitrary query param issue. There are a few  
>> reasons why I
>> brought that up. The first is that there have been times where I  
>> wanted to
>> pass in something to Solr and solr-ruby hadn't yet supported it.  
>> Which means
>> there needs to be a continuous process of field mapping  
>> integration, at
>> least enough to keep up with the latest Solr param spec. Probably  
>> won't be
>> be a real problem, but it did happen to me once. Another issue is  
>> that,
>> sometimes I feel like the mapping gets in the way. Remembering all  
>> of the
>> Solr params is one thing, but then when you use solr-ruby you have to
>> remember a whole new set. Oh and the solr params are shorter :)
>
> Yeah, it was a bit over designed to have alias and be too clever  
> with parameter mappings from Ruby to HTTP.  I'd like to strip away  
> all the mappings and have solr-ruby in its most elementary API be  
> able to simply pass through parameters and get the raw Ruby response  
> data structure back.  Very quickly folks will want to build on top  
> of that to make things cleaner, but that is fine.
>
>> c.query(:q=>'battery  
>> operated', :fq=>'location:Chicago', :qf=>'title^0.5',
>> :fl=>'title, man, price')
>> # v.s.
>> c.query(:query=>'battery  
>> operated', :filter_queries=>['location:Chicago'],
>> :query_fields=>'title^0.5', :field_list=>['title', 'man', 'price'])
>>
>> It'd be really nice to have the field mapping be optional, and even
>> better... plugable field mapping!
>
>  +1
>
> Note that the current Solr::Request::Select is pretty close to what  
> you're asking for here.
>
> One thing I want to really handle cleanly is custom request handler  
> mappings - making it trivial to request to any handler.  It's not  
> too bad now, but the paired Request/Response class structure needs  
> to go.
>
>> The current request classes in solr-ruby (Solr::Request::Dismax  
>> etc.) really
>> look like query field mappers to me, that's what the bulk of the  
>> code is
>> doing it seems. So imagine for a querying... the connection class,  
>> a simple
>> query method, and then something like the current  
>> Solr::Request::Standard
>> being thrown in as a plugable mapper?
>
> +1
>
>> Not to promote inheritence :) but if Solr::Connection provided raw  
>> query ->
>> HTTP, you could do something like:
>>
>> MyConnection < Solr::Connection
>>
>> def query(params)
>>   super map(params) # the real query method accepts only raw solr
>> params...
>> end
>>
>> protected
>> def map(params)
>>   # convert my param structure to a raw solr query string...
>> end
>>
>> end
>>
>> But there are better ways to do this in Ruby!
>
> Solr::Connection does provide pretty raw operations to Solr.  Look  
> at Solr::Connection#post.  Pass in an object that has #handler,  
> #to_s, and #content_type methods in and you're off and running.  The  
> #to_s being the key to parameter mapping.
>
>> -- as an example, here is something I threw together a few weeks  
>> ago just to
>> get a feel for the minimal code needed for talking to solr. This is  
>> nothing
>> more than an experiment, hasn't been tested, and of course isn't a  
>> "real"
>> project!
>>
>> lib:
>> http://github.com/mwmitchell/slite/tree/master/lib/slite.rb
>>
>> example:
>> http://github.com/mwmitchell/slite/tree/master/README
>
> Cute stuff, Matt!
>
> I think there are goodies to be mined from there for sure.
>
> How about using #method_missing on Connection such that  
> connection.whatever(:key => 'value') would call to the "whatever"  
> request handler?  That'd be cool.
>
> I'm not sure I agree with creating objects beyond the eval of the  
> Ruby response though.  At least not in the core of solr-ruby.  Let's  
> let the response from Solr itself be the only object a client really  
> needs.  Conversion to other objects can occur a layer above the  
> inner core of solr-ruby, such as acts_as_solr.
>
> Keep in mind we have access to to change Solr's response format to  
> suit solr-ruby's needs also.  And I can see some custom solr-ruby  
> classes coming into play that Solr's Ruby response would emit.
>
> Erik