|
View:
New views
7 Messages
—
Rating Filter:
Alert me
|
|
|
solr2: Onward and UpwardI've been thinking about the next major version of Solr.
Here's some brainstorming on goals/ideas: - use a standard IOC container for externalization of configuration and plugins... Spring "springs" to mind as the obvious choice here. May want to use other spring services such as JMX integration, and look into JMX management (more than just statistics), etc. - support programatic construction and manipulation of IndexSchema, etc. - support some sort of standard RPC mechanism (Thrift, Etch, ???) so strongly typed language bindings don't have to be developed for every language (as it seems most people want). Create an IDL for common operations and then use a compiler to create the stubs for perl, python, java, etc. - an RPC mechanism that can have multiple operations pending per socket (and maybe use NIO) would probably be good for distributed search, etc. - allow more lower level index operations... create a new index at a given spot, merge multiple indicies, etc. - make Solr more scalable and cloud computing friendly... make it easier to create and deploy clusters/shards, as well as change the size of clusters - remove the single-master points of failure per-shard (support or incorporate something like bailey) - make it easier to deploy config changes (possibly use zookeeper... prob want that for cluster management anyway) - since solr will have the data, possibly allow plugins that could do map-reduce, or other interfaces that enable things like mahout. - support more changes w/o manual re-indexing... change the schema and have Solr re-index in the background (assuming all data is available via stored fields or elsewhere via a plugin) - support more "realtime" search... greatly reducing or eliminating the lag between adding a document and making it searchable - support "tagging" type of updates... quickly updating part of a document, or data associated with a document - try to expose more lower-level Lucene functionality to better support other projects that want to embed Solr (IOC should hopefully make Solr easier to embed and customize too) To support some of these goals, some re-architecture is probably in the cards. Caching based on the IndexReader rather than the IndexSearcher is probably one necessary change. We should also use this as an opportunity to clean some things up and improve the core architecture since this will be a major version change. But we should also - continue to support the current main solr web interfaces for searching and update - retain (or improve) the ease of use factor - we should always be able to point at an existing Lucene index and do interesting things with it - continue to focus on single-node ease of use for small web developers As for the future of Solr 1.x, I fully expect a Solr 1.4 release as well as other 1.x releases after that. Possible next steps: - Have discussions on solr-dev with a subject prefix of "solr2:" - We should avoid the temptation to start banging out code (unless it's just example code) and take some time to really leverage all of the architectural experience this larger solr-dev community brings. - Establish a wiki section for solr2 to capture current consensus... but generally use solr-dev for ideas and establishing that consensus - let java-dev know about this (i.e. what in Solr didn't suit their needs and how can we change that) Onward and upward... Other thoughts & ideas? -Yonik |
|
|
Re: solr2: Onward and UpwardOn Aug 29, 2008, at 2:03 PM, Yonik Seeley wrote: > I've been thinking about the next major version of Solr. > Here's some brainstorming on goals/ideas: > - use a standard IOC container for externalization of configuration > and plugins... Spring "springs" to mind as the obvious choice here. > May want to use other spring services such as JMX integration, and > look into JMX management (more than just statistics), etc. > - support programatic construction and manipulation of IndexSchema, > etc. Definitely. I'm a firm believer we spend too much time on configuration workarounds. Also, the IOC layer definitely makes the second point here trivial. It's maybe worthwhile to at least _consider_ being able to transform 1.x configurations to 2.x, but not saying we have to. > > - support some sort of standard RPC mechanism (Thrift, Etch, ???) so > strongly typed language bindings don't have to be developed for every > language (as it seems most people want). Create an IDL for common > operations and then use a compiler to create the stubs for perl, > python, java, etc. > - an RPC mechanism that can have multiple operations pending per > socket (and maybe use NIO) would probably be good for distributed > search, etc. > - allow more lower level index operations... create a new index at a > given spot, merge multiple indicies, etc. +1 > > - make Solr more scalable and cloud computing friendly... make it > easier to create and deploy clusters/shards, as well as change the > size of clusters People are definitely pushing on the scale front, it will be good to have it baked in from the ground up. > > - remove the single-master points of failure per-shard (support > or incorporate something like bailey) > - make it easier to deploy config changes (possibly use > zookeeper... prob want that for cluster management anyway) > - since solr will have the data, possibly allow plugins that > could do map-reduce, or other interfaces that enable things like > mahout. Ah, the marriage of Solr and Mahout. Words cannot express my joy. Think automatic classification and named entity recognition over large scale distributed collections, all faceted and categorized and tied together in eternal bliss. Sigh. (Dang, that guy's weird!) See also https://issues.apache.org/jira/browse/SOLR-651 I think it's also useful to think about how other NLP type tools plug in (i.e. sentence/paragraph detection, POS taggers, clustering, categorization, etc.) Solr, thanks to it's pluggable output and SearchComponent/Req Handler architecture can actually play quite well with these things. > > - support more changes w/o manual re-indexing... change the schema > and have Solr re-index in the background (assuming all data is > available via stored fields or elsewhere via a plugin) Cool > > - support more "realtime" search... greatly reducing or eliminating > the lag between adding a document and making it searchable One of the big things people often want > > - support "tagging" type of updates... quickly updating part of a > document, or data associated with a document > - try to expose more lower-level Lucene functionality to better > support other projects that want to embed Solr (IOC should hopefully > make Solr easier to embed and customize too) Yep. > > > To support some of these goals, some re-architecture is probably in > the cards. Caching based on the IndexReader rather than the > IndexSearcher is probably one necessary change. We should also use > this as an opportunity to clean some things up and improve the core > architecture since this will be a major version change. But we should > also > - continue to support the current main solr web interfaces for > searching and update Definitely a huge win. > > - retain (or improve) the ease of use factor > - we should always be able to point at an existing Lucene index > and do interesting things with it Even w/o a schema? > > - continue to focus on single-node ease of use for small web > developers Yes, this is the majority of users, I would guess, i.e. sites in the range of less than 10 million docs. > > > As for the future of Solr 1.x, I fully expect a Solr 1.4 release as > well as other 1.x releases after that. > > Possible next steps: > - Have discussions on solr-dev with a subject prefix of "solr2:" > - We should avoid the temptation to start banging out code (unless > it's just example code) and take some time to really leverage all of > the architectural experience this larger solr-dev community brings. > - Establish a wiki section for solr2 to capture current consensus... > but generally use solr-dev for ideas and establishing that consensus > - let java-dev know about this (i.e. what in Solr didn't suit their > needs and how can we change that) > > Onward and upward... Other thoughts & ideas? Better support for Spans, Payloads, Term Vectors. Granted, Spans just need support via a query parser and the results written to the output, but Payloads are a bit trickier when it comes to the indexing side of thing. -Grant |
|
|
Re: solr2: Onward and UpwardOn Aug 29, 2008, at 2:58 PM, Grant Ingersoll wrote: >> Onward and upward... Other thoughts & ideas? > > Better support for Spans, Payloads, Term Vectors. Granted, Spans > just need support via a query parser and the results written to the > output, but Payloads are a bit trickier when it comes to the > indexing side of thing. And let's not forget support for the tee token filter :) Erik |
|
|
Re: solr2: Onward and UpwardYou guys are all nuts. I'm barely hanging on by a thread keeping up with all of the 1.3 stuff, and you're already talking about 1.4, 1.X, and 2.0 ... madness i tell you, madness! PS: seriously, I'm going to hold off on actually reading this thread untill 1.3 is shipped. it doesn't mean i'm not interested, it just means i'm interested later. -Hoss |
|
|
Re: solr2: Onward and UpwardOn Fri, Aug 29, 2008 at 11:33 PM, Yonik Seeley <yonik@...> wrote:
> I've been thinking about the next major version of Solr. > Here's some brainstorming on goals/ideas: > - use a standard IOC container for externalization of configuration > and plugins... Spring "springs" to mind as the obvious choice here. > May want to use other spring services such as JMX integration, and > look into JMX management (more than just statistics), etc. > - support programatic construction and manipulation of IndexSchema, etc. > - support some sort of standard RPC mechanism (Thrift, Etch, ???) so > strongly typed language bindings don't have to be developed for every > language (as it seems most people want). Create an IDL for common > operations and then use a compiler to create the stubs for perl, > python, java, etc. > - an RPC mechanism that can have multiple operations pending per > socket (and maybe use NIO) would probably be good for distributed > search, etc. > - allow more lower level index operations... create a new index at a > given spot, merge multiple indicies, etc. > - make Solr more scalable and cloud computing friendly... make it > easier to create and deploy clusters/shards, as well as change the > size of clusters > - remove the single-master points of failure per-shard (support > or incorporate something like bailey) > - make it easier to deploy config changes (possibly use > zookeeper... prob want that for cluster management anyway) > - since solr will have the data, possibly allow plugins that > could do map-reduce, or other interfaces that enable things like > mahout. > - support more changes w/o manual re-indexing... change the schema > and have Solr re-index in the background (assuming all data is > available via stored fields or elsewhere via a plugin) > - support more "realtime" search... greatly reducing or eliminating > the lag between adding a document and making it searchable > - support "tagging" type of updates... quickly updating part of a > document, or data associated with a document > - try to expose more lower-level Lucene functionality to better > support other projects that want to embed Solr (IOC should hopefully > make Solr easier to embed and customize too) > > To support some of these goals, some re-architecture is probably in > the cards. Caching based on the IndexReader rather than the > IndexSearcher is probably one necessary change. We should also use > this as an opportunity to clean some things up and improve the core > architecture since this will be a major version change. But we should > also > - continue to support the current main solr web interfaces for > searching and update > - retain (or improve) the ease of use factor > - we should always be able to point at an existing Lucene index > and do interesting things with it > - continue to focus on single-node ease of use for small web developers > > As for the future of Solr 1.x, I fully expect a Solr 1.4 release as > well as other 1.x releases after that. > > Possible next steps: > - Have discussions on solr-dev with a subject prefix of "solr2:" > - We should avoid the temptation to start banging out code (unless > it's just example code) and take some time to really leverage all of > the architectural experience this larger solr-dev community brings. > - Establish a wiki section for solr2 to capture current consensus... > but generally use solr-dev for ideas and establishing that consensus > - let java-dev know about this (i.e. what in Solr didn't suit their > needs and how can we change that) > > Onward and upward... Other thoughts & ideas? > You're a mind reader ;-) Noble and I have been discussing many of the same things. Prominent topics have included real time search, eliminating dependency on master (a torrent like replication, well not exactly, but close to that), map-reduce support, exposing operations through JMX (not just read-only statistics), integrating the work done on Mahout, a cross-language binary format (using Thrift, see THRIFT-110, THRIFT-122). Another area was "Solr should learn from it's mistakes" :-) Basically, this is related to providing ways for applications to give feedback to Solr -- querylog/clickstream analysis or direct feedback for better search, more like this and spelling suggestions. -- Regards, Shalin Shekhar Mangar. |
|
|
Re: solr2: Onward and UpwardOn Fri, Aug 29, 2008 at 2:03 PM, Yonik Seeley <yonik@...> wrote:
> - allow more lower level index operations... create a new index at a > given spot, merge multiple indicies, etc. - possibly add the ability to pull a lucene index from hdfs (via a plugin if we don't want a hard dependency on hdfs) or from another solr server (like the in-development java based replication will allow). > - make Solr more scalable and cloud computing friendly... make it > easier to create and deploy clusters/shards, as well as change the > size of clusters > - remove the single-master points of failure per-shard (support > or incorporate something like bailey) > - make it easier to deploy config changes (possibly use > zookeeper... prob want that for cluster management anyway) People interested in the scalability part should look at bailey (people got busy and discussion died down, but there's some interesting stuff). http://sourceforge.net/mailarchive/forum.php?forum_name=bailey-developers http://bailey.wiki.sourceforge.net/ -Yonik |
|
|
Re: solr2: Onward and UpwardI think Hoss has a good point here. Solr has not shipped 1.3 yet and
really needs to. A lot of the functionality mentioned would probably break any backward compatibility and/or require large rewrites of code. For Ocean I guess I should just state more clearly that it's really supposed to be a replacement for SQL databases like what Google has done with GData and not just realtime search using Lucene. There may be some issues with doing this, however they can and should be addressed. This article by Adam Bosworth explains well how a massively scalable search database has many benefits over scaling SQL database systems http://acmqueue.com/modules.php?name=Content&pa=showpage&pid=337 I see this as the clear future for most companies, even if it takes a long time for even a few companies to implement outside of Google. There are too many cost and feature advantages in using search based databases, rather than using a mix of SQL and then doing batch based updates later. I doubt most companies would try to do it at this point, however one would say the same thing about SQL databases in the 1970s. In any case, SOLR is very cool and it would be great to see some of the analyzers, NumberUtils and other things go back into core Lucene at some point. Jason On Fri, Aug 29, 2008 at 3:13 PM, Chris Hostetter <hossman_lucene@...> wrote: > > You guys are all nuts. I'm barely hanging on by a thread keeping up with > all of the 1.3 stuff, and you're already talking about 1.4, 1.X, and 2.0 > ... madness i tell you, madness! > > PS: seriously, I'm going to hold off on actually reading this thread > untill 1.3 is shipped. it doesn't mean i'm not interested, it just means > i'm interested later. > > > -Hoss > > |
| Free embeddable forum powered by Nabble | Forum Help |