|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
[DIH] blocking import operationHi all,
currently, DIH's import operation(s) only works asynchronously. Therefore, after submitting an import request, DIH returns immediately, while the import process (in case a large amount of data needs to be indexed) continues asynchronously behind the scenes. So, what is the recommended way to check if the import process has already finished? Or still better, is there any method / workaround that will block the import operation's caller until the operation has finished? In my application, the DIH receives some URL parameters which are used for determining the database name that is used within data-config.xml, e.g. http://localhost:8983/solr/dataimport?command=full-import&dbname=foo Since only one DIH, /dataimport, is defined, but several database needs to be indexed, it is required to issue this command several times, e.g. http://localhost:8983/solr/dataimport?command=full-import&dbname=foo ... wait until /dataimport?command=status says "Indexing completed" (but without using a loop that checks it again and again) ... http://localhost:8983/solr/dataimport?command=full-import&dbname=bar&clean=false A suitable solution, at least IMHO, would be to have an additional DIH parameter which determines whether the import call is blocking on non-blocking, the default. As far as I see, this could be accomplished since Solr can execute more than one import operation at a time (it starts a new thread for each). Perhaps, my question is somehow related to the discussion [1] on ParallelDataImportHandler. Best, Sascha [1] http://www.lucidimagination.com/search/document/a9b26ade46466ee |
|
|
Re: [DIH] blocking import operationDIH imports are really long running. There is a good chance that the
connection times out or breaks in between. how about a callback? On Tue, Nov 10, 2009 at 12:12 AM, Sascha Szott <szott@...> wrote: > Hi all, > > currently, DIH's import operation(s) only works asynchronously. Therefore, > after submitting an import request, DIH returns immediately, while the > import process (in case a large amount of data needs to be indexed) > continues asynchronously behind the scenes. > > So, what is the recommended way to check if the import process has already > finished? Or still better, is there any method / workaround that will block > the import operation's caller until the operation has finished? > > In my application, the DIH receives some URL parameters which are used for > determining the database name that is used within data-config.xml, e.g. > > http://localhost:8983/solr/dataimport?command=full-import&dbname=foo > > Since only one DIH, /dataimport, is defined, but several database needs to > be indexed, it is required to issue this command several times, e.g. > > http://localhost:8983/solr/dataimport?command=full-import&dbname=foo > > ... wait until /dataimport?command=status says "Indexing completed" (but > without using a loop that checks it again and again) ... > > http://localhost:8983/solr/dataimport?command=full-import&dbname=bar&clean=false > > > A suitable solution, at least IMHO, would be to have an additional DIH > parameter which determines whether the import call is blocking on > non-blocking, the default. As far as I see, this could be accomplished since > Solr can execute more than one import operation at a time (it starts a new > thread for each). Perhaps, my question is somehow related to the discussion > [1] on ParallelDataImportHandler. > > Best, > Sascha > > [1] http://www.lucidimagination.com/search/document/a9b26ade46466ee > > -- ----------------------------------------------------- Noble Paul | Principal Engineer| AOL | http://aol.com |
|
|
Re: [DIH] blocking import operationNoble,
Noble Paul wrote: > DIH imports are really long running. There is a good chance that the > connection times out or breaks in between. Yes, you're right, I missed that point (in my case imports take no longer than a minute). > how about a callback? Thanks for the hint. There was a discussion on adding a callback url to DIH a month ago, but it seems that no issue was raised. So, up to now its only possible to implement an appropriate Solr EventListener. Should we open an issue for supporting callback urls? Best, Sascha > > On Tue, Nov 10, 2009 at 12:12 AM, Sascha Szott <szott@...> wrote: >> Hi all, >> >> currently, DIH's import operation(s) only works asynchronously. >> Therefore, >> after submitting an import request, DIH returns immediately, while the >> import process (in case a large amount of data needs to be indexed) >> continues asynchronously behind the scenes. >> >> So, what is the recommended way to check if the import process has >> already >> finished? Or still better, is there any method / workaround that will >> block >> the import operation's caller until the operation has finished? >> >> In my application, the DIH receives some URL parameters which are used >> for >> determining the database name that is used within data-config.xml, e.g. >> >> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo >> >> Since only one DIH, /dataimport, is defined, but several database needs >> to >> be indexed, it is required to issue this command several times, e.g. >> >> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo >> >> ... wait until /dataimport?command=status says "Indexing completed" (but >> without using a loop that checks it again and again) ... >> >> http://localhost:8983/solr/dataimport?command=full-import&dbname=bar&clean=false >> >> >> A suitable solution, at least IMHO, would be to have an additional DIH >> parameter which determines whether the import call is blocking on >> non-blocking, the default. As far as I see, this could be accomplished >> since >> Solr can execute more than one import operation at a time (it starts a >> new >> thread for each). Perhaps, my question is somehow related to the >> discussion >> [1] on ParallelDataImportHandler. >> >> Best, >> Sascha >> >> [1] http://www.lucidimagination.com/search/document/a9b26ade46466ee >> |
|
|
Re: [DIH] blocking import operationYes , open an issue . This is a trivial change
On Thu, Nov 12, 2009 at 5:08 AM, Sascha Szott <szott@...> wrote: > Noble, > > Noble Paul wrote: >> DIH imports are really long running. There is a good chance that the >> connection times out or breaks in between. > Yes, you're right, I missed that point (in my case imports take no longer > than a minute). > >> how about a callback? > Thanks for the hint. There was a discussion on adding a callback url to > DIH a month ago, but it seems that no issue was raised. So, up to now its > only possible to implement an appropriate Solr EventListener. Should we > open an issue for supporting callback urls? > > Best, > Sascha > >> >> On Tue, Nov 10, 2009 at 12:12 AM, Sascha Szott <szott@...> wrote: >>> Hi all, >>> >>> currently, DIH's import operation(s) only works asynchronously. >>> Therefore, >>> after submitting an import request, DIH returns immediately, while the >>> import process (in case a large amount of data needs to be indexed) >>> continues asynchronously behind the scenes. >>> >>> So, what is the recommended way to check if the import process has >>> already >>> finished? Or still better, is there any method / workaround that will >>> block >>> the import operation's caller until the operation has finished? >>> >>> In my application, the DIH receives some URL parameters which are used >>> for >>> determining the database name that is used within data-config.xml, e.g. >>> >>> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo >>> >>> Since only one DIH, /dataimport, is defined, but several database needs >>> to >>> be indexed, it is required to issue this command several times, e.g. >>> >>> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo >>> >>> ... wait until /dataimport?command=status says "Indexing completed" (but >>> without using a loop that checks it again and again) ... >>> >>> http://localhost:8983/solr/dataimport?command=full-import&dbname=bar&clean=false >>> >>> >>> A suitable solution, at least IMHO, would be to have an additional DIH >>> parameter which determines whether the import call is blocking on >>> non-blocking, the default. As far as I see, this could be accomplished >>> since >>> Solr can execute more than one import operation at a time (it starts a >>> new >>> thread for each). Perhaps, my question is somehow related to the >>> discussion >>> [1] on ParallelDataImportHandler. >>> >>> Best, >>> Sascha >>> >>> [1] http://www.lucidimagination.com/search/document/a9b26ade46466ee >>> > -- ----------------------------------------------------- Noble Paul | Principal Engineer| AOL | http://aol.com |
|
|
Re: [DIH] blocking import operationNoble Paul wrote:
> Yes , open an issue . This is a trivial change I've opened JIRA issue SOLR-1554. -Sascha > > On Thu, Nov 12, 2009 at 5:08 AM, Sascha Szott <szott@...> wrote: >> Noble, >> >> Noble Paul wrote: >>> DIH imports are really long running. There is a good chance that the >>> connection times out or breaks in between. >> Yes, you're right, I missed that point (in my case imports take no >> longer >> than a minute). >> >>> how about a callback? >> Thanks for the hint. There was a discussion on adding a callback url to >> DIH a month ago, but it seems that no issue was raised. So, up to now >> its >> only possible to implement an appropriate Solr EventListener. Should we >> open an issue for supporting callback urls? >> >> Best, >> Sascha >> >>> >>> On Tue, Nov 10, 2009 at 12:12 AM, Sascha Szott <szott@...> wrote: >>>> Hi all, >>>> >>>> currently, DIH's import operation(s) only works asynchronously. >>>> Therefore, >>>> after submitting an import request, DIH returns immediately, while the >>>> import process (in case a large amount of data needs to be indexed) >>>> continues asynchronously behind the scenes. >>>> >>>> So, what is the recommended way to check if the import process has >>>> already >>>> finished? Or still better, is there any method / workaround that will >>>> block >>>> the import operation's caller until the operation has finished? >>>> >>>> In my application, the DIH receives some URL parameters which are used >>>> for >>>> determining the database name that is used within data-config.xml, >>>> e.g. >>>> >>>> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo >>>> >>>> Since only one DIH, /dataimport, is defined, but several database >>>> needs >>>> to >>>> be indexed, it is required to issue this command several times, e.g. >>>> >>>> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo >>>> >>>> ... wait until /dataimport?command=status says "Indexing completed" >>>> (but >>>> without using a loop that checks it again and again) ... >>>> >>>> http://localhost:8983/solr/dataimport?command=full-import&dbname=bar&clean=false >>>> >>>> >>>> A suitable solution, at least IMHO, would be to have an additional DIH >>>> parameter which determines whether the import call is blocking on >>>> non-blocking, the default. As far as I see, this could be accomplished >>>> since >>>> Solr can execute more than one import operation at a time (it starts a >>>> new >>>> thread for each). Perhaps, my question is somehow related to the >>>> discussion >>>> [1] on ParallelDataImportHandler. >>>> >>>> Best, >>>> Sascha >>>> >>>> [1] http://www.lucidimagination.com/search/document/a9b26ade46466ee >>>> >> > > > > -- > ----------------------------------------------------- > Noble Paul | Principal Engineer| AOL | http://aol.com > |
| Free embeddable forum powered by Nabble | Forum Help |