« Return to Thread: DIH: URLDataSource and incremental indexing

DIH: URLDataSource and incremental indexing

by Erik Hatcher :: Rate this Message:

Reply to Author | View in Thread

I'm exploring other ways of getting data into Solr via  
DataImportHandler than through a relational database, particularly the  
URLDataSource.

I see the special commands for deleting by id and query as well as the  
$hasMore/$nextUrl techniques, but I'm unclear on exactly how one would  
go about designing a data source over HTTP that worked cleanly for  
full importing and also for delta indexing.

For sake of argument, suppose I have /data.xml[?since=<some timestamp>]
[&start=X&rows=Y] and it could return documents in Solr XML (or really  
any basic format) since the last time it was updated (or all records  
if no since parameter is provided).  And the service could also return  
which records to remove since that timestamp too.  Can I get there  
from here using URLDataSource?

Have folks been doing this?  If so, anyone care to share some basic  
tips/tricks/examples?

Thanks,
        Erik

 « Return to Thread: DIH: URLDataSource and incremental indexing