I haven't tried this myself, but it sounds like what you're looking for is
enabling remote streaming:
http://wiki.apache.org/solr/ContentStream#head-7179a128a2fdd5dde6b1af553ed41735402aadbfAs the link above shows you should be able to enable remote streaming like
this: <requestParsers enableRemoteStreaming="true"
multipartUploadLimitInKB="2048" /> and then something like this might work:
stream.url=
http://www.sub.myDomain.com/files/pdfdocs/testfile.pdf<
http://www.sub.mydomain.com/files/pdfdocs/testfile.pdf>
So you use stream.url instead of stream.file.
Hope this helps.
-Jay
On Wed, Jul 8, 2009 at 7:40 AM, ahammad <
ahmed.hammad@...> wrote:
>
> Hello,
>
> I can index rich documents like pdf for instance that are on the
> filesystem.
> Can we use ExtractingRequestHandler to index files that are accessible on a
> website?
>
> For example, there is a file that can be reached like so:
>
http://www.sub.myDomain.com/files/pdfdocs/testfile.pdf>
> How would I go about indexing that file? I tried using the following
> combinations. I will put the errors in brackets:
>
> stream.file=
http://www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The
> filename, directory name, or volume label syntax is incorrect)
> stream.file=www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The system
> cannot find the path specified)
> stream.file=//www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The format
> of
> the specified network name is invalid)
> stream.file=sub.myDomain.com/files/pdfdocs/testfile.pdf (The system cannot
> find the path specified)
> stream.file=//sub.myDomain.com/files/pdfdocs/testfile.pdf (The network
> path
> was not found)
>
> I sort of understand why I get those errors. What are the alternative
> methods of doing this? I am guessing that the stream.file attribute doesn't
> support web addresses. Is there another attribute that does?
> --
> View this message in context:
>
http://www.nabble.com/Indexing--rich-documents-from-websites-using-ExtractingRequestHandler-tp24392809p24392809.html> Sent from the Solr - User mailing list archive at Nabble.com.
>
>