no results for local file crawls?

View: New views
2 Messages — Rating Filter:   Alert me  

no results for local file crawls?

by John Whelan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

I'm trying to crawl the local filesystem. It appears that the crawl is successful, but later searches don't display the content. During the crawl, I see the following:

...
fetching file:///c:/test/test.txt
fetching http://www.cnn.com/
...

I know from this that it is finding the file (otherwise I would get a 404 error), and I know that the protocol-file plugin is configured (otherwise I would get a protocol not found error).My test file contains "Hello World!", but when I query on 'world', I'll I get is the CCN page in the results.

Anyone have any idea as to what I'm doing wrong? (I've tried this with 2 different 1.1 nightly builds; one from last week and a different one from September. Also, I'm running in a CygWin environment.)

Thanks,
John

Re: no results for local file crawls?

by John Whelan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Well, I found the sources of my problem...

For starters, it appears that directories must be specified as starting point URLs, not specific files; if files are specified, they seem to be ignored. Also, when specifying directories, the traversal depth must be set to account for the directory as a level. When I had my problem, I was specifying specific files and had the crawler depth set to '1'. It looks loke in order for the local file crawling to obtain results for files in the directory, you need to specify the directory and set the crawler depth must be >= 2. (I suppose that this makes sense when you start looking at this as though it were directory traversal in UNIX, although I am kinda suprised that specifying a specific file as a starting point does not seen to cause it to be included.)