generate, fetch- nutch commands

View: New views
1 Messages — Rating Filter:   Alert me  

generate, fetch- nutch commands

by Gaurang Patel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

All,

I am a masters student and want to crawl the whole web for my masters project.

While trying to generate, fetch, crawl the whole web using Nutch (I am following steps from http://lucene.apache.org/nutch/tutorial8.html), I got confused among various nutch terms and usage:
1) What is the purpose and difference between crawl_fetch and crawldb ? If nutch stores all the info regarding urls in crawldb, then what is the need for crawl_fetch?
2) Moreover, what does fetch and generate do? Can anyone describe in detail? Is there any documentation for nutch commands like generate, fetch, etc?


Thanks & Regards,
Gaurang Patel