Hi Nutch Guys,
I used to show the crawldb stats. Now I want to show which urls are
db_gone (it means an error 404 - or anything else)
how may I showing the db_gone urls?
bin/nutch readdb crawl/crawldb -stats
CrawlDb statistics start: crawl/crawldb
Statistics for CrawlDb: crawl/crawldb
TOTAL urls: 2157
retry 0: 2154
retry 5: 3
min score: 0.0
avg score: 0.018363468
max score: 3.01
status 1 (db_unfetched): 1971
status 2 (db_fetched): 158
status 3 (db_gone): 13
status 4 (db_redir_temp): 1
status 5 (db_redir_perm): 14
CrawlDb statistics: done
thanks,
Mario
--
Mario Schröder |
http://www.finanz-checks.deOffice: +49 361 2152062
Phone: +49 34464 62301 Cell: +49 163 27 09 807
http://www.xing.com/go/invite/6035007.9c143c