bin/nutch readdb <crawldb> -dump <out_dir> -format csv
schroedi wrote:
> Hi Nutch Guys,
>
> I used to show the crawldb stats. Now I want to show which urls are
> db_gone (it means an error 404 - or anything else)
> how may I showing the db_gone urls?
>
> bin/nutch readdb crawl/crawldb -stats
> CrawlDb statistics start: crawl/crawldb
> Statistics for CrawlDb: crawl/crawldb
> TOTAL urls: 2157
> retry 0: 2154
> retry 5: 3
> min score: 0.0
> avg score: 0.018363468
> max score: 3.01
> status 1 (db_unfetched): 1971
> status 2 (db_fetched): 158
> status 3 (db_gone): 13
> status 4 (db_redir_temp): 1
> status 5 (db_redir_perm): 14
> CrawlDb statistics: done
>
> thanks,
>
> Mario
>
>