Hey,
Can anyone tell what could be the reason for following which happened while fetching data using bin/nutch fetch:
My AVG Antivirus is detecting virus threats while Nutch fetches pages from available urls of crawldb.
I injected DMOZ Open Directory urls to crawldb. Antivirus already
detected 4 threats within only half an hour after start of fetching.
Is there any other way(any source other than DMOZ) to get list of
whole web urls ? Or is there an automatic way to avoid such harrmful
urls from being fetched? Let me know asap.
Regards,
Gaurang