Authenticity of URLs from DMOZ

View: New views
1 Messages — Rating Filter:   Alert me  

Authenticity of URLs from DMOZ

by Gaurang Patel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hey,

Can anyone tell what could be the reason for following which happened while fetching data using bin/nutch fetch:

My AVG Antivirus is detecting virus threats while Nutch fetches pages from available urls of crawldb. I injected DMOZ Open Directory urls to crawldb. Antivirus already detected 4 threats within only half an hour after start of fetching.

Is there any other way(any source other than DMOZ) to get list of whole web urls ? Or is there an automatic way to avoid such harrmful urls from being fetched? Let me know asap.


Regards,
Gaurang