Jesse Hires wrote:
> I have a two datanode and one namenode setup. One of my datanodes is slower
> than the other, causing the fetch to run significantly longer on it. Is
> there a way to balance this out?
Most likely the number of URLs/host is unbalanced, meaning that the
tasktracker that takes the longest is assigned a lot of URLs from a
single host.
A workaround for this is to limit the max number of URLs per host (in
nutch-site.xml) to a more reasonable number, e.g. 100 or 1000, whatever
works best for you.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com