« Return to Thread: Weighting different html text nodes - h1,h2 etc..

Weighting different html text nodes - h1,h2 etc..

by Joel Halbert-2 :: Rate this Message:

Reply to Author | View in Thread

Hi, Would I be correct in thinking that Nutch, when indexing an html
document, does not weight the different text nodes (h1, h2, anchor etc)
differently - instead it just lumps together all text as one? (this is
the impression I get from looking at
org.apache.nutch.parse.html.HtmlParser)

Rgs,
Joel


 « Return to Thread: Weighting different html text nodes - h1,h2 etc..