« Return to Thread: Index weightings of different types of text node...h1, h2 anchor etc..

Re: Index weightings of different types of text node...h1, h2 anchor etc..

by Magnús Skúlason :: Rate this Message:

Reply to Author | View in Thread

yes that is correct, in order to do that you could modify the parser to
store the content of special tags into another field that you would give a
higher boost.

best regards,
Magnus

On Thu, Jul 9, 2009 at 3:30 PM, Joel Halbert <joel@...> wrote:

> Hi, Would I be correct in thinking that Nutch, when indexing an html
> document, does not weight the different text nodes (h1, h2, anchor etc)
> differently - instead it just lumps together all text as one? (this is
> the impression I get from looking at
> org.apache.nutch.parse.html.HtmlParser)
>
> Rgs,
> Joel
>
>

 « Return to Thread: Index weightings of different types of text node...h1, h2 anchor etc..