Index weightings of different types of text node...h1, h2 anchor etc..

View: New views
2 Messages — Rating Filter:   Alert me  

Index weightings of different types of text node...h1, h2 anchor etc..

by JoelGrrrr :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi, Would I be correct in thinking that Nutch, when indexing an html
document, does not weight the different text nodes (h1, h2, anchor etc)
differently - instead it just lumps together all text as one? (this is
the impression I get from looking at
org.apache.nutch.parse.html.HtmlParser)

Rgs,
Joel


Re: Index weightings of different types of text node...h1, h2 anchor etc..

by Magnús Skúlason :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

yes that is correct, in order to do that you could modify the parser to
store the content of special tags into another field that you would give a
higher boost.

best regards,
Magnus

On Thu, Jul 9, 2009 at 3:30 PM, Joel Halbert <joel@...> wrote:

> Hi, Would I be correct in thinking that Nutch, when indexing an html
> document, does not weight the different text nodes (h1, h2, anchor etc)
> differently - instead it just lumps together all text as one? (this is
> the impression I get from looking at
> org.apache.nutch.parse.html.HtmlParser)
>
> Rgs,
> Joel
>
>