synonym payload boosting

View: New views
3 Messages — Rating Filter:   Alert me  

synonym payload boosting

by Bugzilla from davidginzburg@gmail.com :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,
I have a field and a wighted synonym map.
I have indexed the synonyms with the weight as payload.
my code snippet from my filter

*public Token next(final Token reusableToken) throws IOException *
*        . *
*        . *
*        .*
       * Payload boostPayload;*
*
*
*        for (Synonym synonym : syns) {*
*            *
*            Token newTok = new Token(nToken.startOffset(),
nToken.endOffset(), "SYNONYM");*
*            newTok.setTermBuffer(synonym.getToken().toCharArray(), 0,
synonym.getToken().length());*
*            // set the position increment to zero*
*            // this tells lucene the synonym is*
*            // in the exact same location as the originating word*
*            newTok.setPositionIncrement(0);*
*            boostPayload = new
Payload(PayloadHelper.encodeFloat(synonym.getWieght()));*
*            newTok.setPayload(boostPayload);*
*
*
I have put it in the index time analyzer : this is my field definition:

*
<fieldType name="PersonName" class="solr.TextField"
positionIncrementGap="100" >
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="com.digitaltrowel.solr.DTSynonymFactory"
FreskoFunction="names_with_scoresPipe23Columns.txt" ignoreCase="true"
expand="false"/>

        <!--<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>-->
        <!--<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>-->
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <!--<filter class="com.digitaltrowel.solr.DTSynonymFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="false"/>-->
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <!--<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>-->
        <!--<filter class="solr.RemoveDuplicatesTokenFilterFactory"/    >-->
      </analyzer>
    </fieldType>


my similarity class is
public class BoostingSymilarity extends DefaultSimilarity {


    public BoostingSymilarity(){
        super();

  }
    @Override
    public  float scorePayload(String field, byte [] payload, int offset,
int length)
{
 double weight = PayloadHelper.decodeFloat(payload, 0);
return (float)weight;
 }

@Override public float coord(int overlap, int maxoverlap)
 {
return 1.0f;
}

@Override public float idf(int docFreq, int numDocs)
{
 return 1.0f;
}

@Override public float lengthNorm(String fieldName, int numTerms)
 {
return 1.0f;
}

@Override public float tf(float freq)
{
 return 1.0f;
}
}

My problem is that scorePayload method does not get called at search time
like the other methods in  my similarity class.
I tested and verified it with break points.
What am I doing wrong?
I used solr 1.3 and thinking of the payload boos support in solr 1.4.


*

Re: synonym payload boosting

by Simon Willnauer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

You might get an answer on the solr list. This is the lucene users list.

Simon

On Nov 8, 2009 2:24 PM, "David Ginzburg" <davidginzburg@...> wrote:

Hi,
I have a field and a wighted synonym map.
I have indexed the synonyms with the weight as payload.
my code snippet from my filter

*public Token next(final Token reusableToken) throws IOException *
*        . *
*        . *
*        .*
      * Payload boostPayload;*
*
*
*        for (Synonym synonym : syns) {*
*            *
*            Token newTok = new Token(nToken.startOffset(),
nToken.endOffset(), "SYNONYM");*
*            newTok.setTermBuffer(synonym.getToken().toCharArray(), 0,
synonym.getToken().length());*
*            // set the position increment to zero*
*            // this tells lucene the synonym is*
*            // in the exact same location as the originating word*
*            newTok.setPositionIncrement(0);*
*            boostPayload = new
Payload(PayloadHelper.encodeFloat(synonym.getWieght()));*
*            newTok.setPayload(boostPayload);*
*
*
I have put it in the index time analyzer : this is my field definition:

*
<fieldType name="PersonName" class="solr.TextField"
positionIncrementGap="100" >
     <analyzer type="index">
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="com.digitaltrowel.solr.DTSynonymFactory"
FreskoFunction="names_with_scoresPipe23Columns.txt" ignoreCase="true"
expand="false"/>

       <!--<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>-->
       <!--<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>-->
     </analyzer>
     <analyzer type="query">
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <!--<filter class="com.digitaltrowel.solr.DTSynonymFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="false"/>-->
       <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
       <!--<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>-->
       <!--<filter class="solr.RemoveDuplicatesTokenFilterFactory"/    >-->
     </analyzer>
   </fieldType>


my similarity class is
public class BoostingSymilarity extends DefaultSimilarity {


   public BoostingSymilarity(){
       super();

 }
   @Override
   public  float scorePayload(String field, byte [] payload, int offset,
int length)
{
 double weight = PayloadHelper.decodeFloat(payload, 0);
return (float)weight;
 }

@Override public float coord(int overlap, int maxoverlap)
 {
return 1.0f;
}

@Override public float idf(int docFreq, int numDocs)
{
 return 1.0f;
}

@Override public float lengthNorm(String fieldName, int numTerms)
 {
return 1.0f;
}

@Override public float tf(float freq)
{
 return 1.0f;
}
}

My problem is that scorePayload method does not get called at search time
like the other methods in  my similarity class.
I tested and verified it with break points.
What am I doing wrong?
I used solr 1.3 and thinking of the payload boos support in solr 1.4.


*

Re: synonym payload boosting

by Ahmet Arslan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Additionaly you need to modify your queryparser to return BoostingTermQuery, PayloadTermQuery, PayloadNearQuery etc.

With these types of Queries scorePayload method invoked.

Hope this helps.

--- On Sun, 11/8/09, David Ginzburg <davidginzburg@...> wrote:

> From: David Ginzburg <davidginzburg@...>
> Subject: synonym payload boosting
> To: java-user@...
> Date: Sunday, November 8, 2009, 3:23 PM
> Hi,
> I have a field and a wighted synonym map.
> I have indexed the synonyms with the weight as payload.
> my code snippet from my filter
>
> *public Token next(final Token reusableToken) throws
> IOException *
> *        . *
> *        . *
> *        .*
>        * Payload boostPayload;*
> *
> *
> *        for (Synonym synonym : syns)
> {*
> *            *
> *            Token newTok =
> new Token(nToken.startOffset(),
> nToken.endOffset(), "SYNONYM");*
> *           
> newTok.setTermBuffer(synonym.getToken().toCharArray(), 0,
> synonym.getToken().length());*
> *            // set the
> position increment to zero*
> *            // this tells
> lucene the synonym is*
> *            // in the exact
> same location as the originating word*
> *           
> newTok.setPositionIncrement(0);*
> *            boostPayload =
> new
> Payload(PayloadHelper.encodeFloat(synonym.getWieght()));*
> *           
> newTok.setPayload(boostPayload);*
> *
> *
> I have put it in the index time analyzer : this is my field
> definition:
>
> *
> <fieldType name="PersonName" class="solr.TextField"
> positionIncrementGap="100" >
>       <analyzer type="index">
>         <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>         <filter
> class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>         <filter
> class="solr.LowerCaseFilterFactory"/>
>         <filter
> class="com.digitaltrowel.solr.DTSynonymFactory"
> FreskoFunction="names_with_scoresPipe23Columns.txt"
> ignoreCase="true"
> expand="false"/>
>
>         <!--<filter
> class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>-->
>         <!--<filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>-->
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>         <filter
> class="solr.LowerCaseFilterFactory"/>
>         <!--<filter
> class="com.digitaltrowel.solr.DTSynonymFactory"
> synonyms="synonyms.txt" ignoreCase="true"
> expand="false"/>-->
>         <filter
> class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>         <!--<filter
> class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>-->
>         <!--<filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/ 
>   >-->
>       </analyzer>
>     </fieldType>
>
>
> my similarity class is
> public class BoostingSymilarity extends DefaultSimilarity
> {
>
>
>     public BoostingSymilarity(){
>         super();
>
>   }
>     @Override
>     public  float scorePayload(String field,
> byte [] payload, int offset,
> int length)
> {
>  double weight = PayloadHelper.decodeFloat(payload, 0);
> return (float)weight;
>  }
>
> @Override public float coord(int overlap, int maxoverlap)
>  {
> return 1.0f;
> }
>
> @Override public float idf(int docFreq, int numDocs)
> {
>  return 1.0f;
> }
>
> @Override public float lengthNorm(String fieldName, int
> numTerms)
>  {
> return 1.0f;
> }
>
> @Override public float tf(float freq)
> {
>  return 1.0f;
> }
> }
>
> My problem is that scorePayload method does not get called
> at search time
> like the other methods in  my similarity class.
> I tested and verified it with break points.
> What am I doing wrong?
> I used solr 1.3 and thinking of the payload boos support in
> solr 1.4.
>
>
> *
>




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@...
For additional commands, e-mail: java-user-help@...