Scan problem.

View: New views
7 Messages — Rating Filter:   Alert me  

Scan problem.

by Henri-Damien :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,
I am exploring a bit further scanning feature.
It is really great and the fact that we can use @attr 8 with resultset
number would be of great value to display facets in my opinion.
But I am facing some problems :
 ZOOM::Resultsets and ZOOM::Scansets are linked to connexions.
If I want to use scan_pqf() with @attr 8= number, to limit facet to the
previous resultset, I have to get or to guess resultset Number.
But how can I since resultsets donot have any id property,
and users are likely to refine or do multiple searchs ?

One solution I see would be to create and destroy connections right
before and right after each search. Would kohagang agree on that ?
That way, ResultSet number would always be 1.
But it seems to me that getting resultset number as a property of
ZOOM::ResultSets OR making ScanSets depend not only on connections but
also on ResultSets could be a solution and could be interesting.
ResultSets seems there in Zebra, so there should be a solution to play
with them via zoom.

Another question is :
     number [default: 10]
           Indicates how many terms should be returned in the ScanSet.
The number actually returned may be less, if the start-point is near the
end of the index, but will not be greater.
Is there really no way to get More results ? What If I wanted only the
10 most relevant results but not the 10 first ? Would there be a solution ?

My last question would be : What if I want to get all the distinct
values stored for authors. Can I get them via a scan ?
something like scan_pqf("@attr 1=1 @attr 8=1 @attr 6=3") which would
return all authors, complete subfields, for resultset 1, assuming names
and surnames would be in the same subfield. (But would work for
Callnumbers, simple subjects, branches and Publihser names). Of course,
we could get some by entering a one-letter word a b c d.... But if we
could avoid...

Does Anyone has answers ?
Is anyone interested ?
What do Koha gang think about that ?

--
Henri Damien LAURENT et Paul POULAIN
Consultants indépendants
en logiciels libres et bibliothéconomie (http://www.koha-fr.org)



_______________________________________________
Koha-zebra mailing list
Koha-zebra@...
http://lists.nongnu.org/mailman/listinfo/koha-zebra

Scan problem.

by Mike Taylor-10 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Henri-Damien LAURENT writes:
 > Hi,
 > I am exploring a bit further scanning feature.
 > It is really great and the fact that we can use @attr 8 with resultset
 > number would be of great value to display facets in my opinion.
 > But I am facing some problems :
 >  ZOOM::Resultsets and ZOOM::Scansets are linked to connexions.
 > If I want to use scan_pqf() with @attr 8= number, to limit facet to the
 > previous resultset, I have to get or to guess resultset Number.
 > But how can I since resultsets donot have any id property,
 > and users are likely to refine or do multiple searchs ?

You can retrieve the result-set ID using:
        $rs->option("resultSetId");

(Make sure you have an up-to-date YAZ for this to work.)

 > One solution I see would be to create and destroy connections right
 > before and right after each search. Would kohagang agree on that ?

I don't see that this is either necessary or sufficient.

 > That way, ResultSet number would always be 1.

Not necessarily.

 > But it seems to me that getting resultset number as a property of
 > ZOOM::ResultSets OR making ScanSets depend not only on connections but
 > also on ResultSets could be a solution and could be interesting.

Changing the model to make ScanSet dependent on ResultSet would have
dramatic consequences and would violate the ZOOM Abstract API.  But I
guess that you don't need this if you can fetch resultSetId.

 > Another question is :
 >      number [default: 10]
 >            Indicates how many terms should be returned in the ScanSet.
 > The number actually returned may be less, if the start-point is near the
 > end of the index, but will not be greater.
 > Is there really no way to get More results ?

Well, sure: set "number" to a higher number.

 > What If I wanted only the 10 most relevant results but not the 10
 > first ? Would there be a solution ?

No, there is no relevance support in scan: it is a very literal-minded
browse of the index.

 > My last question would be : What if I want to get all the distinct
 > values stored for authors. Can I get them via a scan ?
 > something like scan_pqf("@attr 1=1 @attr 8=1 @attr 6=3") which would
 > return all authors, complete subfields, for resultset 1, assuming names
 > and surnames would be in the same subfield.

That looks about right.  Adam can comment on this from a more informed
perspective than I can.

 > (But would work for Callnumbers, simple subjects, branches and
 > Publihser names). Of course, we could get some by entering a
 > one-letter word a b c d.... But if we could avoid...

Sorry, I don't understand that question.

 _/|_ ___________________________________________________________________
/o ) \/  Mike Taylor    <mike@...>    http://www.miketaylor.org.uk
)_v__/\  ... but Doctor, surely the odds against that happening are
         astronomical!



_______________________________________________
Koha-zebra mailing list
Koha-zebra@...
http://lists.nongnu.org/mailman/listinfo/koha-zebra

Re: Scan problem.

by Henri-Damien :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Mike Taylor a écrit :

> Henri-Damien LAURENT writes:
>  > Hi,
>  > I am exploring a bit further scanning feature.
>  > It is really great and the fact that we can use @attr 8 with resultset
>  > number would be of great value to display facets in my opinion.
>  > But I am facing some problems :
>  >  ZOOM::Resultsets and ZOOM::Scansets are linked to connexions.
>  > If I want to use scan_pqf() with @attr 8= number, to limit facet to the
>  > previous resultset, I have to get or to guess resultset Number.
>  > But how can I since resultsets donot have any id property,
>  > and users are likely to refine or do multiple searchs ?
>
> You can retrieve the result-set ID using:
> $rs->option("resultSetId");
>
> (Make sure you have an up-to-date YAZ for this to work.)
>
>  > One solution I see would be to create and destroy connections right
>  > before and right after each search. Would kohagang agree on that ?
>
> I don't see that this is either necessary or sufficient.
>
>  > That way, ResultSet number would always be 1.
>
> Not necessarily.
>
>  > But it seems to me that getting resultset number as a property of
>  > ZOOM::ResultSets OR making ScanSets depend not only on connections but
>  > also on ResultSets could be a solution and could be interesting.
>
> Changing the model to make ScanSet dependent on ResultSet would have
> dramatic consequences and would violate the ZOOM Abstract API.  But I
> guess that you don't need this if you can fetch resultSetId.
>
>  > Another question is :
>  >      number [default: 10]
>  >            Indicates how many terms should be returned in the ScanSet.
>  > The number actually returned may be less, if the start-point is near the
>  > end of the index, but will not be greater.
>  > Is there really no way to get More results ?
>
> Well, sure: set "number" to a higher number.
>
>  > What If I wanted only the 10 most relevant results but not the 10
>  > first ? Would there be a solution ?
>
> No, there is no relevance support in scan: it is a very literal-minded
> browse of the index.
>
>
>  
>  > (But would work for Callnumbers, simple subjects, branches and
>  > Publihser names). Of course, we could get some by entering a
>  > one-letter word a b c d.... But if we could avoid...
>
> Sorry, I don't understand that question.
>  
Hi
thanks for your answer.
resulsetId is really great.
the latest question I found a workaround :
  my $scan= $conn->scan_pqf('@attr 1=21 @attr 6=3 @attr 5=102 @attr 8=1
"[A-z0-9]"');
That way, I can get all the publishers.

From perldoc ZOOM :
 number [default: 10]
           Indicates how many terms should be returned in the ScanSet.
The number actually returned may be less, if the start-point is near the
end of the index, but will not be greater.
I tried to set number to 100 this way
  $conn->option(preferredRecordSyntax => "usmarc",number=>100);

And got only 10 results.
Am I doing wrong ?

Is there a way to order "scanned" term on hit count ?
>  > My last question would be : What if I want to get all the distinct
>  > values stored for authors. Can I get them via a scan ?
>  > something like scan_pqf("@attr 1=1 @attr 8=1 @attr 6=3") which would
>  > return all authors, complete subfields, for resultset 1, assuming names
>  > and surnames would be in the same subfield.
>
> That looks about right.  Adam can comment on this from a more informed
> perspective than I can.
>  
This query was wrong because no search parameter. I corrected it.

--
Henri Damien LAURENT et Paul POULAIN
Consultants indépendants
en logiciels libres et bibliothéconomie (http://www.koha-fr.org)



_______________________________________________
Koha-zebra mailing list
Koha-zebra@...
http://lists.nongnu.org/mailman/listinfo/koha-zebra

Re: Scan problem.

by Mike Taylor-10 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Henri-Damien LAURENT writes:
 > the latest question I found a workaround :
 >   my $scan= $conn->scan_pqf('@attr 1=21 @attr 6=3 @attr 5=102 @attr 8=1
 > "[A-z0-9]"');
 > That way, I can get all the publishers.

Hmm.  But if you start your scan from the just "a" that will yield the
same start-point.  Remember that the term in a scan "query" is not
searched for, just used as a start-point within the list of all
terms.  If you want to start from the very beginning, you should
probably use the empty search-term "", which sorts to the start.

 > >From perldoc ZOOM :
 >  number [default: 10]
 >            Indicates how many terms should be returned in the ScanSet.
 > The number actually returned may be less, if the start-point is near the
 > end of the index, but will not be greater.
 > I tried to set number to 100 this way
 >   $conn->option(preferredRecordSyntax => "usmarc",number=>100);
 >
 > And got only 10 results.
 > Am I doing wrong ?

For some reason, you can't set multiple options in a single call like
this: the second and subsequent are ignored.  Use:
   $conn->option(preferredRecordSyntax => "usmarc");
   $conn->option(number=>100);

 > Is there a way to order "scanned" term on hit count ?

Not that I know of.  If there is a way, it's Zebra-specific, and Adam
will be the one who knows about it.

 > >  > My last question would be : What if I want to get all the distinct
 > >  > values stored for authors. Can I get them via a scan ?
 > >  > something like scan_pqf("@attr 1=1 @attr 8=1 @attr 6=3") which would
 > >  > return all authors, complete subfields, for resultset 1, assuming names
 > >  > and surnames would be in the same subfield.
 > >
 > > That looks about right.  Adam can comment on this from a more informed
 > > perspective than I can.
 > >  
 > This query was wrong because no search parameter. I corrected it.

And is it working?

 _/|_ ___________________________________________________________________
/o ) \/  Mike Taylor    <mike@...>    http://www.miketaylor.org.uk
)_v__/\  "Historically, Taunton is part of Minehead already" -- Monty
         Python's Flying Circus.



_______________________________________________
Koha-zebra mailing list
Koha-zebra@...
http://lists.nongnu.org/mailman/listinfo/koha-zebra

Re: Scan problem.

by Henri-Damien :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Mike Taylor a écrit :

> Henri-Damien LAURENT writes:
>  > the latest question I found a workaround :
>  >   my $scan= $conn->scan_pqf('@attr 1=21 @attr 6=3 @attr 5=102 @attr 8=1
>  > "[A-z0-9]"');
>  > That way, I can get all the publishers.
>
> Hmm.  But if you start your scan from the just "a" that will yield the
> same start-point.  Remember that the term in a scan "query" is not
> searched for, just used as a start-point within the list of all
> terms.  If you want to start from the very beginning, you should
> probably use the empty search-term "", which sorts to the start.
>  
>  > This query was wrong because no search parameter. I corrected it.
>
> And is it working?
>  
empty search term is not working.
What works best is "0".

>  > >From perldoc ZOOM :
>  >  number [default: 10]
>  >            Indicates how many terms should be returned in the ScanSet.
>  > The number actually returned may be less, if the start-point is near the
>  > end of the index, but will not be greater.
>  > I tried to set number to 100 this way
>  >   $conn->option(preferredRecordSyntax => "usmarc",number=>100);
>  >
>  > And got only 10 results.
>  > Am I doing wrong ?
>
> For some reason, you can't set multiple options in a single call like
> this: the second and subsequent are ignored.  Use:
>    $conn->option(preferredRecordSyntax => "usmarc");
>    $conn->option(number=>100);
>  
Thanks for this very valuable information.
It works now.
>  > Is there a way to order "scanned" term on hit count ?
>
> Not that I know of.  If there is a way, it's Zebra-specific, and Adam
> will be the one who knows about it.
>  
Let us wait his advice on that then.

--
Henri Damien LAURENT et Paul POULAIN
Consultants indépendants
en logiciels libres et bibliothéconomie (http://www.koha-fr.org)



_______________________________________________
Koha-zebra mailing list
Koha-zebra@...
http://lists.nongnu.org/mailman/listinfo/koha-zebra

Re: Scan problem.

by Mike Taylor-10 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Henri-Damien LAURENT writes:
 > >  > the latest question I found a workaround :
 > >  >   my $scan= $conn->scan_pqf('@attr 1=21 @attr 6=3 @attr 5=102 @attr 8=1
 > >  > "[A-z0-9]"');
 > >  > That way, I can get all the publishers.
 > >
 > > Hmm.  But if you start your scan from the just "a" that will yield the
 > > same start-point.  Remember that the term in a scan "query" is not
 > > searched for, just used as a start-point within the list of all
 > > terms.  If you want to start from the very beginning, you should
 > > probably use the empty search-term "", which sorts to the start.
 > >  
 > >  > This query was wrong because no search parameter. I corrected it.
 > >
 > > And is it working?
 > >  
 > empty search term is not working.
 > What works best is "0".

Strange that "" doesn't work.  What is the diagnostic?

Terms beginning with character that precedes "0" in ASCII order will
not be found using "0", of course.  I suggest trying " " (space),
which IIRC is the lowest-numbered printing character in the ASCII set.

 > > For some reason, you can't set multiple options in a single call like
 > > this: the second and subsequent are ignored.  Use:
 > >    $conn->option(preferredRecordSyntax => "usmarc");
 > >    $conn->option(number=>100);
 > >  
 > Thanks for this very valuable information.
 > It works now.

Excellent.

 _/|_ ___________________________________________________________________
/o ) \/  Mike Taylor    <mike@...>    http://www.miketaylor.org.uk
)_v__/\  "Troodontids are almost certainly deinonychosaurs.  I was wrong
         about troodontids in 1994, but don't care" -- Thomas R. Holtz, Jr.



_______________________________________________
Koha-zebra mailing list
Koha-zebra@...
http://lists.nongnu.org/mailman/listinfo/koha-zebra

Re: Scan problem.

by Adam Dickmeiss :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Henri-Damien LAURENT wrote:

> Mike Taylor a écrit :
>> Henri-Damien LAURENT writes:
>>  > the latest question I found a workaround :
>>  >   my $scan= $conn->scan_pqf('@attr 1=21 @attr 6=3 @attr 5=102 @attr 8=1
>>  > "[A-z0-9]"');
>>  > That way, I can get all the publishers.
>>
>> Hmm.  But if you start your scan from the just "a" that will yield the
>> same start-point.  Remember that the term in a scan "query" is not
>> searched for, just used as a start-point within the list of all
>> terms.  If you want to start from the very beginning, you should
>> probably use the empty search-term "", which sorts to the start.
>>  
>>  > This query was wrong because no search parameter. I corrected it.
>>
>> And is it working?
>>  
> empty search term is not working.
> What works best is "0".
>
>>  > >From perldoc ZOOM :
>>  >  number [default: 10]
>>  >            Indicates how many terms should be returned in the ScanSet.
>>  > The number actually returned may be less, if the start-point is near the
>>  > end of the index, but will not be greater.
>>  > I tried to set number to 100 this way
>>  >   $conn->option(preferredRecordSyntax => "usmarc",number=>100);
>>  >
>>  > And got only 10 results.
>>  > Am I doing wrong ?
>>
>> For some reason, you can't set multiple options in a single call like
>> this: the second and subsequent are ignored.  Use:
>>    $conn->option(preferredRecordSyntax => "usmarc");
>>    $conn->option(number=>100);
>>  
> Thanks for this very valuable information.
> It works now.
>>  > Is there a way to order "scanned" term on hit count ?
>>
>> Not that I know of.  If there is a way, it's Zebra-specific, and Adam
>> will be the one who knows about it.
>>  
> Let us wait his advice on that then.
>

The order of scan terms are ordered in lex. order. Not by frequency. We
are, however, working on this . The solution will still be "scan" but an
attribute will signal "order by frequency". AKA faceted search.

/ Adam



_______________________________________________
Koha-zebra mailing list
Koha-zebra@...
http://lists.nongnu.org/mailman/listinfo/koha-zebra