WG: CmsSearchResult (or Lucene) in OpenCms 7.5.1 delivers wrong scores ?!

View: New views
9 Messages — Rating Filter:   Alert me  

WG: CmsSearchResult (or Lucene) in OpenCms 7.5.1 delivers wrong scores ?!

by ] Code Create [ Bernd Wolfsegger :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi out there, nobody no idea about this issue ? :-)

-----Ursprüngliche Nachricht-----
Von: opencms-dev-bounces@... [mailto:opencms-dev-bounces@...] Im
Auftrag von ] Code Create [ Bernd Wolfsegger
Gesendet: Samstag, 17. Oktober 2009 15:21
An: opencms-dev@...
Betreff: [opencms-dev] CmsSearchResult (or Lucene) in OpenCms 7.5.1 delivers
wrong scores ?!

Hi Alkacon,

in OpenCms 7.5 and prior versions the max Value shown for a score
(CmsSearchResult.getScore()) was 100 (%).
Running the same search with OpenCms 7.5.1 on my site now delivers a value of
475.

Something must have changed about the use of the score or the calculation of the
score has a bug now.

I also discovered another strange behavior:
I searched for the term "OpenCms" on my site.
I configured the search to have 5 results per page and show 6 pagelinks.
>From the first to the third resultpage search.getSearchResultCount() is "32"
form the fourth to the seventh page it is "31".


Kind regards, Bernd




_______________________________________________
This mail is sent to you from the opencms-dev mailing list
To change your list options, or to unsubscribe from the list, please visit
http://lists.opencms.org/mailman/listinfo/opencms-dev


_______________________________________________
This mail is sent to you from the opencms-dev mailing list
To change your list options, or to unsubscribe from the list, please visit
http://lists.opencms.org/mailman/listinfo/opencms-dev

Re: WG: CmsSearchResult (or Lucene) in OpenCms 7.5.1 delivers wrong scores ?!

by Gregor Schneider :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Bernd,

we're running OpenCMS 7.5.1 here and are boserving exactly the same
behaviour.

Cheers

Gregor
- --
just because your paranoid, doesn't mean they're not after you...
gpgp-fp: 79A84FA526807026795E4209D3B3FE028B3170B2
gpgp-key available
@ http://pgpkeys.pca.dfn.de:11371
@ http://pgp.mit.edu:11371/
skype:rc46fi

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (MingW32)
Comment: Use GnuPG with Firefox : http://getfiregpg.org (Version: 0.7.9)

iD8DBQFK5sAm07P+AosxcLIRAuEmAJ42UJJib7xQBRTzBy3hUE9b/i5NvACfTDxi
e2oTjlcbgRga9SPNsAiUjPY=
=qL4l
-----END PGP SIGNATURE-----

_______________________________________________
This mail is sent to you from the opencms-dev mailing list
To change your list options, or to unsubscribe from the list, please visit
http://lists.opencms.org/mailman/listinfo/opencms-dev

Re: WG: CmsSearchResult (or Lucene) in OpenCms 7.5.1 delivers wrong scores ?!

by Florian Hopf-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Bernd,

] Code Create [ Bernd Wolfsegger wrote:
> I also discovered another strange behavior:
> I searched for the term "OpenCms" on my site.
> I configured the search to have 5 results per page and show 6 pagelinks.
>>From the first to the third resultpage search.getSearchResultCount() is "32"
> form the fourth to the seventh page it is "31".
>

I guess this is caused by the fact that the pagination is calculated
from all results found but for every page the displayed results are
filtered so that only those are displayed that the current user is
allowed to see.

So if you have a pagination of 5 configured and you find 21 documents
the pagination will display 5 pages when you are on the first page. But
if on the second page there is a document you do not have access to the
pagination is recalculated and there are only 4 pages now.

Regarding the scoring I don't really know but I think I read or heard
something about changes in the normalization process for recent lucene
versions but I am not sure.

hope to help at least a little bit

Regards
Flo


_______________________________________________
This mail is sent to you from the opencms-dev mailing list
To change your list options, or to unsubscribe from the list, please visit
http://lists.opencms.org/mailman/listinfo/opencms-dev

Re: WG: CmsSearchResult (or Lucene) in OpenCms 7.5.1 delivers wrong scores ?!

by Gregor Schneider :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Flo,

On Tue, Oct 27, 2009 at 11:03 AM, Florian Hopf <hopf@...> wrote:
>
> I guess this is caused by the fact that the pagination is calculated
> from all results found but for every page the displayed results are
> filtered so that only those are displayed that the current user is
> allowed to see.
>

hm, is it? how about the same issue happens when you're logged in with
an admin-role and as such supposed to be allowed to see everything?

puzzled...

gregor
- --
just because your paranoid, doesn't mean they're not after you...
gpgp-fp: 79A84FA526807026795E4209D3B3FE028B3170B2
gpgp-key available
@ http://pgpkeys.pca.dfn.de:11371
@ http://pgp.mit.edu:11371/
skype:rc46fi

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (MingW32)
Comment: Use GnuPG with Firefox : http://getfiregpg.org (Version: 0.7.9)

iD4DBQFK5s7w07P+AosxcLIRAjhpAJY7OWhbXdUCoMhW5mslShZ9/x58AKCTGU6F
FvOld56aiR8oDvqxrY2g7Q==
=3GUO
-----END PGP SIGNATURE-----

_______________________________________________
This mail is sent to you from the opencms-dev mailing list
To change your list options, or to unsubscribe from the list, please visit
http://lists.opencms.org/mailman/listinfo/opencms-dev

Re: WG: CmsSearchResult (or Lucene) in OpenCms 7.5.1 delivers wrong scores ?!

by Florian Hopf-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

Gregor Schneider wrote:

> Flo,
>
> On Tue, Oct 27, 2009 at 11:03 AM, Florian Hopf <hopf@...> wrote:
>> I guess this is caused by the fact that the pagination is calculated
>> from all results found but for every page the displayed results are
>> filtered so that only those are displayed that the current user is
>> allowed to see.
>
>
> hm, is it? how about the same issue happens when you're logged in with
> an admin-role and as such supposed to be allowed to see everything?
>
> puzzled...

Hmmm ... not sure about that. Probably expired or not yet released
resource?

I just had a look at the code, it's in CmsSearchIndex#search(). There's
a check if ((isInTimeRange(doc, params)) &&
(hasReadPermission(searchCms, doc))) that filters the hit list.

Maybe you can try to set the page size to 1000 just for experimenting.
This should result in a more correct amount.

But of course I could also be wrong ;)

Regards
Flo


_______________________________________________
This mail is sent to you from the opencms-dev mailing list
To change your list options, or to unsubscribe from the list, please visit
http://lists.opencms.org/mailman/listinfo/opencms-dev

Re: WG: CmsSearchResult (or Lucene) in OpenCms 7.5.1 delivers wrong scores ?!

by Gregor Schneider :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

well, we get a value of 1742%. the search-string was a word not
contained in any of the opencms-default-pages,

we setup kust 10 pages for a test, and we had the above result. fro
mtose 10 pages only 1 has not been published.

actually, i don't have time to debug / change the OpenCMS-sources, but
given the facts above, my best guess is that the possible reason you
reckoned for that behavior might not really strike home...

cheers

gregor
- --
just because your paranoid, doesn't mean they're not after you...
gpgp-fp: 79A84FA526807026795E4209D3B3FE028B3170B2
gpgp-key available
@ http://pgpkeys.pca.dfn.de:11371
@ http://pgp.mit.edu:11371/
skype:rc46fi

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (MingW32)
Comment: Use GnuPG with Firefox : http://getfiregpg.org (Version: 0.7.9)

iD8DBQFK5uJf07P+AosxcLIRAi1JAJ4ndmrKSleBKsNAMV0LWcmRZgSnVQCfXVMV
hgNYqr/hfsOPzecdW9hgsIQ=
=QTJh
-----END PGP SIGNATURE-----

_______________________________________________
This mail is sent to you from the opencms-dev mailing list
To change your list options, or to unsubscribe from the list, please visit
http://lists.opencms.org/mailman/listinfo/opencms-dev

Re: WG: CmsSearchResult (or Lucene) in OpenCms 7.5.1 delivers wrong scores ?!

by Florian Hopf-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Gregor,

Gregor Schneider wrote:

> well, we get a value of 1742%. the search-string was a word not
> contained in any of the opencms-default-pages,
>
> we setup kust 10 pages for a test, and we had the above result. fro
> mtose 10 pages only 1 has not been published.
>
> actually, i don't have time to debug / change the OpenCMS-sources, but
> given the facts above, my best guess is that the possible reason you
> reckoned for that behavior might not really strike home...
>

sorry, I think there has been a misunderstanding. I have been talking
about the search result count but not the scoring.

Regards
Flo



_______________________________________________
This mail is sent to you from the opencms-dev mailing list
To change your list options, or to unsubscribe from the list, please visit
http://lists.opencms.org/mailman/listinfo/opencms-dev

Re: WG: CmsSearchResult (or Lucene) in OpenCms 7.5.1 delivers wrong scores ?!

by ] Code Create [ Bernd Wolfsegger :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Gregor,

well I don't think that this getScore() value is meant as a percentage value at
all.
It was meant as a percentage value up to OpenCms 7.5 but it changed in OpenCms
7.5.1.
I think this got something to do with the new Lucene version in this OpenCms
release and that changed meaning in Lucene was not "ported" into the OpenCms
Search implementation.

For the time beeing I have a workaround now by dividing the "score" of each
result element by the max "score" of the very first result element using a
simple caching for the first "score".
So at least I have score values that look more like percentage values ;-)

I think my score value for a specific result entry now displays the relation of
the overall found search terms against the amount of all found search terms up
to that specific result entry.

For a proper solution you would have to calculate with a wheighing mechanism
etc. though.

As for the incongruent number of calculated search results I think this can't
have anything to do with user rights or published or not etc. since it happens
whithin the same search while paging through the results.

Kind regards, Bernd

-----Ursprüngliche Nachricht-----
Von: opencms-dev-bounces@... [mailto:opencms-dev-bounces@...] Im
Auftrag von Gregor Schneider
Gesendet: Dienstag, 27. Oktober 2009 13:07
An: The OpenCms mailing list
Betreff: Re: [opencms-dev] WG: CmsSearchResult (or Lucene) in OpenCms 7.5.1
delivers wrong scores ?!

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

well, we get a value of 1742%. the search-string was a word not
contained in any of the opencms-default-pages,

we setup kust 10 pages for a test, and we had the above result. fro
mtose 10 pages only 1 has not been published.

actually, i don't have time to debug / change the OpenCMS-sources, but
given the facts above, my best guess is that the possible reason you
reckoned for that behavior might not really strike home...

cheers

gregor
- --
just because your paranoid, doesn't mean they're not after you...
gpgp-fp: 79A84FA526807026795E4209D3B3FE028B3170B2
gpgp-key available
@ http://pgpkeys.pca.dfn.de:11371
@ http://pgp.mit.edu:11371/
skype:rc46fi

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (MingW32)
Comment: Use GnuPG with Firefox : http://getfiregpg.org (Version: 0.7.9)

iD8DBQFK5uJf07P+AosxcLIRAi1JAJ4ndmrKSleBKsNAMV0LWcmRZgSnVQCfXVMV
hgNYqr/hfsOPzecdW9hgsIQ=
=QTJh
-----END PGP SIGNATURE-----

_______________________________________________
This mail is sent to you from the opencms-dev mailing list
To change your list options, or to unsubscribe from the list, please visit
http://lists.opencms.org/mailman/listinfo/opencms-dev


_______________________________________________
This mail is sent to you from the opencms-dev mailing list
To change your list options, or to unsubscribe from the list, please visit
http://lists.opencms.org/mailman/listinfo/opencms-dev

Re: WG: CmsSearchResult (or Lucene) in OpenCms 7.5.1 delivers wrong scores ?!

by Gregor Schneider :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

hi bernd,

On Tue, Oct 27, 2009 at 1:40 PM, ] Code Create [ Bernd Wolfsegger
<bw@...> wrote:
> Hi Gregor,
>
> well I don't think that this getScore() value is meant as a
percentage value at
> all.
> It was meant as a percentage value up to OpenCms 7.5 but it changed
in OpenCms
> 7.5.1.
> I think this got something to do with the new Lucene version in this
OpenCms
> release and that changed meaning in Lucene was not "ported" into the
OpenCms
> Search implementation.
>

i c - that makes sens. so let's wait for another patch of OpenCMS then
;)

cheers

Gregor
- -
just because your paranoid, doesn't mean they're not after you...
gpgp-fp: 79A84FA526807026795E4209D3B3FE028B3170B2
gpgp-key available
@ http://pgpkeys.pca.dfn.de:11371
@ http://pgp.mit.edu:11371/
skype:rc46fi

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (MingW32)
Comment: Use GnuPG with Firefox : http://getfiregpg.org (Version: 0.7.9)

iD8DBQFK5vHL07P+AosxcLIRAnjUAKCEVunE1tMGuX0+aHg9r6Opx7tsXgCbB5NG
nXzI8r5HiSLqGxbkK5efkTQ=
=ssUy
-----END PGP SIGNATURE-----

_______________________________________________
This mail is sent to you from the opencms-dev mailing list
To change your list options, or to unsubscribe from the list, please visit
http://lists.opencms.org/mailman/listinfo/opencms-dev