[jira] Created: (SOLR-1537) Dedupe Sharded Search Results by Shard Order or Score

View: New views
7 Messages — Rating Filter:   Alert me  

[jira] Created: (SOLR-1537) Dedupe Sharded Search Results by Shard Order or Score

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dedupe Sharded Search Results by Shard Order or Score
-----------------------------------------------------

                 Key: SOLR-1537
                 URL: https://issues.apache.org/jira/browse/SOLR-1537
             Project: Solr
          Issue Type: Improvement
          Components: search
    Affects Versions: 1.4, 1.5
         Environment: All
            Reporter: Dennis Kubes
             Fix For: 1.4, 1.5


Allows sharded search results to dedupe results by ID based on either the order of the shards in the shards param or by score.  Allows the result returned to be deterministic.  If by shards then shards that appear first in the shards param have a higher precedence than shards that appear later.  If by score then higher scores beat out lower scores.  This doesn't allow multiple duplicates because currently SOLR only permits a single result by ID to be returned.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1537) Dedupe Sharded Search Results by Shard Order or Score

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/SOLR-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Kubes updated SOLR-1537:
-------------------------------

    Attachment: solr-dedupe-20091031.patch

Basic patch.  No unit tests.  Gives dedupe functionality for shards based on either shard order in the shard param or by score.

> Dedupe Sharded Search Results by Shard Order or Score
> -----------------------------------------------------
>
>                 Key: SOLR-1537
>                 URL: https://issues.apache.org/jira/browse/SOLR-1537
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.4, 1.5
>         Environment: All
>            Reporter: Dennis Kubes
>             Fix For: 1.4, 1.5
>
>         Attachments: solr-dedupe-20091031.patch
>
>
> Allows sharded search results to dedupe results by ID based on either the order of the shards in the shards param or by score.  Allows the result returned to be deterministic.  If by shards then shards that appear first in the shards param have a higher precedence than shards that appear later.  If by score then higher scores beat out lower scores.  This doesn't allow multiple duplicates because currently SOLR only permits a single result by ID to be returned.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1537) Dedupe Sharded Search Results by Shard Order or Score

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/SOLR-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Kubes updated SOLR-1537:
-------------------------------

    Attachment: solr-dedupe-20091031-2.patch

Updated patch.  Had to replace the use of the TreeSet for on the fly document queuing with a two pass HashSet and Java 5 PriorityQueue.  This was to allow comparably equal documents (i.e. documents with the same score).

> Dedupe Sharded Search Results by Shard Order or Score
> -----------------------------------------------------
>
>                 Key: SOLR-1537
>                 URL: https://issues.apache.org/jira/browse/SOLR-1537
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.4, 1.5
>         Environment: All
>            Reporter: Dennis Kubes
>             Fix For: 1.4, 1.5
>
>         Attachments: solr-dedupe-20091031-2.patch, solr-dedupe-20091031.patch
>
>
> Allows sharded search results to dedupe results by ID based on either the order of the shards in the shards param or by score.  Allows the result returned to be deterministic.  If by shards then shards that appear first in the shards param have a higher precedence than shards that appear later.  If by score then higher scores beat out lower scores.  This doesn't allow multiple duplicates because currently SOLR only permits a single result by ID to be returned.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1537) Dedupe Sharded Search Results by Shard Order or Score

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/SOLR-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar updated SOLR-1537:
----------------------------------------

    Fix Version/s:     (was: 1.4)

> Dedupe Sharded Search Results by Shard Order or Score
> -----------------------------------------------------
>
>                 Key: SOLR-1537
>                 URL: https://issues.apache.org/jira/browse/SOLR-1537
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.4, 1.5
>         Environment: All
>            Reporter: Dennis Kubes
>             Fix For: 1.5
>
>         Attachments: solr-dedupe-20091031-2.patch, solr-dedupe-20091031.patch
>
>
> Allows sharded search results to dedupe results by ID based on either the order of the shards in the shards param or by score.  Allows the result returned to be deterministic.  If by shards then shards that appear first in the shards param have a higher precedence than shards that appear later.  If by score then higher scores beat out lower scores.  This doesn't allow multiple duplicates because currently SOLR only permits a single result by ID to be returned.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1537) Dedupe Sharded Search Results by Shard Order or Score

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/SOLR-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774053#action_12774053 ]

Otis Gospodnetic commented on SOLR-1537:
----------------------------------------

The "ID" here being the uniqueKey?  i.e. the use case is the removal of dupes when the same document is indexed in multiple shards and more than 1 shard return that document in the result set?


> Dedupe Sharded Search Results by Shard Order or Score
> -----------------------------------------------------
>
>                 Key: SOLR-1537
>                 URL: https://issues.apache.org/jira/browse/SOLR-1537
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.4, 1.5
>         Environment: All
>            Reporter: Dennis Kubes
>             Fix For: 1.5
>
>         Attachments: solr-dedupe-20091031-2.patch, solr-dedupe-20091031.patch
>
>
> Allows sharded search results to dedupe results by ID based on either the order of the shards in the shards param or by score.  Allows the result returned to be deterministic.  If by shards then shards that appear first in the shards param have a higher precedence than shards that appear later.  If by score then higher scores beat out lower scores.  This doesn't allow multiple duplicates because currently SOLR only permits a single result by ID to be returned.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1537) Dedupe Sharded Search Results by Shard Order or Score

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/SOLR-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774122#action_12774122 ]

Dennis Kubes commented on SOLR-1537:
------------------------------------

That is correct.  Dupes is when more than one shard returns a values for the same uniqueKey.  Removal of dupes is by uniqueKey deterministically by either order of shards or by highest score.  Before there was no way to determine which dupe would show up because it was based on whichever shard returned first from the query broadcast to multiple shards.  In other words the fastest responding shard would give the first uniqueKey value and the rest with that uniqueKey would be ignored.  Fastest though could change between query requests.

> Dedupe Sharded Search Results by Shard Order or Score
> -----------------------------------------------------
>
>                 Key: SOLR-1537
>                 URL: https://issues.apache.org/jira/browse/SOLR-1537
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.4, 1.5
>         Environment: All
>            Reporter: Dennis Kubes
>             Fix For: 1.5
>
>         Attachments: solr-dedupe-20091031-2.patch, solr-dedupe-20091031.patch
>
>
> Allows sharded search results to dedupe results by ID based on either the order of the shards in the shards param or by score.  Allows the result returned to be deterministic.  If by shards then shards that appear first in the shards param have a higher precedence than shards that appear later.  If by score then higher scores beat out lower scores.  This doesn't allow multiple duplicates because currently SOLR only permits a single result by ID to be returned.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1537) Dedupe Sharded Search Results by Shard Order or Score

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/SOLR-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Kubes updated SOLR-1537:
-------------------------------

    Attachment: solr-dedupe-20091106-3.patch

Fixes small issue with numFound count being double.

> Dedupe Sharded Search Results by Shard Order or Score
> -----------------------------------------------------
>
>                 Key: SOLR-1537
>                 URL: https://issues.apache.org/jira/browse/SOLR-1537
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.4, 1.5
>         Environment: All
>            Reporter: Dennis Kubes
>             Fix For: 1.5
>
>         Attachments: solr-dedupe-20091031-2.patch, solr-dedupe-20091031.patch, solr-dedupe-20091106-3.patch
>
>
> Allows sharded search results to dedupe results by ID based on either the order of the shards in the shards param or by score.  Allows the result returned to be deterministic.  If by shards then shards that appear first in the shards param have a higher precedence than shards that appear later.  If by score then higher scores beat out lower scores.  This doesn't allow multiple duplicates because currently SOLR only permits a single result by ID to be returned.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.