|
View:
New views
7 Messages
—
Rating Filter:
Alert me
|
|
|
[jira] Created: (SOLR-1539) XPathEntityProcessor timeout when stream=trueXPathEntityProcessor timeout when stream=true
--------------------------------------------- Key: SOLR-1539 URL: https://issues.apache.org/jira/browse/SOLR-1539 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.4 Reporter: Chris Eldredge Attachments: SOLR-1539.patch When setting stream=true on XPathEntityProcessor a separate thread is created to read whatever Reader is being used for rows while the original thread pumps a BlockingQueue. This design allows the Reader to be read even when DIH cannot process documents as quickly as they become available in the Reader. This design has questionable value. It adds complexity to the code with unclear benefits to the user. At any rate, the code incorrectly uses the BlockingQueue API: 1. Arbitrarily sets a 10 second timeout and fails when this timeout elapses before a row becomes available. 2. Fails to check the return code when calling offer() to see if the item was successfully added or if the queue is full. 3. Fails to stop consuming the Reader even after an import has failed or been aborted. The effect is that if a URL being processed pauses more than 10 seconds to think in between streaming rows, the XPathEntityProcessor fails. Setting the readTimeout and connectionTimeout attributes on the dataSource does not address this bug because XPathEntityProcessor imposes its own timeout, hard-coded to 10 seconds. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Updated: (SOLR-1539) XPathEntityProcessor timeout when stream=true[ https://issues.apache.org/jira/browse/SOLR-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Eldredge updated SOLR-1539: --------------------------------- Attachment: SOLR-1539.patch Patch against r831980 including test cases. > XPathEntityProcessor timeout when stream=true > --------------------------------------------- > > Key: SOLR-1539 > URL: https://issues.apache.org/jira/browse/SOLR-1539 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler > Affects Versions: 1.4 > Reporter: Chris Eldredge > Attachments: SOLR-1539.patch > > > When setting stream=true on XPathEntityProcessor a separate thread is created to read whatever Reader is being used for rows while the original thread pumps a BlockingQueue. This design allows the Reader to be read even when DIH cannot process documents as quickly as they become available in the Reader. > This design has questionable value. It adds complexity to the code with unclear benefits to the user. > At any rate, the code incorrectly uses the BlockingQueue API: > 1. Arbitrarily sets a 10 second timeout and fails when this timeout elapses before a row becomes available. > 2. Fails to check the return code when calling offer() to see if the item was successfully added or if the queue is full. > 3. Fails to stop consuming the Reader even after an import has failed or been aborted. > The effect is that if a URL being processed pauses more than 10 seconds to think in between streaming rows, the XPathEntityProcessor fails. Setting the readTimeout and connectionTimeout attributes on the dataSource does not address this bug because XPathEntityProcessor imposes its own timeout, hard-coded to 10 seconds. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (SOLR-1539) XPathEntityProcessor timeout when stream=true[ https://issues.apache.org/jira/browse/SOLR-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772863#action_12772863 ] Noble Paul commented on SOLR-1539: ---------------------------------- so you wish the queue timeout to be configurable? or to make it longer? > XPathEntityProcessor timeout when stream=true > --------------------------------------------- > > Key: SOLR-1539 > URL: https://issues.apache.org/jira/browse/SOLR-1539 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler > Affects Versions: 1.4 > Reporter: Chris Eldredge > Assignee: Noble Paul > Attachments: SOLR-1539.patch > > > When setting stream=true on XPathEntityProcessor a separate thread is created to read whatever Reader is being used for rows while the original thread pumps a BlockingQueue. This design allows the Reader to be read even when DIH cannot process documents as quickly as they become available in the Reader. > This design has questionable value. It adds complexity to the code with unclear benefits to the user. > At any rate, the code incorrectly uses the BlockingQueue API: > 1. Arbitrarily sets a 10 second timeout and fails when this timeout elapses before a row becomes available. > 2. Fails to check the return code when calling offer() to see if the item was successfully added or if the queue is full. > 3. Fails to stop consuming the Reader even after an import has failed or been aborted. > The effect is that if a URL being processed pauses more than 10 seconds to think in between streaming rows, the XPathEntityProcessor fails. Setting the readTimeout and connectionTimeout attributes on the dataSource does not address this bug because XPathEntityProcessor imposes its own timeout, hard-coded to 10 seconds. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Assigned: (SOLR-1539) XPathEntityProcessor timeout when stream=true[ https://issues.apache.org/jira/browse/SOLR-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul reassigned SOLR-1539: -------------------------------- Assignee: Noble Paul > XPathEntityProcessor timeout when stream=true > --------------------------------------------- > > Key: SOLR-1539 > URL: https://issues.apache.org/jira/browse/SOLR-1539 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler > Affects Versions: 1.4 > Reporter: Chris Eldredge > Assignee: Noble Paul > Attachments: SOLR-1539.patch > > > When setting stream=true on XPathEntityProcessor a separate thread is created to read whatever Reader is being used for rows while the original thread pumps a BlockingQueue. This design allows the Reader to be read even when DIH cannot process documents as quickly as they become available in the Reader. > This design has questionable value. It adds complexity to the code with unclear benefits to the user. > At any rate, the code incorrectly uses the BlockingQueue API: > 1. Arbitrarily sets a 10 second timeout and fails when this timeout elapses before a row becomes available. > 2. Fails to check the return code when calling offer() to see if the item was successfully added or if the queue is full. > 3. Fails to stop consuming the Reader even after an import has failed or been aborted. > The effect is that if a URL being processed pauses more than 10 seconds to think in between streaming rows, the XPathEntityProcessor fails. Setting the readTimeout and connectionTimeout attributes on the dataSource does not address this bug because XPathEntityProcessor imposes its own timeout, hard-coded to 10 seconds. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (SOLR-1539) XPathEntityProcessor timeout when stream=true[ https://issues.apache.org/jira/browse/SOLR-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772872#action_12772872 ] Lance Norskog commented on SOLR-1539: ------------------------------------- Why does it need a separate thread? > XPathEntityProcessor timeout when stream=true > --------------------------------------------- > > Key: SOLR-1539 > URL: https://issues.apache.org/jira/browse/SOLR-1539 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler > Affects Versions: 1.4 > Reporter: Chris Eldredge > Assignee: Noble Paul > Attachments: SOLR-1539.patch > > > When setting stream=true on XPathEntityProcessor a separate thread is created to read whatever Reader is being used for rows while the original thread pumps a BlockingQueue. This design allows the Reader to be read even when DIH cannot process documents as quickly as they become available in the Reader. > This design has questionable value. It adds complexity to the code with unclear benefits to the user. > At any rate, the code incorrectly uses the BlockingQueue API: > 1. Arbitrarily sets a 10 second timeout and fails when this timeout elapses before a row becomes available. > 2. Fails to check the return code when calling offer() to see if the item was successfully added or if the queue is full. > 3. Fails to stop consuming the Reader even after an import has failed or been aborted. > The effect is that if a URL being processed pauses more than 10 seconds to think in between streaming rows, the XPathEntityProcessor fails. Setting the readTimeout and connectionTimeout attributes on the dataSource does not address this bug because XPathEntityProcessor imposes its own timeout, hard-coded to 10 seconds. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (SOLR-1539) XPathEntityProcessor timeout when stream=true[ https://issues.apache.org/jira/browse/SOLR-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773080#action_12773080 ] Chris Eldredge commented on SOLR-1539: -------------------------------------- In reply to Noble Paul, the timeout in this code is likely an unintended side-effect of incorrectly using the BlockingQueue. This code should not have any timeout at all. My patch (attached) corrects the code so there will be no timeout in this component. In replay to Lance Norskog, I'm not sure a separate thread provides any advantage. In theory it allows the data stream to be consumed at a different rate than documents can be processed but once the queue limit is reached any advantage goes away. The extra thread can probably be removed but I was trying to fix the bugs I found in the least invasive way. > XPathEntityProcessor timeout when stream=true > --------------------------------------------- > > Key: SOLR-1539 > URL: https://issues.apache.org/jira/browse/SOLR-1539 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler > Affects Versions: 1.4 > Reporter: Chris Eldredge > Assignee: Noble Paul > Attachments: SOLR-1539.patch > > > When setting stream=true on XPathEntityProcessor a separate thread is created to read whatever Reader is being used for rows while the original thread pumps a BlockingQueue. This design allows the Reader to be read even when DIH cannot process documents as quickly as they become available in the Reader. > This design has questionable value. It adds complexity to the code with unclear benefits to the user. > At any rate, the code incorrectly uses the BlockingQueue API: > 1. Arbitrarily sets a 10 second timeout and fails when this timeout elapses before a row becomes available. > 2. Fails to check the return code when calling offer() to see if the item was successfully added or if the queue is full. > 3. Fails to stop consuming the Reader even after an import has failed or been aborted. > The effect is that if a URL being processed pauses more than 10 seconds to think in between streaming rows, the XPathEntityProcessor fails. Setting the readTimeout and connectionTimeout attributes on the dataSource does not address this bug because XPathEntityProcessor imposes its own timeout, hard-coded to 10 seconds. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Resolved: (SOLR-1539) XPathEntityProcessor timeout when stream=true[ https://issues.apache.org/jira/browse/SOLR-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul resolved SOLR-1539. ------------------------------ Resolution: Fixed committed r882852 > XPathEntityProcessor timeout when stream=true > --------------------------------------------- > > Key: SOLR-1539 > URL: https://issues.apache.org/jira/browse/SOLR-1539 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler > Affects Versions: 1.4 > Reporter: Chris Eldredge > Assignee: Noble Paul > Attachments: SOLR-1539.patch > > > When setting stream=true on XPathEntityProcessor a separate thread is created to read whatever Reader is being used for rows while the original thread pumps a BlockingQueue. This design allows the Reader to be read even when DIH cannot process documents as quickly as they become available in the Reader. > This design has questionable value. It adds complexity to the code with unclear benefits to the user. > At any rate, the code incorrectly uses the BlockingQueue API: > 1. Arbitrarily sets a 10 second timeout and fails when this timeout elapses before a row becomes available. > 2. Fails to check the return code when calling offer() to see if the item was successfully added or if the queue is full. > 3. Fails to stop consuming the Reader even after an import has failed or been aborted. > The effect is that if a URL being processed pauses more than 10 seconds to think in between streaming rows, the XPathEntityProcessor fails. Setting the readTimeout and connectionTimeout attributes on the dataSource does not address this bug because XPathEntityProcessor imposes its own timeout, hard-coded to 10 seconds. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
| Free embeddable forum powered by Nabble | Forum Help |