|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
[jira] Created: (TIKA-250) XLS parser does not extract empty sheet namesXLS parser does not extract empty sheet names
--------------------------------------------- Key: TIKA-250 URL: https://issues.apache.org/jira/browse/TIKA-250 Project: Tika Issue Type: Bug Components: parser Affects Versions: 0.4 Reporter: Maxim Valyanskiy Priority: Minor ExcelExtractor misses sheet titles if sheet is empty. Fix it trivial, patch attached -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Updated: (TIKA-250) XLS parser does not extract empty sheet names[ https://issues.apache.org/jira/browse/TIKA-250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-250: ---------------------------------- Attachment: empty.patch > XLS parser does not extract empty sheet names > --------------------------------------------- > > Key: TIKA-250 > URL: https://issues.apache.org/jira/browse/TIKA-250 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 0.4 > Reporter: Maxim Valyanskiy > Priority: Minor > Attachments: empty.patch > > > ExcelExtractor misses sheet titles if sheet is empty. Fix it trivial, patch attached -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (TIKA-250) XLS parser does not extract empty sheet names[ https://issues.apache.org/jira/browse/TIKA-250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724114#action_12724114 ] Jukka Zitting commented on TIKA-250: ------------------------------------ The currentSheet.isEmpty() conditional was added explicitly to avoid outputting empty sheets. Most Excel files out there have the three default worksheets but in the majority of cases only the first sheet contains anything and it's cleaner if the empty extra sheets aren't included in the output. Are there real world cases where the name of an empty sheet is an important part of the extracted text content? I would assume that any essential sheets contain at least some content beside the sheet name. > XLS parser does not extract empty sheet names > --------------------------------------------- > > Key: TIKA-250 > URL: https://issues.apache.org/jira/browse/TIKA-250 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 0.4 > Reporter: Maxim Valyanskiy > Priority: Minor > Attachments: empty.patch > > > ExcelExtractor misses sheet titles if sheet is empty. Fix it trivial, patch attached -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (TIKA-250) XLS parser does not extract empty sheet names[ https://issues.apache.org/jira/browse/TIKA-250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725112#action_12725112 ] Maxim Valyanskiy commented on TIKA-250: --------------------------------------- Yes there are real cases where we really need to know names of the empty sheets. For example we faced the following issue. In the workbook each sheet represented a branch of the company, some sheets were empty just because information was not filled in yet. So when we extracted text from the files the names of some branches were missed. So later when we tried to search our database for these particular names we failed to find this information. > XLS parser does not extract empty sheet names > --------------------------------------------- > > Key: TIKA-250 > URL: https://issues.apache.org/jira/browse/TIKA-250 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 0.4 > Reporter: Maxim Valyanskiy > Priority: Minor > Attachments: empty.patch > > > ExcelExtractor misses sheet titles if sheet is empty. Fix it trivial, patch attached -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Resolved: (TIKA-250) XLS parser does not extract empty sheet names[ https://issues.apache.org/jira/browse/TIKA-250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-250. -------------------------------- Resolution: Fixed Fix Version/s: 0.5 Assignee: Jukka Zitting Fair enough, fix committed in revision 801432. Thanks for the patch and the rationale! > XLS parser does not extract empty sheet names > --------------------------------------------- > > Key: TIKA-250 > URL: https://issues.apache.org/jira/browse/TIKA-250 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 0.4 > Reporter: Maxim Valyanskiy > Assignee: Jukka Zitting > Priority: Minor > Fix For: 0.5 > > Attachments: empty.patch > > > ExcelExtractor misses sheet titles if sheet is empty. Fix it trivial, patch attached -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
| Free embeddable forum powered by Nabble | Forum Help |