|
View:
New views
3 Messages
—
Rating Filter:
Alert me
|
|
|
[jira] Created: (TIKA-244) Missing Header/Footer text for Word'97 documentsMissing Header/Footer text for Word'97 documents
------------------------------------------------ Key: TIKA-244 URL: https://issues.apache.org/jira/browse/TIKA-244 Project: Tika Issue Type: Bug Components: parser Affects Versions: 0.3 Reporter: Maxim Valyanskiy Attachments: tika-patch Tika output lacks header/footer text for Word'07 document. This patch fixes this problem: diff -u -r apache-tika-0.3/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java apache-tika-0.3-modified/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java --- apache-tika-0.3/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java 2009-02-14 03:07:51.000000000 +0300 +++ apache-tika-0.3-modified/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java 2009-06-09 13:24:56.000000000 +0400 @@ -75,9 +75,14 @@ } else if ("WordDocument".equals(name)) { setType(metadata, "application/msword"); WordExtractor extractor = new WordExtractor(filesystem); + + xhtml.element("p", extractor.getHeaderText()); + for (String paragraph : extractor.getParagraphText()) { xhtml.element("p", paragraph); } + + xhtml.element("p", extractor.getFooterText()); } else if ("PowerPoint Document".equals(name)) { setType(metadata, "application/vnd.ms-powerpoint"); PowerPointExtractor extractor = -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Updated: (TIKA-244) Missing Header/Footer text for Word'97 documents[ https://issues.apache.org/jira/browse/TIKA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-244: ---------------------------------- Attachment: tika-patch > Missing Header/Footer text for Word'97 documents > ------------------------------------------------ > > Key: TIKA-244 > URL: https://issues.apache.org/jira/browse/TIKA-244 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 0.3 > Reporter: Maxim Valyanskiy > Attachments: tika-patch > > > Tika output lacks header/footer text for Word'07 document. This patch fixes this problem: > diff -u -r apache-tika-0.3/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java apache-tika-0.3-modified/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java > --- apache-tika-0.3/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java 2009-02-14 03:07:51.000000000 +0300 > +++ apache-tika-0.3-modified/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java 2009-06-09 13:24:56.000000000 +0400 > @@ -75,9 +75,14 @@ > } else if ("WordDocument".equals(name)) { > setType(metadata, "application/msword"); > WordExtractor extractor = new WordExtractor(filesystem); > + > + xhtml.element("p", extractor.getHeaderText()); > + > for (String paragraph : extractor.getParagraphText()) { > xhtml.element("p", paragraph); > } > + > + xhtml.element("p", extractor.getFooterText()); > } else if ("PowerPoint Document".equals(name)) { > setType(metadata, "application/vnd.ms-powerpoint"); > PowerPointExtractor extractor = -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Resolved: (TIKA-244) Missing Header/Footer text for Word'97 documents[ https://issues.apache.org/jira/browse/TIKA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-244. -------------------------------- Resolution: Fixed Fix Version/s: 0.4 Assignee: Jukka Zitting Thanks! Patch applied in revision 788595. I added <div class="header"/> and <div class="footer"/> wrappers around the header and footer texts, and modified the code to only output those sections when the header or footer are non-empty. > Missing Header/Footer text for Word'97 documents > ------------------------------------------------ > > Key: TIKA-244 > URL: https://issues.apache.org/jira/browse/TIKA-244 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 0.3 > Reporter: Maxim Valyanskiy > Assignee: Jukka Zitting > Fix For: 0.4 > > Attachments: tika-patch > > > Tika output lacks header/footer text for Word'07 document. This patch fixes this problem: > diff -u -r apache-tika-0.3/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java apache-tika-0.3-modified/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java > --- apache-tika-0.3/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java 2009-02-14 03:07:51.000000000 +0300 > +++ apache-tika-0.3-modified/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java 2009-06-09 13:24:56.000000000 +0400 > @@ -75,9 +75,14 @@ > } else if ("WordDocument".equals(name)) { > setType(metadata, "application/msword"); > WordExtractor extractor = new WordExtractor(filesystem); > + > + xhtml.element("p", extractor.getHeaderText()); > + > for (String paragraph : extractor.getParagraphText()) { > xhtml.element("p", paragraph); > } > + > + xhtml.element("p", extractor.getFooterText()); > } else if ("PowerPoint Document".equals(name)) { > setType(metadata, "application/vnd.ms-powerpoint"); > PowerPointExtractor extractor = -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
| Free embeddable forum powered by Nabble | Forum Help |