|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
[jira] Created: (TIKA-245) Support of CHM FormatSupport of CHM Format
--------------------- Key: TIKA-245 URL: https://issues.apache.org/jira/browse/TIKA-245 Project: Tika Issue Type: New Feature Components: parser Environment: All Reporter: Karl Heinz Marbaise Priority: Minor It might be a good idea to support the CHM File format of Windows. Some information about http://en.wikipedia.org/wiki/Microsoft_Compiled_HTML_Help#Extracting_to_HTML. The CHM format contains HTML files which can be parsed by Tika. So the "only" problem is to extract the data from the CHM file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (TIKA-245) Support of CHM Format[ https://issues.apache.org/jira/browse/TIKA-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724994#action_12724994 ] Jukka Zitting commented on TIKA-245: ------------------------------------ See http://www.russotto.net/chm/chmformat.html for a description of the CHM format. Quick browsing didn't reveal any Java-based parser libraries that we could use to parse CHM files. > Support of CHM Format > --------------------- > > Key: TIKA-245 > URL: https://issues.apache.org/jira/browse/TIKA-245 > Project: Tika > Issue Type: New Feature > Components: parser > Environment: All > Reporter: Karl Heinz Marbaise > Priority: Minor > > It might be a good idea to support the CHM File format of Windows. Some information about http://en.wikipedia.org/wiki/Microsoft_Compiled_HTML_Help#Extracting_to_HTML. The CHM format contains HTML files which can be parsed by Tika. So the "only" problem is to extract the data from the CHM file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (TIKA-245) Support of CHM Format[ https://issues.apache.org/jira/browse/TIKA-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763671#action_12763671 ] Luciano Leggieri commented on TIKA-245: --------------------------------------- Hi, I've started to use TIKA to parse some files I have and sadly several of them are CHM. Have you tried http://sourceforge.net/projects/jchm/ to see it if works? > Support of CHM Format > --------------------- > > Key: TIKA-245 > URL: https://issues.apache.org/jira/browse/TIKA-245 > Project: Tika > Issue Type: New Feature > Components: parser > Environment: All > Reporter: Karl Heinz Marbaise > Priority: Minor > > It might be a good idea to support the CHM File format of Windows. Some information about http://en.wikipedia.org/wiki/Microsoft_Compiled_HTML_Help#Extracting_to_HTML. The CHM format contains HTML files which can be parsed by Tika. So the "only" problem is to extract the data from the CHM file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (TIKA-245) Support of CHM Format[ https://issues.apache.org/jira/browse/TIKA-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763939#action_12763939 ] Jukka Zitting commented on TIKA-245: ------------------------------------ jchm looks promising, thanks for the pointer! Is anyone interested in implementing a Tika Parser warpper for jchm? As a starting point it would be nice if the jchm jar was made available on Maven central. > Support of CHM Format > --------------------- > > Key: TIKA-245 > URL: https://issues.apache.org/jira/browse/TIKA-245 > Project: Tika > Issue Type: New Feature > Components: parser > Environment: All > Reporter: Karl Heinz Marbaise > Priority: Minor > > It might be a good idea to support the CHM File format of Windows. Some information about http://en.wikipedia.org/wiki/Microsoft_Compiled_HTML_Help#Extracting_to_HTML. The CHM format contains HTML files which can be parsed by Tika. So the "only" problem is to extract the data from the CHM file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
| Free embeddable forum powered by Nabble | Forum Help |