<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
	<id>tag:old.nabble.com,2006:forum-20913</id>
	<title>Nabble - Apache Tika - Development</title>
	<updated>2009-11-09T13:28:32Z</updated>
	<link rel="self" type="application/atom+xml" href="http://old.nabble.com/Apache-Tika---Development-f20913.xml" />
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Apache-Tika---Development-f20913.html" />
	<subtitle type="html">&lt;a href=&quot;http://lucene.apache.org/tika/&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Apache Tika&lt;/a&gt; is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.</subtitle>
	
<entry>
	<id>tag:old.nabble.com,2006:post-26274023</id>
	<title>[jira] Commented: (TIKA-94) Speech recognition</title>
	<published>2009-11-09T13:28:32Z</published>
	<updated>2009-11-09T13:28:32Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; [ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-94?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12775155#action_12775155&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-94?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12775155#action_12775155&lt;/a&gt;&amp;nbsp;] 
&lt;br&gt;&lt;br&gt;David Woollard commented on TIKA-94:
&lt;br&gt;------------------------------------
&lt;br&gt;&lt;br&gt;I've used Sphinx for a couple of projects, to great success, but its probably not what you are looking for here... fundamentally, there a two problems with general speech recognition. The first is training. Most speech recognition systems are trained in some way. Sphinx is &amp;quot;pre-trained&amp;quot; with a number of different english-language accents (mostly American), or you can retrain it yourself. The second problem is a grammar. In order to recognize a good collection of words in the english language, you need to provide a very large grammar (60,000ish words in the english language comes out to be a 200Mb+ file... probably many orders of magnitude larger than you would want in a dependency).
&lt;br&gt;&lt;br&gt;Long and short of it is that sphinx is great out of the box if you speak with a Mid-western American accent (as I happen to) &amp;nbsp;and you are detailing with a command and control situation where you can produce a small, tailored grammar.
&lt;br&gt;&lt;br&gt;This is something that could be done, but i would advocate that it be left outside the core parsers.
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Speech recognition
&lt;br&gt;&amp;gt; ------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-94
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-94&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-94&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: New Feature
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: parser
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Jukka Zitting
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Priority: Minor
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; Like OCR for image files (TIKA-93), we could try using speech recognition to extract text content (where available) from audio (and video!) files.
&lt;br&gt;&amp;gt; The CMU Sphinx engine (&lt;a href=&quot;http://cmusphinx.sourceforge.net/&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://cmusphinx.sourceforge.net/&lt;/a&gt;) looks promising and comes with a friendly license.
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Commented%3A-%28TIKA-94%29-Speech-recognition-tp26274023p26274023.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26247581</id>
	<title>[jira] Commented: (TIKA-318) Upgrade nekohtml dependency from 1.9.9 to 1.9.13</title>
	<published>2009-11-07T10:59:32Z</published>
	<updated>2009-11-07T10:59:32Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; [ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12774658#action_12774658&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12774658#action_12774658&lt;/a&gt;&amp;nbsp;] 
&lt;br&gt;&lt;br&gt;Benson Margulies commented on TIKA-318:
&lt;br&gt;---------------------------------------
&lt;br&gt;&lt;br&gt;Jukka took out nekkohtml in favor of TagSoup. You could close this.
&lt;br&gt;&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Upgrade nekohtml dependency from 1.9.9 to 1.9.13
&lt;br&gt;&amp;gt; ------------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-318
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-318&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-318&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Task
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: packaging
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;Affects Versions: 0.4
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Attila Király
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Priority: Minor
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; It seems tika is still using an old version of nekohtml. It could be upgraded to 1.9.13.
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-318%29-Upgrade-nekohtml-dependency-from-1.9.9-to-1.9.13-tp26246862p26247581.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26247143</id>
	<title>[jira] Updated: (TIKA-316) Parsing Visio diagrams with tika-app causes TikaException (Found a chunk with a negative length)</title>
	<published>2009-11-07T10:03:32Z</published>
	<updated>2009-11-07T10:03:32Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Chris A. Mattmann updated TIKA-316:
&lt;br&gt;-----------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; Component/s: cli
&lt;br&gt;&lt;br&gt;- set fix component
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Parsing Visio diagrams with tika-app causes TikaException (Found a chunk with a negative length)
&lt;br&gt;&amp;gt; ------------------------------------------------------------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-316
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-316&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-316&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Bug
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: cli
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;Affects Versions: 0.4, 0.5
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Environment: Windows Server 2003 SP2, JRE 1.6.0_16, tika-app, Visio 2003
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Mike Hays
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Attachments: repro-TIKA-316.vsd
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; tika-app (0.4 and 0.5 nightly) return the following when attempting to parse a Visio 2003 file (other versions may be affected):
&lt;br&gt;&amp;gt; Exception in thread &amp;quot;main&amp;quot; org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@145e044
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:123)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:103)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:176)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:63)
&lt;br&gt;&amp;gt; Caused by: java.lang.IllegalArgumentException: Found a chunk with a negative length, which isn't allowed
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.poi.hdgf.chunks.ChunkFactory.createChunk(ChunkFactory.java:120)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.poi.hdgf.streams.ChunkStream.findChunks(ChunkStream.java:59)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:93)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.poi.hdgf.HDGFDiagram.&amp;lt;init&amp;gt;(HDGFDiagram.java:95)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.poi.hdgf.extractor.VisioTextExtractor.&amp;lt;init&amp;gt;(VisioTextExtractor.java:52)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.poi.hdgf.extractor.VisioTextExtractor.&amp;lt;init&amp;gt;(VisioTextExtractor.java:49)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:118)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ... 3 more
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-316%29-Parsing-Visio-diagrams-with-tika-app-causes-TikaException-%28Found-a-chunk-with-a-negative-length%29-tp26186623p26247143.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26247144</id>
	<title>[jira] Updated: (TIKA-318) Upgrade nekohtml dependency from 1.9.9 to 1.9.13</title>
	<published>2009-11-07T10:03:32Z</published>
	<updated>2009-11-07T10:03:32Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Chris A. Mattmann updated TIKA-318:
&lt;br&gt;-----------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; Component/s: packaging
&lt;br&gt;&lt;br&gt;- set fix component
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Upgrade nekohtml dependency from 1.9.9 to 1.9.13
&lt;br&gt;&amp;gt; ------------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-318
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-318&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-318&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Task
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: packaging
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;Affects Versions: 0.4
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Attila Király
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Priority: Minor
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; It seems tika is still using an old version of nekohtml. It could be upgraded to 1.9.13.
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-318%29-Upgrade-nekohtml-dependency-from-1.9.9-to-1.9.13-tp26246862p26247144.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26247146</id>
	<title>[jira] Updated: (TIKA-298) CompositeParser.getParser() should use mimetype hierarchy when falling back</title>
	<published>2009-11-07T10:03:32Z</published>
	<updated>2009-11-07T10:03:32Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Chris A. Mattmann updated TIKA-298:
&lt;br&gt;-----------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; Component/s: parser
&lt;br&gt;&lt;br&gt;- set fix component
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; CompositeParser.getParser() should use mimetype hierarchy when falling back
&lt;br&gt;&amp;gt; ---------------------------------------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-298
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-298&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-298&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Improvement
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: parser
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;Affects Versions: 0.4
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Ken Krugler
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; CompositeParser.getParser() doesn't use supertypes when falling back - if it can't get a parser for the exact mimetype, then it goes
&lt;br&gt;&amp;gt; straight to the fallback parser.
&lt;br&gt;&amp;gt; So, for example, if the file mimetype is application/&amp;lt;whatever&amp;gt;+xml, and no parser exists for it, then you get the default &amp;quot;do nothing&amp;quot; parser versus the XML parser.
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-298%29-CompositeParser.getParser%28%29-should-use-mimetype-hierarchy-when-falling-back-tp25680883p26247146.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26247147</id>
	<title>[jira] Updated: (TIKA-315) Tika appears to skip over an entire section of a Microsoft Word Document</title>
	<published>2009-11-07T10:03:32Z</published>
	<updated>2009-11-07T10:03:32Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Chris A. Mattmann updated TIKA-315:
&lt;br&gt;-----------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; Component/s: parser
&lt;br&gt;&lt;br&gt;- set fix component
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Tika appears to skip over an entire section of a Microsoft Word Document
&lt;br&gt;&amp;gt; ------------------------------------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-315
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-315&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-315&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Bug
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: parser
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;Affects Versions: 0.4
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Environment: Microsoft Windows Vista 32 bit; Apache tika 0.4 release; used the -gui and the command line -t option.
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Sanjeev Rao
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Attachments: Biolink07_fromTika0.4.txt
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; I saved this MS word file &lt;a href=&quot;http://mingo.info-science.uiowa.edu/padmini/Papers/Biolink07.doc&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://mingo.info-science.uiowa.edu/padmini/Papers/Biolink07.doc&lt;/a&gt;&amp;nbsp;to my 32 bit Vista desktop. Then I tried both the command line -t option (output attached) and the GUI option. In either case, a large section of the word got lost.. From the bottom, you can scroll up and look for reference #4 which exists but references 1 - 3 don't.. You will also find that a page or so of the original document is missing! It is not clear what about this document caused this behavior. 
&lt;br&gt;&amp;gt; I am trying to use tika to convert content from the web.
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-315%29-Tika-appears-to-skip-over-an-entire-section-of-a-Microsoft-Word-Document-tp26062742p26247147.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26246862</id>
	<title>[jira] Created: (TIKA-318) Upgrade nekohtml dependency from 1.9.9 to 1.9.13</title>
	<published>2009-11-07T09:31:32Z</published>
	<updated>2009-11-07T09:31:32Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">Upgrade nekohtml dependency from 1.9.9 to 1.9.13
&lt;br&gt;------------------------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Key: TIKA-318
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-318&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-318&lt;/a&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Project: Tika
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Issue Type: Task
&lt;br&gt;&amp;nbsp; &amp;nbsp; Affects Versions: 0.4
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Reporter: Attila Király
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Priority: Minor
&lt;br&gt;&lt;br&gt;&lt;br&gt;It seems tika is still using an old version of nekohtml. It could be upgraded to 1.9.13.
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-318%29-Upgrade-nekohtml-dependency-from-1.9.9-to-1.9.13-tp26246862p26246862.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26242143</id>
	<title>[jira] Updated: (TIKA-317) Annotation-based Tika configuration</title>
	<published>2009-11-06T20:59:32Z</published>
	<updated>2009-11-06T20:59:32Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Jukka Zitting updated TIKA-317:
&lt;br&gt;-------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; Fix Version/s: &amp;nbsp; &amp;nbsp; (was: 0.5)
&lt;br&gt;&lt;br&gt;Postponing to after 0.5
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Annotation-based Tika configuration
&lt;br&gt;&amp;gt; -----------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-317
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-317&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-317&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Improvement
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: parser
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Jukka Zitting
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Assignee: Jukka Zitting
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Priority: Minor
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; I'd like to simplify Tika configuration and make it easier to customize by pushing the information in tika-config.xml to Parser annotations and Java SPI service files.
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-317%29-Annotation-based-Tika-configuration-tp26241095p26242143.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26242059</id>
	<title>[jira] Resolved: (TIKA-209) Language detection is weak.</title>
	<published>2009-11-06T20:35:41Z</published>
	<updated>2009-11-06T20:35:41Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Jukka Zitting resolved TIKA-209.
&lt;br&gt;--------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; Resolution: Fixed
&lt;br&gt;&lt;br&gt;I have refactored and simplified the language identifier code to better meet the needs of Tika. Most notably I fixed the ngram length to three characters to reduce the size of the language profile files and to make the ngram classes simpler.
&lt;br&gt;&lt;br&gt;AutoDetectParser now automatically attempts to detect the document language and sets the Metadata.LANGUAGE property if a reasonably certain language profile match is found.
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Language detection is weak.
&lt;br&gt;&amp;gt; ---------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-209
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-209&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-209&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Bug
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: languageidentifier
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;Affects Versions: 0.3
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Robert Newson
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Fix For: 0.5
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; in org.apache.tika.utils.Utils the getUTF8Reader method assigns a language determination without checking the confidence rating from ICU's CharsetDetector.
&lt;br&gt;&amp;gt; Please add a configurable level (0-100);
&lt;br&gt;&amp;gt; if (language != null &amp;&amp; match.getConfidence() &amp;gt; THRESHOLD) {
&lt;br&gt;&amp;gt; &amp;nbsp; metadata.set(Metadata.CONTENT_LANGUAGE, match.getLanguage());
&lt;br&gt;&amp;gt; &amp;nbsp; metadata.set(Metadata.LANGUAGE, match.getLanguage());
&lt;br&gt;&amp;gt; }
&lt;br&gt;&amp;gt; Obviously using charset to imply language is generally weak but it would be sufficient if the confidence threshold was controlled. Today, the text &amp;quot;hello&amp;quot; is tagged as French, for example. 
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-209%29-Language-detection-is-weak.-tp22658271p26242059.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26242043</id>
	<title>[jira] Resolved: (TIKA-275) Parse context</title>
	<published>2009-11-06T20:31:43Z</published>
	<updated>2009-11-06T20:31:43Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Jukka Zitting resolved TIKA-275.
&lt;br&gt;--------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; Resolution: Fixed
&lt;br&gt;&lt;br&gt;This is now done. I'll add the following change log entry as soon as svn.apache.org is back online:
&lt;br&gt;&lt;br&gt;&amp;nbsp;* A new parse context argument was added to the Parser.parse() method.
&lt;br&gt;&amp;nbsp; &amp;nbsp;This context map can be used to pass things like a delegate parser or
&lt;br&gt;&amp;nbsp; &amp;nbsp;other settings to the parsing process. The previous parse() method
&lt;br&gt;&amp;nbsp; &amp;nbsp;signature has been deprecated and will be removed in Tika 1.0. (TIKA-275)
&lt;br&gt;&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Parse context
&lt;br&gt;&amp;gt; -------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-275
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-275&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-275&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: New Feature
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: parser
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Jukka Zitting
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Assignee: Jukka Zitting
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Fix For: 0.5
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; As discussed on dev@ [1], I'd like to add an extra parse context argument to the Parser.parse() method.
&lt;br&gt;&amp;gt; [1] &lt;a href=&quot;http://markmail.org/message/yfgngwhbm6ra42ty&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://markmail.org/message/yfgngwhbm6ra42ty&lt;/a&gt;&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-275%29-Parse-context-tp25406559p26242043.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26242032</id>
	<title>[jira] Commented: (TIKA-309) Mime type application/rdf+xml not correctly detected</title>
	<published>2009-11-06T20:27:41Z</published>
	<updated>2009-11-06T20:27:41Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; [ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12774551#action_12774551&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12774551#action_12774551&lt;/a&gt;&amp;nbsp;] 
&lt;br&gt;&lt;br&gt;Chris A. Mattmann commented on TIKA-309:
&lt;br&gt;----------------------------------------
&lt;br&gt;&lt;br&gt;Hey Guys, I think we just need another line in the tika-mimetypes.xml file for this. I'll take a crack at it, if there are no objections. Thanks!
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Mime type application/rdf+xml not correctly detected
&lt;br&gt;&amp;gt; ----------------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-309
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-309&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-309&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Bug
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: mime
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;Affects Versions: 0.5
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Yuan-Fang Li
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Assignee: Chris A. Mattmann
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Priority: Minor
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Fix For: 0.5
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; Mime type detector using AutoDetectParser and Metadata returns &amp;quot;application/xml&amp;quot; for the URL &lt;a href=&quot;http://www.w3.org/2002/07/owl#&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.w3.org/2002/07/owl#&lt;/a&gt;, where it should be &amp;quot;application/rdf+xml&amp;quot;. The correct mime type is also suggested here: &lt;a href=&quot;http://www.w3.org/TR/owl-ref/#MIMEType&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.w3.org/TR/owl-ref/#MIMEType&lt;/a&gt;.
&lt;br&gt;&amp;gt; P.S., Tika was downloaded from svn and built with Maven last week.
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-309%29-Mime-type-application-rdf%2Bxml-not-correctly-detected-tp25867121p26242032.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26242033</id>
	<title>[jira] Assigned: (TIKA-309) Mime type application/rdf+xml not correctly detected</title>
	<published>2009-11-06T20:27:41Z</published>
	<updated>2009-11-06T20:27:41Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Chris A. Mattmann reassigned TIKA-309:
&lt;br&gt;--------------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; Assignee: Chris A. Mattmann &amp;nbsp;(was: Jukka Zitting)
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Mime type application/rdf+xml not correctly detected
&lt;br&gt;&amp;gt; ----------------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-309
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-309&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-309&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Bug
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: mime
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;Affects Versions: 0.5
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Yuan-Fang Li
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Assignee: Chris A. Mattmann
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Priority: Minor
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Fix For: 0.5
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; Mime type detector using AutoDetectParser and Metadata returns &amp;quot;application/xml&amp;quot; for the URL &lt;a href=&quot;http://www.w3.org/2002/07/owl#&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.w3.org/2002/07/owl#&lt;/a&gt;, where it should be &amp;quot;application/rdf+xml&amp;quot;. The correct mime type is also suggested here: &lt;a href=&quot;http://www.w3.org/TR/owl-ref/#MIMEType&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.w3.org/TR/owl-ref/#MIMEType&lt;/a&gt;.
&lt;br&gt;&amp;gt; P.S., Tika was downloaded from svn and built with Maven last week.
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-309%29-Mime-type-application-rdf%2Bxml-not-correctly-detected-tp25867121p26242033.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26242021</id>
	<title>0.5 release</title>
	<published>2009-11-06T20:25:37Z</published>
	<updated>2009-11-06T20:25:37Z</updated>
	<author>
		<name>Mattmann, Chris A (388J)</name>
	</author>
	<content type="html">Hey Guys,
&lt;br&gt;&lt;br&gt;I'll be working over the next week to prepare the 0.5 RC. I'm looking
&lt;br&gt;forward to using the new process that Jukka put together that involves
&lt;br&gt;repository.apache.org.
&lt;br&gt;&lt;br&gt;If there are any objections to 0.5 RC going up in the next week or so,
&lt;br&gt;please speak up, otherwise, I'll continue pushing towards that as planned.
&lt;br&gt;&lt;br&gt;Thanks!
&lt;br&gt;&lt;br&gt;Cheers,
&lt;br&gt;Chris
&lt;br&gt;&lt;br&gt;++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
&lt;br&gt;Chris Mattmann, Ph.D.
&lt;br&gt;Senior Computer Scientist
&lt;br&gt;NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
&lt;br&gt;Office: 171-266B, Mailstop: 171-246
&lt;br&gt;Email: &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=26242021&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Chris.Mattmann@...&lt;/a&gt;
&lt;br&gt;WWW: &amp;nbsp; &lt;a href=&quot;http://sunset.usc.edu/~mattmann/&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://sunset.usc.edu/~mattmann/&lt;/a&gt;&lt;br&gt;++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
&lt;br&gt;Adjunct Assistant Professor, Computer Science Department
&lt;br&gt;University of Southern California, Los Angeles, CA 90089 USA
&lt;br&gt;++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/0.5-release-tp26242021p26242021.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26242018</id>
	<title>[jira] Resolved: (TIKA-314) Initial support for JPEG EXIF metadata extraction</title>
	<published>2009-11-06T20:23:41Z</published>
	<updated>2009-11-06T20:23:41Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Chris A. Mattmann resolved TIKA-314.
&lt;br&gt;------------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; Resolution: Fixed
&lt;br&gt;&lt;br&gt;I think this fix is fine as is -- as Jukka suggested, we may move to a diff package, but if so, we can open a new issue for it at that time. I'm marking this as fixed. Thanks, Jukka, and thanks Maxim for the patch!
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Initial support for JPEG EXIF metadata extraction
&lt;br&gt;&amp;gt; -------------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-314
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-314&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-314&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Improvement
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: parser
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Maxim Valyanskiy
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Assignee: Jukka Zitting
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Fix For: 0.5
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Attachments: initial-support-for-jpeg-exif-extraction.patch, testJPEG_EXIF.jpg
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; This patch adds initial support for JPEG EXIF metadata extraction
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-314%29-Initial-support-for-JPEG-EXIF-metadata-extraction-tp25975018p26242018.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26242008</id>
	<title>[jira] Assigned: (TIKA-314) Initial support for JPEG EXIF metadata extraction</title>
	<published>2009-11-06T20:21:41Z</published>
	<updated>2009-11-06T20:21:41Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Chris A. Mattmann reassigned TIKA-314:
&lt;br&gt;--------------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; Assignee: Jukka Zitting &amp;nbsp;(was: Chris A. Mattmann)
&lt;br&gt;&lt;br&gt;Jukka already fixed this -- I thought it was unassigned and therefore needed work but since it's likely fixed, I'll leave the assignee as Jukka.
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Initial support for JPEG EXIF metadata extraction
&lt;br&gt;&amp;gt; -------------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-314
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-314&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-314&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Improvement
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: parser
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Maxim Valyanskiy
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Assignee: Jukka Zitting
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Fix For: 0.5
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Attachments: initial-support-for-jpeg-exif-extraction.patch, testJPEG_EXIF.jpg
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; This patch adds initial support for JPEG EXIF metadata extraction
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-314%29-Initial-support-for-JPEG-EXIF-metadata-extraction-tp25975018p26242008.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26242009</id>
	<title>[jira] Assigned: (TIKA-314) Initial support for JPEG EXIF metadata extraction</title>
	<published>2009-11-06T20:21:41Z</published>
	<updated>2009-11-06T20:21:41Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Chris A. Mattmann reassigned TIKA-314:
&lt;br&gt;--------------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; Assignee: Chris A. Mattmann
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Initial support for JPEG EXIF metadata extraction
&lt;br&gt;&amp;gt; -------------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-314
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-314&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-314&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Improvement
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: parser
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Maxim Valyanskiy
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Assignee: Chris A. Mattmann
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Fix For: 0.5
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Attachments: initial-support-for-jpeg-exif-extraction.patch, testJPEG_EXIF.jpg
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; This patch adds initial support for JPEG EXIF metadata extraction
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-314%29-Initial-support-for-JPEG-EXIF-metadata-extraction-tp25975018p26242009.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26241980</id>
	<title>[jira] Commented: (TIKA-317) Annotation-based Tika configuration</title>
	<published>2009-11-06T20:11:41Z</published>
	<updated>2009-11-06T20:11:41Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; [ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12774546#action_12774546&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12774546#action_12774546&lt;/a&gt;&amp;nbsp;] 
&lt;br&gt;&lt;br&gt;Jukka Zitting commented on TIKA-317:
&lt;br&gt;------------------------------------
&lt;br&gt;&lt;br&gt;Re: co-locating metadata with code; Doing so makes it easier to support multiple different configuration mechanisms (default Tika config, programmatic configuration, OSGi services, IoC containers, etc.) as you don't need to duplicate the media type lists for each different way of configuring things.
&lt;br&gt;&lt;br&gt;Re: tika-config.xml vs. META-INF/services/...; The service provider mechanism [1] makes it easy to add custom parser implementations without having to maintain a separate copy of the full Tika configuration file. You could for example create a my-custom-parsers.jar file with a META-INF/services/o.a.tika.parser.Parser file that lists only your custom parser classes. When you add that jar to the classpath, Tika would then automatically pick up those parsers in addition to the standard parser classes from the tika-parsers jar.
&lt;br&gt;&lt;br&gt;[1] &lt;a href=&quot;http://java.sun.com/j2se/1.5.0/docs/guide/jar/jar.html#Service&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://java.sun.com/j2se/1.5.0/docs/guide/jar/jar.html#Service&lt;/a&gt;&amp;nbsp;Provider
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Annotation-based Tika configuration
&lt;br&gt;&amp;gt; -----------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-317
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-317&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-317&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Improvement
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: parser
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Jukka Zitting
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Assignee: Jukka Zitting
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Priority: Minor
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Fix For: 0.5
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; I'd like to simplify Tika configuration and make it easier to customize by pushing the information in tika-config.xml to Parser annotations and Java SPI service files.
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-317%29-Annotation-based-Tika-configuration-tp26241095p26241980.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26241894</id>
	<title>[jira] Commented: (TIKA-317) Annotation-based Tika configuration</title>
	<published>2009-11-06T19:47:41Z</published>
	<updated>2009-11-06T19:47:41Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; [ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12774543#action_12774543&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12774543#action_12774543&lt;/a&gt;&amp;nbsp;] 
&lt;br&gt;&lt;br&gt;Chris A. Mattmann commented on TIKA-317:
&lt;br&gt;----------------------------------------
&lt;br&gt;&lt;br&gt;Thanks for the more detail Jukka, but I fail to see how co-locating metadata with code (as in the case of JDK annotations) is any better of a mechanism that separating out such configuration into an XML file, Also, what is the difference between having the information in the tika-config.xml file versus locating (some of) that information int a META-INF/services/o.a.tika.parser.Parser file? I guess I just need to understand more b/c I'm missing something?
&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Annotation-based Tika configuration
&lt;br&gt;&amp;gt; -----------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-317
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-317&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-317&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Improvement
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: parser
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Jukka Zitting
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Assignee: Jukka Zitting
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Priority: Minor
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Fix For: 0.5
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; I'd like to simplify Tika configuration and make it easier to customize by pushing the information in tika-config.xml to Parser annotations and Java SPI service files.
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-317%29-Annotation-based-Tika-configuration-tp26241095p26241894.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26241864</id>
	<title>[jira] Commented: (TIKA-317) Annotation-based Tika configuration</title>
	<published>2009-11-06T19:39:41Z</published>
	<updated>2009-11-06T19:39:41Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; [ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12774542#action_12774542&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12774542#action_12774542&lt;/a&gt;&amp;nbsp;] 
&lt;br&gt;&lt;br&gt;Jukka Zitting commented on TIKA-317:
&lt;br&gt;------------------------------------
&lt;br&gt;&lt;br&gt;As Benson mentioned, a pretty typical deployment scenario is one where you want to extend Tika with a few custom Parser classes. Currently you'd either need to maintain a custom version of the full configuration file, or do some CompositeParser magic to inject your custom parsers at runtime. Neither option is ideal.
&lt;br&gt;&lt;br&gt;Another concern of mine is that the current configuration mechanism disconnects the list of supported media types from the parser implementation class. It would be better if that list was maintained in the same Java source file instead of in the XML configuration.
&lt;br&gt;&lt;br&gt;Thinking further, there's some interest in making Tika easy to use in more dynamic environments like an OSGi container where new parser components may be added to or removed from the system at any time. A static configuration file does not work that well in such situations.
&lt;br&gt;&lt;br&gt;So my idea is to move the list of media types supported by a Parser class to a class annotation (or perhaps a getSupportedTypes() method that would work better with composite parsers) and replace the tika-config.xml file with a META-INF/services/org.apache.tika.parser.Parser file that simply lists all the Parser implementations within that jar file.
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Annotation-based Tika configuration
&lt;br&gt;&amp;gt; -----------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-317
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-317&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-317&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Improvement
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: parser
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Jukka Zitting
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Assignee: Jukka Zitting
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Priority: Minor
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Fix For: 0.5
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; I'd like to simplify Tika configuration and make it easier to customize by pushing the information in tika-config.xml to Parser annotations and Java SPI service files.
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-317%29-Annotation-based-Tika-configuration-tp26241095p26241864.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26241402</id>
	<title>[jira] Commented: (TIKA-317) Annotation-based Tika configuration</title>
	<published>2009-11-06T17:55:41Z</published>
	<updated>2009-11-06T17:55:41Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; [ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12774534#action_12774534&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12774534#action_12774534&lt;/a&gt;&amp;nbsp;] 
&lt;br&gt;&lt;br&gt;Benson Margulies commented on TIKA-317:
&lt;br&gt;---------------------------------------
&lt;br&gt;&lt;br&gt;I'm with Jukka. I needed to replace one processor. Having to copy and modify the xml file, and then forever maintain my mutant version as new Tika releases change the rest of the contents that I don't want to change, is not a good prospect. 
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Annotation-based Tika configuration
&lt;br&gt;&amp;gt; -----------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-317
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-317&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-317&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Improvement
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: parser
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Jukka Zitting
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Assignee: Jukka Zitting
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Priority: Minor
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Fix For: 0.5
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; I'd like to simplify Tika configuration and make it easier to customize by pushing the information in tika-config.xml to Parser annotations and Java SPI service files.
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-317%29-Annotation-based-Tika-configuration-tp26241095p26241402.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26241147</id>
	<title>[jira] Commented: (TIKA-317) Annotation-based Tika configuration</title>
	<published>2009-11-06T17:05:32Z</published>
	<updated>2009-11-06T17:05:32Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; [ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12774521#action_12774521&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12774521#action_12774521&lt;/a&gt;&amp;nbsp;] 
&lt;br&gt;&lt;br&gt;Chris A. Mattmann commented on TIKA-317:
&lt;br&gt;----------------------------------------
&lt;br&gt;&lt;br&gt;Hey Jukka: could you explain how this will be simpler? I, personally, like the tika-config.xml file. Details, please :)
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Annotation-based Tika configuration
&lt;br&gt;&amp;gt; -----------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-317
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-317&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-317&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Improvement
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: parser
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Jukka Zitting
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Assignee: Jukka Zitting
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Priority: Minor
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Fix For: 0.5
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; I'd like to simplify Tika configuration and make it easier to customize by pushing the information in tika-config.xml to Parser annotations and Java SPI service files.
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-317%29-Annotation-based-Tika-configuration-tp26241095p26241147.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26241095</id>
	<title>[jira] Created: (TIKA-317) Annotation-based Tika configuration</title>
	<published>2009-11-06T16:59:32Z</published>
	<updated>2009-11-06T16:59:32Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">Annotation-based Tika configuration
&lt;br&gt;-----------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Key: TIKA-317
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-317&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-317&lt;/a&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Project: Tika
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Issue Type: Improvement
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Components: parser
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Reporter: Jukka Zitting
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Assignee: Jukka Zitting
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Priority: Minor
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Fix For: 0.5
&lt;br&gt;&lt;br&gt;&lt;br&gt;I'd like to simplify Tika configuration and make it easier to customize by pushing the information in tika-config.xml to Parser annotations and Java SPI service files.
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-317%29-Annotation-based-Tika-configuration-tp26241095p26241095.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26206973</id>
	<title>[jira] Reopened: (TIKA-309) Mime type application/rdf+xml not correctly detected</title>
	<published>2009-11-04T16:03:32Z</published>
	<updated>2009-11-04T16:03:32Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Yuan-Fang Li reopened TIKA-309:
&lt;br&gt;-------------------------------
&lt;br&gt;&lt;br&gt;&lt;br&gt;This fix had worked for me till yesterday. When I updated to the latest version (829668) from svn, my test cases on application/rdf+xml mimetype failed again, for URLs &amp;quot;&lt;a href=&quot;http://www.ai.sri.com/daml/services/owl-s/1.2/Process.owl&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.ai.sri.com/daml/services/owl-s/1.2/Process.owl&lt;/a&gt;&amp;quot; and &amp;quot;&lt;a href=&quot;http://www.w3.org/2002/07/owl#&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.w3.org/2002/07/owl#&lt;/a&gt;&amp;quot;. The mimetype returned is &amp;quot;application/xml&amp;quot; for the first one and &amp;quot;text/html&amp;quot; for the second one. Hence I'm reopening this issue.
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Mime type application/rdf+xml not correctly detected
&lt;br&gt;&amp;gt; ----------------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-309
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-309&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-309&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Bug
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: mime
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;Affects Versions: 0.5
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Yuan-Fang Li
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Assignee: Jukka Zitting
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Priority: Minor
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Fix For: 0.5
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; Mime type detector using AutoDetectParser and Metadata returns &amp;quot;application/xml&amp;quot; for the URL &lt;a href=&quot;http://www.w3.org/2002/07/owl#&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.w3.org/2002/07/owl#&lt;/a&gt;, where it should be &amp;quot;application/rdf+xml&amp;quot;. The correct mime type is also suggested here: &lt;a href=&quot;http://www.w3.org/TR/owl-ref/#MIMEType&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.w3.org/TR/owl-ref/#MIMEType&lt;/a&gt;.
&lt;br&gt;&amp;gt; P.S., Tika was downloaded from svn and built with Maven last week.
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-309%29-Mime-type-application-rdf%2Bxml-not-correctly-detected-tp25867121p26206973.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26197727</id>
	<title>Re: Free live video streaming of ApacheCon US 2009</title>
	<published>2009-11-04T06:55:16Z</published>
	<updated>2009-11-04T06:55:16Z</updated>
	<author>
		<name>Israel Ekpo</name>
	</author>
	<content type="html">Thanks a lot.
&lt;br&gt;&lt;br&gt;This will be very helpful to me.
&lt;br&gt;&lt;br&gt;As I am not able to attend.
&lt;br&gt;&lt;br&gt;On Wed, Nov 4, 2009 at 8:25 AM, Michael McCandless &amp;lt;
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=26197727&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;lucene@...&lt;/a&gt;&amp;gt; wrote:
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Team,
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; For those Lucene fanatics not in Oakland this week for ApacheCon US,
&lt;br&gt;&amp;gt; don't miss the FREE live video streaming, starting today:
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp;&lt;a href=&quot;http://streaming.linux-magazin.de/en/program-apachecon-us-2009.htm&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://streaming.linux-magazin.de/en/program-apachecon-us-2009.htm&lt;/a&gt;&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; Note that there are many talks available, covering Apache Hadoop,
&lt;br&gt;&amp;gt; Apache HTTPD, Lucene, as well as the Apache Pioneer's Panel and
&lt;br&gt;&amp;gt; keynote presentations.
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; Lucene's track is this Friday (NOTE these times are UTC -- use
&lt;br&gt;&amp;gt; &lt;a href=&quot;http://www.timeanddate.com&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.timeanddate.com&lt;/a&gt;&amp;nbsp;to map to your time zone):
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp;17:00 Implementing an Information Retrieval Framework for an
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; Organizational Repository, Sithu D Sudarsan
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp;18:00 Apache Mahout - Going from raw data to information
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; Isabel Drost
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp;19:15 MIME Magic with Apache Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; Jukka Zitting
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp;20:15 Keynote: How Open Source Developers Can (Still!) Save The World
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; Brian Behlendorf
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp;22:00 Building Intelligent Search Applications with the Lucene
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; Ecosystem, Ted Dunning
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp;23:00 Realtime Search
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; Jason Rutherglen
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; Happy viewing,
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; Mike
&lt;br&gt;&amp;gt;
&lt;/div&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;&amp;quot;Good Enough&amp;quot; is not good enough.
&lt;br&gt;To give anything less than your best is to sacrifice the gift.
&lt;br&gt;Quality First. Measure Twice. Cut Once.
&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Free-live-video-streaming-of-ApacheCon-US-2009-tp26196261p26197727.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26196261</id>
	<title>Free live video streaming of ApacheCon US 2009</title>
	<published>2009-11-04T05:25:25Z</published>
	<updated>2009-11-04T05:25:25Z</updated>
	<author>
		<name>Michael McCandless-2</name>
	</author>
	<content type="html">Team,
&lt;br&gt;&lt;br&gt;For those Lucene fanatics not in Oakland this week for ApacheCon US,
&lt;br&gt;don't miss the FREE live video streaming, starting today:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &lt;a href=&quot;http://streaming.linux-magazin.de/en/program-apachecon-us-2009.htm&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://streaming.linux-magazin.de/en/program-apachecon-us-2009.htm&lt;/a&gt;&lt;br&gt;&lt;br&gt;Note that there are many talks available, covering Apache Hadoop,
&lt;br&gt;Apache HTTPD, Lucene, as well as the Apache Pioneer's Panel and
&lt;br&gt;keynote presentations.
&lt;br&gt;&lt;br&gt;Lucene's track is this Friday (NOTE these times are UTC -- use
&lt;br&gt;&lt;a href=&quot;http://www.timeanddate.com&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.timeanddate.com&lt;/a&gt;&amp;nbsp;to map to your time zone):
&lt;br&gt;&lt;br&gt;&amp;nbsp;17:00 Implementing an Information Retrieval Framework for an
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Organizational Repository, Sithu D Sudarsan
&lt;br&gt;&lt;br&gt;&amp;nbsp;18:00 Apache Mahout - Going from raw data to information
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Isabel Drost
&lt;br&gt;&lt;br&gt;&amp;nbsp;19:15 MIME Magic with Apache Tika
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Jukka Zitting
&lt;br&gt;&lt;br&gt;&amp;nbsp;20:15 Keynote: How Open Source Developers Can (Still!) Save The World
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Brian Behlendorf
&lt;br&gt;&lt;br&gt;&amp;nbsp;22:00 Building Intelligent Search Applications with the Lucene
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Ecosystem, Ted Dunning
&lt;br&gt;&lt;br&gt;&amp;nbsp;23:00 Realtime Search
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Jason Rutherglen
&lt;br&gt;&lt;br&gt;Happy viewing,
&lt;br&gt;&lt;br&gt;Mike
&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Free-live-video-streaming-of-ApacheCon-US-2009-tp26196261p26196261.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26186685</id>
	<title>[jira] Updated: (TIKA-316) Parsing Visio diagrams with tika-app causes TikaException (Found a chunk with a negative length)</title>
	<published>2009-11-03T13:11:32Z</published>
	<updated>2009-11-03T13:11:32Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Mike Hays updated TIKA-316:
&lt;br&gt;---------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; Attachment: repro-TIKA-316.vsd
&lt;br&gt;&lt;br&gt;Run either:
&lt;br&gt;java -jar tika-app-0.4.jar repro-TIKA-316.vsd
&lt;br&gt;java -jar tika-app-0.5-SNAPSHOT.jar repro-TIKA-316.vsd
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Parsing Visio diagrams with tika-app causes TikaException (Found a chunk with a negative length)
&lt;br&gt;&amp;gt; ------------------------------------------------------------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-316
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-316&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-316&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Bug
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;Affects Versions: 0.4, 0.5
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Environment: Windows Server 2003 SP2, JRE 1.6.0_16, tika-app, Visio 2003
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Mike Hays
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Attachments: repro-TIKA-316.vsd
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; tika-app (0.4 and 0.5 nightly) return the following when attempting to parse a Visio 2003 file (other versions may be affected):
&lt;br&gt;&amp;gt; Exception in thread &amp;quot;main&amp;quot; org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@145e044
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:123)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:103)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:176)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:63)
&lt;br&gt;&amp;gt; Caused by: java.lang.IllegalArgumentException: Found a chunk with a negative length, which isn't allowed
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.poi.hdgf.chunks.ChunkFactory.createChunk(ChunkFactory.java:120)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.poi.hdgf.streams.ChunkStream.findChunks(ChunkStream.java:59)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:93)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.poi.hdgf.HDGFDiagram.&amp;lt;init&amp;gt;(HDGFDiagram.java:95)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.poi.hdgf.extractor.VisioTextExtractor.&amp;lt;init&amp;gt;(VisioTextExtractor.java:52)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.poi.hdgf.extractor.VisioTextExtractor.&amp;lt;init&amp;gt;(VisioTextExtractor.java:49)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:118)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ... 3 more
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-316%29-Parsing-Visio-diagrams-with-tika-app-causes-TikaException-%28Found-a-chunk-with-a-negative-length%29-tp26186623p26186685.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26186623</id>
	<title>[jira] Created: (TIKA-316) Parsing Visio diagrams with tika-app causes TikaException (Found a chunk with a negative length)</title>
	<published>2009-11-03T13:07:32Z</published>
	<updated>2009-11-03T13:07:32Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">Parsing Visio diagrams with tika-app causes TikaException (Found a chunk with a negative length)
&lt;br&gt;------------------------------------------------------------------------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Key: TIKA-316
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-316&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-316&lt;/a&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Project: Tika
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Issue Type: Bug
&lt;br&gt;&amp;nbsp; &amp;nbsp; Affects Versions: 0.4, 0.5
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Environment: Windows Server 2003 SP2, JRE 1.6.0_16, tika-app, Visio 2003
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Reporter: Mike Hays
&lt;br&gt;&lt;br&gt;&lt;br&gt;tika-app (0.4 and 0.5 nightly) return the following when attempting to parse a Visio 2003 file (other versions may be affected):
&lt;br&gt;&lt;br&gt;Exception in thread &amp;quot;main&amp;quot; org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@145e044
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:123)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:103)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:176)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:63)
&lt;br&gt;Caused by: java.lang.IllegalArgumentException: Found a chunk with a negative length, which isn't allowed
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.poi.hdgf.chunks.ChunkFactory.createChunk(ChunkFactory.java:120)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.poi.hdgf.streams.ChunkStream.findChunks(ChunkStream.java:59)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:93)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.poi.hdgf.HDGFDiagram.&amp;lt;init&amp;gt;(HDGFDiagram.java:95)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.poi.hdgf.extractor.VisioTextExtractor.&amp;lt;init&amp;gt;(VisioTextExtractor.java:52)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.poi.hdgf.extractor.VisioTextExtractor.&amp;lt;init&amp;gt;(VisioTextExtractor.java:49)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:118)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ... 3 more
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-316%29-Parsing-Visio-diagrams-with-tika-app-causes-TikaException-%28Found-a-chunk-with-a-negative-length%29-tp26186623p26186623.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26062887</id>
	<title>[jira] Updated: (TIKA-315) Tika appears to skip over an entire section of a Microsoft Word Document</title>
	<published>2009-10-26T09:22:59Z</published>
	<updated>2009-10-26T09:22:59Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Sanjeev Rao updated TIKA-315:
&lt;br&gt;-----------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; Description: 
&lt;br&gt;I saved this MS word file &lt;a href=&quot;http://mingo.info-science.uiowa.edu/padmini/Papers/Biolink07.doc&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://mingo.info-science.uiowa.edu/padmini/Papers/Biolink07.doc&lt;/a&gt;&amp;nbsp;to my 32 bit Vista desktop. Then I tried both the command line -t option (output attached) and the GUI option. In either case, a large section of the word got lost.. From the bottom, you can scroll up and look for reference #4 which exists but references 1 - 3 don't.. You will also find that a page or so of the original document is missing! It is not clear what about this document caused this behavior. 
&lt;br&gt;&lt;br&gt;I am trying to use tika to convert content from the web.
&lt;br&gt;&lt;br&gt;&amp;nbsp; was:
&lt;br&gt;Using the -gui option, dragged and dropped this file &lt;a href=&quot;http://mingo.info-science.uiowa.edu/padmini/Papers/Biolink07.doc&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://mingo.info-science.uiowa.edu/padmini/Papers/Biolink07.doc&lt;/a&gt;&amp;nbsp;which was saved on my 32 bit Vista desktop. Attached is the output. (assuming I can attach after I create this issue)
&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; Environment: Microsoft Windows Vista 32 bit; Apache tika 0.4 release; used the -gui and the command line -t option. &amp;nbsp;(was: Microsoft Windows Vista 32 bit; Apache tika 0.4 release; used the -gui option.)
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Tika appears to skip over an entire section of a Microsoft Word Document
&lt;br&gt;&amp;gt; ------------------------------------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-315
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-315&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-315&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Bug
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;Affects Versions: 0.4
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Environment: Microsoft Windows Vista 32 bit; Apache tika 0.4 release; used the -gui and the command line -t option.
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Sanjeev Rao
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Attachments: Biolink07_fromTika0.4.txt
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; I saved this MS word file &lt;a href=&quot;http://mingo.info-science.uiowa.edu/padmini/Papers/Biolink07.doc&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://mingo.info-science.uiowa.edu/padmini/Papers/Biolink07.doc&lt;/a&gt;&amp;nbsp;to my 32 bit Vista desktop. Then I tried both the command line -t option (output attached) and the GUI option. In either case, a large section of the word got lost.. From the bottom, you can scroll up and look for reference #4 which exists but references 1 - 3 don't.. You will also find that a page or so of the original document is missing! It is not clear what about this document caused this behavior. 
&lt;br&gt;&amp;gt; I am trying to use tika to convert content from the web.
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-315%29-Tika-appears-to-skip-over-an-entire-section-of-a-Microsoft-Word-Document-tp26062742p26062887.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26062782</id>
	<title>[jira] Updated: (TIKA-315) Tika appears to skip over an entire section of a Microsoft Word Document</title>
	<published>2009-10-26T09:16:59Z</published>
	<updated>2009-10-26T09:16:59Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Sanjeev Rao updated TIKA-315:
&lt;br&gt;-----------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; Attachment: Biolink07_fromTika0.4.txt
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Tika appears to skip over an entire section of a Microsoft Word Document
&lt;br&gt;&amp;gt; ------------------------------------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-315
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-315&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-315&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Bug
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;Affects Versions: 0.4
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Environment: Microsoft Windows Vista 32 bit; Apache tika 0.4 release; used the -gui option.
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Sanjeev Rao
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Attachments: Biolink07_fromTika0.4.txt
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; Using the -gui option, dragged and dropped this file &lt;a href=&quot;http://mingo.info-science.uiowa.edu/padmini/Papers/Biolink07.doc&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://mingo.info-science.uiowa.edu/padmini/Papers/Biolink07.doc&lt;/a&gt;&amp;nbsp;which was saved on my 32 bit Vista desktop. Attached is the output. (assuming I can attach after I create this issue)
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-315%29-Tika-appears-to-skip-over-an-entire-section-of-a-Microsoft-Word-Document-tp26062742p26062782.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26062742</id>
	<title>[jira] Created: (TIKA-315) Tika appears to skip over an entire section of a Microsoft Word Document</title>
	<published>2009-10-26T09:14:59Z</published>
	<updated>2009-10-26T09:14:59Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">Tika appears to skip over an entire section of a Microsoft Word Document
&lt;br&gt;------------------------------------------------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Key: TIKA-315
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-315&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-315&lt;/a&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Project: Tika
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Issue Type: Bug
&lt;br&gt;&amp;nbsp; &amp;nbsp; Affects Versions: 0.4
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Environment: Microsoft Windows Vista 32 bit; Apache tika 0.4 release; used the -gui option.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Reporter: Sanjeev Rao
&lt;br&gt;&lt;br&gt;&lt;br&gt;Using the -gui option, dragged and dropped this file &lt;a href=&quot;http://mingo.info-science.uiowa.edu/padmini/Papers/Biolink07.doc&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://mingo.info-science.uiowa.edu/padmini/Papers/Biolink07.doc&lt;/a&gt;&amp;nbsp;which was saved on my 32 bit Vista desktop. Attached is the output. (assuming I can attach after I create this issue)
&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-315%29-Tika-appears-to-skip-over-an-entire-section-of-a-Microsoft-Word-Document-tp26062742p26062742.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26044112</id>
	<title>[jira] Commented: (TIKA-314) Initial support for JPEG EXIF metadata extraction</title>
	<published>2009-10-24T16:42:59Z</published>
	<updated>2009-10-24T16:42:59Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; [ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12769736#action_12769736&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12769736#action_12769736&lt;/a&gt;&amp;nbsp;] 
&lt;br&gt;&lt;br&gt;Jukka Zitting commented on TIKA-314:
&lt;br&gt;------------------------------------
&lt;br&gt;&lt;br&gt;Patch committed in revision 829467. Added missing license headers in revision 829469.
&lt;br&gt;&lt;br&gt;BTW, how about if we moved the JpegParser class to the o.a.tika.parser.image package?
&lt;br&gt;&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Initial support for JPEG EXIF metadata extraction
&lt;br&gt;&amp;gt; -------------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-314
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-314&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-314&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Improvement
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: parser
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Maxim Valyanskiy
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Fix For: 0.5
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Attachments: initial-support-for-jpeg-exif-extraction.patch, testJPEG_EXIF.jpg
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; This patch adds initial support for JPEG EXIF metadata extraction
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-314%29-Initial-support-for-JPEG-EXIF-metadata-extraction-tp25975018p26044112.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26044018</id>
	<title>[jira] Commented: (TIKA-314) Initial support for JPEG EXIF metadata extraction</title>
	<published>2009-10-24T16:24:59Z</published>
	<updated>2009-10-24T16:24:59Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; [ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12769731#action_12769731&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12769731#action_12769731&lt;/a&gt;&amp;nbsp;] 
&lt;br&gt;&lt;br&gt;Jukka Zitting commented on TIKA-314:
&lt;br&gt;------------------------------------
&lt;br&gt;&lt;br&gt;Looks good, thanks! The metadata-extractor library seems to be in the public domain, so it should be fine for us to use. The exact license info I find in the library is:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; This is public domain software - that is, you can do whatever you want
&lt;br&gt;&amp;nbsp; &amp;nbsp; with it, and include it software that is licensed under the GNU or the
&lt;br&gt;&amp;nbsp; &amp;nbsp; BSD license, or whatever other licence you choose, including proprietary
&lt;br&gt;&amp;nbsp; &amp;nbsp; closed source licenses. &amp;nbsp;I do ask that you leave this header in tact.
&lt;br&gt;&lt;br&gt;BTW, a git format patch is fine, I'm actually using git myself too (committing via git-svn).
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Initial support for JPEG EXIF metadata extraction
&lt;br&gt;&amp;gt; -------------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-314
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-314&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-314&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Improvement
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: parser
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Maxim Valyanskiy
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Fix For: 0.5
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Attachments: initial-support-for-jpeg-exif-extraction.patch, testJPEG_EXIF.jpg
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; This patch adds initial support for JPEG EXIF metadata extraction
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-314%29-Initial-support-for-JPEG-EXIF-metadata-extraction-tp25975018p26044018.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26042033</id>
	<title>Re: MarkUnsupportedException</title>
	<published>2009-10-24T12:02:00Z</published>
	<updated>2009-10-24T12:02:00Z</updated>
	<author>
		<name>Jukka Zitting</name>
	</author>
	<content type="html">Hi,
&lt;br&gt;&lt;br&gt;On Sat, Oct 17, 2009 at 2:43 PM, mastcheshmi
&lt;br&gt;&amp;lt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=26042033&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;mehran.mastcheshmi@...&lt;/a&gt;&amp;gt; wrote:
&lt;br&gt;&amp;gt; I use Tika
&lt;br&gt;&amp;gt; for all document this exception occured.
&lt;br&gt;&amp;gt; org/apache/poi/hpsf/MarkUnsupportedException
&lt;br&gt;&lt;br&gt;That seems like a bug. I guess POI expects the given document stream
&lt;br&gt;to support the mark feature, so Tika should explicitly wrap the stream
&lt;br&gt;into a java.io.BufferedInputStream if it does not already support
&lt;br&gt;marks.
&lt;br&gt;&lt;br&gt;Can you please file a bug report [1] about this? Meanwhile, as a
&lt;br&gt;workaround you can wrap the input stream yourself:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; new OfficeParser().parse(new BufferedInputStream(in), handler, metadata);
&lt;br&gt;&lt;br&gt;[1] &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA&lt;/a&gt;&lt;br&gt;&lt;br&gt;BR,
&lt;br&gt;&lt;br&gt;Jukka Zitting
&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Re%3A-MarkUnsupportedException-tp26042033p26042033.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-25975076</id>
	<title>[jira] Updated: (TIKA-314) Initial support for JPEG EXIF metadata extraction</title>
	<published>2009-10-20T06:25:59Z</published>
	<updated>2009-10-20T06:25:59Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Maxim Valyanskiy updated TIKA-314:
&lt;br&gt;----------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; Attachment: &amp;nbsp; &amp;nbsp; (was: 0001-initial-support-for-jpeg-exif-extraction.patch)
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Initial support for JPEG EXIF metadata extraction
&lt;br&gt;&amp;gt; -------------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-314
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-314&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-314&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Improvement
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: parser
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Maxim Valyanskiy
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Fix For: 0.5
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Attachments: initial-support-for-jpeg-exif-extraction.patch, testJPEG_EXIF.jpg
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; This patch adds initial support for JPEG EXIF metadata extraction
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-314%29-Initial-support-for-JPEG-EXIF-metadata-extraction-tp25975018p25975076.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-25975077</id>
	<title>[jira] Updated: (TIKA-314) Initial support for JPEG EXIF metadata extraction</title>
	<published>2009-10-20T06:25:59Z</published>
	<updated>2009-10-20T06:25:59Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Maxim Valyanskiy updated TIKA-314:
&lt;br&gt;----------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; Attachment: initial-support-for-jpeg-exif-extraction.patch
&lt;br&gt;&lt;br&gt;oops, I attached patch in git format, this attachement is the same patch in unidiff format
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Initial support for JPEG EXIF metadata extraction
&lt;br&gt;&amp;gt; -------------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-314
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-314&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-314&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Improvement
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: parser
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Maxim Valyanskiy
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Fix For: 0.5
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Attachments: initial-support-for-jpeg-exif-extraction.patch, testJPEG_EXIF.jpg
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; This patch adds initial support for JPEG EXIF metadata extraction
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/-jira--Created%3A-%28TIKA-314%29-Initial-support-for-JPEG-EXIF-metadata-extraction-tp25975018p25975077.html" />
</entry>

</feed>
