« Return to Thread: Major speed improvements in package parsing

Re: Major speed improvements in package parsing

by ogjunk-tika :: Rate this Message:

Reply to Author | View in Thread


Nice, thanks for sharing!  You observed the same speed increase pattern after running this several times to avoid any cold/hot cache side-effects?

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----

> From: Jukka Zitting <jukka.zitting@...>
> To: tika-dev@...
> Sent: Wednesday, June 3, 2009 6:18:02 AM
> Subject: Major speed improvements in package parsing
>
> Hi,
>
> Inspired by TIKA-236, I ran the following ad-hoc test:
>
> $ time java -jar tika-0.3-standalone.jar --text lucene-2.0.0-src.zip >
> output-0.3.txt
> real    0m29.844s
> user    0m39.686s
> sys    0m0.840s
> $ time java -jar tika-app-0.4-SNAPSHOT.jar --text lucene-2.0.0-src.zip
> > output-0.4.txt
> real    0m12.587s
> user    0m15.911s
> sys    0m0.495s
>
> This is especially impressive as the 0.4 version is able to extract
> almost twice as much text from the archive:
>
> $ du -h output-*
> 6.8M    output-0.3.txt
> 13M    output-0.4.txt
>
> This speed increase is mostly the result of the TIKA-204 and TIKA-238
> improvements.
>
> Looking deeper at the output reveals some minor issues that I'll be
> filing bugs for. However, in general the result of the extraction
> seems pretty good.
>
> BR,
>
> Jukka Zitting

 « Return to Thread: Major speed improvements in package parsing