clarification on libextractor functionality and ev

View: New views
2 Messages — Rating Filter:   Alert me  

clarification on libextractor functionality and ev

by nijil :: Rate this Message:

| View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
Hi,

I am a student who is planning to work on Xapian search and indexing libraries project.My area of work would be to replace the currect content and meta data mechanism , which make use of external filter programs resulting in a filter being run for every different file format increasing the cpu footprint.I would like to replace these external filter programs with shared libraries like the one provided with libextractor.If its not too much trouble please clarify on the following matters

1:Does it also support to extract content of the file other than just meta data ?

2:Does the file format identification and extraction happen implicitly or does libextractor implement a mechanism where for each fileformat an external filter program is run.If that is the case then there is no need to replace the current xapian system with this.

3:Is there any fileformats that would be desirable to be seen as a part of libextractor but which currently is not a part ?

4: Could you suggest any alternative for the above requirements I mentioned. I would specifically require c/c++ libaries.

Please reply for the above queries , as it would help me in tailoring the project as such.Also if there is any irc where I could possibly clarify doubts please kindly add that too.

Regards,
Nijil.Y


Re: clarification on libextractor functionality and ev

by Christian Grothoff :: Rate this Message:

| View Threaded | Show Only this Message

On Tuesday, March 29, 2011 11:15:03 pm nijil yes wrote:

> Hi,
>
> I am a student who is planning to work on Xapian search and indexing
> libraries project.My area of work would be to replace the currect content
> and meta data mechanism , which make use of external filter programs
> resulting in a filter being run for every different file format increasing
> the cpu footprint.I would like to replace these external filter programs
> with shared libraries like the one provided with libextractor.If its not
> too much trouble please clarify on the following matters
>
> 1:Does it also support to extract content of the file other than just meta
> data ?

In principle this would be possible, but none of the plugins that have been
implemented do this and this is not the intend of the library.
 
> 2:Does the file format identification and extraction happen implicitly or
> does libextractor implement a mechanism where for each fileformat an
> external filter program is run.If that is the case then there is no need
> to replace the current xapian system with this.

File format identification happens internally; LE runs each plugin and the
plugin then decides if the given format is applicable to it.  Naturally, most
plugins terminate quickly after a brief look at the file header most of the
time.
 
> 3:Is there any fileformats that would be desirable to be seen as a part of
> libextractor but which currently is not a part ?

Always ;-).  There is a TODO list in the distribution.

> 4: Could you suggest any alternative for the above requirements I
> mentioned. I would specifically require c/c++ libaries.

libmime is somewhat related, other than that, I don't know any C/C++ libraries
doing something similar.

Happy hacking,

Christian