|
View:
New views
2 Messages
—
Rating Filter:
Alert me
|
|
|
clarification on libextractor functionality and evHi,
I am a student who is planning to work on Xapian search and indexing libraries project.My area of work would be to replace the currect content and meta data mechanism , which make use of external filter programs resulting in a filter being run for every different file format increasing the cpu footprint.I would like to replace these external filter programs with shared libraries like the one provided with libextractor.If its not too much trouble please clarify on the following matters 1:Does it also support to extract content of the file other than just meta data ? 2:Does the file format identification and extraction happen implicitly or does libextractor implement a mechanism where for each fileformat an external filter program is run.If that is the case then there is no need to replace the current xapian system with this. 3:Is there any fileformats that would be desirable to be seen as a part of libextractor but which currently is not a part ? 4: Could you suggest any alternative for the above requirements I mentioned. I would specifically require c/c++ libaries. Please reply for the above queries , as it would help me in tailoring the project as such.Also if there is any irc where I could possibly clarify doubts please kindly add that too. Regards, Nijil.Y |
|
|
Re: clarification on libextractor functionality and evOn Tuesday, March 29, 2011 11:15:03 pm nijil yes wrote:
> Hi, > > I am a student who is planning to work on Xapian search and indexing > libraries project.My area of work would be to replace the currect content > and meta data mechanism , which make use of external filter programs > resulting in a filter being run for every different file format increasing > the cpu footprint.I would like to replace these external filter programs > with shared libraries like the one provided with libextractor.If its not > too much trouble please clarify on the following matters > > 1:Does it also support to extract content of the file other than just meta > data ? In principle this would be possible, but none of the plugins that have been implemented do this and this is not the intend of the library. > 2:Does the file format identification and extraction happen implicitly or > does libextractor implement a mechanism where for each fileformat an > external filter program is run.If that is the case then there is no need > to replace the current xapian system with this. File format identification happens internally; LE runs each plugin and the plugin then decides if the given format is applicable to it. Naturally, most plugins terminate quickly after a brief look at the file header most of the time. > 3:Is there any fileformats that would be desirable to be seen as a part of > libextractor but which currently is not a part ? Always ;-). There is a TODO list in the distribution. > 4: Could you suggest any alternative for the above requirements I > mentioned. I would specifically require c/c++ libaries. libmime is somewhat related, other than that, I don't know any C/C++ libraries doing something similar. Happy hacking, Christian |
| Free embeddable forum powered by Nabble | Forum Help |