|
View:
New views
3 Messages
—
Rating Filter:
Alert me
|
|
|
extractor metadata and XML/RDFHello,
I'm working on SWSE [1], a Semantic Web Search Engine. The aim is to collect arbitrary content from the Web and make the metadata available for search and query. Extractor looks like exactly the right tool for extracting metadata from legacy formats. However, the resulting metadata are name-value pairs, which makes post-processing difficult. Do you have (or are there efforts in that direction) a more formal way of returning metadata? I can see XML or better RDF fitting there. I'd like to add some terms from standard ontologies (such as Dublin Core and Friend of a Friend) to the output, probably using sed scripts in the beginning if there is currently nothing else available. Regards, Andreas. [1] http://swse.org/ _______________________________________________ libextractor mailing list libextractor@... http://lists.gnu.org/mailman/listinfo/libextractor |
|
|
Re: extractor metadata and XML/RDFOn Monday 09 July 2007 10:07, Andreas Harth wrote:
> Hello, > > I'm working on SWSE [1], a Semantic Web Search Engine. The aim > is to collect arbitrary content from the Web and make the metadata > available for search and query. > > Extractor looks like exactly the right tool for extracting metadata > from legacy formats. However, the resulting metadata are name-value > pairs, which makes post-processing difficult. I don't see how it makes post-processing difficult. It is pretty much the simplest format possible. Now, certainly having data in highly standardized format (such as dates, numbers, etc.) would help certain forms of post-processing. However, given that some of the file-formats are a bit vague in how they encode the data in the first place, I don't see how it would be possible to always achieve this. > Do you have (or are there efforts in that direction) a more formal > way of returning metadata? I can see XML or better RDF fitting there. > I'd like to add some terms from standard ontologies (such as Dublin > Core and Friend of a Friend) to the output, probably using sed > scripts in the beginning if there is currently nothing else available. The metadata types used by LE were motivated by Dublin Core. Additional terms are added as needed by particular formats. Improvements in the set of available metadata types are welcome but should be driven by adding or modifying existing plugins to produce better terms, not by just adding terms that will never be extracted. I am not aware of any effort to add support for RDF or XML. Best regards, Christian _______________________________________________ libextractor mailing list libextractor@... http://lists.gnu.org/mailman/listinfo/libextractor |
|
|
|
| Free embeddable forum powered by Nabble | Forum Help |