|
View:
New views
8 Messages
—
Rating Filter:
Alert me
|
|
|
using TextExtractor
Hi,
I would like to use the TextExtractor-classes to extract text from several files, concat this text and then put it into a part of a document. I have a document that links to several other documents and those documents all have a part-type that contains a file. I want to find the document that links to these documents by searching on the text that is in the files. That is why i think i have to concat the text from te files and put this into a part on the first document. Can you explain how i can reuse the TextExtractor? It seems that i have to give a configuration with the interface. Where can i find this? Is there a better way to do this? --
Met vriendelijke groeten, Bart Van den Abeele Email: bvda@... Helpdesk : +32 (09) 389 0560 Persoonlijk : +32 (09) 389 0564 **** DISCLAIMER **** _______________________________________________ daisy community mailing list Professional Daisy support: http://outerthought.org/en/services/daisy/support.html mail to: daisy@... list information: http://lists.cocoondev.org/mailman/listinfo/daisy |
|
|
Re: using TextExtractorI don't know what configuration you are referring to.
The TextExtractor interface has only two methods: List<String> getMimeTypes(); String getText(InputStream is) throws Exception which seem pretty clear to me. The built-in implementations use AbstractTextExtractor which takes care of registering the text extractor. Have a look at the code and spring configuration files under http://svn.daisycms.org/viewsvn/daisy/trunk/daisy/services/textextraction/impl/src/ HTH, Karel On Mon, Aug 24, 2009 at 10:22 AM, Bart Van den Abeele<bvda@...> wrote: > Hi, > > I would like to use the TextExtractor-classes to extract text from several > files, concat this text and then put it into a part of a document. I have a > document that links to several other documents and those documents all have > a part-type that contains a file. I want to find the document that links to > these documents by searching on the text that is in the files. That is why > i think i have to concat the text from te files and put this into a part on > the first document. Can you explain how i can reuse the TextExtractor? It > seems that i have to give a configuration with the interface. Where can i > find this? Is there a better way to do this? > > -- > Met vriendelijke groeten, > Bart Van den Abeele > Email: bvda@... > Helpdesk : +32 (09) 389 0560 > Persoonlijk : +32 (09) 389 0564 > > **** DISCLAIMER **** > http://www.schaubroeck.be/maildisclaimer.htm > > _______________________________________________ > daisy community mailing list > Professional Daisy support: > http://outerthought.org/en/services/daisy/support.html > mail to: daisy@... > list information: http://lists.cocoondev.org/mailman/listinfo/daisy > > daisy community mailing list Professional Daisy support: http://outerthought.org/en/services/daisy/support.html mail to: daisy@... list information: http://lists.cocoondev.org/mailman/listinfo/daisy |
|
|
setting layoutType=plainHi,
In my skin document_to_html.xsl I would like to set 'daisyid'.html?layoutType=plain for specific links. These links are to document that will appear in a popup box where I want no left-nav and header/footer. I see in the docs that "link transformation happens after the document styling" Is there a way to add a parameter to a link like ?layoutType=plain thanks, -chris _______________________________________________ daisy community mailing list Professional Daisy support: http://outerthought.org/en/services/daisy/support.html mail to: daisy@... list information: http://lists.cocoondev.org/mailman/listinfo/daisy |
|
|
Re: using TextExtractor
Hi,
I got it to work. thx! I see that there is no support for open office spreadsheet .ods (vnd.sun.xml.calc) or open office persentation .odp. Will this be added? At the moment it works on the client-side, but it is way to heavy. I would like it to run in background on the repository-server. Any idea's on how i could do this? The server already extracts the text, but i got a different case. I would like to agregate the text of several documents and put this on an other document that refers to those documents. Example : I got a Finance-document f1 which refers to an Attachement-document a1 (.odt) and a2 (.ppt). I want the text of a1 and a2 concatted and then set on document f1. The goal is to find document f1 when i search on text that is in a1 or a2. Met vriendelijke groeten,
Bart Van den Abeele Email: bvda@... Helpdesk : +32 (09) 389 0560 Persoonlijk : +32 (09) 389 0564 Karel Vervaeke wrote:
**** DISCLAIMER **** _______________________________________________ daisy community mailing list Professional Daisy support: http://outerthought.org/en/services/daisy/support.html mail to: daisy@... list information: http://lists.cocoondev.org/mailman/listinfo/daisy |
|
|
Re: using TextExtractor
Ter, 2009-08-25 às 16:02 +0200, Bart Van den Abeele escreveu:
I got a Finance-document f1 which refers to an Attachement-document a1 (.odt) and a2 (.ppt). I want the text of a1 and a2 concatted and then set on document f1. The goal is to find document f1 when i search on text that is in a1 or a2.First off (from an information science point of view) you need to determine whether a1 and a2 will ONLY be part of f1, and never part of any other documents. If this is not the case, then of course you have no way of doing that. Then from a Daisy perspective, you could create a special SubAttachment document type with a special field called ParentDocument which is a link to the parent document. In the case of a1 and a2, ParentDocument would be the ID of f1. Then in /<yourskin>/document-styling/html/SubAttachment.xsl instead of displaying the current document, you retrieve the doc with ID ParentDocument and display it. I don't know a very smart and Daisy way of doing that, but one way would be to use a client-side HTTP redirect. HTH Júlio. _______________________________________________ daisy community mailing list Professional Daisy support: http://outerthought.org/en/services/daisy/support.html mail to: daisy@... list information: http://lists.cocoondev.org/mailman/listinfo/daisy |
|
|
Re: using TextExtractorOn Tue, Aug 25, 2009 at 4:02 PM, Bart Van den Abeele<bvda@...> wrote:
> Hi, > > I got it to work. thx! > > I see that there is no support for open office spreadsheet .ods > (vnd.sun.xml.calc) or open office persentation .odp. Will this be added? Nothing of the sort is scheduled. Feel free to create a jira issue (and if possible, provide a patch). AFAICT, the built-in OpenOfficeTextExtractor could work for ods files without changes, so it would merely be a question of registering the additional mimetypes. (It needs to be tested to make sure though) I don't have any suggestions for the link-fulltext search problem unfortunately. Regards, Karel > > At the moment it works on the client-side, but it is way to heavy. I would > like it to run in background on the repository-server. Any idea's on how i > could do this? The server already extracts the text, but i got a different > case. I would like to agregate the text of several documents and put this > on an other document that refers to those documents. Example : I got a > Finance-document f1 which refers to an Attachement-document a1 (.odt) and a2 > (.ppt). I want the text of a1 and a2 concatted and then set on document > f1. The goal is to find document f1 when i search on text that is in a1 or > a2. > > Met vriendelijke groeten, > Bart Van den Abeele > Email: bvda@... > Helpdesk : +32 (09) 389 0560 > Persoonlijk : +32 (09) 389 0564 > > Karel Vervaeke wrote: > > I don't know what configuration you are referring to. > > The TextExtractor interface has only two methods: > List<String> getMimeTypes(); > String getText(InputStream is) throws Exception > which seem pretty clear to me. > > The built-in implementations use AbstractTextExtractor which takes > care of registering the text extractor. > > Have a look at the code and spring configuration files under > http://svn.daisycms.org/viewsvn/daisy/trunk/daisy/services/textextraction/impl/src/ > > HTH, > Karel > > On Mon, Aug 24, 2009 at 10:22 AM, Bart Van den > Abeele<bvda@...> wrote: > > > Hi, > > I would like to use the TextExtractor-classes to extract text from several > files, concat this text and then put it into a part of a document. I have a > document that links to several other documents and those documents all have > a part-type that contains a file. I want to find the document that links to > these documents by searching on the text that is in the files. That is why > i think i have to concat the text from te files and put this into a part on > the first document. Can you explain how i can reuse the TextExtractor? It > seems that i have to give a configuration with the interface. Where can i > find this? Is there a better way to do this? > > -- > Met vriendelijke groeten, > Bart Van den Abeele > Email: bvda@... > Helpdesk : +32 (09) 389 0560 > Persoonlijk : +32 (09) 389 0564 > > **** DISCLAIMER **** > http://www.schaubroeck.be/maildisclaimer.htm > > _______________________________________________ > daisy community mailing list > Professional Daisy support: > http://outerthought.org/en/services/daisy/support.html > mail to: daisy@... > list information: http://lists.cocoondev.org/mailman/listinfo/daisy > > > > > _______________________________________________ > daisy community mailing list > Professional Daisy support: > http://outerthought.org/en/services/daisy/support.html > mail to: daisy@... > list information: http://lists.cocoondev.org/mailman/listinfo/daisy > > > > **** DISCLAIMER **** > http://www.schaubroeck.be/maildisclaimer.htm > > _______________________________________________ > daisy community mailing list > Professional Daisy support: > http://outerthought.org/en/services/daisy/support.html > mail to: daisy@... > list information: http://lists.cocoondev.org/mailman/listinfo/daisy > > daisy community mailing list Professional Daisy support: http://outerthought.org/en/services/daisy/support.html mail to: daisy@... list information: http://lists.cocoondev.org/mailman/listinfo/daisy |
|
|
Re: using TextExtractor
Perhaps you could use http://lucene.apache.org/tika/ to extract text
instead of your own implementation. Did you evaluate this package?
Met vriendelijke groeten,
Bart Van den Abeele Email: bvda@... Helpdesk : +32 (09) 389 0560 Persoonlijk : +32 (09) 389 0564 Karel Vervaeke wrote: On Tue, Aug 25, 2009 at 4:02 PM, Bart Van den Abeelebvda@... wrote: **** DISCLAIMER **** _______________________________________________ daisy community mailing list Professional Daisy support: http://outerthought.org/en/services/daisy/support.html mail to: daisy@... list information: http://lists.cocoondev.org/mailman/listinfo/daisy |
|
|
Re: using TextExtractorI haven't heard of the project before. Daisy predates Tika, so we
couldn't have considered it at the time the text extractors were written. On Wed, Aug 26, 2009 at 11:04 AM, Bart Van den Abeele<bvda@...> wrote: > Perhaps you could use http://lucene.apache.org/tika/ to extract text instead > of your own implementation. Did you evaluate this package? > > Met vriendelijke groeten, > Bart Van den Abeele > Email: bvda@... > Helpdesk : +32 (09) 389 0560 > Persoonlijk : +32 (09) 389 0564 > > Karel Vervaeke wrote: > > On Tue, Aug 25, 2009 at 4:02 PM, Bart Van den Abeele<bvda@...> > wrote: > > > Hi, > > I got it to work. thx! > > I see that there is no support for open office spreadsheet .ods > (vnd.sun.xml.calc) or open office persentation .odp. Will this be added? > > > Nothing of the sort is scheduled. Feel free to create a jira issue > (and if possible, provide a patch). > AFAICT, the built-in OpenOfficeTextExtractor could work for ods files > without changes, so it > would merely be a question of registering the additional mimetypes. > (It needs to be tested to make sure though) > > I don't have any suggestions for the link-fulltext search problem > unfortunately. > > Regards, > Karel > > > > At the moment it works on the client-side, but it is way to heavy. I would > like it to run in background on the repository-server. Any idea's on how i > could do this? The server already extracts the text, but i got a different > case. I would like to agregate the text of several documents and put this > on an other document that refers to those documents. Example : I got a > Finance-document f1 which refers to an Attachement-document a1 (.odt) and a2 > (.ppt). I want the text of a1 and a2 concatted and then set on document > f1. The goal is to find document f1 when i search on text that is in a1 or > a2. > > Met vriendelijke groeten, > Bart Van den Abeele > Email: bvda@... > Helpdesk : +32 (09) 389 0560 > Persoonlijk : +32 (09) 389 0564 > > Karel Vervaeke wrote: > > I don't know what configuration you are referring to. > > The TextExtractor interface has only two methods: > List<String> getMimeTypes(); > String getText(InputStream is) throws Exception > which seem pretty clear to me. > > The built-in implementations use AbstractTextExtractor which takes > care of registering the text extractor. > > Have a look at the code and spring configuration files under > http://svn.daisycms.org/viewsvn/daisy/trunk/daisy/services/textextraction/impl/src/ > > HTH, > Karel > > On Mon, Aug 24, 2009 at 10:22 AM, Bart Van den > Abeele<bvda@...> wrote: > > > Hi, > > I would like to use the TextExtractor-classes to extract text from several > files, concat this text and then put it into a part of a document. I have a > document that links to several other documents and those documents all have > a part-type that contains a file. I want to find the document that links to > these documents by searching on the text that is in the files. That is why > i think i have to concat the text from te files and put this into a part on > the first document. Can you explain how i can reuse the TextExtractor? It > seems that i have to give a configuration with the interface. Where can i > find this? Is there a better way to do this? > > -- > Met vriendelijke groeten, > Bart Van den Abeele > Email: bvda@... > Helpdesk : +32 (09) 389 0560 > Persoonlijk : +32 (09) 389 0564 > > **** DISCLAIMER **** > http://www.schaubroeck.be/maildisclaimer.htm > > _______________________________________________ > daisy community mailing list > Professional Daisy support: > http://outerthought.org/en/services/daisy/support.html > mail to: daisy@... > list information: http://lists.cocoondev.org/mailman/listinfo/daisy > > > > > _______________________________________________ > daisy community mailing list > Professional Daisy support: > http://outerthought.org/en/services/daisy/support.html > mail to: daisy@... > list information: http://lists.cocoondev.org/mailman/listinfo/daisy > > > > **** DISCLAIMER **** > http://www.schaubroeck.be/maildisclaimer.htm > > _______________________________________________ > daisy community mailing list > Professional Daisy support: > http://outerthought.org/en/services/daisy/support.html > mail to: daisy@... > list information: http://lists.cocoondev.org/mailman/listinfo/daisy > > > > > _______________________________________________ > daisy community mailing list > Professional Daisy support: > http://outerthought.org/en/services/daisy/support.html > mail to: daisy@... > list information: http://lists.cocoondev.org/mailman/listinfo/daisy > > > > **** DISCLAIMER **** > http://www.schaubroeck.be/maildisclaimer.htm > > _______________________________________________ > daisy community mailing list > Professional Daisy support: > http://outerthought.org/en/services/daisy/support.html > mail to: daisy@... > list information: http://lists.cocoondev.org/mailman/listinfo/daisy > > daisy community mailing list Professional Daisy support: http://outerthought.org/en/services/daisy/support.html mail to: daisy@... list information: http://lists.cocoondev.org/mailman/listinfo/daisy |
| Free embeddable forum powered by Nabble | Forum Help |