XmlUpdateRequestHandler with HTMLStripCharFilterFactory

View: New views
2 Messages — Rating Filter:   Alert me  

XmlUpdateRequestHandler with HTMLStripCharFilterFactory

by aseem cheema :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I am trying to post a document with the following content using SolrJ:
<center>content</center>
I need the xml/html tags to be ignored. Even though this works fine in
analysis.jsp, this does not work with SolrJ, as the client escapes the
< and > with < and > and HTMLStripCharFilterFactory does not
strip those escaped tags. How can I achieve this? Any ideas will be
highly appreciated.

There is escapedTags in HTMLStripCharFilterFactory constructor. Is
there a way to get that to work?
Thanks
--
Aseem

Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory

by aseem cheema :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Alright. It turns out that escapedTags is not for what I thought it is for.
The problem that I am having with HTMLStripCharFilterFactory is that
it strips the html while indexing the field, but not while storing the
field. That is why what is see in analysis.jsp, which is index
analysis, does not match what gets stored... because.. well HTML is
stripped only for indexing. Makes so much sense.

Thanks to Ryan McKinley for clarifying this.
Aseem

On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema <aseemcheema@...> wrote:

> I am trying to post a document with the following content using SolrJ:
> <center>content</center>
> I need the xml/html tags to be ignored. Even though this works fine in
> analysis.jsp, this does not work with SolrJ, as the client escapes the
> < and > with < and > and HTMLStripCharFilterFactory does not
> strip those escaped tags. How can I achieve this? Any ideas will be
> highly appreciated.
>
> There is escapedTags in HTMLStripCharFilterFactory constructor. Is
> there a way to get that to work?
> Thanks
> --
> Aseem
>



--
Aseem