|
View:
New views
3 Messages
—
Rating Filter:
Alert me
|
|
|
HTMLStripCharFilterFactory not working when using SolrJ java clientHey Guys,
I have HTMLStripCharFilterFactory char filter declared in my schema.xml for fieldType text (code below). I am using this field type for body field of my schema. I am seeing different behavior when I use SolrJ to post a document (code below) and when I use the analysis.jsp. The text I am putting in the field is <center>content</center>. When SolrJ is used, the field gets the whole value <center>content</center>, but when analysis.jsp is used, it shows only "content" being used for the field. What am I possibly doing wrong here? How do I get HTMLStripCharFilterFactory to work, even if I am pushing data using SolrJ. Thanks. Your help is highly appreciated. Thanks -- Aseem ############# schema.xml ###################### <analyzer type="index"> <charFilter class="solr.HTMLStripCharFilterFactory"/> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> ################## SolrJ Code ###################### CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://aseem.desktop.amazon.com:8983/solr/sharepoint"); SolrInputDocument doc = new SolrInputDocument(); UpdateRequest req = new UpdateRequest(); doc.addField("url", "http://haha.com"); doc.addField("body", sbr.toString());*/ doc.addField("body", "<center>content</center>"); req.add(doc); req.setAction(ACTION.COMMIT, false, false); UpdateResponse resp = req.process(server); System.out.println(resp); |
|
|
Re: HTMLStripCharFilterFactory not working when using SolrJ java clientI printed the UpdateRequest object (getXML) and the XML is:
<add><doc boost="1.0"><field name="url">http://haha.com</field><field name="body"><center>content</center></field></doc></add> I can see that the issue is because the HTML/XML <> are replaced by < > I understand that it is required to do so to keep them from interfering with the solr xml document, but how do I accomplish what I want to? I need to get the html in body field stripped out. Any help is highly appreciated. Thanks Aseem On Tue, Nov 10, 2009 at 10:56 AM, aseem cheema <aseemcheema@...> wrote: > Hey Guys, > I have HTMLStripCharFilterFactory char filter declared in my > schema.xml for fieldType text (code below). I am using this field type > for body field of my schema. I am seeing different behavior when I use > SolrJ to post a document (code below) and when I use the analysis.jsp. > The text I am putting in the field is <center>content</center>. > > When SolrJ is used, the field gets the whole value > <center>content</center>, but when analysis.jsp is used, it shows only > "content" being used for the field. > > What am I possibly doing wrong here? How do I get > HTMLStripCharFilterFactory to work, even if I am pushing data using > SolrJ. Thanks. > > Your help is highly appreciated. > Thanks > -- > Aseem > > ############# schema.xml ###################### > <analyzer type="index"> > <charFilter class="solr.HTMLStripCharFilterFactory"/> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.StopFilterFactory" > ignoreCase="true" > words="stopwords.txt" > enablePositionIncrements="true" > /> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" > splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SynonymFilterFactory" > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > <filter class="solr.EnglishPorterFilterFactory" > protected="protwords.txt"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > > ################## SolrJ Code ###################### > CommonsHttpSolrServer server = new > CommonsHttpSolrServer("http://aseem.desktop.amazon.com:8983/solr/sharepoint"); > SolrInputDocument doc = new SolrInputDocument(); > UpdateRequest req = new UpdateRequest(); > doc.addField("url", "http://haha.com"); > doc.addField("body", sbr.toString());*/ > doc.addField("body", "<center>content</center>"); > req.add(doc); > req.setAction(ACTION.COMMIT, false, false); > UpdateResponse resp = req.process(server); > System.out.println(resp); > -- Aseem |
|
|
Re: HTMLStripCharFilterFactory not working when using SolrJ java clientHTMLStripCharFilterFactory class has a constructor that accept
escaptedTags. I believe this will solve my problem. But I am not sure how to pass this from schema.xml file. I have tried <charFilter class="solr.HTMLStripCharFilterFactory" escapedTags="<,>"/> but that didn't work. Anybody? Thanks On Tue, Nov 10, 2009 at 10:56 AM, aseem cheema <aseemcheema@...> wrote: > Hey Guys, > I have HTMLStripCharFilterFactory char filter declared in my > schema.xml for fieldType text (code below). I am using this field type > for body field of my schema. I am seeing different behavior when I use > SolrJ to post a document (code below) and when I use the analysis.jsp. > The text I am putting in the field is <center>content</center>. > > When SolrJ is used, the field gets the whole value > <center>content</center>, but when analysis.jsp is used, it shows only > "content" being used for the field. > > What am I possibly doing wrong here? How do I get > HTMLStripCharFilterFactory to work, even if I am pushing data using > SolrJ. Thanks. > > Your help is highly appreciated. > Thanks > -- > Aseem > > ############# schema.xml ###################### > <analyzer type="index"> > <charFilter class="solr.HTMLStripCharFilterFactory"/> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.StopFilterFactory" > ignoreCase="true" > words="stopwords.txt" > enablePositionIncrements="true" > /> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" > splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SynonymFilterFactory" > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > <filter class="solr.EnglishPorterFilterFactory" > protected="protwords.txt"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > > ################## SolrJ Code ###################### > CommonsHttpSolrServer server = new > CommonsHttpSolrServer("http://aseem.desktop.amazon.com:8983/solr/sharepoint"); > SolrInputDocument doc = new SolrInputDocument(); > UpdateRequest req = new UpdateRequest(); > doc.addField("url", "http://haha.com"); > doc.addField("body", sbr.toString());*/ > doc.addField("body", "<center>content</center>"); > req.add(doc); > req.setAction(ACTION.COMMIT, false, false); > UpdateResponse resp = req.process(server); > System.out.println(resp); > -- Aseem |
| Free embeddable forum powered by Nabble | Forum Help |