|
View:
New views
9 Messages
—
Rating Filter:
Alert me
|
|
|
Can't render html entities when adding documentsThere's something funky with solr-ruby's xml processing when adding
documents, but I don't really know what it is yet. It can't process html entities at all, not even an html blank space " ": SEVERE: org.xmlpull.v1.XmlPullParserException: could not resolve entity named 'nbsp' (position: START_TAG seen ... to participate and contribute to the Open Source Community. ... @1:1085) Please look into it as soon as possible, acts_as_solr is using solr-ruby as the backend it cannot have a buggy behavior. Thanks. -- Thiago Jackiw acts_as_solr => http://acts-as-solr.railsfreaks.com |
|
|
Re: Can't render html entities when adding documentsOn 6/19/07, Thiago Jackiw <tjackiw@...> wrote:
> There's something funky with solr-ruby's xml processing when adding > documents, but I don't really know what it is yet. It can't process > html entities at all, not even an html blank space " ": nbsp is not a default XML entity. Try replacing it with -Yonik |
|
|
Re: Can't render html entities when adding documentsI'm was getting the same XmlPullParserException from solr while using
solr-ruby to index HTML. I solved things by running text through the html_escape() method in ERB::Utils before submitting to Solr. In the console, the following generates the XmlPullParserException in solr, which manifests itself as a Net::HTTPFatalError in solr-ruby: Solr::Connection.new(http://localhost:8083/solr, :autocommit => :on).add(:id => 1, :value_t => ' ') Net::HTTPFatalError: 500...XmlPullParserException... But escape_html (aliased as the h() method by default) characters works like a charm: include ERB::Util Solr::Connection.new(http://localhost:8083/solr, :autocommit => :on).add(:id => 1, :value_t => h(' ')) => true Subsequently, searching for strings like 'nbsp' returns hits on those escaped entities, which may or may not be what you want: >> Solr::Connection.new(SOLR_URL, :autocommit => :on).query('value_t:nbsp').hits => [{"score"=>10.771498, "id"=>1, "value_t"=>" "}] If you don't want searches for 'nbsp' to return all documents with escaped non-breaking spaces, the solution lies in defining some new fieldtype in solr/conf/schema.xml -Aaron Suggs On 6/19/07, Yonik Seeley <yonik@...> wrote: > On 6/19/07, Thiago Jackiw <tjackiw@...> wrote: > > There's something funky with solr-ruby's xml processing when adding > > documents, but I don't really know what it is yet. It can't process > > html entities at all, not even an html blank space " ": > > nbsp is not a default XML entity. > Try replacing it with > > -Yonik > |
|
|
Re: Can't render html entities when adding documentsOn 6/19/07, Yonik Seeley <yonik@...> wrote:
> On 6/19/07, Thiago Jackiw <tjackiw@...> wrote: > > There's something funky with solr-ruby's xml processing when adding > > documents, but I don't really know what it is yet. It can't process > > html entities at all, not even an html blank space " ": > > nbsp is not a default XML entity. > Try replacing it with Even though the current Solr behavior is correct, I'm practical over purist... if we could find a way to seed the XML parser with common HTML entities, I don't think I'd be opposed to it. -Yonik |
|
|
Re: Can't render html entities when adding documentsWhat's interesting is that on the previous versions of acts_as_solr
(without solr-ruby) the html entities where getting indexed fine without passing through ERB's html_escape method. That's that I did as a fast fix before starting this thread. Did anything change in Solr 1.2 in regards to xml parsing? And I guess I should try the previous version of the acts_as_solr plugin with Solr 1.2 to see if I get the same error. -- Thiago Jackiw acts_as_solr => http://acts-as-solr.railsfreaks.com On 6/19/07, Aaron Suggs <aaron@...> wrote: > I'm was getting the same XmlPullParserException from solr while using > solr-ruby to index HTML. > > I solved things by running text through the html_escape() method in > ERB::Utils before submitting to Solr. > > In the console, the following generates the XmlPullParserException in > solr, which manifests itself as a Net::HTTPFatalError in solr-ruby: > > Solr::Connection.new(http://localhost:8083/solr, :autocommit => > :on).add(:id => 1, :value_t => ' ') > Net::HTTPFatalError: 500...XmlPullParserException... > > But escape_html (aliased as the h() method by default) characters > works like a charm: > > include ERB::Util > Solr::Connection.new(http://localhost:8083/solr, :autocommit => > :on).add(:id => 1, :value_t => h(' ')) > => true > > Subsequently, searching for strings like 'nbsp' returns hits on those > escaped entities, which may or may not be what you want: > >> Solr::Connection.new(SOLR_URL, :autocommit => :on).query('value_t:nbsp').hits > => [{"score"=>10.771498, "id"=>1, "value_t"=>" "}] > > If you don't want searches for 'nbsp' to return all documents with > escaped non-breaking spaces, the solution lies in defining some new > fieldtype in solr/conf/schema.xml > > -Aaron Suggs > > On 6/19/07, Yonik Seeley <yonik@...> wrote: > > On 6/19/07, Thiago Jackiw <tjackiw@...> wrote: > > > There's something funky with solr-ruby's xml processing when adding > > > documents, but I don't really know what it is yet. It can't process > > > html entities at all, not even an html blank space " ": > > > > nbsp is not a default XML entity. > > Try replacing it with > > > > -Yonik > > > |
|
|
Re: Can't render html entities when adding documentsReplying to my own post, I just tried with solr 1.2 with the last 2
previous versions of acts_as_solr and it worked great, so I'm pretty sure this is a solr-ruby issue. I'll do some more testing with the way solr-ruby adds documents to Solr. -- Thiago Jackiw acts_as_solr => http://acts-as-solr.railsfreaks.com On 6/19/07, Thiago Jackiw <tjackiw@...> wrote: > What's interesting is that on the previous versions of acts_as_solr > (without solr-ruby) the html entities where getting indexed fine > without passing through ERB's html_escape method. That's that I did as > a fast fix before starting this thread. > > Did anything change in Solr 1.2 in regards to xml parsing? And I guess > I should try the previous version of the acts_as_solr plugin with Solr > 1.2 to see if I get the same error. > > -- > Thiago Jackiw > acts_as_solr => http://acts-as-solr.railsfreaks.com > > > On 6/19/07, Aaron Suggs <aaron@...> wrote: > > I'm was getting the same XmlPullParserException from solr while using > > solr-ruby to index HTML. > > > > I solved things by running text through the html_escape() method in > > ERB::Utils before submitting to Solr. > > > > In the console, the following generates the XmlPullParserException in > > solr, which manifests itself as a Net::HTTPFatalError in solr-ruby: > > > > Solr::Connection.new(http://localhost:8083/solr, :autocommit => > > :on).add(:id => 1, :value_t => ' ') > > Net::HTTPFatalError: 500...XmlPullParserException... > > > > But escape_html (aliased as the h() method by default) characters > > works like a charm: > > > > include ERB::Util > > Solr::Connection.new(http://localhost:8083/solr, :autocommit => > > :on).add(:id => 1, :value_t => h(' ')) > > => true > > > > Subsequently, searching for strings like 'nbsp' returns hits on those > > escaped entities, which may or may not be what you want: > > >> Solr::Connection.new(SOLR_URL, :autocommit => :on).query('value_t:nbsp').hits > > => [{"score"=>10.771498, "id"=>1, "value_t"=>" "}] > > > > If you don't want searches for 'nbsp' to return all documents with > > escaped non-breaking spaces, the solution lies in defining some new > > fieldtype in solr/conf/schema.xml > > > > -Aaron Suggs > > > > On 6/19/07, Yonik Seeley <yonik@...> wrote: > > > On 6/19/07, Thiago Jackiw <tjackiw@...> wrote: > > > > There's something funky with solr-ruby's xml processing when adding > > > > documents, but I don't really know what it is yet. It can't process > > > > html entities at all, not even an html blank space " ": > > > > > > nbsp is not a default XML entity. > > > Try replacing it with > > > > > > -Yonik > > > > > > |
|
|
Re: Can't render html entities when adding documentsThiago,
I'll have to look late this week/weekend if I get a chance then, but how did acts_as_solr create the XML passed to Solr? I think you used my original hack for that communication which used REXML, right? solr-ruby now supports both REXML and libxml2 - and I've found that libxml2 does things properly whereas REXML was screwing things up. I suspect we can come up with a simple test case that shows where things are wacky. If you can submit one of those I'll be glad to look into this as soon as I can (this weekend at the earliest). Erik On Jun 20, 2007, at 2:06 AM, Thiago Jackiw wrote: > Replying to my own post, I just tried with solr 1.2 with the last 2 > previous versions of acts_as_solr and it worked great, so I'm pretty > sure this is a solr-ruby issue. I'll do some more testing with the way > solr-ruby adds documents to Solr. > > -- > Thiago Jackiw > acts_as_solr => http://acts-as-solr.railsfreaks.com > > > On 6/19/07, Thiago Jackiw <tjackiw@...> wrote: >> What's interesting is that on the previous versions of acts_as_solr >> (without solr-ruby) the html entities where getting indexed fine >> without passing through ERB's html_escape method. That's that I >> did as >> a fast fix before starting this thread. >> >> Did anything change in Solr 1.2 in regards to xml parsing? And I >> guess >> I should try the previous version of the acts_as_solr plugin with >> Solr >> 1.2 to see if I get the same error. >> >> -- >> Thiago Jackiw >> acts_as_solr => http://acts-as-solr.railsfreaks.com >> >> >> On 6/19/07, Aaron Suggs <aaron@...> wrote: >> > I'm was getting the same XmlPullParserException from solr while >> using >> > solr-ruby to index HTML. >> > >> > I solved things by running text through the html_escape() method in >> > ERB::Utils before submitting to Solr. >> > >> > In the console, the following generates the >> XmlPullParserException in >> > solr, which manifests itself as a Net::HTTPFatalError in solr-ruby: >> > >> > Solr::Connection.new(http://localhost:8083/solr, :autocommit => >> > :on).add(:id => 1, :value_t => ' ') >> > Net::HTTPFatalError: 500...XmlPullParserException... >> > >> > But escape_html (aliased as the h() method by default) characters >> > works like a charm: >> > >> > include ERB::Util >> > Solr::Connection.new(http://localhost:8083/solr, :autocommit => >> > :on).add(:id => 1, :value_t => h(' ')) >> > => true >> > >> > Subsequently, searching for strings like 'nbsp' returns hits on >> those >> > escaped entities, which may or may not be what you want: >> > >> Solr::Connection.new(SOLR_URL, :autocommit => :on).query >> ('value_t:nbsp').hits >> > => [{"score"=>10.771498, "id"=>1, "value_t"=>" "}] >> > >> > If you don't want searches for 'nbsp' to return all documents with >> > escaped non-breaking spaces, the solution lies in defining some new >> > fieldtype in solr/conf/schema.xml >> > >> > -Aaron Suggs >> > >> > On 6/19/07, Yonik Seeley <yonik@...> wrote: >> > > On 6/19/07, Thiago Jackiw <tjackiw@...> wrote: >> > > > There's something funky with solr-ruby's xml processing when >> adding >> > > > documents, but I don't really know what it is yet. It can't >> process >> > > > html entities at all, not even an html blank space " ": >> > > >> > > nbsp is not a default XML entity. >> > > Try replacing it with >> > > >> > > -Yonik >> > > >> > >> |
|
|
Re: Can't render html entities when adding documentsFirstly: REXML Sucks!
good grief: <http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby- talk/161603> Text.new(" ",false,nil,false).to_s => " " I've added this currently failing test to server_test.rb: def test_entities @connection.add(:id => 1, :title_text => " ") response = @connection.query('nbsp') assert_equal 1, response.total_hits assert_equal '1', response.hits[0]['id'] end This works fine with libxml, but fails with REXML because of REXML's ridiculous escape-everything-not-already-escaped policy. At the moment I'm not sure how to resolve this, and I'm not currently sure how acts_as_solr worked with REXML any differently. Thiago - can you shed any light on that? My vote is to get rid of REXML support in solr-ruby and either require libxml-ruby to be installed or find some other lighter weight replacement. Thoughts? Erik On Jun 19, 2007, at 9:55 PM, Thiago Jackiw wrote: > There's something funky with solr-ruby's xml processing when adding > documents, but I don't really know what it is yet. It can't process > html entities at all, not even an html blank space " ": > > SEVERE: org.xmlpull.v1.XmlPullParserException: could not resolve > entity named 'nbsp' (position: START_TAG seen ... to participate and > contribute to the Open Source Community. ... @1:1085) > > Please look into it as soon as possible, acts_as_solr is using > solr-ruby as the backend it cannot have a buggy behavior. > > Thanks. > > -- > Thiago Jackiw > acts_as_solr => http://acts-as-solr.railsfreaks.com |
|
|
Re: Can't render html entities when adding documentsShedding more light on the REXML issue, with Ruby 1.8.6 it works!
irb(main):003:0> REXML::Text.new(" ",false,nil,false).to_s => " " irb(main):004:0> REXML::Text.new("&",false,nil,false).to_s => "&" So, do we require a higher version of Ruby (I was using 1.8.4 before)? Or.. ? Erik On Jun 24, 2007, at 11:02 AM, Erik Hatcher wrote: > Firstly: REXML Sucks! > > good grief: <http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby- > talk/161603> > > Text.new(" ",false,nil,false).to_s > => " " > > I've added this currently failing test to server_test.rb: > > def test_entities > @connection.add(:id => 1, :title_text => " ") > response = @connection.query('nbsp') > assert_equal 1, response.total_hits > assert_equal '1', response.hits[0]['id'] > end > > This works fine with libxml, but fails with REXML because of > REXML's ridiculous escape-everything-not-already-escaped policy. > At the moment I'm not sure how to resolve this, and I'm not > currently sure how acts_as_solr worked with REXML any differently. > Thiago - can you shed any light on that? > > My vote is to get rid of REXML support in solr-ruby and either > require libxml-ruby to be installed or find some other lighter > weight replacement. > > Thoughts? > > Erik > > > > On Jun 19, 2007, at 9:55 PM, Thiago Jackiw wrote: > >> There's something funky with solr-ruby's xml processing when adding >> documents, but I don't really know what it is yet. It can't process >> html entities at all, not even an html blank space " ": >> >> SEVERE: org.xmlpull.v1.XmlPullParserException: could not resolve >> entity named 'nbsp' (position: START_TAG seen ... to participate and >> contribute to the Open Source Community. ... @1:1085) >> >> Please look into it as soon as possible, acts_as_solr is using >> solr-ruby as the backend it cannot have a buggy behavior. >> >> Thanks. >> >> -- >> Thiago Jackiw >> acts_as_solr => http://acts-as-solr.railsfreaks.com |
| Free embeddable forum powered by Nabble | Forum Help |