Xapian: Term too long

View: New views
6 Messages — Rating Filter:   Alert me  

Xapian: Term too long

by Tero Tilus-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

sup-sync blows up like this

/home/terotil/src/sup/lib/sup/xapian_index.rb:446:in `replace_document': InvalidArgumentError: Term too long (> 245): Lfwd: =?iso-8859-1?q?tekij=e4n_oikeudet=5d?= (ArgumentError)
x-enigmail-version: 0.92.0.0
content-type: multipart/mixed;
 boundary="------------010606010007070802040301"
x-virus-scanned: amavisd-new at cc.jyu.fi
x-spam-status: no, hits=-2.373 required=5 tests=[awl=0.226, bayes_00=-2.599
        from /home/terotil/src/sup/lib/sup/xapian_index.rb:446:in `sync_message'
        from /usr/lib/ruby/1.8/monitor.rb:242:in `synchronize'
        from /home/terotil/src/sup/lib/sup/xapian_index.rb:363:in `synchronize'
        from /home/terotil/src/sup/lib/sup/xapian_index.rb:440:in `sync_message'
        from /home/terotil/src/sup/lib/sup/xapian_index.rb:92:in `add_message'
        from /home/terotil/src/sup/bin/sup-sync:211
        ...

Relevant part of the problematic mail looks like this

User-Agent: Debian Thunderbird 1.0.6 (X11/20050802)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: mutikainen@...
Subject: [Fwd: =?ISO-8859-1?Q?tekij=E4n_oikeudet=5D?=
X-Enigmail-Version: 0.92.0.0
Content-Type: multipart/mixed;
 boundary="------------010606010007070802040301"
X-Virus-Scanned: amavisd-new at cc.jyu.fi
X-Spam-Status: No, hits=-2.373 required=5 tests=[AWL=0.226, BAYES_00=-2.599]
X-Spam-Level:
X-Sorted: Whitelist
Content-Length: 11892

This is how I solved it for me, for now

diff --git a/lib/sup/xapian_index.rb b/lib/sup/xapian_index.rb
index ad45b0e..d3b3e25 100644
--- a/lib/sup/xapian_index.rb
+++ b/lib/sup/xapian_index.rb
@@ -443,7 +443,11 @@ EOS
         warn "docid underflow, dropping #{m.id.inspect}"
         return
       end
-      @xapian.replace_document docid, doc
+      begin
+        @xapian.replace_document docid, doc
+      rescue StandardError => err
+        warn "Failed to add message #{m.id.inspect} to Xapian index: #{err}"
+      end
     end
 
     m.labels.each { |l| LabelManager << l }

Looks like lib/sup/xapian_index.rb tries to override
Xapian::Document#add_term with a version which is wired to ditch too
long terms.  Only that you can't override methods just by including a
module.  Methods of the including class override methods in included
module.

terotil@sotka:~$ irb
> class Foo; def bar; :bar; end; end
=> nil
> module Baz; def bar; :baz; end; end
=> nil
> class Foo; include Baz; end
=> Foo
> Foo.new.bar
=> :bar
> Foo.ancestors
=> [Foo, Baz, Object, Kernel]  # Foo before Baz, methods in Foo take priority

It is still Foo#bar being called, not Baz#bar.  You need to open up
Xapian::Document and then do alias method chaining to override
methods.  Or you could do tricks like
http://coderrr.wordpress.com/2008/10/29/secure-alias-method-chaining/

--
Tero Tilus ## 050 3635 235 ## http://tero.tilus.net/
_______________________________________________
sup-talk mailing list
sup-talk@...
http://rubyforge.org/mailman/listinfo/sup-talk

Re: Xapian: Term too long

by William Morgan-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Reformatted excerpts from Tero Tilus's message of 2009-10-12:
> Looks like lib/sup/xapian_index.rb tries to override
> Xapian::Document#add_term with a version which is wired to ditch too
> long terms.  Only that you can't override methods just by including a
> module.  Methods of the including class override methods in included
> module.

Very good point. Thanks!
--
William <wmorgan-sup@...>
_______________________________________________
sup-talk mailing list
sup-talk@...
http://rubyforge.org/mailman/listinfo/sup-talk

[PATCH] xapian: replace DocumentMethods module with plain monkeypatching

by Rich Lane :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

---
 lib/sup/xapian_index.rb |   25 +++++++++++++++++++++++++
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/lib/sup/xapian_index.rb b/lib/sup/xapian_index.rb
index e1cfe65..c373c17 100644
--- a/lib/sup/xapian_index.rb
+++ b/lib/sup/xapian_index.rb
@@ -560,7 +560,32 @@ EOS
       raise "Invalid term type #{type}"
     end
   end
+end
 
 end
 
+class Xapian::Document
+  def entry
+    Marshal.load data
+  end
+
+  def entry=(x)
+    self.data = Marshal.dump x
+  end
+
+  def index_text text, prefix, weight=1
+    term_generator = Xapian::TermGenerator.new
+    term_generator.stemmer = Xapian::Stem.new(Redwood::XapianIndex::STEM_LANGUAGE)
+    term_generator.document = self
+    term_generator.index_text text, weight, prefix
+  end
+
+  alias old_add_term add_term
+  def add_term term
+    if term.length <= Redwood::XapianIndex::MAX_TERM_LENGTH
+      old_add_term term
+    else
+      warn "dropping excessively long term #{term}"
+    end
+  end
 end
--
1.6.4.2

_______________________________________________
sup-talk mailing list
sup-talk@...
http://rubyforge.org/mailman/listinfo/sup-talk

Re: [PATCH] xapian: replace DocumentMethods module with plain monkeypatching

by Rich Lane :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Disregard this one. (I thought master had already gotten my
update-message-state patch)

Excerpts from Rich Lane's message of Tue Oct 20 01:34:37 -0400 2009:

> ---
>  lib/sup/xapian_index.rb |   25 +++++++++++++++++++++++++
>  1 files changed, 25 insertions(+), 0 deletions(-)
>
> diff --git a/lib/sup/xapian_index.rb b/lib/sup/xapian_index.rb
> index e1cfe65..c373c17 100644
> --- a/lib/sup/xapian_index.rb
> +++ b/lib/sup/xapian_index.rb
> @@ -560,7 +560,32 @@ EOS
>        raise "Invalid term type #{type}"
>      end
>    end
> +end
>  
>  end
>  
> +class Xapian::Document
> +  def entry
> +    Marshal.load data
> +  end
> +
> +  def entry=(x)
> +    self.data = Marshal.dump x
> +  end
> +
> +  def index_text text, prefix, weight=1
> +    term_generator = Xapian::TermGenerator.new
> +    term_generator.stemmer =
> Xapian::Stem.new(Redwood::XapianIndex::STEM_LANGUAGE)
> +    term_generator.document = self
> +    term_generator.index_text text, weight, prefix
> +  end
> +
> +  alias old_add_term add_term
> +  def add_term term
> +    if term.length <= Redwood::XapianIndex::MAX_TERM_LENGTH
> +      old_add_term term
> +    else
> +      warn "dropping excessively long term #{term}"
> +    end
> +  end
>  end
_______________________________________________
sup-talk mailing list
sup-talk@...
http://rubyforge.org/mailman/listinfo/sup-talk

[PATCH] xapian: replace DocumentMethods module with plain monkeypatching

by Rich Lane :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

---
 lib/sup/xapian_index.rb |   47 ++++++++++++++++++++++-------------------------
 1 files changed, 22 insertions(+), 25 deletions(-)

diff --git a/lib/sup/xapian_index.rb b/lib/sup/xapian_index.rb
index ad45b0e..34d67d5 100644
--- a/lib/sup/xapian_index.rb
+++ b/lib/sup/xapian_index.rb
@@ -565,35 +565,32 @@ EOS
       raise "Invalid term type #{type}"
     end
   end
+end
 
-  module DocumentMethods
-    def entry
-      Marshal.load data
-    end
-
-    def entry=(x)
-      self.data = Marshal.dump x
-    end
+end
 
-    def index_text text, prefix, weight=1
-      term_generator = Xapian::TermGenerator.new
-      term_generator.stemmer = Xapian::Stem.new(STEM_LANGUAGE)
-      term_generator.document = self
-      term_generator.index_text text, weight, prefix
-    end
+class Xapian::Document
+  def entry
+    Marshal.load data
+  end
 
-    def add_term term
-      if term.length <= MAX_TERM_LENGTH
-        super term
-      else
-        warn "dropping excessively long term #{term}"
-      end
-    end
+  def entry=(x)
+    self.data = Marshal.dump x
   end
-end
 
-end
+  def index_text text, prefix, weight=1
+    term_generator = Xapian::TermGenerator.new
+    term_generator.stemmer = Xapian::Stem.new(Redwood::XapianIndex::STEM_LANGUAGE)
+    term_generator.document = self
+    term_generator.index_text text, weight, prefix
+  end
 
-class Xapian::Document
-  include Redwood::XapianIndex::DocumentMethods
+  alias old_add_term add_term
+  def add_term term
+    if term.length <= Redwood::XapianIndex::MAX_TERM_LENGTH
+      old_add_term term
+    else
+      warn "dropping excessively long term #{term}"
+    end
+  end
 end
--
1.6.4.2

_______________________________________________
sup-talk mailing list
sup-talk@...
http://rubyforge.org/mailman/listinfo/sup-talk

Re: [PATCH] xapian: replace DocumentMethods module with plain monkeypatching

by William Morgan-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Branch xapian-bugfix, merged into next. Thanks!
--
William <wmorgan-sup@...>
_______________________________________________
sup-talk mailing list
sup-talk@...
http://rubyforge.org/mailman/listinfo/sup-talk