« Return to Thread: PDFText Plugin for PDF file scoring - not for PDF images

Re: PDFText Plugin for PDF file scoring - not for PDF images

by Theo Van Dinter-2 :: Rate this Message:

Reply to Author | View in Thread

On Sat, Jul 14, 2007 at 09:54:36AM -0300, James MacLean wrote:
> Where do I find information on hooking into post_message_parse()? Tried
> greping in the module area with no luck :(. Certainly agree it would be
> better to get the text out and let everyone at it :).

You can ask. :)  But yes, I didn't do a good job of fully documenting how
this is supposed to work -- you have to know about the plugin call, then
hunt around Message and Message::Node, etc.  Sorry.  Here's the basics:

First, create a plugin with the post_message_parse method.  Then in
there, use $msg->find_parts() to find the parts that you're looking
for (find_parts() is pretty well documented).  Then, you simply take
the data from $part->decode() and do something to convert it to text.
Then you take that text and call $part->set_rendered($text).

Later on, when SA looks for the text to use for body rules, uri parsing,
etc, it takes anything that has rendered text.

So here's a quick n' dirty sample that takes parts of "image/theo" and
"renders" them into "The plugin works!\n":

------------
package Mail::SpamAssassin::Plugin::RenderExample;

use Mail::SpamAssassin::Plugin;
use strict;
use warnings;

use vars qw(@ISA);
@ISA = qw(Mail::SpamAssassin::Plugin);

sub new {
  my $class = shift;
  my $mailsaobject = shift;
  $class = ref($class) || $class;
  my $self = $class->SUPER::new($mailsaobject);
  bless ($self, $class);
  return $self;
}

sub post_message_parse {
  my ($self, $opts) = @_;
  my $msg = $opts->{'message'};
  foreach my $p ( $msg->find_parts(qr!^image/theo$!, 1) ) {
    $p->set_rendered("The plugin works!\n");
  }
}

1;
------------

--
Randomly Selected Tagline:
"I'm a programmer: I don't buy software, I write it." - Tom Christiansen


attachment0 (196 bytes) Download Attachment

 « Return to Thread: PDFText Plugin for PDF file scoring - not for PDF images