Ocrad on Windows

View: New views
4 Messages — Rating Filter:   Alert me  

Ocrad on Windows

by smontanaro :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm one of the developers of SpamBayes (http://www.spambayes.org/).  A
frequent source of spam these days are messages with essentially no text (or
random gibberish) and one or more GIF images containing a pitch for
cheap pharmaceuticals or penny stocks.

I recently added code to use Ocrad to extract the text from these images:

    http://mail.python.org/pipermail/spambayes-dev/2006-August/003697.html
    http://mail.python.org/pipermail/spambayes-dev/2006-August/003699.html
    http://mail.python.org/pipermail/spambayes-dev/2006-August/003715.html

Even though ocrad doesn't do a great job at extracting human-readable text
from these images, it does a good enough job, and I expect it will get
better over time.  For this technique to be broadly useful in the SpamBayes
community, it will need to be available on Windows.  A couple developers
have compiled ocrad on Windows using cygwin with one small code change
("std::fprintf" -> "fprintf").  Can we distribute that executable on the
SpamBayes SF site (or convince you to do so) so that we can get Windows
users to test out my new additions?

Related to that, is there any interest in making an OCR library which can be
linked into other applications instead of requiring the program to be run?

Thanks,

--
Skip Montanaro - skip@... - http://www.mojam.com/
"On the academic side, effort is too often expended on finding precise
answers to the wrong questions." Baxter & Rennie, in "Financial Calculus"


_______________________________________________
Bug-ocrad mailing list
Bug-ocrad@...
http://lists.gnu.org/mailman/listinfo/bug-ocrad

Re: Ocrad on Windows

by Bugzilla from ant_diaz@teleline.es :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

skip@... wrote:
> For this technique to be broadly useful in the SpamBayes
> community, it will need to be available on Windows.  A couple developers
> have compiled ocrad on Windows using cygwin with one small code change
> ("std::fprintf" -> "fprintf").  Can we distribute that executable on the
> SpamBayes SF site (or convince you to do so) so that we can get Windows
> users to test out my new additions?

Of course you may distribute the executable, as long as you also
distribute the modified source as required by the GPL.

I have never tried ocrad with spam images, but I suppose they are
created to be seen on a monitor, and perhaps the text size is too small
for ocrad. Did you try to enlarge images with the --scale option?


> Related to that, is there any interest in making an OCR library which can be
> linked into other applications instead of requiring the program to be run?

Sort answer, no. Ocrad is currently too experimental as to develop a
consistent library interface based in it.


Best regards,
Antonio.


_______________________________________________
Bug-ocrad mailing list
Bug-ocrad@...
http://lists.gnu.org/mailman/listinfo/bug-ocrad

Re: Ocrad on Windows

by smontanaro :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    Antonio> skip@... wrote:
    >> For this technique to be broadly useful in the SpamBayes community,
    >> it will need to be available on Windows.  A couple developers have
    >> compiled ocrad on Windows using cygwin with one small code change
    >> ("std::fprintf" -> "fprintf").  Can we distribute that executable on
    >> the SpamBayes SF site (or convince you to do so) so that we can get
    >> Windows users to test out my new additions?

    Antonio> Of course you may distribute the executable, as long as you
    Antonio> also distribute the modified source as required by the GPL.

Cool.  Once we're set up I'll send you a pointer as a courtesy.  We have no
intention of forking Ocrad, we just want to make it available so Windows
users can help us test recent changes to SpamBayes' scoring.

    Antonio> I have never tried ocrad with spam images, but I suppose they
    Antonio> are created to be seen on a monitor, and perhaps the text size
    Antonio> is too small for ocrad. Did you try to enlarge images with the
    Antonio> --scale option?

So far I've only run it with no command line args.  A quick check with one
image I have laying about suggests that scaling up by two or three should
help recognition a bit.  I'll do some more tests with it.  Thanks for the
suggestion.

    >> Related to that, is there any interest in making an OCR library which
    >> can be linked into other applications instead of requiring the
    >> program to be run?

    Antonio> Sort answer, no. Ocrad is currently too experimental as to
    Antonio> develop a consistent library interface based in it.

Not a problem.  We're in the early stages as well of our endeavour as well.

Skip


_______________________________________________
Bug-ocrad mailing list
Bug-ocrad@...
http://lists.gnu.org/mailman/listinfo/bug-ocrad

Re: Ocrad on Windows

by smontanaro :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    Antonio> Of course you may distribute the executable, as long as you
    Antonio> also distribute the modified source as required by the GPL.

Done.  I added a release named "ocrad-cygwin" here:

    http://sourceforge.net/project/showfiles.php?group_id=61702

It's a simple zip file containing Ocrad 0.15, the compiled ocrad.exe file
and a patch for the source.

    Antonio> I have never tried ocrad with spam images, but I suppose they
    Antonio> are created to be seen on a monitor, and perhaps the text size
    Antonio> is too small for ocrad. Did you try to enlarge images with the
    Antonio> --scale option?

Scaling by a factor of two helped quite a bit.  Thanks again for the
suggestion.

Skip


_______________________________________________
Bug-ocrad mailing list
Bug-ocrad@...
http://lists.gnu.org/mailman/listinfo/bug-ocrad