|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
Feature Request for less noisy outputHere we have to do with plenty documents with lots of white space.
A usual headache are patches (lines) of streaky noises, in between valid text lines. I have now started to write a filter; using -x and then extracting the line numbers from there, and store the line numbers with low height. Then I split the OCR-ed text into its lines to purge those lines from the text. Cumbersome, I thought. And then I had the impression that this might be done much easier within ocrad; with an option, somewhat like ocrad -h <height> that simply suppresses the output of lines with a height of <height> or lower. In my humble opinion, the file created with -x should still show everything. But text output would be much cleaner from dirt and dust, when any 'line' of a height below a certain threshold is simply dropped when using this option. I am well aware, that this would as well drop a straight, horizontal line, though, but would not matter in our case. We fight much more with patches and dots of dirt on the scanner surface, that usually screw up inter-line white-space; adding dots, dashes and underscores into the text. Uwe _______________________________________________ Bug-ocrad mailing list Bug-ocrad@... http://lists.gnu.org/mailman/listinfo/bug-ocrad |
|
|
Re: Feature Request for less noisy outputI wrote a little processing script which you might find usefull.
See http://www.gbnetwork.co.uk/mailscanner/gbpgmdiff/ On Fri, 2007-05-25 at 15:50, Uwe Dippel wrote: > Here we have to do with plenty documents with lots of white space. > A usual headache are patches (lines) of streaky noises, in between valid > text lines. > I have now started to write a filter; using -x and then extracting the line > numbers from there, and store the line numbers with low height. Then I split > the OCR-ed text into its lines to purge those lines from the text. > Cumbersome, I thought. And then I had the impression that this might be done > much easier within ocrad; with an option, somewhat like > ocrad -h <height> > that simply suppresses the output of lines with a height of <height> or > lower. > In my humble opinion, the file created with -x should still show everything. > But text output would be much cleaner from dirt and dust, when any 'line' of > a height below a certain threshold is simply dropped when using this option. > I am well aware, that this would as well drop a straight, horizontal line, > though, but would not matter in our case. We fight much more with patches > and dots of dirt on the scanner surface, that usually screw up inter-line > white-space; adding dots, dashes and underscores into the text. > > Uwe > _______________________________________________ > Bug-ocrad mailing list > Bug-ocrad@... > http://lists.gnu.org/mailman/listinfo/bug-ocrad _______________________________________________ Bug-ocrad mailing list Bug-ocrad@... http://lists.gnu.org/mailman/listinfo/bug-ocrad |
|
|
Re: Feature Request for less noisy outputHello Uwe,
I think I can implement this as a filter in ocrad. Something like "--filter line_height=MIN[,MAX]", but it will surely need some reorganization of the code because all currently implemented filters are caracter filters, not line filters. Also the current filters affect the '-x' output. Regards, Antonio. _______________________________________________ Bug-ocrad mailing list Bug-ocrad@... http://lists.gnu.org/mailman/listinfo/bug-ocrad |
|
|
Re: Feature Request for less noisy outputAntonio Diaz Diaz wrote:
> I think I can implement this as a filter in ocrad. Something like > "--filter line_height=MIN[,MAX]", but it will surely need some > reorganization of the code because all currently implemented filters are > caracter filters, not line filters. Yes, would be useful here. Thanks for looking into it. > Also the current filters affect the > '-x' output. I am not so sure about what I wrote with respect to this. What do you, what do the others think ? Should -x output all characters or only those above the limit ? Uwe _______________________________________________ Bug-ocrad mailing list Bug-ocrad@... http://lists.gnu.org/mailman/listinfo/bug-ocrad |
| Free embeddable forum powered by Nabble | Forum Help |