|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
FYI - same books - new filesGreetings!
To those who downloaded the May 2008 OCR version of the OS/VS2 MVS Overview manual, this is to let you know that the page images have been reOCRed with what I think is a better end result, so you may feel like downloading the newer file. To those who downloaded the April 2007 OCR version of the OS/360 PL/I LRM, this is to let you know that a new PDF has been cut with JPEG image compression, which reduces the 25 million bytes to 6.75 million without any other noticable changes, so you may deem a new download also opportune. Both of these PDFs have been linearized for fast web view, and can be downloaded from http://www.prycroft6.com.au/misc/ Cheers, Greg |
|
|
Re: FYI - same books - new files> Greg Price wrote:
<snip> > To those who downloaded the April 2007 OCR version of > the OS/360 PL/I LRM, this is to let you know that a new > PDF has been cut with JPEG image compression, which > reduces the 25 million bytes to 6.75 million without any > other noticable changes, so you may deem a new download > also opportune. > Both of these PDFs have been linearized for fast web view, > and can be downloaded from > http://www.prycroft6.com.au/misc/ There were 2 reasons DjVu was chosen for softlib.org, size of files and fast web view. You mention both these reasons above, but concerning PDF files. My scans of manuals are at 600 DPI and I am curious as to the comparision between DjVu and the method you mention above. Do you have a reference? Want a test Tiff bundle of a manual? Phil |
|
|
Re: Re: FYI - same books - new filesOn Mon, 24 Aug 2009 01:46:54 -0000, halfmeg wrote:
>There were 2 reasons DjVu was chosen for softlib.org, size of files and fast web view. You mention both these reasons above, but concerning PDF files. >My scans of manuals are at 600 DPI and I am curious as to the comparision between DjVu and the method you mention above. Do you have a reference? Want a test Tiff bundle of a manual? >Phil Hi Phil, I may have waffled on about this before, so apologies it this is all a repeat. I am interested in having searchable manuals which look (on the screen) much like the original. Ideally some magic piece of software would digest the scanned images of the original pages, and be able to reproduce them in character form in the correct fonts and matching page layout. Using the text with appropriate display attributes means that the data can be stored in much less space than pictures of each page could be. The first manual I wanted searchable was the PL/I LRM so I could easily find out which bits of the current PL/I were in the OS/360 version and which bits weren't. Although I could OCR the PL/I LRM the re-rendering of the recognized text by the package I used was so unimpressive that I decided the only usable form was to keep the original page images for the display and hiding the text words behind it. Hence the original 25 million byte searchable version. If you just clicked on the link the whole 25 million bytes had to be downloaded before the first page was shown. Yuk. But the plan was that interested folks could "Save linked file as...". With a new release of the OCR package I can now save the same book in about a quarter of the space and it looks the same to me, and the first page is now shown quickly if accessed of the web. So, I decided to issue a FYI although I still expect downloads for subsequent local reference rather than using the web each time the book is to be accessed. The 370 manuals seem to render much better than the 360 manuals, so the Overview manuals are not "pictures over text" but just the actual text meaning that images (other than non-text figures) are not stored in the PDF. I also wanted a system codes manual that was searchable. I got it from your DjVu hosting and used that DLL (under Cygwin?) to get the TIFFs which I then OCRed. The result is the one on my site which is again just text and not bitmaps (in the main). I do agree that that the DjVu accessing of manuals is faster than accessing PDFs (ie. smaller files), but I don't think the softlib.org books are searchable by text content, hence my modest effort at OCRing. If the scan quality is not good or the original book had flaws I have to spend time editing bitmaps to reconstruct the text characters, or to make the pictures black if the scans are in greyscale. Quite a tedious process really. I don't know which manuals are the most important to have as searchable. I have Supervisor SPL and maybe even System Messages on my plate. After that I could trawl softlib to see what I think might be good to get the TIFFs for. Of course, smaller manuals take less time to process than larger ones, but then usually the need for text searching is proportionally smaller. And then I got a ton of paper books which aren't even scanned yet... So, what's the most important manual to have in a searchable form? Cheers, Greg |
|
|
Re: FYI - same books - new files> Greg Price wrote:
><snip> > I may have waffled on about this before, so apologies > it this is all a repeat. A refresher helps, at least for me it does. > I am interested in having searchable manuals which look > (on the screen) much like the original. Ideally some magic > piece of software would digest the scanned images of the > original pages, and be able to reproduce them in character > form in the correct fonts and matching page layout. > Using the text with appropriate display attributes means > that the data can be stored in much less space than > pictures of each page could be. > The first manual I wanted searchable was the PL/I LRM > so I could easily find out which bits of the current PL/I > were in the OS/360 version and which bits weren't. > Although I could OCR the PL/I LRM the re-rendering > of the recognized text by the package I used was so > unimpressive that I decided the only usable form was > to keep the original page images for the display and > hiding the text words behind it. > Hence the original 25 million byte searchable version. > If you just clicked on the link the whole 25 million > bytes had to be downloaded before the first page > was shown. Yuk. But the plan was that interested > folks could "Save linked file as...". > With a new release of the OCR package I can now > save the same book in about a quarter of the space > and it looks the same to me, and the first page is > now shown quickly if accessed of the web. So, I > decided to issue a FYI although I still expect downloads > for subsequent local reference rather than using the > web each time the book is to be accessed. > The 370 manuals seem to render much better than > the 360 manuals, so the Overview manuals are not > "pictures over text" but just the actual text meaning > that images (other than non-text figures) are not > stored in the PDF. > I also wanted a system codes manual that was searchable. > I got it from your DjVu hosting and used that DLL (under > Cygwin?) to get the TIFFs which I then OCRed. The > result is the one on my site which is again just text > and not bitmaps (in the main). > I do agree that that the DjVu accessing of manuals is > faster than accessing PDFs (ie. smaller files), but I > don't think the softlib.org books are searchable by > text content, hence my modest effort at OCRing. There is a method, which I haven't mastered yet, to use Abby FineReader, which I purchased, to OCR and provide DjVu with searchable text files. > If the scan quality is not good or the original book > had flaws I have to spend time editing bitmaps > to reconstruct the text characters, or to make the > pictures black if the scans are in greyscale. Quite > a tedious process really. > I don't know which manuals are the most important > to have as searchable. I have Supervisor SPL and > maybe even System Messages on my plate. After > that I could trawl softlib to see what I think might > be good to get the TIFFs for. Of course, smaller > manuals take less time to process than larger ones, > but then usually the need for text searching is > proportionally smaller. Let me know on list or off which TIFFs you might want to mess with. The conversion to and from DjVu back to TIFF does introduce a difference as the encode is in 'lossy' mode. > And then I got a ton of paper books which aren't > even scanned yet... > So, what's the most important manual to have in > a searchable form? Unknown currently. Phil |
| Free embeddable forum powered by Nabble | Forum Help |