|
View:
New views
10 Messages
—
Rating Filter:
Alert me
|
|
|
|
|
|
Re: Digitisation equipmentOn Wed, Aug 26, 2009 at 6:27 PM, Andrew
Turvey<andrewrturvey@...> wrote: > We had a discussion at a recent Wikimedia UK board meeting about potentially > buying some digitisation equipment which could be used to generate content > for the Wikimedia projects. This recent email to the EN-WP list sparked my > interest. > > Does anyone have any experience with equipment like this, and could you > recommend anything? Any idea what the price range and quality typically is? > > Also, is anyone else in the Wikimedia community currently doing this? For digitizing what? Archive.org digitizes books using a pair of canon 1Ds (? perhaps it was a 5D? In any case the 5DII would be sufficient now) on a custom stand with a hacked up copy of gphoto2 to actuate the cameras. Turn the page, click a button... It avoids the stress on the books that a flatbed scanner would add and is faster to boot. I'm not sure how they're dealing with curvature (I think they just may lay a glass plate on the pages), but it would be easy enough to solve using a laser pointer with a pattern generating holographic grating and a second exposure to capture the page distortion and some fairly simple software processing after the fact. _______________________________________________ Commons-l mailing list Commons-l@... https://lists.wikimedia.org/mailman/listinfo/commons-l |
|
|
Re: Digitisation equipmentHi!
Just tried this Sunday... 6MP digital SLR camera + tripod is good enough to make images of book pages for OCR. Speed is ~ 10-15 seconds per page. However, if you want good quality images, you should use regular scanner, adjust page position, press book. Eugene. On Wed, Aug 26, 2009 at 3:27 PM, Andrew Turvey<andrewrturvey@...> wrote: > We had a discussion at a recent Wikimedia UK board meeting about potentially > buying some digitisation equipment which could be used to generate content > for the Wikimedia projects. This recent email to the EN-WP list sparked my > interest. > > Does anyone have any experience with equipment like this, and could you > recommend anything? Any idea what the price range and quality typically is? > > Also, is anyone else in the Wikimedia community currently doing this? > > Thanks, > > ---- Forwarded Message ----- > From: "Steve Bennett" <stevagewp@...> > To: "English Wikipedia" <wikien-l@...> > Sent: Sunday, 23 August, 2009 10:55:32 GMT +00:00 GMT Britain, Ireland, > Portugal > Subject: Re: [WikiEN-l] Wikipedia reaches 3 millionth article > > On Wed, Aug 19, 2009 at 11:15 PM, David Gerard<dgerard@...> wrote: >> I believe they have machines to turn pages, and something to figure >> out the distorted photo of the book and render it how it would look as >> a flat page. > > Yeah, there are videos of these machines. The book sits open, the > scanner comes down and scans both open pages at once. As it goes up > again, it sucks on one page, causing it to flip over. Then repeat. > > Oh, look, here you go: > http://www.youtube.com/watch?v=hlOQuuLYavY > > And while we're at it: > http://en.wikipedia.org/wiki/Book_scanning > > Steve > > _______________________________________________ > WikiEN-l mailing list > WikiEN-l@... > To unsubscribe from this mailing list, visit: > https://lists.wikimedia.org/mailman/listinfo/wikien-l > > _______________________________________________ > Commons-l mailing list > Commons-l@... > https://lists.wikimedia.org/mailman/listinfo/commons-l > > _______________________________________________ Commons-l mailing list Commons-l@... https://lists.wikimedia.org/mailman/listinfo/commons-l |
|
|
Re: Digitisation equipmentOn Thu, Aug 27, 2009 at 8:27 AM, Andrew
Turvey<andrewrturvey@...> wrote: > We had a discussion at a recent Wikimedia UK board meeting about potentially > buying some digitisation equipment which could be used to generate content > for the Wikimedia projects. This recent email to the EN-WP list sparked my > interest. > > Does anyone have any experience with equipment like this, and could you > recommend anything? Any idea what the price range and quality typically is? > > Also, is anyone else in the Wikimedia community currently doing this? This came up on the Australian Wikimedia list. http://lists.wikimedia.org/pipermail/wikimediaau-l/2009-August/002606.html I think it is terribly inefficient for Wikimedians to start mass scanning projects while we have so few people engaging in transcription projects. Libraries have scanned millions of books, and there is no signs that they are going to stop. Commons and Wikisource should be mining and transcribing these books which are already scanned. http://lists.wikimedia.org/pipermail/wikimediaau-l/2009-August/002611.html -- John Vandenberg _______________________________________________ Commons-l mailing list Commons-l@... https://lists.wikimedia.org/mailman/listinfo/commons-l |
|
|
|
|
|
Re: Digitisation equipmentAndrew Turvey, 27/08/2009 16:20:
> The concept we were thinking about is linking with municipal archives, saying - we'll scan your records for you if you release them to us copyright-free afterwards. Not sure if it's a runner at the moment, which is why I'm asking the question to see what others have done. Well, yes. I think that 100,000 € is the basic investment (see IFLA 2009). > Could you tell me more about the "transcription" tasks? Have we got access to any resources that are awaiting transcription? Do you mean apart from 1,593,519 texts only on http://www.archive.org/details/texts ? Nemo _______________________________________________ Commons-l mailing list Commons-l@... https://lists.wikimedia.org/mailman/listinfo/commons-l |
|
|
Re: Digitisation equipmentOn Fri, Aug 28, 2009 at 12:20 AM, Andrew
Turvey<andrewrturvey@...> wrote: > Yes, I take your point. However, much of the scanned material is subject to copyright, and the people who've invested in the scanning are often keen to get a return on their investment and not release it to us! The concept we were thinking about is linking with municipal archives, saying - we'll scan your records for you if you release them to us copyright-free afterwards. Not sure if it's a runner at the moment, which is why I'm asking the question to see what others have done. Organise for these records to be donated to a commons-friendly library or archive, and let them do what they are good at. We are good a tasks that require lots of people. There is a distinct lack of digitised works in languages other than English, and I can understand Wikimedia chapters taking a leading role in those countries. > Could you tell me more about the "transcription" tasks? Have we got access to any resources that are awaiting transcription? Wikisource is the transcription project; there is an abundance of tasks, and not enough people. See my email: >> http://lists.wikimedia.org/pipermail/wikimediaau-l/2009-August/002611.html If you want to talk to a local, Charles Matthews is the most active Brit that I can think of quickly. http://en.wikisource.org/wiki/Special:Contributions/Charles_Matthews Here are two very important texts which WMUK could push to completion: http://en.wikisource.org/wiki/Index:The_copyright_act,_1911,_annotated.djvu http://en.wikisource.org/wiki/Index:A_treatise_upon_the_law_of_copyright.djvu As far as I know, there is no complete etext of the original 1911 Copyright Act. http://en.wikipedia.org/wiki/Copyright_Act_1911 -- John Vandenberg _______________________________________________ Commons-l mailing list Commons-l@... https://lists.wikimedia.org/mailman/listinfo/commons-l |
|
|
Re: Digitisation equipmentAndrew Turvey wrote:
> We had a discussion at a recent Wikimedia UK board meeting about > potentially buying some digitisation equipment which could be > used to generate content for the Wikimedia projects. This recent > email to the EN-WP list sparked my interest. > > Does anyone have any experience with equipment like this, and > could you recommend anything? Any idea what the price range and > quality typically is? > > Also, is anyone else in the Wikimedia community currently doing > this? I'm on the board of Wikimedia Sverige (Sweden), and also the founder (in 1992) of Project Runeberg, the Scandinavian offspring of Project Gutenberg. The Swedish-language Wikisource isn't doing much, because Project Runeberg still does a lot of book scanning. Its archive now contains images of 550,000 book pages, corresponding to nearly 30 linear metres of shelving. Book digitization is a matter of using the right tools for each job. Much depends on the kind of book and the kind of labour. If you use unpaid volunteers, you can afford slower equipment. If you need to pay your staff, any equipment that speeds up the work will quickly pay its own cost, including those very expensive "professional" book scanners. On the scale of Google Book Search, aiming to digitize millions of books, it pays off to let Google engineers work on developing even faster equipment, just like Google develops its own Linux-based storage architecture. It's hard to measure the usefulness of a digitized book, since Wikisource (and Project Runeberg) doesn't have any income. (And neither has Google Book Search, I believe.) If your success is measured in how much money you spend (as some charities have it), it is very easy to invest a lot of money, without much result. The worst you can do is to spend a lot to digitize something that is already available for free download. Look around first. You should start to think of what do you want to achieve? Is there some book, or genre of books, that would be really useful for Wikipedia to have on Wikisource? Anything British that all those American projects haven't already covered? For us from non-English speaking countries, it's far easier. Very little has been digitized, so there is a lot to do. The most useful thing is to digitize an old encyclopedia, just like that 11th edition of Encyclopaedia Britannica (from 1911). Now, encyclopedias are common items in used bookstores or online auctions. You can buy 20 volumes for 200 euro, or even cheaper. At that price, the best investment is a paper cutter (or ask a print shop to help you) and a two-sided (duplex) sheet-feeding scanner, such as the Canon DR-2050C or Fujitsu Scansnap s510. http://www.youtube.com/watch?v=1oH3mQZLpL8 OCR software might be included with the scanner. Or you can buy www.finereader.com for 160 euro. The total investment would be less than 1000 euro (scanner + OCR software + 20 volume encyclopedia + ask a print shop to cut the spines). After this, you only need hours of volunteer work. That's how I digitized the "New Student's Reference Work" (from 1914, 5 volumes, some 2500 pages) for Wikisource in 2005, only to show that Wikisource could be used that way. Here are some old pictures (with Swedish text, from 2001), http://runeberg.org/admin/snuff.html These scanners and that OCR software are not open source products, but neither is my digital camera, and I use that to produce free pictures for Wikimedia Commons. I know there have been many attempts to make free OCR software, but is it any good? In fact, if you have books where you can't afford to cut the spine, maybe some rare thing that you only find in a library, a 10 megapixel digital camera is very useful. You need to experiment a little with tripod stands and good lights. You only need to open the book at 90 degrees, to get a good view of a page, which is much friendlier to the book than an old flatbed scanner (and faster). If you have two cameras, you can shoot left pages with one, and right pages with the other. That's in fact how the fastest modern "book scanners" work. Google builds their own, and so can you. Some radical ideas are found on http://bkrpr.org/ Again, a pair of digital cameras is a total investment of less than 1000 euro. That's a good starting range. You can achieve a lot, and learn even more, in very little time. What can you do with 2000 euro? Buy one sheet-feeding set, and one pair of digital cameras. Let two teams compete against each other. Write a report for next year's Wikimania. Have great fun! -- Lars Aronsson (lars@...) Aronsson Datateknik - http://aronsson.se Project Runeberg - free Nordic literature - http://runeberg.org/ Wikimedia Sverige - stöd fri kunskap - http://wikimedia.se/ _______________________________________________ Commons-l mailing list Commons-l@... https://lists.wikimedia.org/mailman/listinfo/commons-l |
|
|
Re: Digitisation equipmentGregory Maxwell wrote:
> For digitizing what? Exactly, that's the first question. > Archive.org digitizes books using a pair of canon 1Ds (? perhaps > it was a 5D? In any case the 5DII would be sufficient now) on a > custom stand with a hacked up copy of gphoto2 to actuate the > cameras. That's Brewster Kahle doing things many years ago (2002? 2003?). Today, a much cheaper low-end digital SLR, or even compact cameras will give you the needed 10 or so megapixels. But again, if you need to pay your staff, a ten times more expensive camera might easily pay its own cost in increased speed, or increased shutter lifespan. > I'm not sure how they're dealing with curvature (I think they > just may lay a glass plate on the pages), but it would be easy > enough to solve using a laser pointer with a pattern generating > holographic grating and a second exposure to capture the page > distortion and some fairly simple software processing after the > fact. The Internet Archive apparently uses a fixed glass, and lowers the book cradle to turn pages, http://aipengineering.com/scribe/ Other designs have a fixed book cradle and lifts the glass, e.g. the Atiz DIY, http://diy.atiz.com/ I thought the Internet Archive design was very clever, since it keeps a fixed distance from lens to book surface (beneath the glass), until I saw the bkrpr.org where you just lift everything. That's a design for 2009! I haven't tried to build one myself yet. ---- However, you can capture lots of books (that can be opened fully) with a single camera, laying the book flat on a table with a glass on top. That's just like a flatbed scanner (but much faster) turned upside down. In January 2008, I used a 10 megapixel Canon EOS 400D (Digital Rebel XTi) with a 50 mm lens to shoot this, laying flat on a table under a glass, http://runeberg.org/stridfin/0226.html On that webpage, the image is reduced to 120 dpi (1.2 megapixel), but the original is 300 dpi (7.5 megapixel). The map shown is reused in http://en.wikipedia.org/wiki/Battle_of_Alavus That's an example of how one specialized book can be very useful for a limited Wikiproject. This book was published in 1909 for the 100th anniversary of the Finnish War (1808-1809), and digitized in 2008 for the 200th anniversary. -- Lars Aronsson (lars@...) Aronsson Datateknik - http://aronsson.se Project Runeberg - free Nordic literature - http://runeberg.org/ _______________________________________________ Commons-l mailing list Commons-l@... https://lists.wikimedia.org/mailman/listinfo/commons-l |
|
|
Re: Digitisation equipmentI love the fact you can achieve such high-quality results with
relatively cheap equipment. For many archives i think getting the people to manually scan pages are probably easier to motivate than for us, but chapters and individual Wikimedians could probably be of much help with all the technical aspects, uploading stuff to Commons / Wikisource, getting the word out to other people, etc. -- Hay On Sat, Aug 29, 2009 at 6:10 AM, Lars Aronsson<lars@...> wrote: > Gregory Maxwell wrote: > >> For digitizing what? > > Exactly, that's the first question. > >> Archive.org digitizes books using a pair of canon 1Ds (? perhaps >> it was a 5D? In any case the 5DII would be sufficient now) on a >> custom stand with a hacked up copy of gphoto2 to actuate the >> cameras. > > That's Brewster Kahle doing things many years ago (2002? 2003?). > Today, a much cheaper low-end digital SLR, or even compact cameras > will give you the needed 10 or so megapixels. But again, if you > need to pay your staff, a ten times more expensive camera might > easily pay its own cost in increased speed, or increased shutter > lifespan. > >> I'm not sure how they're dealing with curvature (I think they >> just may lay a glass plate on the pages), but it would be easy >> enough to solve using a laser pointer with a pattern generating >> holographic grating and a second exposure to capture the page >> distortion and some fairly simple software processing after the >> fact. > > The Internet Archive apparently uses a fixed glass, and lowers the > book cradle to turn pages, http://aipengineering.com/scribe/ > > Other designs have a fixed book cradle and lifts the glass, e.g. > the Atiz DIY, http://diy.atiz.com/ > > I thought the Internet Archive design was very clever, since it > keeps a fixed distance from lens to book surface (beneath the > glass), until I saw the bkrpr.org where you just lift everything. > That's a design for 2009! I haven't tried to build one myself yet. > > ---- > > However, you can capture lots of books (that can be opened fully) > with a single camera, laying the book flat on a table with a glass > on top. That's just like a flatbed scanner (but much faster) > turned upside down. > > In January 2008, I used a 10 megapixel Canon EOS 400D (Digital > Rebel XTi) with a 50 mm lens to shoot this, laying flat on a table > under a glass, http://runeberg.org/stridfin/0226.html > > On that webpage, the image is reduced to 120 dpi (1.2 megapixel), > but the original is 300 dpi (7.5 megapixel). The map shown is > reused in http://en.wikipedia.org/wiki/Battle_of_Alavus > > That's an example of how one specialized book can be very useful > for a limited Wikiproject. This book was published in 1909 for the > 100th anniversary of the Finnish War (1808-1809), and digitized in > 2008 for the 200th anniversary. > > > > -- > Lars Aronsson (lars@...) > Aronsson Datateknik - http://aronsson.se > > Project Runeberg - free Nordic literature - http://runeberg.org/ > > _______________________________________________ > Commons-l mailing list > Commons-l@... > https://lists.wikimedia.org/mailman/listinfo/commons-l > _______________________________________________ Commons-l mailing list Commons-l@... https://lists.wikimedia.org/mailman/listinfo/commons-l |
| Free embeddable forum powered by Nabble | Forum Help |