Digitisation equipment

View: New views
10 Messages — Rating Filter:   Alert me  

Parent Message unknown Digitisation equipment

by Andrew Turvey :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
We had a discussion at a recent Wikimedia UK board meeting about potentially buying some digitisation equipment which could be used to generate content for the Wikimedia projects. This recent email to the EN-WP list sparked my interest.

Does anyone have any experience with equipment like this, and could you recommend anything? Any idea what the price range and quality typically is?

Also, is anyone else in the Wikimedia community currently doing this?

Thanks,

---- Forwarded Message -----
From: "Steve Bennett" <stevagewp@...>
To: "English Wikipedia" <wikien-l@...>
Sent: Sunday, 23 August, 2009 10:55:32 GMT +00:00 GMT Britain, Ireland, Portugal
Subject: Re: [WikiEN-l] Wikipedia reaches 3 millionth article

On Wed, Aug 19, 2009 at 11:15 PM, David Gerard<dgerard@...> wrote:
> I believe they have machines to turn pages, and something to figure
> out the distorted photo of the book and render it how it would look as
> a flat page.

Yeah, there are videos of these machines. The book sits open, the
scanner comes down and scans both open pages at once. As it goes up
again, it sucks on one page, causing it to flip over. Then repeat.

Oh, look, here you go:
http://www.youtube.com/watch?v=hlOQuuLYavY

And while we're at it:
http://en.wikipedia.org/wiki/Book_scanning

Steve

_______________________________________________
WikiEN-l mailing list
WikiEN-l@...
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l

_______________________________________________
Commons-l mailing list
Commons-l@...
https://lists.wikimedia.org/mailman/listinfo/commons-l

Re: Digitisation equipment

by Gregory Maxwell :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Aug 26, 2009 at 6:27 PM, Andrew
Turvey<andrewrturvey@...> wrote:
> We had a discussion at a recent Wikimedia UK board meeting about potentially
> buying some digitisation equipment which could be used to generate content
> for the Wikimedia projects. This recent email to the EN-WP list sparked my
> interest.
>
> Does anyone have any experience with equipment like this, and could you
> recommend anything? Any idea what the price range and quality typically is?
>
> Also, is anyone else in the Wikimedia community currently doing this?

For digitizing what?

Archive.org digitizes books using a pair of canon 1Ds (? perhaps it
was a 5D? In any case the 5DII would be sufficient now) on a custom
stand with a hacked up copy of gphoto2 to actuate the cameras.

Turn the page, click a button...   It avoids the stress on the books
that a flatbed scanner would add and is faster to boot.

I'm not sure how they're dealing with curvature (I think they just may
lay a glass plate on the pages), but it would be easy enough to solve
using a laser pointer with a pattern generating holographic grating
and a second exposure to capture the page distortion and some fairly
simple software processing after the fact.

_______________________________________________
Commons-l mailing list
Commons-l@...
https://lists.wikimedia.org/mailman/listinfo/commons-l

Re: Digitisation equipment

by Eugene Zelenko-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi!

Just tried this Sunday...

6MP digital SLR camera + tripod is good enough to make images of book
pages for OCR. Speed is ~ 10-15 seconds per page.

However, if you want good quality images, you should use regular
scanner, adjust page position, press book.

Eugene.

On Wed, Aug 26, 2009 at 3:27 PM, Andrew
Turvey<andrewrturvey@...> wrote:

> We had a discussion at a recent Wikimedia UK board meeting about potentially
> buying some digitisation equipment which could be used to generate content
> for the Wikimedia projects. This recent email to the EN-WP list sparked my
> interest.
>
> Does anyone have any experience with equipment like this, and could you
> recommend anything? Any idea what the price range and quality typically is?
>
> Also, is anyone else in the Wikimedia community currently doing this?
>
> Thanks,
>
> ---- Forwarded Message -----
> From: "Steve Bennett" <stevagewp@...>
> To: "English Wikipedia" <wikien-l@...>
> Sent: Sunday, 23 August, 2009 10:55:32 GMT +00:00 GMT Britain, Ireland,
> Portugal
> Subject: Re: [WikiEN-l] Wikipedia reaches 3 millionth article
>
> On Wed, Aug 19, 2009 at 11:15 PM, David Gerard<dgerard@...> wrote:
>> I believe they have machines to turn pages, and something to figure
>> out the distorted photo of the book and render it how it would look as
>> a flat page.
>
> Yeah, there are videos of these machines. The book sits open, the
> scanner comes down and scans both open pages at once. As it goes up
> again, it sucks on one page, causing it to flip over. Then repeat.
>
> Oh, look, here you go:
> http://www.youtube.com/watch?v=hlOQuuLYavY
>
> And while we're at it:
> http://en.wikipedia.org/wiki/Book_scanning
>
> Steve
>
> _______________________________________________
> WikiEN-l mailing list
> WikiEN-l@...
> To unsubscribe from this mailing list, visit:
> https://lists.wikimedia.org/mailman/listinfo/wikien-l
>
> _______________________________________________
> Commons-l mailing list
> Commons-l@...
> https://lists.wikimedia.org/mailman/listinfo/commons-l
>
>

_______________________________________________
Commons-l mailing list
Commons-l@...
https://lists.wikimedia.org/mailman/listinfo/commons-l

Re: Digitisation equipment

by John Vandenberg :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Aug 27, 2009 at 8:27 AM, Andrew
Turvey<andrewrturvey@...> wrote:
> We had a discussion at a recent Wikimedia UK board meeting about potentially
> buying some digitisation equipment which could be used to generate content
> for the Wikimedia projects. This recent email to the EN-WP list sparked my
> interest.
>
> Does anyone have any experience with equipment like this, and could you
> recommend anything? Any idea what the price range and quality typically is?
>
> Also, is anyone else in the Wikimedia community currently doing this?

This came up on the Australian Wikimedia list.

http://lists.wikimedia.org/pipermail/wikimediaau-l/2009-August/002606.html

I think it is terribly inefficient for Wikimedians to start mass
scanning projects while we have so few people engaging in
transcription projects.  Libraries have scanned millions of books, and
there is no signs that they are going to stop.  Commons and Wikisource
should be mining and transcribing these books which are already
scanned.

http://lists.wikimedia.org/pipermail/wikimediaau-l/2009-August/002611.html

--
John Vandenberg

_______________________________________________
Commons-l mailing list
Commons-l@...
https://lists.wikimedia.org/mailman/listinfo/commons-l

Parent Message unknown Re: Digitisation equipment

by Andrew Turvey :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Yes, I take your point. However, much of the scanned material is subject to copyright, and the people who've invested in the scanning are often keen to get a return on their investment and not release it to us! The concept we were thinking about is linking with municipal archives, saying - we'll scan your records for you if you release them to us copyright-free afterwards. Not sure if it's a runner at the moment, which is why I'm asking the question to see what others have done.

Could you tell me more about the "transcription" tasks? Have we got access to any resources that are awaiting transcription?

Thanks

----- "John Vandenberg" <jayvdb@...> wrote:

> From: "John Vandenberg" <jayvdb@...>
> To: "Wikimedia Commons Discussion List" <commons-l@...>
> Sent: Thursday, 27 August, 2009 03:46:58 GMT +00:00 GMT Britain, Ireland, Portugal
> Subject: Re: [Commons-l] Digitisation equipment
>
> On Thu, Aug 27, 2009 at 8:27 AM, Andrew
> Turvey<andrewrturvey@...> wrote:
> > We had a discussion at a recent Wikimedia UK board meeting about potentially
> > buying some digitisation equipment which could be used to generate content
> > for the Wikimedia projects. This recent email to the EN-WP list sparked my
> > interest.
> >
> > Does anyone have any experience with equipment like this, and could you
> > recommend anything? Any idea what the price range and quality typically is?
> >
> > Also, is anyone else in the Wikimedia community currently doing this?
>
> This came up on the Australian Wikimedia list.
>
> http://lists.wikimedia.org/pipermail/wikimediaau-l/2009-August/002606.html 
>
> I think it is terribly inefficient for Wikimedians to start mass
> scanning projects while we have so few people engaging in
> transcription projects. Libraries have scanned millions of books, and
> there is no signs that they are going to stop. Commons and Wikisource
> should be mining and transcribing these books which are already
> scanned.
>
> http://lists.wikimedia.org/pipermail/wikimediaau-l/2009-August/002611.html 
>
> --
> John Vandenberg
>
> _______________________________________________
> Commons-l mailing list
> Commons-l@...
> https://lists.wikimedia.org/mailman/listinfo/commons-l 
>

_______________________________________________
Commons-l mailing list
Commons-l@...
https://lists.wikimedia.org/mailman/listinfo/commons-l

Re: Digitisation equipment

by Federico Leva (Nemo) :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Andrew Turvey, 27/08/2009 16:20:
 > The concept we were thinking about is linking with municipal
archives, saying - we'll scan your records for you if you release them
to us copyright-free afterwards. Not sure if it's a runner at the
moment, which is why I'm asking the question to see what others have done.

Well, yes. I think that 100,000 € is the basic investment (see IFLA 2009).

> Could you tell me more about the "transcription" tasks? Have we got access to any resources that are awaiting transcription?

Do you mean apart from 1,593,519 texts only on
http://www.archive.org/details/texts ?

Nemo

_______________________________________________
Commons-l mailing list
Commons-l@...
https://lists.wikimedia.org/mailman/listinfo/commons-l

Re: Digitisation equipment

by John Vandenberg :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Aug 28, 2009 at 12:20 AM, Andrew
Turvey<andrewrturvey@...> wrote:
> Yes, I take your point. However, much of the scanned material is subject to copyright, and the people who've invested in the scanning are often keen to get a return on their investment and not release it to us! The concept we were thinking about is linking with municipal archives, saying - we'll scan your records for you if you release them to us copyright-free afterwards. Not sure if it's a runner at the moment, which is why I'm asking the question to see what others have done.

Organise for these records to be donated to a commons-friendly library
or archive, and let them do what they are good at.  We are good a
tasks that require lots of people.

There is a distinct lack of digitised works in languages other than
English, and I can understand Wikimedia chapters taking a leading role
in those countries.

> Could you tell me more about the "transcription" tasks? Have we got access to any resources that are awaiting transcription?

Wikisource is the transcription project; there is an abundance of
tasks, and not enough people.
See my email:

>> http://lists.wikimedia.org/pipermail/wikimediaau-l/2009-August/002611.html

If you want to talk to a local, Charles Matthews is the most active
Brit that I can think of quickly.

http://en.wikisource.org/wiki/Special:Contributions/Charles_Matthews

Here are two very important texts which WMUK could push to completion:

http://en.wikisource.org/wiki/Index:The_copyright_act,_1911,_annotated.djvu
http://en.wikisource.org/wiki/Index:A_treatise_upon_the_law_of_copyright.djvu

As far as I know, there is no complete etext of the original 1911 Copyright Act.

http://en.wikipedia.org/wiki/Copyright_Act_1911

--
John Vandenberg

_______________________________________________
Commons-l mailing list
Commons-l@...
https://lists.wikimedia.org/mailman/listinfo/commons-l

Re: Digitisation equipment

by Lars Aronsson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Andrew Turvey wrote:

> We had a discussion at a recent Wikimedia UK board meeting about
> potentially buying some digitisation equipment which could be
> used to generate content for the Wikimedia projects. This recent
> email to the EN-WP list sparked my interest.
>
> Does anyone have any experience with equipment like this, and
> could you recommend anything? Any idea what the price range and
> quality typically is?
>
> Also, is anyone else in the Wikimedia community currently doing
> this?

I'm on the board of Wikimedia Sverige (Sweden), and also the
founder (in 1992) of Project Runeberg, the Scandinavian offspring
of Project Gutenberg.  The Swedish-language Wikisource isn't doing
much, because Project Runeberg still does a lot of book scanning.
Its archive now contains images of 550,000 book pages,
corresponding to nearly 30 linear metres of shelving.

Book digitization is a matter of using the right tools for each
job.  Much depends on the kind of book and the kind of labour. If
you use unpaid volunteers, you can afford slower equipment. If you
need to pay your staff, any equipment that speeds up the work will
quickly pay its own cost, including those very expensive
"professional" book scanners.  On the scale of Google Book Search,
aiming to digitize millions of books, it pays off to let Google
engineers work on developing even faster equipment, just like
Google develops its own Linux-based storage architecture.

It's hard to measure the usefulness of a digitized book, since
Wikisource (and Project Runeberg) doesn't have any income.  (And
neither has Google Book Search, I believe.) If your success is
measured in how much money you spend (as some charities have it),
it is very easy to invest a lot of money, without much result.
The worst you can do is to spend a lot to digitize something that
is already available for free download.  Look around first.

You should start to think of what do you want to achieve?  Is
there some book, or genre of books, that would be really useful
for Wikipedia to have on Wikisource?  Anything British that all
those American projects haven't already covered?  For us from
non-English speaking countries, it's far easier.  Very little has
been digitized, so there is a lot to do.  The most useful thing is
to digitize an old encyclopedia, just like that 11th edition of
Encyclopaedia Britannica (from 1911).

Now, encyclopedias are common items in used bookstores or online
auctions.  You can buy 20 volumes for 200 euro, or even cheaper.
At that price, the best investment is a paper cutter (or ask a
print shop to help you) and a two-sided (duplex) sheet-feeding
scanner, such as the Canon DR-2050C or Fujitsu Scansnap s510.
http://www.youtube.com/watch?v=1oH3mQZLpL8

OCR software might be included with the scanner. Or you can buy
www.finereader.com for 160 euro.

The total investment would be less than 1000 euro (scanner + OCR
software + 20 volume encyclopedia + ask a print shop to cut the
spines).  After this, you only need hours of volunteer work.

That's how I digitized the "New Student's Reference Work" (from
1914, 5 volumes, some 2500 pages) for Wikisource in 2005, only to
show that Wikisource could be used that way.

Here are some old pictures (with Swedish text, from 2001),
http://runeberg.org/admin/snuff.html

These scanners and that OCR software are not open source products,
but neither is my digital camera, and I use that to produce free
pictures for Wikimedia Commons.  I know there have been many
attempts to make free OCR software, but is it any good?

In fact, if you have books where you can't afford to cut the
spine, maybe some rare thing that you only find in a library, a 10
megapixel digital camera is very useful. You need to experiment a
little with tripod stands and good lights.  You only need to open
the book at 90 degrees, to get a good view of a page, which is
much friendlier to the book than an old flatbed scanner (and
faster).  If you have two cameras, you can shoot left pages with
one, and right pages with the other.  That's in fact how the
fastest modern "book scanners" work.  Google builds their own, and
so can you.  Some radical ideas are found on http://bkrpr.org/

Again, a pair of digital cameras is a total investment of less
than 1000 euro.  That's a good starting range.  You can achieve a
lot, and learn even more, in very little time.

What can you do with 2000 euro?  Buy one sheet-feeding set, and
one pair of digital cameras.  Let two teams compete against each
other. Write a report for next year's Wikimania. Have great fun!


--
  Lars Aronsson (lars@...)
  Aronsson Datateknik - http://aronsson.se

  Project Runeberg - free Nordic literature - http://runeberg.org/

  Wikimedia Sverige - stöd fri kunskap - http://wikimedia.se/

_______________________________________________
Commons-l mailing list
Commons-l@...
https://lists.wikimedia.org/mailman/listinfo/commons-l

Re: Digitisation equipment

by Lars Aronsson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Gregory Maxwell wrote:

> For digitizing what?

Exactly, that's the first question.

> Archive.org digitizes books using a pair of canon 1Ds (? perhaps
> it was a 5D? In any case the 5DII would be sufficient now) on a
> custom stand with a hacked up copy of gphoto2 to actuate the
> cameras.

That's Brewster Kahle doing things many years ago (2002? 2003?).
Today, a much cheaper low-end digital SLR, or even compact cameras
will give you the needed 10 or so megapixels.  But again, if you
need to pay your staff, a ten times more expensive camera might
easily pay its own cost in increased speed, or increased shutter
lifespan.

> I'm not sure how they're dealing with curvature (I think they
> just may lay a glass plate on the pages), but it would be easy
> enough to solve using a laser pointer with a pattern generating
> holographic grating and a second exposure to capture the page
> distortion and some fairly simple software processing after the
> fact.

The Internet Archive apparently uses a fixed glass, and lowers the
book cradle to turn pages, http://aipengineering.com/scribe/

Other designs have a fixed book cradle and lifts the glass, e.g.
the Atiz DIY, http://diy.atiz.com/

I thought the Internet Archive design was very clever, since it
keeps a fixed distance from lens to book surface (beneath the
glass), until I saw the bkrpr.org where you just lift everything.
That's a design for 2009! I haven't tried to build one myself yet.

----

However, you can capture lots of books (that can be opened fully)
with a single camera, laying the book flat on a table with a glass
on top.  That's just like a flatbed scanner (but much faster)
turned upside down.

In January 2008, I used a 10 megapixel Canon EOS 400D (Digital
Rebel XTi) with a 50 mm lens to shoot this, laying flat on a table
under a glass, http://runeberg.org/stridfin/0226.html

On that webpage, the image is reduced to 120 dpi (1.2 megapixel),
but the original is 300 dpi (7.5 megapixel).  The map shown is
reused in http://en.wikipedia.org/wiki/Battle_of_Alavus

That's an example of how one specialized book can be very useful
for a limited Wikiproject. This book was published in 1909 for the
100th anniversary of the Finnish War (1808-1809), and digitized in
2008 for the 200th anniversary.



--
  Lars Aronsson (lars@...)
  Aronsson Datateknik - http://aronsson.se

  Project Runeberg - free Nordic literature - http://runeberg.org/

_______________________________________________
Commons-l mailing list
Commons-l@...
https://lists.wikimedia.org/mailman/listinfo/commons-l

Re: Digitisation equipment

by Hay (Husky) :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I love the fact you can achieve such high-quality results with
relatively cheap equipment. For many archives i think getting the
people to manually scan pages are probably easier to motivate than for
us, but chapters and individual Wikimedians could probably be of much
help with all the technical aspects, uploading stuff to Commons /
Wikisource, getting the word out to other people, etc.

-- Hay

On Sat, Aug 29, 2009 at 6:10 AM, Lars Aronsson<lars@...> wrote:

> Gregory Maxwell wrote:
>
>> For digitizing what?
>
> Exactly, that's the first question.
>
>> Archive.org digitizes books using a pair of canon 1Ds (? perhaps
>> it was a 5D? In any case the 5DII would be sufficient now) on a
>> custom stand with a hacked up copy of gphoto2 to actuate the
>> cameras.
>
> That's Brewster Kahle doing things many years ago (2002? 2003?).
> Today, a much cheaper low-end digital SLR, or even compact cameras
> will give you the needed 10 or so megapixels.  But again, if you
> need to pay your staff, a ten times more expensive camera might
> easily pay its own cost in increased speed, or increased shutter
> lifespan.
>
>> I'm not sure how they're dealing with curvature (I think they
>> just may lay a glass plate on the pages), but it would be easy
>> enough to solve using a laser pointer with a pattern generating
>> holographic grating and a second exposure to capture the page
>> distortion and some fairly simple software processing after the
>> fact.
>
> The Internet Archive apparently uses a fixed glass, and lowers the
> book cradle to turn pages, http://aipengineering.com/scribe/
>
> Other designs have a fixed book cradle and lifts the glass, e.g.
> the Atiz DIY, http://diy.atiz.com/
>
> I thought the Internet Archive design was very clever, since it
> keeps a fixed distance from lens to book surface (beneath the
> glass), until I saw the bkrpr.org where you just lift everything.
> That's a design for 2009! I haven't tried to build one myself yet.
>
> ----
>
> However, you can capture lots of books (that can be opened fully)
> with a single camera, laying the book flat on a table with a glass
> on top.  That's just like a flatbed scanner (but much faster)
> turned upside down.
>
> In January 2008, I used a 10 megapixel Canon EOS 400D (Digital
> Rebel XTi) with a 50 mm lens to shoot this, laying flat on a table
> under a glass, http://runeberg.org/stridfin/0226.html
>
> On that webpage, the image is reduced to 120 dpi (1.2 megapixel),
> but the original is 300 dpi (7.5 megapixel).  The map shown is
> reused in http://en.wikipedia.org/wiki/Battle_of_Alavus
>
> That's an example of how one specialized book can be very useful
> for a limited Wikiproject. This book was published in 1909 for the
> 100th anniversary of the Finnish War (1808-1809), and digitized in
> 2008 for the 200th anniversary.
>
>
>
> --
>  Lars Aronsson (lars@...)
>  Aronsson Datateknik - http://aronsson.se
>
>  Project Runeberg - free Nordic literature - http://runeberg.org/
>
> _______________________________________________
> Commons-l mailing list
> Commons-l@...
> https://lists.wikimedia.org/mailman/listinfo/commons-l
>

_______________________________________________
Commons-l mailing list
Commons-l@...
https://lists.wikimedia.org/mailman/listinfo/commons-l