Wednesday, September 26, 2007

Stop spam! Read books! Woo hoo!

I think most everyone on the web these days is familiar with the captcha, a mechanism for proving that you are a human. The word captcha actually an acronym for "completely automated public Turing test to tell computers and humans apart", and they manifest themselves in the form of tests that require you to read the text in an image that has been distorted in a way that fools OCR, but doesn't fool humans. You know, these things:

It's a good idea, even if they can be annoying. (Particularly if the captcha is difficult even for a human to read).

Enter recaptcha.net. The creators of the captcha have teamed up with the Internet Archive to digitize public domain books. If you have a website, you can sign up on their website to use captchas provided by recaptcha.net. That way, when your users verify their humanness, they are also typing in a couple of words from an actual book in the Internet Archive. A few thousand users later, and you've digitized the whole book, two words at a time. To prevent error, they have words digitized more than once by different people.

It looks like this:


That is a pretty good idea. Pretty, pretty good.

2 comments:

Torben B said...

Yeah... this is a pretty good idea. Great post.

Scott Abbott said...

I had no idea that's what those are about. Thanks.