Categories
bot captcha douglas kastle ocr recaptcha

reCAPTCHA – Class

It is not often you see a service that solves two problems at the same time. I like the simplicity of this one.

CAPTCHA’s have become an unfortunate necessity of the internet, used by certain websites to block bot attacks and try an guarantee that the thing on the other end of the internet connection is a human. CAPTCHA usually take the form of an image or a warped word that a computer can’t read and only a human (it is assumed) can correctly decipher. This requires a back end image generated that knows the correct answer and can let blog comments or website registration continue once a CAPTCHA has been solved. It is an imperfect solution to the internet scourge.

“About 60 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that’s not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day.”

Well if we live with it can’t we some how use it. So a crowd called reCAPTCHA have come up with an interesting technique. Some where in the world loads of old documents are been scanned and one problem that they have to overcome is OCR tools being unable to read certain scanned words, ready made CAPTCHAs.

My first thought on the matter was, if the OCR tools can’t correctly read the garbled word, how would it know if when supplied to a human user that the returned answer is correct. Simple supply 2 words one known and one unknown. Without the user knowing which if the 2 is correct the assumption is made if the user correctly enters the known word the unknown word answer is assumed to be correct.

There doesn’t appear to be support in blogger yet but I’ll be keeping an eye on this one.

2 replies on “reCAPTCHA – Class”

“Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known.”

I dunno, that’s twice the work, two words for evey Captcha. Suppose it’s translating Rumplestilskin at the time? I’d be really pissed off 🙂

hmmm… very good point. My take is that it is the cognitive aspect that is the biggest hit. The actual typing is pretty small. One way or another it is a specifically a time hit in the first place.

Leave a Reply to Douglas Kastle Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.