Categories
bot captcha douglas kastle ocr recaptcha

reCAPTCHA – Class

It is not often you see a service that solves two problems at the same time. I like the simplicity of this one.

CAPTCHA’s have become an unfortunate necessity of the internet, used by certain websites to block bot attacks and try an guarantee that the thing on the other end of the internet connection is a human. CAPTCHA usually take the form of an image or a warped word that a computer can’t read and only a human (it is assumed) can correctly decipher. This requires a back end image generated that knows the correct answer and can let blog comments or website registration continue once a CAPTCHA has been solved. It is an imperfect solution to the internet scourge.

“About 60 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that’s not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day.”

Well if we live with it can’t we some how use it. So a crowd called reCAPTCHA have come up with an interesting technique. Some where in the world loads of old documents are been scanned and one problem that they have to overcome is OCR tools being unable to read certain scanned words, ready made CAPTCHAs.

My first thought on the matter was, if the OCR tools can’t correctly read the garbled word, how would it know if when supplied to a human user that the returned answer is correct. Simple supply 2 words one known and one unknown. Without the user knowing which if the 2 is correct the assumption is made if the user correctly enters the known word the unknown word answer is assumed to be correct.

There doesn’t appear to be support in blogger yet but I’ll be keeping an eye on this one.