Anti-spam methods: securing the future of books
It has to be said that one of the most annoying features of surfing the web is that online security check that demands you copy squiggly, blurred letters and numbers of apparently nonsensical words when doing a spot of online shopping, social networking or the like to prove your human status. But now, your answers are being put to good use, as these checks are in fact digitising books simultaneously.
Invented by Luis von Ahn in 2000, the Completely Automated Public Turing test to tell Computers and Humans Apart (Captcha) software – those distorted images of words and numbers -is used by more than 350,000 websites to prevent computer programs from attacking them with spam.
In 2007, von Ahn calculated that 200 million Captchas were being typed by people all over the world every day – at a count of about 10 seconds spent per form. Multiply that by 200 million, and web surfers were wasting about 500,000 hours on these frustrating security codes every day.
He decided to put these hours to good use and devised ReCaptcha, a system that uses each human-typed response as both a security check and a means to digitise books one word at a time.This software differed from the Captcha process as forms now showed one randomly generated word paired with a photo of a word taken from the pages of an old book, newspaper or journal that needed digitising.
Usually in order to digitise documents hard copy texts are scanned, then run through a programme that transcribes every word into a digital format (known as optical character recognition), but when pages of a document are very old, the typeface faded or the paper yellowed and torn, the computer struggles to read it and needs human help. This is where the second picture on the ReCaptcha form comes in.
To make sure answers are accurate, the ReCaptcha system only logs a person’s second response if they get the first word right. It then collates the most popular second responses from a number of forms, and stores the most popular answer, as this is most likely to be correct.
The ReCaptcha software was bought by Google in 2009, and now its translating software is used exclusively for the Google’s Books project – Google’s attempt to transcribe every book in the world. It is available for websites to use for free. To find out more, visit www.google.com/recaptcha/learnmore
So next time you’re shouting at a web page because it requires some word guessing, take a deep breath and remind yourself that those 10 seconds of your time are helping the greater good.
Picture credit: © Google