Fighting spam and recapturing books with reCAPTCHA

A CAPTCHA is an anti-spam test used to work out whether a request has been made by a human, or a spambot. CAPTCHAs no longer seem to be as popular as they once were, as other spam identification techniques have emerged, however a considerable number of websites still use them.

CAPTCHA pictures

Some common examples of CAPTCHAs.

CAPTCHAs can be really annoying, hence their downfall in recent years. Take a look at the different CAPTCHAs in the image above, if you had spent 30 seconds filling in a feedback form, would you be willing to try and decipher one of the above CAPTCHAs, or would you just abandon the feedback?

The top left image could be ZYPEB, however it could just as easily be 2tPF8. If you get it wrong, usually you will be forced to do another, which could be just as difficult.

The BBC recently reported how The National Federation for the Blind has criticised CAPTCHAs, due to their restrictive nature for the visually impaired. Many CAPTCHAs do offer an auditory version, however if you check out the BBC article (which has an example of an auditory CAPTCHA), you will see that they are near impossible to understand.

reCAPTCHA

Luis von Ahn is a computer scientist who was instrumental in developing the CAPTCHA back in the late 90’s and early 2000’s. According to an article the Canadian magazine The Walrus, when CAPTCHAs started to become popular, Luis von Ahn “realized that he had unwittingly created a system that was frittering away, in ten-second increments, millions of hours of a most precious resource: human brain cycles.

Anti-spam reCAPTCHA

An example of a reCAPTCHA CAPTCHA.

In order to try and ensure that this time was not wasted, von Ahn set about developing a way to better utilise this time; it was at this point that reCAPTCHA was born.

reCAPTCHA is different to most CAPTCHAs because it uses two words. One word is generated by a computer, whilst the other is taken from an old book, journal, or newspaper article.

Recapturing Literature

As I mentioned, reCAPTCHA shows you two words. One of the images is to prevent spam, and confirm the accuracy of your reading; you must get this one right, or you will be presented with another. The other image is designed to help piece together text from old literature, so that books, newspapers and journals can be digitised.

reCAPTCHA presents the same word to a variety of users and then uses the average response to work out what the word actually says – this helps to stop abuse. In a 2007 quality test, using a standard computer text reader, (also known as OCR) 83.5% of words were identified correctly – a reasonably high amount – however the accuracy of human interpretation via reCAPTCHA was an astonishing 99.1%!

According to an entry in the journal Science, in 2007 reCAPTCHA was present on over 40,000 websites, and users had interpreted over 440 million words! Google claim that today around 200 million CAPTCHAs are solved each day.

If each CAPTCHA took 10 seconds to solve, that would have been around 139 years (or 4.4 billion seconds) of brain time wasted; I am starting to see what Mr von Ahn meant! To put the 440 million words into perspective, the complete works of Shakespeare is around 900,000 words – or 0.9 million.

Whilst the progress of reCAPTCHA seems pretty impressive, it is a tiny step on the path to total digitisation. According to this BBC article, at the time von Ahn is quoted saying:

“There’s still about 100 million books to be digitised, which at the current rate will take us about 400 years to complete”

Google

In 2009 Google acquired reCAPTCHA. The search giant claimed that it wanted to “teach computers to read” hence the acquisition.

Many speculate that Google‘s ultimate aim is to index the world, and reCAPTCHA will help it to accelerate this process. That said, if that is its goal, it is still a very long way off.

We won’t be implementing a CAPTCHA on Technology Bloggers any time soon, however next time you have to fill one in, do spare a thought for the [free] work you might be doing for literature, for history and for Google.

Parenting in the Age of Digital Technology

Last month the Northwestern University in the USA published a national survey entitled Parenting in the Age of Digital Technology. The report is available for free download through the Parenting CC Portal , but here I would like to take a quick look at some of the findings and questions raised and see if we can provoke some debate.

Multiple Screen Viewing

Multiple Screen Viewing

The study explores how parents are incorporating new digital technologies (iPads, smartphones) as well as older media platforms (TV, video games, and computers) into their family lives and parenting practices, and it gives an idea of how parents use and view this technology.

We should point out that this is a US based survey.

The 10 key finding could be seen as the following:

1 While new media technologies have become widespread, a majority of parents do not think they have made parenting any easier.

2 Parents use media and technology as a tool for managing daily life, but books, toys, and other activities are used more often.

3 Parents still turn to family and friends for parenting advice far more often than to new media sources like websites, blogs, and social networks.

4 Parents do not report having many family conflicts or concerns about their children’s media use.

5 There is still a big gap between higher- and lower- income families in terms of access to new mobile devices.

6 Parents are less likely to turn to media or technology as an educational tool for their children than to other activities.

7 Parents assess video games more negatively than television, computers, and mobile devices.

8 For each type of technology included in the survey, a majority of parents believe these devices have a negative impact on children’s physical activity, the most substantial negative outcome attributed to technology in this study.

9 Many parents report using media technology with their children, but this “joint media engagement” drops off markedly for children who are six or older.

10 Parents are creating vastly different types of media environments for their children to grow up in, and, not surprisingly, the choices they make are strongly related to their own media use.

Some other interesting points arise, such as that 40% of families are described as media heavy and spend more than 11 hours a day in front of the screen. Half of all families surveyed have 3 TV’s or more in the house. 40% of 6 to 8 year olds have a TV in their bedroom. 70% of parents state that having mobile devices has not made parenting easier with 40% stating that they have a negative social skills effect upon the children.

The conclusions are in some ways surprising though as the authors demonstrate evidence that parents are still more likely to resort to traditional means of entertainment as rewards and punishment, and they are convinced enough about the educational possibilities offered by so called new media to not worry too much about their negative effects.

An interesting read if you have half an hour, but comments and debate about the summary above would also be educational.

Asleep at the Wheel?

Anyone who has ever driven a long distance will know the feeling of “zoning out”. You lose focus on the road, staring blankly in front of you, your reaction time lengthens, and sometimes people even fall asleep.

In the UK it is estimated that about 20% of accidents are caused by people nodding off at the wheel, but a breakthrough at the University of Leicester might help to put an end to this problem.

Researchers have been working on a system that combines high speed eye tracking and EEG technology, with one application being to alert drivers who show signs of drowsiness.

These forms of technology have traditionally been difficult to marry together, EEG use has been around for decades and any epileptic person will have had experience of it. The EEG system involves wearing a kind of cap with electrodes attached that measure neurone activity in the brain. Once a cumbersome affair this can now be carried out using a lightweight headset, a far cry from the rubber cap manually fitted with sensors and cables that I grew up with.

A contemporary EEG Underway

A contemporary EEG

The eye monitoring technology involves infra red cameras measuring how LED light reflects from the user’s eyes, monitoring where the user is looking, how often they blink and other signs of distraction and sleepiness.

The researchers at Leicester have made the breakthrough of devising a way to use these different measurements together, something that has not been possible in the past.

Applications go much further than saving lives however. The developers point to uses for people who cannot use their arms, as they could control machinery using their eyes and thoughts. Even more importantly for some, the technology could be used to control video games, so that a player would no longer have to use a console of any sort but could communicate through measuring where their eyes were looking and the patterns in their brains.

More information is available here.