captcha | Gydle

A few days ago, I wrote a post about smartphones and traffic jams. Pretty soon, thanks to billions of anonymous bits of data flying all over the place, you won’t have to get stuck on the freeway or wait in line at Disneyland ever again. I have since learned that the smartphone stuff was just the tip of the crowdsourcing iceberg. Down below the penguins and their migratory traffic jams, the iceberg is gargantuan. It boggles the mind. We’re like the Titanic, most of us. Blissfully unaware.

Way back in 1999, UC Berkeley started tapping into the unused processing power of home computers all over the world to scan radio waves for evidence of extraterrestrial life. Currently, more than 3 million people are involved in the SETI@home project, doing their little bit to locate aliens. CERN also uses volunteer-based distributed computing to crunch its enormous particle physics data sets, a task that would constipate even the biggest, baddest supercomputer in the world. The rise of the internet has been nirvana-like for people in charge of iterative tasks like these that can be distributed, calculated remotely, and then reassembled into some meaningful form at the other end. All that computer power for free! You don’t have to put an expensive IBM petaflop supercomputer into your project budget, you just have to convince a lot of people (a crowd!) that volunteering the unused processing power on their PCs is a cool thing to do.

I could get my mind around this. The next step should have been obvious to me. But it happened so subtly it didn’t even register. Until my smartphone epiphany, that is.

Computers are great for a lot of things. Give a network of computers a number, an equation, a set of parameters, and they crunch away happily. A symbolic smorgasbord. A banquet of bits. A few more dimensions? Ten thousand more iterations? Not a problem. Computers don’t sleep, have sex, demand respect, drink coffee, or take sick leave. They won’t join a union, have babies or demand yearly pay increases. And when those computers all belong to other people, you don’t even have to pay for it. How great is that?

But unfortunately there are quite a few things computers can’t do very well. They can’t identify faces, emotions, words and images from partial or incomplete samples. They don’t get when something’s funny. They’re unable to recognize and capture beauty in music, art, dance or photography.

The human brain – even the most average, run-of-the-mill, McDonald’s eating, MTV-watching brain – can do things no computer on the planet can do, and all on the power of a 60-watt lightbulb. Such a terrible waste! All those brains out there, operating at a fraction of their potential. All that valuable processing power squandered on mundane tasks like navel gazing and Facebook.

What if you could find a way to connect all those brains, get them to communicate and interact without having to be in physical proximity? Get them to join a global network of some kind? Now that would be a truly formidable source of information processing, a machine that could handle just about any problem. Uh, wait a sec…

And that, my friends, is the essence of crowd-sourcing.

I have seen the future and it is the iceberg.

In case you still don’t get it, here are a few examples.

Wikipedia. The classic. Why hire a bunch of encyclopedia writers when the world is full of experts who will write copy for free? And in case they make mistakes, there are more experts out there to correct their copy for free? Wikipedia is a crowd of know-it-alls. And the information it contains is being constantly refined.

iStockphoto. Why pay a photographer hundreds of dollars for the rights to use a photo when the world is full of excellent amateur photographers who would be ecstatic to sell their shots online for a dollar?

Citizen science. Why pay for those grad students when the world is full of geeks that will do your gruntwork for free? Identifying protein folding patterns was turned into a game called Foldit, in which volunteers outperform computers consistently. In Galaxy Zoo people look at images of outer space and classify the galaxies they see in them. You build up a “reputation” for how accurate you are. Several novel structures have been discovered this way, such as a “weird green thing” (Dave’s description) called Hanny’s Voorwerp. Herbaria@home taps into the British armchair naturalist crowd to document the vast numbers of plant specimens held in the UK’s herbaria. “Documenting large herbarium collections is an extremely labour-intensive task and most museum collections are woefully under-funded.” You get the picture.

Corporate R&D. Why pay for an expensive R&D department if you can avoid it? Companies like Boeing, DuPont, Procter & Gamble and Eli Lilly post their most intractable scientific problems on a website called InnoCentive, which anyone can join for a small fee. The companies (“seekers”) pay anywhere from $10,000 – $100,000 per solution. More than 30% of the problems have reportedly been cracked, “which is 30 percent more than would have been solved using a traditional, in-house approach,” said InnoCentive’s chief scientific officer Jill Panetta. And just think of all the money they’ve saved not having to shell out for health insurance benefits!

CAPTCHA. I like this one the most. Say it aloud. Isn’t it clever? It stands for Completely Automated Public Turing Test to Tell Computers and Humans Apart. Whenever you are asked to type squiggly, hard-to-read words in order to verify that you are a human being and not an evil, webtrolling, spamming bot, you are solving a CAPTCHA. It turns out that a lot of the words in these images are from old documents that have been scanned for archiving. Computers parse the images, turning them into digital text, which takes up a lot less room. But often the computers can’t decipher the images. From the reCAPTCHA website:

About 200 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that’s not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day. What if we could make positive use of this human effort? reCAPTCHA does exactly that by channeling the effort spent solving CAPTCHAs online into “reading” books.

When you are asked to solve a two-word CAPTCHA like the one in the picture, the first word is already known and the second word is one that a computer hasn’t been able to read. If you get the first one right, then they figure you’ve got the second one right, too. The image is sent out to lots of people to statistically verify that your answer is correct. I bet you didn’t know you were being crowd-sourced when you typed in those words. I sure didn’t.

Combine spam protection with 150,000 hours of free human optical recognition processing per day. That has got to be the ultimate win-win.

Crowdsourcing takes impossible tasks and turns them into games, contests, and time-fillers for millions of under-occupied neocortexes around the world. I once read an article about an eccentric genius who claimed he wanted to build “the game layer on top of the world.” I thought, “huh?” Now I think he might be onto something. Where is it written that we have to spend our waking hours doing things we think are boring and unsatisfying? Life’s a game. Join the crowd.

My brother Dave (who started all this) said, “You could become wealthy if you could figure out how to use crowdsoursing for tranbslation.”

“Or spelling,” I replied.

Gydle

Tag Archives: captcha

Crowdsourcing part II