Crowdsourcing Babel

5277404580_1cd8923c02_mIn my second post on crowdsourcing, my brother Dave made this comment (spelling mistakes corrected): “You could become wealthy if you could figure out how to use crowdsourcing for translation.”

Well, it’s happening! I just found out today that a group led by CAPTCHA inventor and Carnegie Mellon prof Luis von Ahn is crowdsourcing people to translate stuff under the guise of an online language course called Duolingo. (After this I’ll stop posting on crowdsourcing. For a little while. I did say it was a big iceberg, didn’t I?)

Here’s how it will work. Say there’s a website in English that they want to translate into Spanish. They take the text from the website, break it down into sentences, and use these sentences as exercises in a free online English course for Spanish speakers.  A person taking the English course would read the sentence, and then enter what she thinks it means (in Spanish) on her computer. That’s effectively an English-to-Spanish translation. (I’m not sure it’s the best way to learn a language — but then that’s not their goal, is it?) If you get enough people to “translate” that same sentence, you can do either a statistical analysis to find the most common translation, or get people to vote on the best translation.

I hope they’re not planning to do translations the other way around — let the Spanish person do translations into English. One of the cardinal rules of translating is that you always translate into your mother tongue. You should never attempt to translate into a language you’re not totally fluent in. There are too many expressions, turns of phrase, and words that just don’t “go” together.

That brings up another potential hurdle; unlike the reCAPTCHA crowdsourcing (those squiggly words in boxes that prove you are a person and not a spam-monster), this one requires people to string words together into sentences. Just like gut bacteria, writing ability varies wildly from person to person. Just understanding separate words doesn’t mean you have a clue as to how they should go together.

So that means the clincher is going to be getting enough people involved to even out all the failed attempts. The language course will be free, which is a start. If on top of that it’s not fun and cute and motivating, it’ll tank for sure.

I have signed up for the Beta version. Initially, they will only be offering English, German and Spanish, the languages the developers know personally. (They’re not franco-phobic as far as I know.) I’ll be signing up for Spanish. I tried to learn German in order to help Luc get through 9th grade and my head almost exploded.

That brings up another question: why are there always beta versions? Why is it that we never get to sign up for an alpha version? Or is the alpha version the one that exists inside the inventor’s head?

And another question — how will they choose the texts to translate? To generate income, I can imagine they’d set up a translating business and then use this language course to do the work. But the texts might not all be that practical for the language learner. Tips for preventing Cholera. Machine tool specifications. The LL Bean catalog. Never mind. It’ll all come in handy sometime. You never know when you might be in need of a barn jacket. I’m not sure Dave was right, that you could get rich doing this, but I hope they do.

I hope I’ll be able to rack up points or something. That’s not quite as motivating as money or jelly bellies, but would be a better use of my time than trying to beat my high score in Scramble on Facebook. I might actually learn something in the process >Here’s a video of von Ahm talking about CAPTCHA and Duolingo at a TEDx conference at CMU:


Photo Credit: Alice Hutchinson via Compfight cc

