Web

Interview mit Ghassan Haddad

Wie Facebook via Crowdsourcing lokalisiert

Thomas Cloer war viele Jahre lang verantwortlich für die Nachrichten auf computerwoche.de.
Er sorgt außerdem ziemlich rund um die Uhr bei Twitter dafür, dass niemand Weltbewegendes verpasst, treibt sich auch sonst im Social Web herum (auch wieder bei Facebook) und bloggt auf teezeh.de. Apple-affin, bei Smartphones polymorph-pervers.
Im Vorfeld der Fachmesse Localization World Berlin hatten wir die Gelegenheit, Ghassan Haddad, Director of Internationalization beim Social Network Facebook, einige Fragen zu stellen.

Facebook hat die mittlerweile 56 lokalisierten Sprachversionen seiner Website von der eigenen Community erstellen lassen ("Crowdsourcing"). Eine treibende Kraft dahinter war und ist der Linguist Ghassan Haddad. Seine Kurzbio zur Localization World* spricht Bände:

Ghassan Haddad, Director of I18n, Facebook
Ghassan Haddad, Director of I18n, Facebook

"Ghassan Haddad, the director of internationalization at Facebook, is deeply immersed in defining and implementing the company's crowdsourcing model and living the localization dream. Prior to Facebook, he was director of software engineering and localization at PayPal where he was responsible for enabling PayPal as a payment solution in almost 200 countries, 30-plus currencies, and 15-plus languages. He has over 20 years of experience in language research and technology, management and software development. He is one of the first computational linguists to develop an English > Arabic machine translation system and has held several middle and upper management positions at Intergraph, Berlitz, eTranslate, PayPal and Facebook. Ghassan holds a Ph.D. in linguistics from the University of Illinois at Urbana-Champaign."

Ghassan hat uns freundlicherweise während des Flugs nach Berlin einige Fragen beantwortet. Es folgt das E-Mail-Interview in englischer Sprache:

CW: Who came up with the idea/concept of crowdsourcing the Facebook l10n -- was it the community itself or someone from inside the company (maybe yourself)?

Mit Facebook Translations kann jeder bei der Lokalisierung des Social Networks mitmachen.
Mit Facebook Translations kann jeder bei der Lokalisierung des Social Networks mitmachen.

Haddad: Back in 2007 we started thinking of a model (for translation) that allows us to convert our site into any language that our users want and to do that pretty quickly. We felt that the conventional model of purely employing professional translation agencies was not practical enough to allow us to accomplish our goals. In addition, the peculiar nature of the Facebook site made it quite necessary that the people that translated Facebook be very familiar with it. Crowd-sourcing seemed like a natural way to go and our users responded very quickly and with good quality.

How big of an effort was it to create the Translation app then that you've been and are using for new languages? What were the development challenges, and did Facebook write everything from scratch or were there some tools already in place which could serve as a foundation?

Haddad: The translation application was written from scratch by a few Facebook engineers. At a very high level, the main areas that we focused (and continue to iterate) on are speed and scalability of the application, especially that we did not know initially the extent of participation by our users; creating a set of features that makes the application easy, accurate, and fun while minimizing mistakes and sabotage; and finally, dealing with the complexities of the different languages. Developing this application was not a one-time effort: even today we continue to enhance it in order to maintain and improve the languages that we have launched so far, as well as to help us launch additional languages in the future.

Just to get an idea of how big the translation effort at Facebook is: How many characters / words / sentences are there currently in the main English language version of Facebook?

Haddad: The main user interface comprises around 38,000 "strings” which are approximately equal to 230,000 words. Other content assets probably add another ~150,000 words. Naturally, the volume changes as we add (or remove) features from the site.

English is widely known as a pretty short and straightforward language. How do you deal with other languages that take more estate -- did you have to adapt page elements like menus or even change the page layout here and there?

Haddad: There is no single strategy that solves the issue of text expansion, layout, and other design variations required to accommodate different languages. We try to allow more space than required in English in order to fit translated text which can grow an average of 30-40 in most European languages. In addition, we use technologies (whenever possible) that auto-expand to fit longer text. We do not do any significant changes to the layout of the page, except when dealing with right-to-left languages. For those, at the most basic level, we use special cascading style sheets to create a mirror image of the English page.

How did/does Facebook manage the QA of the crowdsourced translations -- do you leave that to the community as well or rely on professional expertise?

Haddad: All non-beta languages that we have released undergo some verification by the top community translators and are later QA’d by professional translators working for a translation vendor. The amount of effort we put into QA can be extensive or light depending on the language itself (how critical and how complex) and the level of participation by the community. One of the by-products of the QA activity is a set of enhancements that we end up building into the translations application in order to improve the quality of future translations.

How long does an average translation to a) a big/popular and b) a smaller/less popular language take and how many contributors does it typically involve? How many Facebook members from a language area do you consider a minimum before considering a crowdsourced translation there?

Haddad: The amount of time it takes to translate a language varies greatly. However, for popular languages like German, French and Spanish, it took about 1-2 weeks of translation by the community and another couple of weeks of QA. Less popular languages can be surprisingly fast or very slow depending on how passionate the community is. For example, Welsh, which is spoken by approximately 700k users worldwide was translated just as fast as the more popular languages.

The number of contributors also vary greatly. Over 230,000 people have actively contributed to translations (in all languages). Turkish has more than 30,000 contributors, French more than 15,000, and German more than 5,000. Even minor languages average over 1,000 contributors each.

We would actually like to include any and all languages that our users would like to translate. Initially, we based our decision to include a language on three criteria: desire by our users who wrote us requesting that we include their language, the number of speakers of a given language, and the internet penetration in the countries where the language is spoken. In a few cases (such as Indic languages), we also considered the level of English language education in the culture. We did not use a specific number of users as a benchmark.

Also, I understand that you're an expert in RTL (Right-To-Left) languages. How big of a challenge is it to implement RTL or even BiDi in web applications like Facebook? Is it technically feasible today (e.g. can older browsers like IE6 do the magic?) or will we have to wait for HTML 5 and other upcoming technologies before this works properly?

Haddad: We’ve spent a good amount of effort to work on RTL languages and what we have today, although still not perfect, is comparable to what you see on every web site even by established software companies like Microsoft. Strict RTL in some respects is easier to deal with than BiDi: most of the difficult issues related to RTL is the bi-directional nature of the text. The problem is actually not restricted to browsers, but also to operating systems and their inability to accurately determine when the text is RTL or LTR. I pointed out some of the challenges we dealt with in my blog but would be happy to answer specific questions if you’d like.

Do you have any hints or recommendations for other companies considering translation crowdsourcing? Which are the hurdles and roadblock to expect and how to circumvent them?

Haddad: I am actually conducting two sessions to discuss this issue in particular at the Localization World conference in Berlin. In a nutshell, crowd-sourcing can be a very effective way to do translation. Facebook’s success in using this method to launch 56 language (57th being US English) has led many companies to try out this model. Its success is greatly dependent on the size of the community, how passionate it is about seeing a site (or product) in their language, the incentives (not financial!) you offer contributors, the technology you use to facilitate the process, and the people (even if it is one person) to manage it.

*Ghassan Haddad spricht morgen von 14 bis 15 Uhr auf der Localization World gemeinsam mit Danica Brinton von Linden Lab zum Thema "Powered by the Crowd".