Translating our website and app into 40 languages using AI

We recently translated Luma (web, iOS, and backend services) to 40 different languages. We started the translation project at the beginning of the year and since then we've seen an over 4x increase in the number of non-English events on Luma.

In this post, I'll share the technical details about how we set up Luma to be translation aware and then how we used AI to scale translation such that we are able to ship high quality translations every day across 40 languages.

About a year ago, we started to see more Luma events where the content was written in a language other than English. We would see pockets of these events start to grow — tech events in Japan, social clubs in the Netherlands — but because the majority of Luma was still in English, hosts and guests would often get confused. While we would keep the core group of hosts onboarded to Luma, the product wouldn't spread as well as it does in English speaking countries.

After translation, growth has been much stronger in non-English speaking communities. Nearly 20% of our events today are advertised in a language other than English.

So let's get started.

Becoming Translation Aware

When you start building apps, you'll create something like this:

And then you'll create a button that looks like this:

This works well and it's really easy to iterate on in your code. If you are playing around with your app and find yourself wanting to change something, you can easily find the file for the code by searching for a string in the UI. For example, I would search "Welcome to" and find the WelcomePage component quickly. Then if I want to update something like the color of the text, I can do it right there.

But when you want to add translation, things become more complicated. You can't have <h1>Welcome to Luma</h1> because that header should be different depending on the language of the viewer. There are actually many things that should change based on the language and many places that we will need to update to be translation aware.

For a non-exhaustive list of places we will want to be translation aware:

Anywhere we display UI
Showing toasts or messages — this often uses a different system than just rendering UI
Dates, number formatting, and currencies
Error messages coming from the server
Transactional emails
Marketing emails
In-app notifications
SMS messages
WhatsApp messages
Marketing copy

So let's see how we can update our apps and backend to be translation aware.

Web / React

I chose Format.JS / react-intl for our Javascript framework. I also considered react-i18next but I preferred Format.JS because it's simpler and doesn't require globals.

FormatJS gives you two entry points to make your text translation aware — a component <FormattedMessage /> and a function formatMessage(...). I thought these were a bit long so created aliases in our codebase for these called <T /> and t(...) where t stands for translate.

Using these new components, we can rewrite the components above to:

and for the button:

So you can see that both of these components are now translation aware. Unfortunately this makes them a lot longer as well... There are certain things we do to make the translation code less verbose — but this is a drawback of translation, it's going to make your code a bit more verbose. I recommend only adding translation when you know it's worth it.

As a quick aside, we're also able to handle variables in the translations. If we wanted to welcome a user to Luma, we can add a variable to the welcome text.

But let's get back to the simple case and break down how the translation pipeline works. We have the following component:

We generate a unique ID for every string in our app. We've set up the ID to be a hash of the defaultMessage (which is what we show in English) and the description (which gives the translator more context about where the string is used but isn't actually used in our app).

So whenever the defaultMessage or description changes, we will generate a new ID and will need to generate a new translation. These IDs are generated by an eslint plugin so that whenever we save a file in our IDE, the IDs get recalculated.

This hash-based ID format is much nicer than the semantic key based format where you have to define a key welcome_to_luma_dashboard and then define the English message in a different file. With the hash-based ID, I don't have the overhead of coming up with a new semantic ID for each string and keeping it organized — I just rely on the hash function to do everything for me.

That being said, there are times where it's useful to have the same string consistently in your app. For that, we have a library of commonly used strings that we call CT. We can import CT anywhere across the app or backend.

Now that we have components that are translation aware — how do they know what language to use?

We have a LocalizationProvider at the top of our app that figures out the locale for the user and then applies it across the app. We look at in order:

the locale query parameter — this allows you to force a locale for a given page by putting in ?locale=xx — you can actually try this on Luma right now by going to lu.ma?stay&locale=es
the user's saved locale if they are signed in
the accept-language header set by the browser

Notice that I said a locale and not a language. A locale is a code like en (English), es (Spanish), or es-ES (Spanish based in Spain) that specifies the language and optionally the country of a user. The locale tells you the language the user wants to see along with other defaults like preferred currency, date formatting, etc. For some languages like English, Spanish, Portuguese and Chinese, we have multiple translations for the same language. English in the US is a bit different than English in the UK and in Singapore. And Portuguese is very different in Portugal and Brazil. Additionally, the locale gives you information about the preferred currency (and other formatting). In Singapore they want to use SGD while in Australia they use AUD.

The LocalizationProvider pulls in the translation file for the given language which maps each ID to the translated string. For example, a Spanish translation file could look like:

Then when rendering <T id="abc123" />, the component will look at the translation file and see that it should render Bienvenido a Luma. If the ID is missing, then we fall back to using the default English message.

Format.JS also supports number formatting, pluralization, and more features that we use extensively, but I won't get in to detail about those here.

So now that we have our app framework set up, we'll need to generate the translation files. To do this, we use the FormatJS CLI to look through all of our files and extract out all IDs, default messages, and descriptions into a file. Then we can hand that to a translator (or an AI) and get back a translated file.

Here is a simplified version of the script we use to extract all of the messages from our codebase.

Here is our folder of translations. You can see that they are all updated every day. We'll talk about how that works later. First, let's talk about the server.

Node — API / Background Tasks

So we've translated the frontend of our website. But we often get content or error messages from our server. How should we display those?

There are two options:

The client will need to figure out how to take an ID or English message from the server and then translate it to the user's language.
The client will send the language of the user to the server and the server will translate the content before sending it to the client. So the client can just render messages without translating them.

We actually do a mix of both of them, but for the vast majority of cases we stick with the second option, we translate on the server so that the client receives translated content. You can see an example of this on the cover image picker when creating an event. All of the image categories are translated on the server.

We write our server in Javascript so we are able to share the same Format.JS packages and content that we use on the web.

On the web, we use <LocalizationProvider /> at the root of our app. On the server, we don't have a root that is shared, but we have created a getT(locale) function that will get a t translation function for the language specified. So when we want to translate something, we look at the locale from the request (or user in the DB) and then get a t function that we can use to translate things.

We also use our server to send emails, SMS messages, and WhatsApp notifications. We can use the same translation framework we are using for all of these. When we send an SMS, we look up the user's locale and then translate the message into their language. It works pretty seamlessly!

iOS

Wow, we had to do a lot to set up Javascript to become translation aware. What framework are we going to use for iOS? — That's a trick question, we won't need framework at all.

In iOS, SwiftUI is natively translation aware which is super convenient. If you are building a SwiftUI app, you'll create views like this:

If you haven't seen SwiftUI before, it's actually very similar to React. The Text component is similar to a div in React. And we do styling by adding modifiers (.font) rather than classes or properties. Once we've written that View, Xcode is smart enough to analyze our SwiftUI code and automatically pull out the Welcome to Luma as something that we need to translate. We can even add a description for a translator like so:

You'll notice this looks pretty similar to the FormatJS code above, but we're missing an ID parameter. Xcode keys translations by their default value. So if we have two Text components with the string Welcome to Luma they will be translated as the same string, even if they have different comments.

When you build your app, Xcode will automatically extract your strings into an .xcstrings file. You can hand this file to a translator, ask them to translate to their language, and then import it back into Xcode.

Here's what a .xcstrings file looks like with one translation key which is translated to Spanish.

Xcode provides tools to export and import translations. But the basic goal is to have a .xcstrings file with all of the translations filled out. So if we can do that, then Xcode will be translation aware.

Creating the Translations

We've spent a lot of time talking about how the apps and backend will display translated content, but there's a big missing piece...

How do we actually translate the content of the app?

Take the "Welcome to Luma" message. We'll need to translate that into every language our app supports and then store it somewhere for the code to pick up.

Hiring Translators and Using Lokalise (Don't Do This)

This entire section is about a mistake we made. So if you're just interested in figuring out how to set up translations for your app, you can skip this section.

We had our first code ready for translation at the end of 2023. I Googled around and found that the "correct" way to translate is to hire translators, give them access to a translation management service (TMS), and then integrate the TMS into the rest of your app.

There are a lot of TMS options, but like most enterprise software, they are all a serious pain to use. You'll need to integrate with the TMS so that when you have new code to translate you'll upload the content to the TMS and then when the translators have translated it, you will pull the translation out of the TMS and into your app. (The TMS vendors also try to lock you in by giving you tools to dynamically request translations from your app, but this is a bad idea for many reasons.)

The least bad option we tried was Lokalise, yet it's still very bad. For example, they don't support the .xcstrings file format which was released by Apple over a year ago. And there are many times you'll click on a notification from Lokalise and will need to entirely refresh the page a few times to see the content. It's just middle of the road enterprise software.

But we still used Lokalise a few months! At the beginning of 2024, I tried using various AI translation tools, including AI translation provided by Lokalise, and they were all pretty bad.

So we hired over 10 translators using a mix of our Luma network, Upwork, Twitter, and LinkedIn. Then we gave the translators access to Lokalise. They were tasked with updating the translations as we had new content to translate. This generated a ton of chaos. First of all, it's really hard to hire a translator in a language that you don't speak because you can't assess their translation skills. I was calling friends who spoke different languages and begging them to review translators.

Then it's just hard to manage a big team. The translators were spread out across the world and had questions about different content. They would ask some questions as comments in Lokalise but Lokalise was difficult to navigate so they wouldn't get quick answers to their questions.

Additionally, even if you are good at translating, you may not be able to easily parse some of the more technical content like our plural string format {count, plural, one {1 Response} other {# Responses}}. Translators consistently made syntax errors while translating this kind of content.

Our average lead time for translation was about a week. So if we merged in a new feature, we would upload those strings to Lokalise and then wait a week to get translated content. This meant a significant part of the site would not be translated at any given time.

In an attempt to reduce this lead time, we were hesitant to make small changes to strings in the app. For example we would be hesitant to update Welcome to Luma to Welcome to Luma! because that would generate more work for the translators.

In summary, using Lokalise and managing translators was very difficult. But it seemed like the best solution at the time because the AI workflows we tried were not good enough.

Creating an AI Pipeline

We first tried AI by using AI translation services. We thought they may have some secret sauce so their translation would be an improvement over just plain ChatGPT.

These services did not create good translations. There were many times we had an ambiguous word in our app (Event is a good example) that was translated to a totally incorrect word. The AI wouldn't have enough context about Luma in order to know that an Event is a conference / party / happy hour and not an analytics event, a point in time, or a step in a process. So we created an AI pipeline that would add more context for every translation. When we run translations today we run a translate(...) function on every string in our app and give it the following context:

A base prompt — We share information about what Luma is and the tone that we attempt to go for in our marketing copy. We also share some output formatting guidelines.
A project based prompt — iOS and web have different translation formats, for example plurals on web are handled with ICU format where iOS has their own plural format.
A language based prompt — We give tips about the language here. We include a glossary for the language along with other tips that are language specific. If we see that a given language has some issues, we can usually update this prompt to fix the issues.
Information for the string we want to translate — Each string we are translating has the English message and an optional comment.

For each string, we concatenate the above into a prompt and then run a Claude 3.5 Sonnet completion to generate the translation. We use the Tool Use feature with a function called translate — this helps us reliably get the translation in a format that we expect.

When we receive the translation, we store the translation into a translated_string table in our database.

From our testing, we've found that using this pipeline with GPT-4o or Claude 3.5 Sonnet creates translations that are at least on par with the translators that we hired. Today we're using Claude 3.5 Sonnet as it consistently outperforms GPT-4o on translation quality.

It is still incredibly hard to judge the output quality of translations, but across our team we have people who speak about 6 different languages fluently, so we benchmark on those languages and hope that the AI generalizes well across languages.

Daily Translations

So now we've updated our app to be translation aware, and we have a script that will translate content and insert the translated strings into our DB.

Let's look at how we ensure our app translations are kept up to date. Every night, we run a batch translation job using GitHub Actions. The translation job does the following steps:

Extract all of the strings from web and iOS and add them to the translated_string table in our database (if they don't already exist).
Pull all untranslated strings from the translated_string table and run them through the AI pipeline above for every language we support. After completing this step, we should have a table translated_string that has all of the strings in our app with translation completed for every language.
Write the translated strings to files in our code. For iOS we update the .xcstrings files and for Javascript we have a locales folder where we store a file of translations for each language.
Open a GitHub PR with the translated files. We have one translation PR every night and we can easily approve and merge it in to release those translations.

This entire process means that our app has good translations in 40 languages that are updated every day. When we release a new feature, we simply merge it into master with the assurance that it will be translated by tomorrow morning.

Note: we do this nightly rather than with every PR since we don't want to deal with the merge conflicts and noise that would come with 10s of additional PRs every day.

Summary

We made our apps translation aware (across web, iOS, emails, SMS, etc) and increased non-English usage 4x in 6 months. We have a pipeline set up so that we have high-quality translations shipped to production every day. And we don't have to worry about it or futz around with some enterprise translation software, it just works!

I was the primary person working on this project and was able to ship it while building a lot of other features for Luma at the same time. If you are interested in creating simple, robust solutions, you should work with us! Send me an email 😃.

And if you have questions about this post or translation in general, you can ask them on Twitter and tag me.