21 March 2015

Angular and i18n - The new world

We all know that internationalization is important when it comes to bigger apps, or just the ones that are simply used across countries. Angular itself comes with very poor i18n support, which is why the community has built their own solutions to extend the framework’s functionalities to their needs. However, there’s finally a first-class solution evolving that will be baked right into the core.

In fact, I was honoured to give a talk on that topic at this year’s ng-conf with Chirayu Krishnappa and you can watch the recording of it right here. This article is a detailed write-up based on the talk and I hope it will answer all the questions it raised.

Understanding the process of i18n

When I started working on angular-translate two years ago, i18n to me, was really just about making it possible that the user of an application is able to change the locale through the user interface. So what is needed? Well, we have our application, we replace all strings with an abstraction that takes care of displaying the actual messages of a certain locale later at runtime, tokenize all our existing messages and write them into JSON so we can easily add new messages - Done!

Even if this approach works quite well (that’s the one angular-translate uses and it’s used in a lot Angular apps), it turned out that it also comes with it’s pitfalls. In addition to that, there are even some things left out when it comes to the whole i18n process.

You wonder what these things are? Just ask yourself: Who defines the tokens for each message that needs to be translated? In which file format does your translator receive your message bundles? What if your translator needs context for your messages in order to translate them properly, how do you provide it? And once the translator translated all your messages, how do you get them back into your existing application?

Right, all of a sudden we realise, that there’s much more required than just translating messages from one locale to another. In order to make i18n as a process a bit more clear, here’s a graphic that visualises a good i18n solution that covers all the problems we’re facing with existing solutions.

i18n as a process visualised

Let’s go through this step by step:

Developers write HTML templates - This should be super natural. In a good i18n solution, we as developers just write our HTML templates as usual, either manually or maybe dynamically generated by a server. In an Angular app those could be our routing views or templates for custom directives. Also important to notice: we should be able to write our templates in any language that we prefer, it shouldn’t be required to introduce tokens for every single message that needs to be translated. In fact, a simple HTML template that just contains plain text, is a valid template that can be translated.
Messages are extracted and bundled - In order to translate all of our application messages, we somehow need to extract them from our existing templates, bundle them together and hand them over to our translator in an expected file format (e.g. PO or XLIFF). The message extraction as well as the file format transformation should be done by automated tooling. We should be able to re-extract message any time as we’re developing new features in our application.
Translator translates messages - Once all messages are bundled as common translation file formats, the translator can take those files, import them in translation software of choice and use it’s graphical user interface to translate all messages to a different locale. The software then again, exports new translation files containing the new locale, which are handed over to us so we can process them.
Template/JSON generation - At this point, we need to decide in what way we want to implement i18n. We either want to generate new templates for each new locale so we can serve them accordingly from a web server, or we decide to use a client-side solution that e.g. consumes a certain JSON structure with the locale information and takes care of the locale change in the web browser. We might even need both solutions combined depending on our use case. However, whatever we decide to do, what we always need is tooling that allows us to do all these things.

As you can see, there’s really much more involved when it comes to i18n. What we also notice in this graphic, is that it’s an iterative process. No matter for which way we decide to get the new locale back into our application, we need to repeat that process over and over again as new features or changes happen in our application.

Okay cool, now we know what i18n is all about, but what is it with the new i18n solution that comes to the Angular core?

The new i18n story in Angular

Internationalization support in Angular has been very poor so far. You might know that there’s an ngLocale module you need to include, which is used by a couple components, like ngPluralize, date and currency filter to name a few, and that’s pretty much it. As we’ve already discussed, there’s so much more that comes into play when internationalizing an application, which is why there’s finally a new solution evolving that will bring first-class i18n support to the Angular framework.

Here’s a what the new solution will cover:

Tooling - As we’ve learned earlier, there’s a lot of tooling required in order to implement a smooth i18n experience for both, developers and translators. We work on all the tools that are needed to realise that process, which includes message extraction, file transformation, template generation and more. We’ll also provide APIs so you can write your own plugins to extend the pipeline to your personal needs.
Plural and Gender select - Pluralization and gender selection is a rather isolated topic that can and has to be done whether you want to support your apps in other languages or not. With the new i18n solution we’ll extend the existing string interpolation and support ICU Messageformat inside curlies in any template.
HTML Annotations - In order to get high quality translations, we need to provide our translators with context, descriptions and meanings. There’ll be a new syntax to annotate existing templates with meta information that will be extracted together with all messages. This syntax is and will always be backwards compatible, in fact, we’ll be able to use it in any website or application without breaking anything.
Server and Client - The new i18n solution will work for both scenarios, generate templates on the server side and dynamic locale evaluation on the client side. This is a very powerful fact, since we don’t force anybody to use a certain strategy.
Pseudotranslation - Since the process of translating messages from one locale to another can take very long and only once we get those translations back we actually see and realise that our application UI is completely broken with the new locale, the new solution will support something called pseudotranslation. With pseudotranslation we can generate gibberish languages in order to test our application user interfaces as soon as possible.

Oh my, that’s a lot right? Yes indeed it is! But that just confirms that there’s a lot of processing and tooling involved when it comes to i18n. As we can see, the planned solution solves all the problems described above, which is awesome.

Let’s take a closer look at some of those topics to get an idea of how we plan to implement them.

HTML Annotations

One of the most important things when it comes to i18n, is to provide our translators context along the messages that get extracted from our templates. Imagine we have a simple template as the following one:

<p>Limit is 255 characters.</p>

There’s nothing special about this template. In fact, a simple HTML template that just contains plain text is content to be extracted by the message extraction tool. However, if this particular message gets extracted and handed to our translator, how does he or she ever know what this message is about? In the end, all the translator gets is this:

Limit is 255 characters.

What is the “Limit” we are talking about? Also, “characters” can have different meaning depending on the context of the message. In case this example doesn’t make it clear enough, just think about the word “crane”. Without context it could be the bird, or the machine, right?

In order to get high quality translations back, we need to provide enough information to the translator and this is where HTML Annotations come in. The message extraction tool knows about annotatoins via an i18n attribute which we can use to annotate existing templates with meta information.

Coming back to our example, what is the information we want to provide to the translator? Let’s say it’s a message to show the maximum length of characters allowed in a comment. With the i18n attribute we can add this information to our message:

<p i18n="Label to show the maximum length of characters
   allowed in comments.">Limit is 255 characters.</p>

Now when the message gets extracted, the description is extracted as well which ends up at our translator’s software and would look something like this (Note that translator’s usually have a proper graphical user interface):

Limit is 255 characters.

DESCRIPTION: Label to show the maximum length of 
             characters allowed in comments.

Now it makes much more sense to our translator and he or she can actually reason about the given messages. The cool thing about the annotation is, that it doesn’t require a DOM element with an i18n attribute. There’s another comment based syntax annotation we can use if we don’t have that surrounding DOM element.

<!--i18n: Label to show the maximum length of
          characters allowed in comments.-->
Limit is 255 characters.
<!--/i18n-->

Another use case were messages are extracted and meta information needed is when HTML attributes are used. Just take a look at this simple snippet:

<input placeholder="Firstname Lastname">

placeholder is a standard HTML attribute for input elements. For attributes like this, our new tools will provide our translator with a default description in case none is provided.

Firstname Lastname

DESCRIPTION: HTML input control placeholder text

However, in case we do want to be more specific, all we need to do is to introduce a new i18n-[ATTRNAME] attribute that gets the description:

<input 
  placeholder="Firstname Lastname"
  i18n-placeholder="Text input placeholder for the full
                    name of a friend.">

Yes, this works with all attributes! It doesn’t matter if you have a directive that introduces it’s own attribute APIs or even a Web Component, as long as you prefix your attribute with i18n- the message extractor will take care of extracting the message together with it’s description.

Placeholders

Even if what we’ve seen so far is pretty powerful, you might have noticed that we’ve only seen simple text messages that get extracted. But usually, in an Angular application, we have interpolation directives in our messages. Let’s say we replace the limit value in our example with something dynamic that is interpolated later at runtime.

<p i18n="Label to show the maximum length of characters
   allowed in comments.">Limit is {{count}} characters.</p>

What will the translator see now? Well, it turns out that the Message Extraction tool will extract those messages as so called Placeholders. Our translator would see something like this:

Limit is EXPRESSION characters.

DESCRIPTION: Label to show the maximum length of 
             characters allowed in comments.

PLACEHOLDERS

- EXPRESSION
    Description: Angular Expression

A placeholder cannot be changed by a translator. However, they are able to move them around inside the message in case it’s needed for the locale messages are translated to. We also notice that we get some meta information on what the placeholder EXPRESSION is about.

Of course, our translator probably doesn’t know about Angular at all and even EXPRESSION doesn’t really tell anything. In such cases we can explicitly define the name of a placeholder that ends up in a message and we’re even able to provide and example to give even more information.

Something like this would be more useful to the translator:

Limit is MAXCHARS characters.

DESCRIPTION: Label to show the maximum length of 
             characters allowed in comments.

PLACEHOLDERS

- MAXCHARS
    Description: Angular Expression
    Example: 255

The annotation syntax to override default placeholder names and add examples is still in discussion.

Pluralization and Gender

Using HTML annotations, we’re able to provide our translators with meta information about our messages that live in our templates. This technique is actually already everything we’d need to kick off the process of getting high quality translations.

However, there’s one topic that we haven’t talked about yet, which is Pruralization and Gender selection. Pluralization and Gender selection is rather decoupled from the actual i18n process, even if it can be part of it though. But we don’t have to internationalize our applications in order to implement plural and gender selection in the first place. The new i18n solution will come with a nicer and easier to use story for plural and gender, that works seamlessly with the rest of the process.

Wait, don’t we have ngPluralize for this? Yes we do and it will always be there. It turns out that there are some issues with ngPluralize though. For instance, how would you implement pluralization with that directive if you want to pass a pluralized message into an attribute? Right, it’s not possible. Also, with different locales, different plural categories might be needed and added by our translator, since they vary dramatically across languages.

With the new i18n solution, Angular’s interpolation syntax gets extended with the ICU Messageformat syntax. There are many Messageformat implementations for different programming languages and libraries out there (even angular-translate supports it) and in fact, it’s used by a lot people.

Alright, but how can we pluralize declaratively with that support? Well, take a look at the following snippet.

{{numMessages, plural,
     =0 { You have no new messages }
     =1 { You have one new message }
  other { You have # new messages }
}}

It’s that easy. If you’re familiar with Messageformat then you know this syntax already. We’re able to overload a simple Angular interpolation by defining the selection type (plural or gender) as second parameter and a “list” of categories to match the selection based on the given expression numMessages. The # symbol is a placeholder for value that the expression returns.

Why is that cool? We can use that syntax inside HTML attributes, we can use all interpolations inside selection messages (incl. filters), and even more important, our translators are likely to know this syntax. Because what ends up in their software, once extracted, is a message that looks something like this:

{numMessages, plural,
     =0 { You have no new messages }
     =1 { You have one new message }
  other { You have # new messages }
}

Yeap. It’s pretty much the same, except that they see single curlies instead of double curlies. Now, if a language has different categories than 0, 1 and other, maybe few, many and more, translators can easily at those and it gets especially easy if their software comes up with corresponding suggestions based on the locale.

Just to give you an idea what the gender selection syntax looks like, here’s an example.

{{friendGender, gender,
    male { Invite him }
  female { Invite her }
   other { Invite them }
}}

We can nest plural and gender messages as well, which makes this feature super powerful.

Angular core pre-compilation step

You might wonder how things like HTML annotations and pluralization and gender syntax are supported if we happen to choose the client side solution, right? In order to make all those nice features available to our Angular apps, without requiring a server, Angular performs a pre-compilation step which will manipulate the DOM before the actual compilation kicks in. This happens once initially so messages won’t be translated over and over again.

The pre-compilation also takes care of rewriting our plural and gender messages in order to make them work with the categories of a different locale. The same happens with all other messages that need to be translated as well, which means we can still just write our templates in one locale.

Okay, awesome! When can we use it?

This article detailed the overall picture of the new i18n solution that comes to the Angular core. While it’s planned to bring all mentioned features and tools to both versions of Angular, we need take one step at a time. Angular 1.4 will ship with support for Messageformat syntax inside Angular interpolations, so we can start using proper pluralization and gender selection throughout our applications right away.

Prototypes for tooling are already developed and production ready implementations are in the making, which will enable us to use and extract HTML annotations.

Work on support for Angular 2.0.0 is likely to start a few weeks after Angular 1.4 is out.

And of course, contributions are always welcome!

Written by

Pascal Precht