i18n Resource File Formats: RESJSON files

In our last blog posts on localization resource file formats, we outlined RESW and RESX formats. Since then, the newer Microsoft products are using JSON-powered formats, especially in their mobile frameworks. Here is a look at this format, and some of its characteristics for app localization.

RESJSON files are used by Windows “Metro” style applications developed for Windows 8. They are saved in a JSON (JavaScript Object Notation) format and contain strings that are often used for localizing the application’s user interface. Lingohub supports this format since earlier this year, see our news article. If you’re developing apps for Windows Mobile, Windows RT or Windows 8, you most likely will be working with these file types.

Developers often create different RESJSON files (e.g., resources.resjson) in locale folders named /en-US/, /fr-FR/, /ja-JP/, etc. Each resources.resjson file has strings localized for the particular language folder.

The RESJSON resource file formats follow the standard JSON syntax:

  • the whole content is enclosed in braces ( { } ) with no new lines
  • key-value pairs are delimited with colons ( : )
  • keys and values are surrounded by quotes ( ” )
  • key-value pairs are comma separated
  • place-holders syntax: {name}, where “name” can be a combination of non white space characters
  • key-value pairs with key syntax like: “_somekey.comment” where “somekey” is an existing key, are treated as comments belonging to key-value pair with “somekey” key. The location of the comment in the file is not important.
  • we use UTF-8 encoding for RESJSON resource files exports by default, but we also support other encodings our users might prefer.

RESJSON resource file formats example:

We’re curious to find out what your experiences are localizing apps for Microsoft’s new platforms. I hope this overview is helpful and we’re continuing our blog series on resource file formats, also check out some of the previous blog posts on localizing for other platforms.

i18n Resource File Formats: RESX and RESW files

Next up in our blog series on localization resource file formats are Windows-related formats. RESX files are used by programs developed with Microsoft’s .NET Framework. They store objects and strings for a program in an XML format. They may contain both plain text information as well as binary data, which is encoded as text within the XML tags.

RESW files are used by Microsoft Windows and Silverlight applications and contain strings that are used to localize the application for different languages and contexts. They are often used with XAML applications (such as Expression), which abstract the user interface strings to resource files. Let’s have a closer look at how these files look like in terms of formatting:

Syntax of the RESW and RESX resource file format:

  • documents start with <?xml version=”1.0″ encoding=”ENCODING”?> where ENCODING is desired encoding
  • key-value pairs are nested within a <root> element and have this form:

<data name=”key” xml:space=”preserve”><value>value</value></data>

  • place-holder syntax is: {name}, where “name” can be a combination of non-white-space characters
  • HTML comments preceding a key-value pair are treated as a translation descriptions belonging to that pair, and can contain LingoChecks
  • we use UTF-8 encoding for RESJSON resource files exports by default, but we also support other encodings our users might prefer.

Example of the RESX/RESW resource file format:

Thanks for reading. Let us know if  you have questions on using RESW or RESX resource files for your localization projects. We also support newer files for Microsoft-related projects, such as Windows 8 or RT, see blog entry. Our series on localization file formats will be continued. We’ve previously covered .ini , .strings and .properties files, for example. I am looking forward to your comments.

i18n Resource File Formats: YAML files

We are continuing our series on localization resource file formats, this time with a closer look at one of the most popular formats, especially since the rise of Ruby/Rails in web development. For those not so familiar with YAML I suggest the very good basic introduction that the Wikipedia article provides.

YAML is a human-readable data serialization format. Its syntax was designed to be easily mapped to data types common to most high-level languages (lists, associative arrays and scalars). YAML is what we use for localizing Lingohub in our Ruby on Rails application. It has proven to be a very efficient production environment, especially used in conjunction with Github integration that Lingohub supports natively.

Unlike some other formats, YAML has a well defined standard. Let me outline some key features below followed by an example. As always, feel free to send us questions on these file types or others that you want to use to localize your apps.

Key features of YAML resource file format:

  • key-value pairs are delimited with colon ( : )
  • values can be surrounded by quotes
  • correct and consistent line indentation is important
  • comments start with a hash sign ( # ), and are ignored by the parser
  • in the Lingohub context, all comment lines directly preceding a key-value pair (with no blank lines in between) are treated as translation descriptions or LingoCheck rules belonging to that line.
  • place-holder syntax is: %{name}, where “name” can consist of multiple non-white-space characters
  • we use UTF-8 encoding for YAML resource files exports by default, but we also support other encodings our users might prefer.

An example of a YAML localization resource file format:

References:

We are continuing our series on localization resource file formats with with some other popular file types soon, stay tuned and feel free to post questions below.

How locales turn the Internet into a global village

"Love in less-common languages" by Quinn Dombrowski (http://www.flickr.com/photos/quinnanya/). License: CC-BY-SA 2.0 (attribution, share-alike)

Picture: “Love in less-common languages” by Q. Dombrowski. License: CC-BY-SA 2.0

What makes the Internet global? It’s multilingualism expressed in the technical possibility of reaching each user in his or her native language, independent of the software development process that works behind the scenes, on websites, apps and on their devices. The success of localization has great implications. Recently, Google rolled out its email client Gmail in Cherokee, possibly the first software giant to deliver a global app in a Native American language. Localization means that your product can be rolled out world-wide, multiplying your possible customer base manifold and reaching deep into niches in your market that you otherwise only touch slightly.

In order for the web to be truly global and local at the same time, we need so-called locales, which you might know from their underscore looks, e.g. en_US (US English), de_AT (Austrian German) or pt_BR (Brazilian Portuguese) – the first part being the major language, the second part the region modifier. The trick is to allow for a website, software or mobile app to be available in English, German or Portuguese, but there are varieties that can be taken into account that affect – in other contexts – also the set of special characters, vocabulary, other place names, et cetera. Locales are the standardized labeling system software developers can use to create truly multilingual projects.

Technically, each system is different, but what most projects have in common (or should be prepared for) is a so called world-readiness (as Microsoft calls it), that means all text parts in your software are variables that can pull the corresponding text in the desired locale, for example from a text file (resource file). There is a separation of interface, visuals and content, and the content is available in various language files for example. If you plan cross-platform roll-outs, lingohub can display the same content in various formats, just a click away – because multilingualism is one aspect of two that should complement each other: a cross-platform strategy. More on that in a future blog post.

How to avoid duplicate content SEO punishment with hreflang

When you intend to reach an international audience with your web presence, there are basically two ways to go about the task. One is to do a 1:1 translation of all your content into other languages. It is the seemingly less exhaustive process and makes sense only if the same content applies to all markets more or less without strong modifications. That could be the case for instructional content, for example. The other is to do an actual adaptation of your product content for each individual target language market individually, which requires a bit more work, and a native language writing staff to create the unique content. The advantage is, that the second option is an actual localization, with content directly relevant to individual markets, their unique requirements and nuances, written by people who are familiar with what matters to your customers and their culture.

Google hreflang webmaster tools screenshotThe first optio,n even though it creates clear comparability of your content offering, however creates an SEO dilemma: some search engines will then classify your different language sites as duplicates. According to a recent article in “Website Boosting”, one way to counteract this, is to use the hreflang attribute as suggested by Google. That way, your (for example) product.com, product.es and product.co.jp websites will no longer be treated as three duplicate sites competing with each other in search ranking, but one in different locales (working as landing pages) depending on who accesses it. Search engines will be informed about connected content and users can be pointed to a specific site depending on their browser’s language settings. In that example, they also advise country top level domains as they signal a language preference to the customer. It is recommended however, to take such steps carefully and consult with experts on how to ensure that  your website is out of risk with all common search engines.

The hreflang attribute (examples below) signal different locale versions of similar content URLs to the search engine, and are recommended also (aside from full 1:1 translations of your content) in case there are minimal differences between internationalized content pages, or if you have a case where your site is partially localized and you want to smooth out the experience for the user as much as possible. Using the attribute becomes especially relevant if you have multiple sites in the same language, but in different markets (serving .co.uk, .com.au and .au for example), or if you’re working with Google’s canonical attribute in combination with hreflang. Here’s an example of applying the hrefllang attribute to signal two versions of your website for the Canadian market, one in French and one in English:

<link rel=”alternate” hreflang=”en-ca” href=”http://en.yoursite.ca” /><link rel=”alternate” hreflang=”fr-ca” href=”http://fr.yoursite.ca” />

When it comes to multilingual content, many shy away from going the extra mile in providing your customers with an authentic native language interaction. We argue that you should definitely prefer approaching customers in their language over assuming they speak enough English to use your product. In order to get there, a quick analysis of your product development process will reveal whether the localization segment of the product cycle can be smoothed out from a technical, financial or managerial angle. Usually, rolling out in more languages is less of a headache than it has to be. Watching out for technical set-ups of your website or app can save some additional trouble on the way to going global.