Do you speak UTF-8?

“ @ mperham : You have a problem, your data is in latin1 so you think : ” I’ll convert to UTF8 !” Now you have � problems .” cc @ kingshy_g

Everyone of us coders who dealt with encodings felt that pain, didn’t you?

During my developers career I was quite lucky. Had to deal with encodings quite seldom.
And if I had to: Ok, it wasn’t my fault. The provider of the data had chosen (by not knowing it better) that exotic encoding. But I was in charge to solve this problem!

Actually for me the whole encoding issue feels like a neverending Y2K bug.
We have the proper encodings nowadays, but we as computerists were not able to bring this topic to an end.

While reading different resource file formats Lingohub has to deal with this subject:

  • Java resource bundles are stored in ISO8859-1 with UTF-16 escapes
  • iOS strings are stored in UTF-16 (sometimes you have to guess: little/big endian)
  • XML: encoding=”UTF-8“. Good idea! But this could be a lie (by copy/paste)
  • some other formats do not have a defined encoding, nor you have any metadata that give you that information. So you have to know in your application which encoding it will be

Ok, Ok. This was just a rant and won’t give you any solutions.
I will finish it for today and will start this topic as a series of posts to give you some ideas how we solved some of our issues in the encoding domain.

Do you speak Japanese?

Reasons for running multilingual websites

As economic globalization has led to a global market, companies need to attract people from all over the world. Finding customers online may sound easy, but it’s not enough to simply offer products and services online. Internationally successful companies need to consider many other facts, too. One of them is communicating with clients in their native language.

If you communicate in your customer’s mother tongue, they will remain on your website for twice as long and are four times more likely to purchase from you!

Of course, millions of websites are in English, and more than 20% of Internet users are English speaking. But the dominance of the English language on the web is ebbing as there are billions of people on earth that speak in a different language.

Top 10 languages

The best way to win your client’s favor is to communicate in their native language. The figure below illustrates the top 10 languages used in the Internet. English (27%) is followed by Chinese (23%) and Spanish (8%) in the list of top 10 languages that most frequently appear online.

Did you know about Japanese?

What may surprise you is that Spanish is followed closely by Japanese: There are nearly one million Japanese-speaking Internet users, which represent 5.0% of all Internet users in the world and 78% of use the Internet.

Talking to Europe

The latest Eurobarometer survey provides another reason to develop multilingual websites: 9 of 10 internet users in the European Union said, that given a choice of languages, they would always visit a website in their own language. Only a third of them is using a foreign language actively, and only 20% of Internet users in the EU would buy products from a website in a foreign language.