Falsehoods programmers believe about languages (localization)

RSS Bot@lemmy.bestiver.se · 1 month ago

Falsehoods programmers believe about languages (localization)

Lvxferre [he/him]@mander.xyz · 1 month ago

Those points are all sensible and sane. All of them.

A few additional notes:

Sentences in all languages can be templated as easily as in English: {user} is in {location} etc.

Not even English can be templated this way; doing so would immediately break once you need the user to insert a noun, and you need to add either a/an or pluralise it.

But by far the main issues are morphological case and agreements. Failure to properly account for both lead to situations in many languages that sound as ridiculous as *him gives these book to I, but with nouns instead of just pronouns. Yes, it sounds that bad.

Words that are short in English are short in other languages too.

The same applies to long words. Empty space might not be as big of a deal than text out of boundaries, but it’s still annoying.

For any text in any language, its translation into any other language is approximately as long as the original.

It gets worse: translations are often lenghier than the original, as the translator tries to keep nuances through additional words. Take that into account and make the space flexible-ish enough to allow some verbosity.

For every lower-case character, there is exactly one (language-independent) upper-case character, and vice versa.

I see someone handled Turkish ⟨i İ ı I⟩. Or German ⟨ß⟩; technically there’s ⟨ẞ⟩, but it’s often more sensible to capitalise it as ⟨SS⟩ instead.

The lower-case/upper-case distinction exists in all languages.

Related: Arabic. A letter can have up to four forms (initial, medial, final, isolated), but no lower/upper-case distinction.

All languages have words for exactly the same things as English.

The opposite is also false. For example, trying to mindlessly translate English “be” into either Spanish “ser” or “estar” only leads to ridiculous sentences.

There is always only one correct way to spell anything.

And the correct way often depends on the situation; refer to English ⟨a⟩/⟨an⟩, Spanish ⟨y⟩/⟨e⟩ (both “and”), etc.

Numbers, when written out in digits, are formatted and punctuated the same way in all languages.

The same applies to currency. Plenty languages out there would use “400 X$” instead of “X$ 400”, even using Latin alphabet.

Geolocation is an accurate way to predict the user’s language. // Country flags are accurate and appropriate symbols for languages. // Every country has exactly one “national” language. // Every language is the “national” language of exactly one country.

You have no idea on how many bones I have to pick against nationalists (YES) who insistently disregard those four points.

Xavienth@lemmygrad.ml · 1 month ago

Canada breaks all four of the last assumptions in the article, and the one about exactly one correct spelling.

Lvxferre [he/him]@mander.xyz · 1 month ago

Most countries break those four. All those hard-coded associations that we make between country and language are at best generalisations/stereotypes, at worst superstition.