Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
Internationalization and localization
J.Serrat
102759 Software Design
June 18, 2014
Index
1 Introduction
2 Graphics
3 Encodings
4 Internationalization in Java
5 Internationalization in Android
References
Books
Java internationalization. A. Deitsch, D. Czar-necki. O’Reilly, 2001.
Usability and internationalization of informationtechnology. N. Aykin editor, LEA 2005. Chap-ters 1 overview, 2 what to localize, 6 icons.
Developing international software. Dr. Interna-tional group. Microsoft Press, 2003.
References
Tutorials, slides, web pages :
Software localization. Edutech wiki entry athttp://edutechwiki.unige.ch/
Internationalization in Java. Oracle tutorial athttp://docs.oracle.com/javase/tutorial/i18n/.
Localization tutorial in Android developers site.http://developer.android.com/guide/topics/resource.
Slides on localization in Android.http://www.coreservlets.com/android-tutorial/. See alsoslides on handling screen rotations.
Slides and comments of MIT course 6.813 User interfacedesign and evaluation, lecture 21.
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Internationalization and localization
Internationalization, i18n
Process of designing and implementing a software application sothat it can be very easily adapted to various languages and regionswithout engineering changes to the programming logic.
Easily: No need to recompile. Only one set of source code, binary,help, installer, data. . . is produced to support all the places inwhich software will be installed.
Localization, l10n
Once a piece of software has been internationalized, is the processof adapting it for a specific language and region, that is, to alocale.
5 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Internationalization and localization
Are they different ?
Internationalization is the adaptation of products for potentialuse virtually everywhere.
Localization is the addition of special features for use in aspecific locale.
Internationalization is done once per product.
localization is done once for each combination of product andlocale.
6 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Internationalization and localization
Locale
Collection of features of the user’s environment that is dependenton language, country and region, and cultural conventions.
What kind of features ? Obviously language, plus
Language related: regional version, script (not only alphabet),writing direction, text sorting, plurals . . .
Cultural conventions : format of date and time, calendar,numbers (decimal separator), addresses, names, person titles,currency, physical units (weight, length, size), meaning ofcolors and gestures, icons. . .
Legal issues : country borders, names of seas and places . . .
7 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Internationalization and localization
Wikipedia entry “Date format by country”
8 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Internationalization and localization
Localization example
Java internationalization. A. Deitsch, D. Czarnecki. O’Reilly, 2001.
9 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Internationalization and localization
Localization example: MS-Word
10 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Why
People from other countries are used to the [American |Western] way and they like to be [American | Western]ized.This may seem offensive at some places. Cultural differencesdo exist and must be taken into account.
The world speaks English.Only 8% to 10% of the world’s population speaks English astheir primary language.
My users can read English, so why should I localize mysoftware ?To avoid confusion and increase usability. Today, more andmore web sites are providing content in multiple languages,and more and more software products are being designed withcultural differences in mind.
11 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Why
Target users are my country’s nationals
Google Play is available in more than 190 countries.While the exact breakdown varies by application category,in most cases more than 50 percent of applicationinstallations are downloaded from countries outside theUnited States on devices whose language is set tonon-English.
Japan and South Korea represent the two largestconsumers of applications outside the United States.
(...) Anecdotally, it has been found by manydevelopers that bad translations are considered worsethan no translation.
Professional Android 4 Application Development, Reto Meier, John
Wiley 2012.
12 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Why
Users will ask for localized software if they need it.They expect that the software works correctly in theirlanguage using their local conventions.“Internationalization isnot a feature”: it is an essential part of the software design.
13 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Why i18n is difficult ?
Can not rely on automatic translation. Meaning depends oncontext.
Can’t necessarily rely on bilingual members of your design team,either. They may be reasonably fluent in the other language, butnot sufficiently immersed in the culture or national standards.
Actually, there is an industry of software localization.
“You are not the user” is especially true in internationalization.
14 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Why i18n is difficult ?
Hotel signs
“Please leave your values at the desk.”— France“You are invited to take advantage of the chambermaid.”— Japan“Ladies are requested not to have children at the bar.”— NorwayJava internationalization. A. Deitsch, D. Czarnecki. O’Reilly, 2001.
15 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Why i18n is difficult ?
I18n is not only about translating (possibly large amounts of) textwithin a context. Also
text format, size and reading direction
images and icons
the UI layout itself . . .
Many potential locales: just main Western languages plus CJK(Chinese, Japanese, Korean) may account for 10 versions of the UI.
16 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Graphics
Generic rules for images and icons :
Acceptable and understandableworldwide: less will have to be localizedfor. Draw on imagery of medicine, maths,transportation, sports, travel and trafficsigns, consumer goods, business office.
Remove text from graphics or it will needto be translated.
Avoid political, religious and genderstereotype references.
Avoid referring to culture-specific heroes,symbols, puns, and idioms.
17 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Graphics
Take into account different colormeanings in different cultures.
Avoid images of people, body parts andgestures, they are subject to differentinterpretations even sexual or offensive.
Show the best know version of an object.
If there is no universal symbol for aconcept, mix several versions.
Make reading direction explicit orunimportant.
18 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Scripts
Script
A collection of characters for displaying written text. One scriptcan be used for several different languages
Several languages may share a common script: Spanish, French,English are written in Latin script.
Some languages require multiple scripts : Japanese requires atleast hiragana, katakana syllabaries and the kanji ideographsimported from China.
19 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Japanese
Japanese writings are a mix of
2000 Kanji ideographic characters inherited from Chinese ornewly created
Hiragana and Katakana syllabaries, each with 46 symbols forthe same sounds. Katakana is used to write foreign loanwords, incorporated into Japanese.
Latin numerals and characters
20 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Japanese
21 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Korean
Based on the Hangeul alphabet. Left to right, with spaces andsame punctuation symbols. However, syllables can be made of 2, 3or 4 characters.
22 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Others
The former scripts were described just to give an idea of theircomplexity and variety. But there are many others: Arabic,Hebrew, Indic (Devanagari, Tamil, Malayalam), Thai, Cyrillic,Greek . . .
Main features :
Types of script: alphabetic, syllabic, ideographic plus mixtures
Context-dependent Glyph shaping : position within a word,ligatures with other characters
Text direction: left to right, right to left, vertical, bidirectional
Punctuation: words can be separated or not, differentpunctuation symbols
23 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Scripts
Character
The smallest components of a writing system or script that havesemantic value. Independent of the specific style or shape that acharacter might have once rendered or displayed.Ideographic character: represents a word (a meaning) or a syllable,usually in Asian languages.
Glyph
The actual shape of a character, represented as a greyscale orbinary image, scaled graphics vector . . .
For example, a Roman “a” a and a Computer Modern boldface“a” a are two different glyphs representing the same underlyingcharacter.
24 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Scripts
Fonts
Character sets are ordered collections of abstract references tocharacters and do not specify their exact visual appearance. Fontsare collections of glyphs of a specific style and appearance for acertain script or character set.
Examples: Helvetica, Times Roman, Courier . . .
25 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Scripts
Face
A certain style determining shape details and relative charactersizes for rendering several fonts.
Examples for Latin fonts: italics, bold, sans serif, small caps . . .
But also for other scripts
26 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Encoding
Encoding or Character set
A method or system of assigning numeric values to characters.
For example, ASCII, Latin-1 or ISO 8859-1, Unicode, UTF-8,Windows 1252.
One same script can be encoded differently. Some encodings arerestricted to a unique or few scripts.
27 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
ASCII
The code set that most programmers know is the AmericanStandard Code for Information Interchange (ASCII).
7-bit code set containing :
control characters (NUL, CR, etc.)
upper and lowercase English letters (A-Z and a-z)
European digits 0 . . . 9
punctuation and other symbols # $ % & \ { } ( ) [ ] + - * /? @ : ; , . ‘ ’ etc.
Not even sufficient for English (no £). No characters withdiacritics (accents) to support Spanish, French, Catalan etc.
28 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
ISO 8859
The International Standards Organization (ISO) created a series of8-bit code sets to handle a much larger selection of languages byextending ASCII filling in the free slots 1 byte offers.
The ISO 8859 Character Set includes language-specific groupings:
ISO 8859-1 or Latin 1, Western Europe (French, German, Spanish, . . . )default for the Web
ISO 8859-2 or Latin 2, Central/Eastern Europe (Hungarian, Polish,Romanian, Croatian, ...)
ISO 8859-5 Cyrillic
ISO 8859-6 Arabic
ISO 8859-7 Greek
ISO 8859-8 Hebrew
ISO 8859-9 or Latin 5, Turkish
ISO 8859-11 Thai
ISO 8859-13 or Latin 7, Baltic Rim (Estonian, Latvian, Lithuanian)
ISO 8859-14 or Latin 8, Celtic (Gaelic and Welsh) 29 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Unicode
Fixed-width, 16-bit character encoding scheme.
Characters from all the world’s major scripts are uniformlysupported. It is possible to combine Arabic, French, Japanese, andRussian characters all in the same string.
Unicode provides 65,536 unique slots. Actually, the Standardhas a back door that cannibalizes 2048 slots to make room for anadditional 1,048,576 characters. Together (. . . ) allows a total of1,112,064 characters.
The total number of individual markings that humans havemade since the beginning of time is estimated at roughly 500,000,so we should not need to upgrade to a new encoding schemeanytime soon.
Java internationalization. A. Deitsch, D. Czarnecki. O’Reilly, 2001.
30 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
UTF-8
To send and receive Unicode data from systems (web browsers,email clients) that can only deal with 7 or 8-bit data, you need touse encoding scheme different from the default 2 bytes/character(Universal Character Set, UCS-2).
The Unicode Standard supports several encoding schemes knownas Unicode transformation formats (UTF). UTF-8 has become themost common encoding scheme for Unicode data.
31 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
UTF-8
UTF-8
Unicode encoding scheme of 8-bit variable length encoding: one,two, or three bytes could represent a Unicode code point.
32 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Resource bundles
Java has several classes to makeinternationalization andlocalization possible and easy.A first example :
Catalan
Spanish
Korean
Armenian
33 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Resource bundles
Common solution: separate what varies from what stays the same.In this case,
the text in the labels is moved to an special text file, theresource bundle
each locale supported by the application will have its ownresource bundle, named after language and region, likeMessagesBundle fr CA.properties
the specific locale to use will be selected at run-time
34 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Resource bundles
1 Design the completeinterface with theactual labels in yourlanguage. Here withSWT +www.cloudgarden.com’sJigloo plugin.
35 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Resource bundles
1 The label texts go tothe source code.
36 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Resource bundles
2 Create a file with pairs (key, text)MessagesBundle.properties. Itwill be the default labels text.
Here key is the dialog class name.
Note: non Ascii characters arerepresented by their hexadecimalUnicode.
37 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Resource bundles
3 Add variableResourceBundle.
38 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Resource bundles
4 Replace constantstrings of labels bykey in the bundle.
Now the text of thedialog has beeninternationalized.
Eclipse can do this foryou: Source →Externalize strings.
39 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Resource bundles
5 Write a new MessageBundle filefor each supported locale inUTF-8 encoding. Usestandard suffix<language> <region>.
MessagesBundle ko KR.properties
40 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Resource bundles
6 Convert no Asciicharacters to Unicodecodes \uxxxx withJDK programnative2Ascii :n−→ \u00f1.
MessagesBundle ko KR.properties
41 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Resource bundles
7 Add Locale variablesand select one.
42 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Date and Time
Date and time are formed byseveral elements: year, month,day of month, day of week, hour,minute, second . . . plus delimitersymbols (\, −, space, dot etc.)Depending on the locale theformat and values of elementsand separators change.In addition, each locale typicallyhas several acceptable severalacceptable formats for one samedate or time.
43 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Date and Time
The DateFormat class has several static factory methods thatallow instantiation of a formatter object of time, date or time anddate for the default locale or a given locale.
DateFormat df1 = DateFormat.getDateInstance(DateFormat.FULL,
Locale.FRANCE);
GregorianCalendar cal = new GregorianCalendar(1965,
Calendar.JUNE, 16);
System.out.println(df1.format(cal.getTime()));
Locale.setDefault(new Locale("de","DE"));
DateFormat df2 = DateFormat.getTimeInstance(DateFormat.MEDIUM);
System.out.println(df2.format(new Date()));
mercredi 16 juin 196514:03 Uhr MEZ
44 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Date versus Calendar
Date is a simpler class and is mainly there for backwardcompatibility reasons. If you need to set particular datesor do date arithmetic, use a Calendar. Calendars alsohandle localization.
Java Date versus Calendar, Stack Overflow
When manipulating dates in Java, do not use methods ofjava.util.Date, such as Date.getMonth() orDate.getYear(), to build a String. These functionshave been deprecated since JDK 1.1 a because requireyou to handle all date formats for different localesyourself.
45 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Numbers
To display a number, it must be converted to a String andformatted according to local conventions. Locale-specific issues:
Group and decimalseparators
Numeric shapes(Arabic, European,Indic, Thai, Chinese,etc.)
Negative numbers
Currency symbols andplacement
Percentage symbolsand placement
123 456 fr123.456 de123,456 en US345 987,246 fr345.987,246 de345,987.246 en US
46 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Numbers: groups and decimals
Integer quantity = new Integer(123456);
Double amount = new Double(345987.246);
Locale loc = new Locale("de","AT"); // Austria
NumberFormat numberFormatter;
numberFormatter = NumberFormat.getNumberInstance(loc);
quantityOut = numberFormatter.format(quantity);
amountOut = numberFormatter.format(amount);
System.out.println(quantityOut );
System.out.println(amountOut);
47 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Numbers: currency
Double money = new Double(9876543.21);
// Locale loc = new Locale("fr","FR",
"EURO");
// Locale loc = new Locale("de","DE",
"EURO");
Locale loc = new Locale("en","US");
Currency cur =
Currency.getInstance(loc);
NumberFormat cf =
NumberFormat.getCurrencyInstance(loc);
System.out.println(
loc.getDisplayName() + ", " +
cur.getDisplayName() + ": " +
cf.format(money));
French (France), Euro:9 876 543,21 e
German (Germany), Euro:9.876.543,21 e
English (United States),US Dollar:$9,876,543.21
48 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Compound messages
If all the interface text strings are static, simply externalizing theminto a ResourceBundle is sufficient.
But there may be pieces of variable data that we need to insertinto the middle of text before being shown :
Flight number 4106 will begin boarding from gate 2 at 10:05 p.m.Last login: Fri Sep 24 09:51:37 from calvinYou are visitor #9,238 to this web site since January 1, 1999Volume Serial Number is 1F6E-16F1
49 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Compound messages
Problem 1: order
int numMistakes;
Locale loc = new Locale("en","US");
ResourceBundle Messages = new ResourceBundle.getBundle(loc);
System.out.println(Messages.getString("Spelling.ThereWere")
+ numMistakes
+ Messages.getString("Spelling.Mistakes")
+ Messages.getString("Spelling.File")
+ filename);
There were 3 spelling mistakes in file report.doc.
50 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Compound messages
Problem 1: order
int numMistakes;
Locale loc = new Locale("de","AT"); // Austria
ResourceBundle Messages = new ResourceBundle.getBundle(loc);
System.out.println(Messages.getString("Spelling.ThereWere")
+ numMistakes
+ Messages.getString("Spelling.Mistakes")
+ Messages.getString("Spelling.File")
+ filename);
Datei report.doc enthalt 3 Rechtschreibfehler.
51 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Compound messages
Problem 2: concordance
Locale loc = new Locale("en","US");
ResourceBundle Messages = new ResourceBundle.getBundle(loc);
System.out.println(
Messages.getString("Device.What")
+ Messages.getString("Device.State"));
The printer is enabled. L’imprimante est activee.The modem is enabled. Le modem est active.The network is enabled. Le reseau est active.
52 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Compound messages
The MessageFormat class provides a way of inserting argumentsinto a string, independent of the order in which the argumentsappear to the user.
# MessageBundle_en_US.properties
planet = Mars
template = At {2,time,short} on {2,date,long}, \
we detected {1,number,integer} spaceships on \
the planet {0}.
# MessageBundle_de_DE.properties
planet = Mars
template = Um {2,time,short} am {2,date,long} \
haben wir {1,number,integer} \
Raumschiffe auf dem Planeten {0} entdeckt.
53 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Compound messages
At 9:15 on January 10, 2013, we detected 7 spaceships on planet Mars.
Locale loc = new Locale("en","US");
ResourceBundle Messages =
ResourceBundle.getBundle("MessageBundle", loc);
Object[] messageArguments = {
Messages.getString("planet"),
new Integer(7),
new Date()
};
MessageFormat formatter = new MessageFormat("");
formatter.setLocale(currentLocale);
formatter.applyPattern(Messages.getString("template"));
System.output.println(formatter.format(messageArguments));
54 / 55
Introduction Graphics Encodings Internationalization in Java Internationalization in Android
Internationalization in Android
Slides on localization in Android fromhttp://www.coreservlets.com/android-tutorial/.
Also, slides on Handling Screen Rotations and Other App Restartsat the same place.
55 / 55