View
223
Download
0
Category
Tags:
Preview:
Citation preview
Sophia Antipolis, September 2006
Multilinguality, localization and internationalization
Miruna Bădescu
Finsiel Romania
3
How it all started…Until recently, most computers used font sets with a maximum 256 characters (ANSI):The first 128 (ASCII):
numbers letters a-z and A-Zpunctuation marks
The second 128 set varies: English-speaking world contain:
more punctuation markscurrency symbols (e.g. £)accented letters (á, é, ñ, ç, ô)
Places like Egypt, Greece, Russia contain characters taken from the corresponding alphabet: Arabic, Greek, Cyrillic
4
Code, encoding
Character code – a sequence of bits that a computer use to represent a character
Encoding – the rule describing how a set of bytes are transformed into characters
5
Problem
These encoding systems also conflict with one another – two encodings can use the same number for two different
characters can use different numbers for the same
character
Data can become incomprehensible when transferred from one place to another
6
Solution
Moving to a system that assigns a unique number to each character in each language of the world
The Unicode standard provides a unique number for every character
no matter what the platform,no matter what the program,no matter what the language
Unicode (as defined by the Unicode Consortium) has become a universal standard: ISO/IEC 10646, describing the 'Universal Multiple-Octet Coded Character Set' (UCS)
7
Unicode Unicode repertoire can be encoded in more than one way: UTF-8, UTF-16, UTF-32
UTF-8 encodes: ASCII characters on 1 byte other characters up to 6 bytes
Incorporating it into client-server or multi-tiered applications and websites offers significant cost savings over the use of legacy character sets
Enables a single software product or a single website to be targeted across multiple platforms, languages and countries without re-engineering
Allows data to be transported through many different systems without corruption.
9
I18n
Internationalization (I18n): modification of an application so that it can handle multiple languages, countries, etc.:Display content (web pages, files) in end user’s languageDisplay messages around the site in user’s language
(e.g. “Home”, “Search”, error messages)Input characters in end user’s languagePrinting out the correct charactersHandling dates, numbers and sorting words using the
rules of that language
10
L10n
Localization (l10n) involves taking a product and making it linguistically and culturally appropriate to the target locale (country/region and language)
Means to change the language on a Web site:User selectionDetecting the browser settingsAutomatically, based on the user’s profile
Translation issue:Identifying un-translated or old translations of terms and
phrasesDifferent roles for translators and content managersOffering an interface for the content translation
11
Example of XLIFF translation file coming from the translation service
XLIFF: XML Localization Interchange File Format
13
Sorting in the same language Strings must be sorted according to that language sorting rules Complex characters, ignorable characters and exceptional words to be considered Normally done in to steps:
primary sorting uppercase and lowercase characters are equivalentdiacritical marks are ignoredignorable characters are not considered
secondary sortingdifference between uppercase and lowercasecharacters with diacritical marks are ranked individuallyignorable characters influence the sorting
14
Sorting in different languages
Approaches 1.
All strings in the same language should be sorted according to that language’s rules
Sorting is also governed by order among languages or among groups of languages
e.g English, German, French = Roman group
2. Sort using the sorting rules that are associated with
the language chosen by the end-user or site language
17
Features
All pages are encoded in UTF-8all characters of the word are supported
Default language set at startup: English
18
What aspects are multilingual?
Graphical user interfacetranslation from the administrative area
one-by-one, .po, .XLIFF
Contentindividual translation for each item on edit
Glossaries and thesauritranslation from the Zope’s Management Interface
Syndication (RDF channels)depends on the selected language
Searchesuser multiple selection
Recommended