24
Software Internationaliza tion and Localization: Basic Concepts Doug Kunz

Software Internationalization & Localization: Basic Concepts

Embed Size (px)

DESCRIPTION

Overview of basic concepts involved in producing software that supports users in multiple languages or countries.

Citation preview

Page 1: Software Internationalization & Localization: Basic Concepts

SoftwareInternationalizationand Localization: Basic Concepts

Doug Kunz

Page 2: Software Internationalization & Localization: Basic Concepts

000000XX-2

Outline Introduction

Localization Examples

Design and development impact

Page 3: Software Internationalization & Localization: Basic Concepts

000000XX-3

Why does internationalization matter?

Web’s global reach – potential global user base

Support foreign language speakers within our borders

Ever-increasing numbers of international business transactions

Page 4: Software Internationalization & Localization: Basic Concepts

000000XX-4

Definitions Internationalization (i18n): The practice of writing

software which can easily be extended to support users from multiple cultural and linguistic backgrounds

Localization (L10n): The process of taking internationalized software and actually producing a version tailored to users from a particular culture and language background

Page 5: Software Internationalization & Localization: Basic Concepts

000000XX-5

Language Tags – IETF BCP 47 A “language tag” or “locale” describes a common

language + culture shared by a group of users, often at a national level.

Documented by IETF “Best Current Practice” 47 http://www.ietf.org/rfc/bcp/bcp47.txt Refers to underlying RFC’s (these can change over time, but the BCP number

does not)

Typically represented by an identifier describing a combination of: 2-3 letter language code (ISO 639, parts 1 or 2) 2 letter country code (ISO 3166) Optional extensions for dialect, writing system

en – English zh – Chinese (macrolanguage)en-US – US English zh-cmn – Mandarin Chineseen-GB – UK English zh-cmn-TW – Mandarin Chinese as spoken in Taiwanes-US – US Spanish zh-cmn-Hans-CN – Mandarin Chinese written with

Simplified system, as used in China

ISO = International Organization for Standardization; IETF = Internet Engineering Task Force

Page 6: Software Internationalization & Localization: Basic Concepts

What to localize(a non-exhaustive listing)

Page 7: Software Internationalization & Localization: Basic Concepts

000000XX-7

Writing System Direction of scan (Left-to-Right vs. tfeL-ot-tghiR)

Character set (various alphabets, syllabaries and logographies)

Page 8: Software Internationalization & Localization: Basic Concepts

000000XX-8

Display captions Regional variations within language

Spelling variations, e.g. US “color” vs. UK “colour” Terminology variations (“lift” vs. “elevator”, “Español”

vs. “Castellano”)

Language variations (“Login” vs. “Conectese” vs. “Anmelden” vs. “Connessione”)

Page 9: Software Internationalization & Localization: Basic Concepts

000000XX-9

Display layoutsUS English

Caption 1 nnnnn Caption 2 nnnnn

German

BigGermanTranslationOfCaption1 nnnnn

BigGermanTranslationOfCaption2 nnnnn

Arabic

nnnnn 2noitpaC nnnnn 1noitpaC

Page 10: Software Internationalization & Localization: Basic Concepts

000000XX-10

Print layouts US Letter paper (8 ½ by 11 inches) vs. A4 paper (210×297 mm)

Page 11: Software Internationalization & Localization: Basic Concepts

000000XX-11

Units of Measure “British Engineering” (Imperial) System – U.S.A, Liberia and

Myanmar Feet/inches/miles Pounds, stone or slugs Fahrenheit

SI (Système International) – Rest of world Meters/centimeters/kilometers Kilograms Celsius or Kelvin

Page 12: Software Internationalization & Localization: Basic Concepts

000000XX-12

Formats: Numbers Decimal separator – character varies

1,000 (US) “one thousand” 1,000 (Most of Europe) “one”

Readability delimiters – placement and character vary 1,000,000 (US) 10,00,000 (“10 lakh” India/Pakistan/Sri Lanka) 1.000.000 (Germany) 1 000 000 (France) 100,0000 (China)

Page 13: Software Internationalization & Localization: Basic Concepts

000000XX-13

Formats: Contact Info Phone numbers

(415) 644-3912 within US +1 415 6443912 outside US

Postal Codes (a few examples) – US Zip Codes: 99999 or 99999-9999 Canadian Postal Codes: A9A 9A9 UK Postal Codes (generally):

A9 9AA A99 9AA A9A 9AA AA9 9AA AA99 9AA AA9A 9AA

Page 14: Software Internationalization & Localization: Basic Concepts

000000XX-14

Formats: Contact Info Address layout examples

Line1

Line2 etc.

City PostCode

Country

Line1

Line2 etc.

PostCode City

Country

Line1

Line2 etc.

City Region PostCode

Country

Page 15: Software Internationalization & Localization: Basic Concepts

000000XX-15

Formats: Dates and Times Dates –

Commonly, formats differ within calendar systems: does 01/06/2006 mean “January 1, 2006” or “June 1, 2006”?

Less commonly, across calendar systems 22 May 2006 - Gregorian 9 May 2006 - Julian 24 Iyyar 5766 (before sunset) – Hebrew 23 or 24 Rabi`-ul-Akhir 1427 (before sunset) - Islamic

Times – 5:00pm vs. 17.00

Time Zones – 22 May 2006 12:00pm (UTC+14) = 21 May 2006 10:00am (UTC-12)

Page 16: Software Internationalization & Localization: Basic Concepts

Design/DevelopmentImpacts and Techniques

Page 17: Software Internationalization & Localization: Basic Concepts

000000XX-17

Know your user Collect information in user profile, such as:

“Preferred language” store as language tag containing least possible amount of

information (subtags) needed to localize experience for that particular user (e.g. “en-US” is better than “en-Latn-US”)

Time zone Preferred units-of-measure Preferred currency

Page 18: Software Internationalization & Localization: Basic Concepts

000000XX-18

User Interface vs. Data Locales User Interface locale

The captioning, formats and layout needed to present data to the current user

Data locale Locale to which a business object belongs, may be

distinct from current user’s locale. Example: purchase order has comment text written in French, although current user is English-speaking

Typically the locale of the user who created the object

Page 19: Software Internationalization & Localization: Basic Concepts

000000XX-19

Resource Extraction A “resource” is a screen artifact—text, image, etc.—which

contains localized information. For example, a field caption written in US English would be a resource.

Place text captions in a separate file for translation

Images Where possible, implement buttons as text with a background

image, to avoid producing locale-specific images When text *must* be included in an image:

“ALT” text should be placed in a separate file, and should match image text (if any) for ease of translation

Image “path” should be locale-specific, e.g. medem.com/images/en_us/next_button.gif

Sometimes screen shots help translation services by providing context

Page 20: Software Internationalization & Localization: Basic Concepts

000000XX-20

Layouts Technique 1: Produce general layout that will

work for most languages Where needed, make language-specific

“override layouts”

Technique 2: “Least common denominator” layouts that will always work

Example: restrict print layouts to 210mm by 279mm – works on US Letter and A4

Page 21: Software Internationalization & Localization: Basic Concepts

000000XX-21

“Store globally, display locally” Pick a reasonable standard format for storage in your

database (e.g. ISO 8601 “2006-05-24T18:15:00Z”)

Translate for display based on user’s locale (5/24/06 10:15am Pacific Daylight Time)

Page 22: Software Internationalization & Localization: Basic Concepts

000000XX-22

Flexible storage design Explicit rate/unit storage

Bad: Column “Height” Bad: Column “Height_inches” Good: Column “Height” and Column “Height_Units” Good: Column “Price” and Column “Currency”

Globally appropriate data type Bad: Column “ZipCode” Integer(5) Good: Column “PostCode” Varchar2 (10)

Globally appropriate name Bad: Column “State” Better: Column “Region”

Page 23: Software Internationalization & Localization: Basic Concepts

000000XX-23

Appropriate character encoding US-ASCII (American Standard Code for Information Interchange)

7 bits / character English only: diacritics not supported (ü, è, ç, etc.)

ISO-8859-1 (“Latin 1”) 1 byte (8 bits) / character Superset of US-ASCII Western European languages Default encoding for “text/*” MIME types Basis of the set of characters allowed in HTML 3.2 documents

UTF-8 1 to 4 bytes/character (in practice, 1 to 3 bytes) Backward compatible with US-ASCII and ISO-8859-1 Unicode (all character sets, including extinct languages) Basis of the set of characters allowed in HTML 4.0 documents

Page 24: Software Internationalization & Localization: Basic Concepts

000000XX-24

For More Information International Telecommunications Union (ITU) http://www.itu.int/

Universal Postal Union (UPU) http://www.upu.int/

International Organization for Standardization http://www.iso.org/

UTF-8 http://en.wikipedia.org/wiki/UTF-8