40
Internationalizing JavaScript Applications Norbert Lindenberg © Norbert Lindenberg 2012. All rights reserved.

Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Internationalizing JavaScript Applications

Norbert Lindenberg

© Norbert Lindenberg 2012. All rights reserved.

Page 2: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

ECMAScript

• Language Speci!cation

• Developed by Ecma TC 39

• Language syntax and semantics

• Core API: Object, String, Array, RegExp, ...

• 5.1 current

• 6 expected December 2013

Page 3: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

ECMAScript• Internationalization API Speci!cation

• Developed by Ecma TC 39 + experts

• Collation, number, date & time formatting

• Started fall 2010

• Speci!cation stable

• Implementations and test suite in progress

• Approval expected December 2012

Page 4: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

JavaScript Environments

• Web browsers: with DOM, XHR

• Servers: Node

• Platforms: Firefox OS, Metro Windows 8-style UI, Phonegap

• Libraries: jQuery, Dojo, YUI, GWT, +++++

Page 5: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Collation

Page 6: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Collation (Sorting)• Old: String.prototype.localeCompare

• Only string argument

• New: Intl.Collator

• locales

• options

• Fixed: String.prototype.localeCompare

• With locales and options arguments

Page 7: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Locales• BCP 47 language tags

• Language, script, country codes

• “es”, “en-AU”, “zh-Hans-CN”

• Unicode locale extension

• “de-u-co-phonebk”

• Preference lists

• [“mr”, “hi”, “en-IN”]

Page 8: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Locale Negotiation• BCP 47 Lookup

• [“es-GT”, “es-MX”] → “es-GT”, “es”, “es-MX”

• Best !t

• implementation de!ned

• [“es-GT”, “es-MX”] → “es-GT”, “es-MX”, “es”

• Unicode extension handled separately

Page 9: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Collator Extensions

• co: collation – phonebook, pinyin, ...

• kf: case !rst – upper, lower

• kn: numeric sorting

• kk: use normalization

Page 10: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Collator Options

• localeMatcher: lookup, best !t

• usage: sort, search

• sensitivity: base, accent, case, variant

• ignorePunctuation

• numeric, normalization, caseFirst

Page 11: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Non-ECMAScript

• Nothing good found (some for Latin only)

• Collation is hard

• Knowledge of full Unicode character set

• Big tables

Page 12: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Number Formatting

Page 13: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Number Formatting• Old: Number.prototype.toLocaleString

• No arguments

• New: Intl.NumberFormat

• locales

• options

• Fixed: Number.prototype.toLocaleString

• With locales and options arguments

Page 14: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

NumberFormat Extensions

• nu: numbering system

Page 15: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

NumberFormat Options

• localeMatcher: lookup, best !t

• style: decimal, currency, percent

• currency: ISO 4217 currency code

• currencyDisplay: symbol, code, name

• minimum/maximum digits

• useGrouping

Page 16: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

¤ % ๙ # , ⚑Globalize + + - + - 250+

Dojo + + - + - 30+

Closure + + + + + 300+

Windows 8-style UI + + + + + 100s

iLib + + - + - 10+¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.

Non-ECMAScript

Page 17: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Date and Time Formatting

Page 18: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Date and Time Formatting

• Old: Date.prototype.toLocale[|Date|Time]String

• No arguments

• New: Intl.DateTimeFormat

• locales

• options

• Fixed: Date.prototype.toLocale[|Date|Time]String

• With locales and options arguments

Page 19: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

DateTimeFormat Extensions

• ca: calendar

• nu: numbering system

Page 20: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

DateTimeFormat Options

• localeMatcher: lookup, best !t

• timeZone: UTC

• hour12

• weekday, era, year, month, day, hour, minute, second, timeZoneName: components

• formatMatcher: basic, best !t

Page 21: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Non-ECMAScript

ca tz ๙ ⚑Globalize 5+ + - 250+Dojo 4 - - 30+Closure + + + 300+Windows 8-style UI ? - ? ?iLib 3 + - 10+YUI - - - 50+ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.

Page 22: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Message Construction

• Substitution

• {user} went to {city}.

• {user}さんは{city}へ行きました。

Page 23: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Message Construction

• Plurals

• {user} est allé à {city}.

• {user1} et {user2} sont allés à {city}.

• 1-6 forms depending on language

• {number, plural {one {...} few {...} many {...}}}

Page 24: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Message Construction

• Gender

• {user} est allé à {city}.

• {user} est allée à {city}.

• 1-4 forms depending on language

• {gender, select {female {...} male {...} unknown {...}}}

Page 25: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Message Construction{gender, select {

female {num, plural {

one {{user1} est allée à {city}.}

other {{user1} et {user2} sont allées à {city}.}}}

male {num, plural {

one {{user1} est allé à {city}.}

other {{user1} et {user2} sont allés à {city}.}}}

}}

Page 26: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Message Construction

• Google has MessageFormat for Closure environment

• Alex Sexton provided standalone version

Page 27: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Occupy Wall Street. By @tanlines.

Page 28: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Supplementary Characters

• Characters above U+FFFF

• Emoji, rare CJK, ancient scripts, musical symbols, ...

• 2 units in UTF-16

Page 29: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Today: UCS-2 or UTF-16?UCS-2:

• Regular expressions

• String comparison

• Case conversion

UTF-16:

• Source text conversion

• URI handling

Page 30: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Today: UCS-2 or UTF-16?UCS-2:

• Regular expressions

• String comparison

• Case conversion

UTF-16:

• Source text conversion

• URI handling

• DOM, text input, text rendering, XMLHttpRequest, libraries, apps

Page 31: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

ECMAScript 6: UTF-16

• New Unicode mode in regular expressions

• Case conversion for full Unicode

• Full Unicode in identi!ers

• String accessors for code points

• But: no change to low-level string comparison

Page 32: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Rendering

• Emoji on Mac/iOS are rendered with color font

• On Mac, only Safari supports this font

• Not Firefox, Chrome, Opera

• Fonts for other supplementary characters supported in all modern browsers

Page 33: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Regular Expressions

• RegExp in ES5 doesn’t have much Unicode support

• No support for Unicode character properties

• No support for supplementary characters

Page 34: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Regular Expressions

• CSet (inimino): Character classes with supplementary characters

• XRegExp (Steven Levithan and Mathias Bynens): Unicode categories and properties with supplementary characters

Page 35: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Unicode Normalization

• Makes strings be equal that users perceive as equal (more or less)

• ä = a ¨

• ự = ự

• 김 = ㄱ ㅣ ㅁ

Page 36: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Unicode Normalization

• ECMAScript “assumes” normalization happens where needed

• Reality: applications have to do it

• Libraries available, but not up to date:

• unorm (Matsuza)

• Richard Ishida’s normalizer

Page 37: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

北京大学.中国

Page 38: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

北京大学.中国

Page 39: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Internationalized Domain Names

• Unicode at user interface

• ASCII under the hood

• 北京大学.中国 = xn--1lq90ic7fzpc.xn--!qs8s

• Main steps:

• normalization (as discussed)

• punycode (Mathias Bynens has latest)

Page 40: Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array,

Summary

• ECMAScript Internationalization API provides core functionality

• Please review and provide feedback

• http://norbertlindenberg.com/2012/06/ecmascript-internationalization-api/

• Libraries provide more internationalization support than you may think