Upload
karen-sutton
View
221
Download
0
Embed Size (px)
Citation preview
DEV-10: Supporting Multiple Languages In Your Application
Salvador ViñalsConsultant Product Manager
© 2006 Progress Software Corporation2DEV-10: Supporting Multiple Languages In Your Application
Agenda
International support with OpenEdge® 10 OpenEdge internationalization update
• GB18030
• Sorting and Collations
• Unicode Normalization
• Default word-break tables and double-byte
For more information, go to… Summary
This presentation includes annotations with additional, complementary information
© 2006 Progress Software Corporation3DEV-10: Supporting Multiple Languages In Your Application
Code-Pages and Unicode
Code-pages• Many code-pages• Max 255 characters each• Each with regionally-limited repertoire of characters
Unicode• Uni code = One• Uni code = Universal• Virtually all the world's characters• Distinguishes characters by script, but not by language.
UTF-8, UTF-16, UTF-32• Unicode binary representations (8,16,32 bits)
© 2006 Progress Software Corporation4DEV-10: Supporting Multiple Languages In Your Application
OpenEdge Products
OpenEdge 10 products support UTF-8 (Unicode)• Database (Personal, Workgroup, Enterprise)• Application Servers [AppServer, WebSpeed] (Basic, Enterprise)• GUI Clients (Client Networking, WebClient) and Batch Client
Exceptions• Character Client and DataServers: Use code-pages instead
Code-pages and Unicode can interoperate
International readiness
© 2006 Progress Software Corporation5DEV-10: Supporting Multiple Languages In Your Application
Configurations
UTF-8 or Code-pages
AppServer™
WebSpeed®
OpenEdge Application
Servers
OE Batch
ClientUTF-8 or
Code-pages
OpenEdge RDBMS
UTF-8 or Code-pages
OracleMS SQLODBC
UTF-8
OpenEdge DataServers
Code-pages
Web Service Client
GUI
Character
UTF-8 or Code-pages
Code-pages
SQL Clients
UTF-8
© 2006 Progress Software Corporation6DEV-10: Supporting Multiple Languages In Your Application
Translation Products
Translation Manager (TranMan) Visual Translator (VisTran)
Products life cycle• Progress V9 – Functionally Stable
• OpenEdge 10 – Active
TranMan and VisTran run on Windows only, however they can be used to manage translations of ChUI or GUI applications.
© 2006 Progress Software Corporation7DEV-10: Supporting Multiple Languages In Your Application
Agenda
International support with OpenEdge 10 OpenEdge internationalization update
• GB18030
• Sorting and Collations
• Unicode Normalization
• Default word-break tables and double-byte
For more information, go to… Summary
This presentation includes annotations with additional, complementary information
© 2006 Progress Software Corporation8DEV-10: Supporting Multiple Languages In Your Application
Support for GB18030 Code Page
Chinese code page
Required for all new software sold in mainland China
© 2006 Progress Software Corporation9DEV-10: Supporting Multiple Languages In Your Application
Support for GB18030 Code Page
Why is this code page unique?• Does not fit into lead-byte / trail-byte model
• It has 1, 2, and 4 byte characters
• Cannot tell from lead-byte if there are 2 or 4 bytes in the character
© 2006 Progress Software Corporation10DEV-10: Supporting Multiple Languages In Your Application
Support for GB18030 Code Page
Supported by making conversions of the GB18030 code page to and from UTF-8
• Requires cpinternal to be UTF-8 – No cpinternal for GB18030
• Reading and writing a file in GB18030– Converts to/from UTF-8
© 2006 Progress Software Corporation11DEV-10: Supporting Multiple Languages In Your Application
Linguistic Sorting
Unicode sorting for UTF-8 Language-sensitive collations Tailor app to expectations of locale
• Language
• Location (country, region, etc.)
Easy to use• Functions just like any other collation
for ABL, and OpenEdge Database or SQL users
• Prior to 10.0B UTF-8 collation was binary sort
The goal …
© 2006 Progress Software Corporation12DEV-10: Supporting Multiple Languages In Your Application
Catalan, català (ca,cat)-- Catalan alphabet:---- Aa (Àà), Bb, Cc (Çç), Dd, -- Ee (Éé, Èè), Ff, Gg, Hh, -- Ii (Íí, Ïï), Jj, [Kk], Ll, Mm, Nn, -- Oo (Óó, Òò), Pp, Qq, Rr, Ss, Tt, -- Uu (Úú, Üü), Vv, [Ww], Xx, [Yy], Zz ---- L·L is ordered as L+L.--& LL << l·l <<< L·l <<< L·L
Finnish, suomi (fi,fin)-- Finnish alphabet:---- Aa, Bb, [Cc], Dd, Ee, Ff, Gg, Hh, -- Ii, Jj, Kk, Ll, Mm, Nn, Oo, Pp, -- [Qq], Rr, Ss (Šš), Tt, Uu, Vv [Ww], -- [Xx], Yy [Üü], Zz (Žž), [Åå], Ää -- [Ææ], Öö [Øø] --& V << w <<< W& Y << ü <<< Ü& Z < å <<< Å < ä <<< Ä << æ <<< Æ < ö <<< Ö << ø <<< Ø
French, français (fr,fra) -- French alphabet:---- Aa (Àà, Ââ), (Ææ), Bb, Cc (Çç), Dd, -- Ee (Éé, Èè, Êê, Ëë), Ff, Gg, Hh, -- Ii (Îî, Ïï), Jj, [Kk], Ll, Mm, -- Nn (Ññ), Oo (Ôô), (Œœ), Pp, Qq, Rr, -- Ss, Tt, Uu (Ùù, Ûû), Vv, [Ww], Xx, -- Yy (Ÿÿ), Zz ---- The ligatures Æ and Œ are ordered-- as A+E and O+E respectively.--[accentorder backward]
Unicode 4.1 Default Collation OrderISO/IEC 14651-- Unicode default latin alphabet: ---- Aa, Bb, Cc, Dd, Ee, Əə, Ff, Gg, Hh, -- Ii, ı, Jj, Kk, Ll, Mm, Nn, Ŋŋ, Oo, -- Pp, Qq, ĸ, Rr, Ss, Tt, Ŧŧ, Uu, Vv, -- Ww, Xx, Yy, Zz, Þþ ---- Unicode default greek alphabet:---- Αα, Ββ, Γγ, Δδ, Εε, Ζζ, Ηη, Θθ, Ιι, -- Κκ, Λλ, Μμ, Νν, Ξξ, Οο, Ππ, Ρρ, Σσς, -- Ττ, Υυ, Φφ, Χχ, Ψψ, Ωω ---- Unicode default cyrillic alphabet:---- Аа, Әә, Бб, Вв, Гг, Ғғ, Дд, Ђђ, Ѓѓ, -- Ее, Єє, Жж, Җҗ, Зз, Ѕѕ, Ии, Іі, Її, -- Йй, Јј, Кк, Ққ, Ҝҝ, Лл, Љљ, Мм, Нн, -- Ңң, Њњ, Оо, Өө, Пп, Рр, Сс, Тт, Ћћ, -- Ќќ, Уу, Ўў, Үү, Ұұ, Фф, Хх, Ҳҳ, Һһ, -- Цц, Чч, Ҹҹ, Џџ, Шш, Щщ, Ъъ, Ыы, Ьь, -- Ээ, Юю, Яя --
Some collation examples Latin alphabet
© 2006 Progress Software Corporation13DEV-10: Supporting Multiple Languages In Your Application
Linguistic Sorting
OpenEdge Database meta-schema• Table _DB-collate
– Already used for single-byte sort weights– New functionality used for summary information
• Table _Collation– Added in 10.0A in preparation – Can hold any amount of collation data
Internals
© 2006 Progress Software Corporation14DEV-10: Supporting Multiple Languages In Your Application
Linguistic Sorting
ABL Usage• Reference collation by name
– For example “ICU-fr” for French
Specify using• -cpcoll <table name>
– Identifies collation table to use with code page in memory at session startup
– <table name> is the collation table in convmap.cp or the name of the ICU collation
• ABL Statements– COMPARE– COLLATE
© 2006 Progress Software Corporation15DEV-10: Supporting Multiple Languages In Your Application
Linguistic Sorting
COMPARE and COLLATE new strengths supported• 10.0A strengths: CASE-INSENSITIVE, CASE-
SENSITIVE, CAPS and RAW
Added strengths• PRIMARY• SECONDARY = CASE-INSENSITIVE• TERTIARY = CASE-SENSITIVE• QUATERNARY
© 2006 Progress Software Corporation16DEV-10: Supporting Multiple Languages In Your Application
Linguistic Sorting
/* French collation */DISPLAY “ICU-fr = ” + COMPARE("côte", "<", "coté", "case-insensitive",
"ICU-fr")
/* Spanish collation */DISPLAY “ICU-es = ” +
COMPARE("côte", "<", "coté", "case-insensitive", "ICU-es")
ICU-fr = yesICU-es = no
Output of above statements
Sort order depends on selected collation
© 2006 Progress Software Corporation17DEV-10: Supporting Multiple Languages In Your Application
Linguistic Sorting
OpenEdge uses collations for• The –cpcoll startup parameter
• The database collation
• The collation of a database CLOB column
• An argument to the COMPARE function or COLLATE option of the BY phrase
© 2006 Progress Software Corporation18DEV-10: Supporting Multiple Languages In Your Application
Linguistic Sorting
Once a collation is specified for the database in the _Collation table, it cannot be modified
Once the collation is written to the _Collation table, it is the only collation with that name that can be used by that database
It is strongly recommended that databases should be backed up before using an ICU collation
Rules
© 2006 Progress Software Corporation19DEV-10: Supporting Multiple Languages In Your Application
Linguistic Sorting
The following examples assume• UTF-8 database with “basic” collation
• Names: – beet, carrot, çedilla, entry, école, trust, zoom
FOR EACH words WHERE name < “t”:DISPLAY name.
END.
beetcarrotentry
Output result
Example 1 of 4
© 2006 Progress Software Corporation20DEV-10: Supporting Multiple Languages In Your Application
Linguistic Sorting
FOR EACH words WHERE name >= “t”:DISPLAY name.
END.
trustzoomécoleçedilla
Output result
Example 2 of 4
© 2006 Progress Software Corporation21DEV-10: Supporting Multiple Languages In Your Application
Linguistic Sorting
FOR EACH words WHERE COMPARE(name < “t”,“case-insensitive”,“ICU-en”):DISPLAY name.
END.
beetcarrotentryécoleçedilla
Output result
Example 3 of 4
beetcarrotentry
Before, without COMPARE
© 2006 Progress Software Corporation22DEV-10: Supporting Multiple Languages In Your Application
Linguistic Sorting
FOR EACH words WHERE COMPARE(name < “t”,“case-insensitive”,“ICU-en”)
BY COLLATE(name,“case-insensitive”,“ICU-en”):DISPLAY name.
END.
beetcarrotçedillaécoleentry
Example 4 of 4
Output result Before, without BY COLLATE
beetcarrotentryécoleçedilla
© 2006 Progress Software Corporation23DEV-10: Supporting Multiple Languages In Your Application
Linguistic Sorting
OpenEdge supports ICU collations in the icui18n library for supported OpenEdge languages
ICU-ja__HQ = Japanese Hiragana Quaternary
One additional collation is supported - Japanese Hiragana Quaternary as case-sensitive• Uses the QUATERNARY strength as the
CASE-SENSITIVE strength
Supported Collations
© 2006 Progress Software Corporation24DEV-10: Supporting Multiple Languages In Your Application
Linguistic SortingICU Collations Available 1 of 3
ICU-UCA UCA (default Unicode Collation Algorithm)
ICU-ar Arabic ICU-be Belarusian ICU-bg Bulgarian ICU-ca Catalan ICU-cs Czech ICU-da Danish ICU-de__PHONEBOOK German phonebook ICU-el Greek ICU-en_BE English Belgium ICU-eo Esperanto ICU-es Spanish ICU-es__TRADITIONAL Spanish traditional ICU-et Estonian ICU-fa Persian ICU-fi Finnish ICU-fr French ICU-gu Gujarati
© 2006 Progress Software Corporation25DEV-10: Supporting Multiple Languages In Your Application
Linguistic SortingICU Collations Available 2 of 3
ICU-he Hebrew ICU-hi Hindi ICU-hi__DIRECT Hindi direct ICU-hr Croatian ICU-hu Hungarian ICU-is Icelandic ICU-ja Japanese ICU-ko Korean ICU-kn Kannada ICU-lt Lithuanian ICU-lv Latvian ICU-mk Macedonian ICU-mr Marathi ICU-mt Maltese ICU-nb Norwegian Bokmål ICU-nn Norwegian Nynorsk ICU-pl Polish ICU-ro Romanian
© 2006 Progress Software Corporation26DEV-10: Supporting Multiple Languages In Your Application
Linguistic SortingICU Collations Available 3 of 3
ICU-ru Russian ICU-sh Saint Helena ICU-sk Slovak ICU-sl Slovenian ICU-sq Albanian ICU-sr Serbian ICU-sv Swedish ICU-ta Tamil ICU-te Telugu ICU-th Thai ICU-tr Turkish ICU-uk Ukrainian ICU-vi Vietnamese ICU-zh Chinese ICU-zh__PINYIN Chinese Pinyin ICU-zh_HK Chinese Hong Kong ICU-zh_MO Chinese Macau ICU-zh_TW Chinese Taiwan
© 2006 Progress Software Corporation27DEV-10: Supporting Multiple Languages In Your Application
Collations Gotchas
If Database, Clients and Servers use different collations (-cpcoll), indexed and non-indexed queries may return different results
If a client needs different collation than database, you can use COMPARE, COLLATE on the client• Performance impact with large results sets
© 2006 Progress Software Corporation28DEV-10: Supporting Multiple Languages In Your Application
Configuration Gotchas
Database code-page is 1252 on Windows server
OpenEdge install startup.pf setting is:• –cpinternal 1252 –cpstream 1252
French Windows Client with • a default Windows code page of 1252, and • a DOS system code page of ibm850
DOS Character Client starts without specifying -cpinternal and –cpstream• so uses 1252 from startup.pf
Typical character client configuration, 1/2
© 2006 Progress Software Corporation29DEV-10: Supporting Multiple Languages In Your Application
Configuration Gotchas
User enters “è” (Hex 8A in ibm850)
Since session is started with –cpinternal 1252 OpenEdge doesn’t convert when writing to the database. • The entered value is written to the
database as 8A, when it should be E8 (1252)
Start Character Client with –cpinternal and –cpstream set to ibm850
Typical character client configuration, 2/2
© 2006 Progress Software Corporation30DEV-10: Supporting Multiple Languages In Your Application
Unicode Normalization
Unicode has different ways of expressing the same characters
Decomposed• Á = (U+0041, Latin Capital Letter A) +
(U+0301, Combining Acute Accent ´)
Composed• Á = (U+00C1, Latin Capital Letter A with Acute)
What is Normalization?
© 2006 Progress Software Corporation31DEV-10: Supporting Multiple Languages In Your Application
Unicode Normalization
XML (and other W3C entities) expects data in “NFC” form
Best way to convert from Unicode to other code pages
Useful when doing tasks such as making comparisons
Why Normalization?
NFC = Canonical Decomposition, followed by Canonical Composition
© 2006 Progress Software Corporation32DEV-10: Supporting Multiple Languages In Your Application
Unicode Normalization
NORMALIZE• Returns either CHAR or LONGCHAR
– Matches the source string
• CHAR variable must be UTF-8
• LONGCHAR variable can be any form of Unicode– UTF-8, UTF-16, UTF-32
result-string = NORMALIZE(source-string, normalization-mode)
NORMALIZE Language Function
© 2006 Progress Software Corporation33DEV-10: Supporting Multiple Languages In Your Application
Normalization Modes Supported
NFD: Canonical Decomposition
NFC: Canonical Decomposition, followed by Canonical Composition (default)
NFKD: Compatibility Decomposition
NFKC: Compatibility Decomposition, followed by Canonical Composition
None: No change to source string. Turns off normalization when normalization-mode is a variable
Normalization modes from ICU library
© 2006 Progress Software Corporation34DEV-10: Supporting Multiple Languages In Your Application
Unicode Normalization
Unicode Normalization FormsRecommended for understanding normalization
forms used with NORMALIZE functionhttp://www.unicode.org/unicode/reports/tr15/
International Components for Unicode (ICU) libraries & globalization, in-depth informationhttp://icu.sourceforge.net/userguide/intro.html
Additional information
© 2006 Progress Software Corporation35DEV-10: Supporting Multiple Languages In Your Application
Default Word-Break Tables
Prior to 10.1A• User had to configure word-break tables for
use with double-byte and UTF-8 databases
© 2006 Progress Software Corporation36DEV-10: Supporting Multiple Languages In Your Application
Default Word-Break Tables
Default Word-Break Tables added for: • Double-byte
• UTF-8 Databases
These are available ‘out of the box’• Either in product or for download
Simplifies accessing non-single-byte databases
10.1A simplifies implementing double-byte databases
© 2006 Progress Software Corporation37DEV-10: Supporting Multiple Languages In Your Application
Default Word-Break Tables
10.1A provides 10 compiled files• See list on next slide
• Ranging from proword.245 to proword.254
Located in subdirectory with corresponding empty databases• Subdirectory prolang/<language>
10.1A simplifies implementing double-byte databases
© 2006 Progress Software Corporation38DEV-10: Supporting Multiple Languages In Your Application
Default Word-Break TablesCompiled, Available out of the box
Available as part of the Supplemental PROMSGS package
Available for download• Japanese SHIFT-JIS proword.253• Japanese EUCJIS proword.250• Korean CP949 proword.248• Korean KSC5601 proword.252• Chinese (simplified) CP936 proword.247• Chinese (simplified) GB2312 proword.251• Chinese (traditional) CP950 proword.249• Chinese (traditional) BIG-5 proword.246• Chinese (traditional) CP950-HKSCS proword.245 • UTF-8 proword.254
10.1A simplifies implementing double-byte databases
© 2006 Progress Software Corporation39DEV-10: Supporting Multiple Languages In Your Application
Default Word-Break Tables
What if you are using proword file in the range of 245 – 254?• Copy the file to proword.<nnn>
– Where <nnn> is less than 240• Apply word rule to the database
– No index-build is required for this change
Remember, apply the change in all tiers (Client, Server, Database) to prevent corruption!
© 2006 Progress Software Corporation40DEV-10: Supporting Multiple Languages In Your Application
Agenda
International support with OpenEdge 10 OpenEdge internationalization update
• GB18030
• Sorting and Collations
• Unicode Normalization
• Default word-break tables and double-byte
For more information, go to… Summary
This presentation includes annotations with additional, complementary information
© 2006 Progress Software Corporation41DEV-10: Supporting Multiple Languages In Your Application
For More Information, go to…
Expand to New Countries Business Empowerment Program• Contact your Account Manager
Product documentation• OpenEdge Development: Internationalizing Applications• OpenEdge Development: Visual Translator• OpenEdge Development: Translation Manager
Visit PSDN for white papers and presentations, for example:• “Understanding Internationalization” web seminar
Training and Professional Services – www.progress.com
© 2006 Progress Software Corporation42DEV-10: Supporting Multiple Languages In Your Application
Agenda
International support with OpenEdge 10 OpenEdge internationalization update
• GB18030
• Sorting and Collations
• Unicode Normalization
• Default word-break tables and double-byte
For more information, go to… Summary
This presentation includes annotations with additional, complementary information
© 2006 Progress Software Corporation43DEV-10: Supporting Multiple Languages In Your Application
In Summary
Use UTF-8 GB18030 Linguistic Sorting and Collations
• Use ICU-*
Unicode Normalization Default word-break tables and
double-byte
Expand to New Countries Business Empowerment Program
© 2006 Progress Software Corporation44DEV-10: Supporting Multiple Languages In Your Application
Questions?
© 2006 Progress Software Corporation45DEV-10: Supporting Multiple Languages In Your Application
Thank you foryour time
© 2006 Progress Software Corporation46DEV-10: Supporting Multiple Languages In Your Application