39
Universal Acceptance Steering Group Introduction to Universal Acceptance Mark Svancarek and Luisa Villa

Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

U n i v e r s a l A c c e p t a n c e S t e e r i n g G r o u p

IntroductiontoUniversalAcceptanceMarkSvancarekandLuisaVilla

Page 2: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 2

AboutThisDocumentPurpose

TheInternet’stechnologies,includingitsnamingcomponents,areundercontinualevolutionandchange.In

recentyears,agreatnumberofnewTLDswithASCIIcharactersandIDNtop-leveldomainshavebeenreleased

by ICANN.Examples include.nyc,.hsbc,.eco,and.. However, the response to the change in the naming

landscapehasnotbeen fast enough.Manyapplicationsand

services are not being updated to manage new TLDs. This

affectstheuserexperience.Forexample:

• Validemailaddressesarenotbeingaccepted

• Domain names are mistakenly treated as search

termsintheaddressbarofthebrowser.

Unlesssoftwarerecognizesandcanprocessthenewdomains,astateknownasUniversalAcceptance,itwillnotbepossibletoprovideaconsistentandpositiveexperienceforInternetusers.Thisdocument,therefore,

providesabroadintroductiontoUniversalAcceptancetoassistinthedevelopmentofUniversalAcceptance-

readysoftware.

TargetAudience

• SoftwareDevelopers

• ChiefTechnicalOfficers(CTOs)

• Thetechnicalcommunityingeneral

DocumentStructure

Part1

BaselineconceptsofUniversalAcceptancesuchaswhatisaDomainNameandtheDomain

NameSystem,ASCIIandUnicode,Punycode,EmailAddressInternationalization,andother

basicconcepts.

Part2 The fivecriteriaofUniversalAcceptanceaswellas thegoodpractices foreachof thesecriteria. Also contains user scenarios and nonconformance practices to UniversalAcceptance,technicalrequirementsandcurrentchallenges.

Part3 Advanced topics such as right-to-left scripts, the Bidi algorithm,Normalization and Case

Folding.

Part4 Containstheglossaryandusefulonlineresources.

Needmoreinformation?

The UASG and the community are available to provide advice to software developers and

implementersonwhatisneeded.

• Contactustoshareyourideasandsuggestionsonthetopicatinfo@uasg.tech

• JointheUniversalAcceptancediscussionathttp://tinyurl.com/ua-discuss

• Tolearnmoreabouttheeffort,visithttp://www.icann.org/universalacceptance

Manyapplicationsandservicesarenotbeingupdatedto

managethesenewTLDs.Thisaffectstheuserexperience.

Page 3: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 3

Contents

Introduction.......................................................................................................................................................5

ABriefHistoryofDomainNameInternationalization...................................................................................5

TheNeedforUniversalAcceptance...............................................................................................................5

Part1:BaselineConceptsofUniversalAcceptance...........................................................................................6

DomainName............................................................................................................................................6

DomainNameSystem(DNS).....................................................................................................................6

TopLevelDomains(TLDs)..........................................................................................................................6

GenericTopLevelDomains(gTLDs)..........................................................................................................7

CharacterSetsandScripts.........................................................................................................................7

ASCIIandUnicode.....................................................................................................................................7

InternationalizedDomainNames(IDNs)andPunycode...........................................................................8

Email..........................................................................................................................................................9

AddressesandEmailAddressInternationalization(EAI)...........................................................................9

DynamicLinkGeneration(Linkification)..................................................................................................10

Part2:UniversalAcceptanceinAction............................................................................................................11

FiveCriteriaofUniversalAcceptance..........................................................................................................11

UserScenarios.............................................................................................................................................12

NonconformancetoUniversalAcceptancePractices..................................................................................14

TechnicalRequirementsforUAReadiness......................................................................................................15

HighlevelRequirements..............................................................................................................................15

DeveloperConsiderations............................................................................................................................15

AGuidingPrincipleforAchievingUniversalAcceptance:Postel’sLaw...................................................16

GoodPracticesforDevelopingandUpdatingSoftwaretoAchieveUA-Readiness.................................16

AuthoritativeSourcesforDomainNames...................................................................................................22

DNSRootZone.........................................................................................................................................22

PublicSuffixList.......................................................................................................................................22

OtherChallenges..............................................................................................................................................23

General........................................................................................................................................................23

IDN-StyleEmailandWhyItIsNottheSameasEAI.....................................................................................23

LinkificationandItsChallenges....................................................................................................................24

Part3:AdvancedTopics...................................................................................................................................26

ComplexScripts...........................................................................................................................................26

RighttoLeftLanguagesandUnicodeConformance...............................................................................26

Page 4: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 4

TheBidiAlgorithm...................................................................................................................................26

TheBidiRuleforDomainNames.............................................................................................................27

Joiners......................................................................................................................................................27

HomoglyphandConfusinglySimilarCharacters......................................................................................28

NormalizationandCaseFolding..................................................................................................................29

Normalization..........................................................................................................................................29

CaseFolding.............................................................................................................................................30

Part4:GlossaryandOtherResources..............................................................................................................32

Glossary.......................................................................................................................................................32

RFCs..............................................................................................................................................................34

KeyStandards..............................................................................................................................................36

OnlineResources.........................................................................................................................................37

Acknowledgements..........................................................................................................................................39

Page 5: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 5

IntroductionABriefHistoryofDomainNameInternationalizationIn the 1970s, the characters available for registering domain names were limited to a subset of ASCIIcharacters(lettersa-z,digits0-9andthehyphen“-“).Sincetheearliest.comregistration,symbolics.com,in

1985, the number and characteristics of domain names have expanded to reflect the needs of the ever-

increasingglobaluseoftheInternetasacommunalresource.Today,themajorityofInternetusersarenon-

English speakers. However, the dominant language used on the Internet is English. To help with theinternationalization of the Internet, in 2003, the Internet

EngineeringTaskForce(IETF)startedreleasingstandardsproviding

technical guidelines for the deployment of InternationalizedDomainNames(IDN)throughatranslationmechanismtosupport

non-ASCII representations of domain names in geographically

diverselocalscripts(e.g., - . , ua-test. ,etc.).

The Board of Directors of the Internet Corporation for Assigned

NamesandNumbers (ICANN)approvedtheprocessto introduce

new IDN Country Code Top Domain Names (ccTLDs) in October

2009,withthefirstIDNccTLDsaddedtotherootzoneinMay2010.

InJune2011,theBoardapprovedandauthorizedthelaunchofthe

new Generic Top Level Domain (gTLD) program, which included

newASCII aswell as IDNTLDs. The first batchof TLDs from this

programwasadded to the root zone in2013.Theadditionof IDNccTLDsandnewTLDshasdramatically

increasedthepaceatwhichTLDsareaddedtotherootzone.

AdecadeaftertheIETFreleaseditsIDN-relatedguidelines,andthankstotheICANNNewTLDProgram,more

thanonethousandnewTLDshavenowbeenreleased.Inspiteofalltheseefforts,however,muchsoftware

and many applications are still not Universal Acceptance-ready. This causes problems to Internet users,

includingthosewhoselanguagesarewritteninscriptsthatincludenon-ASCIIcharacters.

TheNeedforUniversalAcceptanceTokeeppacewiththisnewTLDworld,newsoftwaremustbebuiltandoldsoftwareandapplicationsmustbe

updated.ThestateofsuccessfullycomplyingwiththisnewworldofTLDsiscalledUniversalAcceptance.

UniversalAcceptanceisthestatewhereallvaliddomainnamesandemailaddressesareaccepted,validated,stored,processedanddisplayedcorrectlyandconsistentlybyallInternet-enabledapplications,devicesandsystems. Inotherwords, every validwebaddress resolves to theexpectedwebsiteandevery validemail

addressdeliversmailtotheexpecteddestination.Duetotherapidlychangingdomainnamelandscape,many

systemsdonotrecognizeorappropriatelyprocessnewdomainnames,primarilybecausetheymaybeina

non-ASCIIformat,becausethesoftwareisnotawareofthenewlyreleasedTLD,orbecausethelengthoftheir

TLDvariesinlength.Thesameistrueforemailaddressesthatincorporatethesenewextensions.

TheUniversalAcceptanceSteeringGroup(UASG),acommunity-led,industry-wideinitiativethatissupported

by ICANN, isworkingoncreatingawareness, identifyingandresolvingproblemsassociatedwithUniversal

AcceptanceofDomainNamestohelpensureaconsistentandpositiveexperienceforInternetusersglobally.

UniversalAcceptanceisthestatewhereallvaliddomain

namesandemailaddressesareaccepted,validated,stored,processedanddisplayed

correctlyandconsistentlybyallInternet-enabledapplications,

devicesandsystems.

Page 6: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 6

Part1:BaselineConceptsofUniversalAcceptanceThissectioncontainsanoverviewofthebasictermsandconceptsnecessarytounderstandbeforereading

themoreadvancedsectionsofthisdocument.

DomainNameAdomainnameisadottedtextstringusedasahuman-friendlytechnicalidentifierforcomputers

andnetworksontheInternet.Forexample:

www.domain.tld

Howtoreadadomainname:

• EachdotrepresentsalevelintheDomainNameSystem(DNS)hierarchy.

• ATop-LevelDomain(TLD)isoftencalledthesuffixattheendofadomainname.

• Theindividualwordsorcharactersbetweenthedotsarecalledlabels.Forthoselanguagesorscriptsthatarewrittenfromlefttoright(LTR),1thelabelfurthestrightrepresentsthetop-leveldomain.

• Thesecondlabelfromtheendrepresentsthesecond-leveldomain.

• Any labels thatcomebefore thesecond-leveldomainareconsideredsubdomainsof thesecond-leveldomain(sometimescalledthird-leveldomains).

DomainNameSystem(DNS)EachresourceontheInternetisassignedanaddresstobeusedbytheInternetProtocol(IP).Since

IP addresses are difficult to remember, the DNS provides amapping between IP addresses and

human-readable domain names. Servers collectively providing a public DNS exist at well-known

addressesontheInternet.

TopLevelDomains(TLDs)Humanreadabledomainnamesaremanagedbyorganizationsknownasregistries.Whenadomain

name is registered, it consists ofmultiple text strings representingmultiple domain levels, each

separatedbya“.”character.InLTRscripts,theright-mostdomainlevelisthetop-leveldomain(TLD).

SomeTLDsaredelegated to specific countriesor territories.ThesearecalledCountryCodeTLDs(ccTDs).

1Languagesorscriptswrittenfromrighttoleft(RTL)willbediscussedlaterinthisdocument.

Page 7: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 7

GenericTopLevelDomains(gTLDs)Starting in 2013, ICANN (the organization responsible for the creation and maintenance of TLD

assignments) has approved the creation of a large number of new TLDs. These new TLDs can

represent brands, communities of interest, geographic communities (cities, regions) and more

genericconcepts.Collectively,allofthesenewTLDsareknownasGenericTopLevelDomains(gTLDs).

CharacterSetsandScriptsLanguagesarewrittenusingwritingsystems.Mostwritingsystemsuseonescript,whichisasetof

graphiccharactersusedforthewrittenformofoneormorelanguages.Asmallnumberofwriting

systemsemploymorethanonescriptatthesametime.Thesecharactersorscriptscanberecognized

byhumans.However,theyarenotusefultocomputers. Instead,acomputerneedsascripttobe

encodedinawaythatitcanprocess(forexample,toresolveawebaddress).Themechanismforthis

iscalledacharactermappingorcodedcharacterset(CCS),oracodepage.2Acharactermapping

associatescharacterswithspecificnumbers.Manydifferentcodepageshavebeencreatedovertime

fordifferentpurposes,butforthistopicwewillfocusononlytwo:ASCIIandUnicode.

ASCIIandUnicodeIntheexamplesofTLDsabove,allofthetextstringsarerepresentedusingtheLatincharacterset.

ThischaractersetisincludedintheAmericanStandardCodeforInformationInterchange(ASCII,or

US-ASCII) character-encoding scheme. ASCII is an older encoding scheme andwas based on the

Englishlanguage.Forhistoricalreasons,itbecamethestandardcharacterencodingschemeonthe

Internet.ASCIIusesonly7bitspercharacter,whichlimitsthesetto128characters,notallofwhich

canbeusedindomainnames.DomainnamesarelimitedtothecharactersA-Z,thenumbers0-9,

andhyphen“-“.

2TherearesubtletiestothetermsthatarenotdirectlyrelevanttothetopicofUniversalAcceptance.Ifyou

are interested in more information about the terminology, a useful starting point is:

https://tools.ietf.org/html/rfc6365

ExamplesofcommonTLDs ExamplesofccTLDs ExamplesofnewgTLDs

.com

.gov

.info

.org

China=.cn Germany=.deUnitedStates=.us

.app

.lawyer

.shopping

.panasonic

.osaka

Page 8: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 8

ASCII-ISO8859-1(Latin-1)Table3

BecausemostwritingsystemsdonotusetheLatincharacterset,alternateencodingshavealsobeen

adopted.Unicode,alsoknownastheUniversalCodedCharacterSet(UCS),iscapableofencodingmorethan1millioncharacters.EachoftheseUnicodecharactersisacalledacodepoint.Themost

commonformofUnicodeiscalledUniversalCodedCharacterSetTransformFormat8-bit(UTF-8).

ToseeallUnicodecharactercodecharts,goto:http://unicode.org/charts

InternationalizedDomainNames(IDNs)andPunycodeTheuseofUnicodeenablesdomainnamestocontainnon-ASCIIcharacters.Asnotedearlierinthis

document,domainnamesthatusenon-ASCIIcharactersarecalledInternationalizedDomainNames

(IDNs).4Theinternationalizedportionofadomainnamecanbeinanylevel–notjusttheTLDbut

alsotheotherlabels.

SincetheDNSitselfpreviouslyonlyusedASCII,5itwasnecessarytocreateanadditionalencodingto

allownon-ASCIIUnicodecodepointstobeconvertedintoASCIIstrings,andviceversa.Thealgorithm

thatimplementsthisUnicode-to-ASCIIencodingiscalledPunycode;theoutputstringsarecalledA-Labels.A-LabelscanbedistinguishedfromanordinaryASCIIlabelbecausetheyalwaysstartwiththe

followingfourcharacters:

• xn--

ThesecharactersarecalledtheACEprefix.6

ThePunycode transformation is reversible: it can transformfromUnicode toanA-Labelandalso

fromanA-labelbacktoUnicode(knownasaU-Label).

TheonlyRFC-defined7useofthePunycodealgorithmis forexpressing internationalizeddomains.

However, rather than implement Unicode, some developers choose to apply Punycode to other

scenarios.

3Source:CaliforniaStateUniversity.1997.ASCII-ISO8859-1(Latin-1)TablewithHTMLEntityNames.http://web.calstatela.edu/faculty/jchen13/Docs/CS120/Lectures/ASCIITable_with_HTML_Entity_Names.ht

m4Notethatnoteverynon-ASCIIcharacterisanIDN.

5Forcurrentstatus,seehttp://tools.ietf.org/html/rfc6055#section-3

6ASCIICompatibleEncoding(ACE)prefixisusedtodistinguishPunycode-encodedlabelsfromordinaryASCII

labels.7RFC:RequestforComments.SeetheGlossaryofterminPart4ofthisdocumentformoreinformation.

Page 9: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 9

Email

AddressesandEmailAddressInternationalization(EAI)Emailaddressescontaintwoparts:

1. Alocalpart(theusername,beforethe“@” character)

2. Adomain(afterthe“@”character)

ThedomainpartcancontainanyTLD,includinganewTLD.BothportionsmaybeUnicodeU-labels.

NOTE:Anadditionalformat,IDN-StyleEmailAddresses,willbediscussedbelow.

EmailAddressInternationalization(EAI)requirestheuseofUnicodeinallpartsoftheemailaddress.

EachoftheexamplesabovecouldbeexpressedasEAI,andthisisthepreferredformat.

Examplesof(imaginary)IDNs

example. (Punycode encoding = example.xn--q9jyb4c)

.info (Punycode encoding = xn--uesx7b.info)

. � (Punycode encoding = xn--q9jyb4c.xn--uesx7b)

Tolearnmore,seetheIDNFAQ:http://unicode.org/faq/idn.html

Examplesof(imaginary)EmailAddressesincludingIDNs

user@example.

user@ .info

@example.lawyer

(UsesinternationalizedTLD)

(Usesinternationalized2ndleveldomain)

(UsesinternationalizedusernameandnewgTLD)

Page 10: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 10

DynamicLinkGeneration(Linkification)Modernsoftware,suchaspopularwordprocessingorspreadsheetapplications,sometimesallowsa

usertocreateahyperlinksimplybytypinginastringthatlookslikeawebaddress,emailaddressor

networkpath.Forexample,typing“www.icann.org”intoanemailmessagemayresultinaclickable

linktohttp://www.icann.org beingautomaticallycreatediftheapptreats“www.”asaspecial

prefixor“.org”asaspecialsuffix.

Linkificationshouldworkconsistentlyforallwell-formedwebaddresses,emailaddressesornetwork

paths.

Page 11: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 11

Part2:UniversalAcceptanceinActionFiveCriteriaofUniversalAcceptanceAs described in the section, Universal Acceptance is the state where all valid domain names and email

addressesareaccepted,validated,stored,processedanddisplayedcorrectlyandconsistentlybyallInternet-enabledapplications,devicesandsystems.Thesefivecriteriaaredescribedbelow.

1.Accept8

Acceptingoccurswheneveranemailaddressoradomainnameisreceivedasastringofcharactersfromauserinterface,fromafile,orfromanAPIusedbyasoftwareapplicationoronlineservice.

Applicationsandservicesallowdomainnamesandemailaddressestobe:

• Enteredintouserinterfaces,AND/OR • ReceivedfromotherapplicationsandservicesviaAPIs

2.Validate9

Validationmayoccur inmanyplaceswheneveranemailaddressoradomainnameiseitherreceivedoremittedasastringofcharactersbyanapplicationoronlineservice.

Validationisintendedtoensurethattheenteredinformationiseithervalidorat

least definitely not invalid. In other words, validation ensures the syntax

correctnessofthegiveninformation.

Fordomainnamesandemailaddresses,manyprogrammershavebeenusingsome

heuristics (for example, checking that a TLD has the “correct” number of

characters,orthatthecharactersarefromtheASCIIcharacterset).However,these

heuristicsarenolongerapplicablebecausetheInternetischanging:

• DomainnamesandemailaddressescannowincludeUnicode(non-ASCII)

characters

• ThelistofTLDsisgrowing

• ATLDcanbeupto63characterslong

3.Store

The Storage process occurswhenever an email address or a domain name isstoredasastringofcharactersinadatabaseorfileusedbyasoftwareapplicationoronlineservice.

Applications and services might require long-term and/or transient storage of

domainnamesandemailaddresses.Regardlessofthelifetimeofthedata,itmust

bestoredin:

• RFC-definedformats,OR

• AlternateformatsthatcanbeeasilytranslatedtoandfromRFC-defined

formats(thisismuchlessdesirable)

8AcceptingistreatedasdistinctfromValidatinginthisdocument.Inpractice,theabilitiesmayoverlap.

9AcceptingandProcessingaretreatedasdistinctfromValidatinginthisdocument.Inpractice,theabilities

mayoverlap.

Page 12: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 12

Although RFCs require the use ofUTF-8, other formatsmay be encountered in

legacycode.Seethe“GoodPractices”sectionbelow.

4.Process10

Processingoccurswheneveranemailaddressoradomainnameisusedbyanapplicationorservicetoperformanactivity(forexample,searchingorsortingalist),ortransformedintoanalternateformat(suchasstoringASCIIasUnicode).

Processingmeansusingdomainnamesandemailstringsinafeature.Additional

validationmayoccurduringprocessing.Thereisnolimittothenumberofways

thatdomainnamesandemailaddressescouldbeprocessed(examples:“Identify

allthepeopleassociatedwithNewZealandbecausetheyhaveanamewitha.nzccTLD”; “Identify all the pharmacists because they have a

[email protected] email address”; “Identify firewalls that might filter

DNSrequeststhatdon’tapplytotheirpolicies”).

5.Display

TheDisplay process occurs whenever an email address or a domain name isrenderedwithinauserinterface.

Displayingdomainnamesandemailaddressesisusuallystraightforwardwhenthe

scriptsusedaresupportedintheunderlyingoperatingsystemandwhenthestrings

are stored in Unicode. If these conditions are not met, application-specific

transformationsmayberequired.

UserScenariosThe examples and definitions above may give the impression that Universal Acceptance is only about

computersystemsandonlineservices.Thereality,however, is that it’salsoabout thepeopleusing those

systemsandservices.

BelowaresomeexamplesofactivitiesthatrequireUniversalAcceptance:

RegisteringanewTLD

An organization adopts a “brand” TLD to offer its customers a differentiated

customerexperiencebyprovidingemailaddressesintheformat,[email protected].

UniversalAcceptancemeans:

• Web apps accept these new “@example.brand” email addresses as

validastheywouldwithTLDssuchas.com,.net,.org.

AccessingagTLD Auseraccessesawebsite,whosedomainnamecontainsanewTLD,bytypingan

addressintoabrowserorclickingalinkinadocument.

UniversalAcceptancemeans:

• EventhoughtheTLDisnew,anybrowsertheuserwishestousedisplays

the web address in its native form and accesses the site as the user

10ProcessingistreatedasdistinctfromValidatinginthisdocument.Inpractice,theabilitiesmayoverlap.

Page 13: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 13

expects.ThebrowserdoesnotdisplayPunycodedtexttotheuserunless

itbenefitstheuserinsomeway.

UsinganemailaddresscontaininganewgTLDasanonlineidentity

AuseracquiresanemailaddresswiththedomainportionusinganewgTLD,and

usesthisemailaddressastheiridentityforaccessingtheirbankandairlineloyalty

accounts.

UniversalAcceptancemeans:

• Eventhoughthedomainusedintheemailaddress isnew,thebankor

airline siteaccepts theaddressexactlyas if itwereanestablishedTLD

suchas.bizor.eu.

AccessinganIDN

AuseraccessesanIDNURL,bytypinganaddressintoabrowserorclickingalink

inadocument.

UniversalAcceptancemeans:

• Evenifthedomainnamecontainscharactersdifferentthanthelanguage

settingsontheuser’scomputer,anybrowsertheuserwishestousewill

displaythewebaddressasexpectedandaccessthesitesuccessfully.

Usinganinternationalizedemailaddressforemail

A user has acquiredmultiple email addresses, some are internationalized (e.g.

īnfo@ - . ).

UniversalAcceptancemeans:

• Theusercansendtoandreceivefromanyemailaddressandusingany

emailclient.

Usinganinternationalizedemailaddressasanonlineidentity

AuseracquiresanEAIemailaddress,andusesthisemailaddressastheiridentity

foraccessingtheirbankandairlineloyaltyaccounts.

UniversalAcceptancemeans:

• ThebankorairlinesiteacceptstheEAIidentityexactlyasifitwereany

otheremailidentity.

DynamicallycreatingaHyperlinkinanApplication

Ausertypesawebaddressintoadocumentoremailmessage.

UniversalAcceptancemeans:

• Therulesusedbytheapplicationtoautomaticallygenerateahyperlink

arethesameeveniftheaddressisanEAIorcontainsanewTLD.

DevelopinganApplication

Adeveloperwritesanappthataccesseswebresources.

UniversalAcceptancemeans:

• ThetoolsusedbythedevelopersincludelibrariesthatenableUniversal

AcceptancebysupportingUnicode,IDNsandEAI.

Page 14: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 14

NonconformancetoUniversalAcceptancePracticesThefollowingareconsideredtobepoorpractice:

✖DisplayingPunycodedtexttotheuserwithoutacorrespondinguserbenefit.

Forexample,toshowthemappingbetweenaU-labelandaA-label.

✖RequiringausertoenterPunycodedtextwhensigningupforanewemailaddressorrequiringauser

toenterPunycodedtextwhensigningupforanewhosteddomain.

✖Validatingthesyntaxofdomainnameoremailaddressusingoutofdatecriteriaornon-authoritative

onlinedomainnameresources.

✖ UsinganoutdatedlistofTLDseventhoughnewTLDsareregularlybeingadded.

✖ ExposinginternaluseofPunycodedtexttousers.

Forexample,convertingfromEAItoanIDN-styleemailaddresswhenreplyingtoanEAIuser.

✖ Treatingsomedomainnamesassearchtermsratherthanasdomainnamesbecausetheapplication

doesnotrecognizethemassuch.

✖ SettingspamblockerstoautomaticallyblockentireTLDs.

Page 15: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 15

TechnicalRequirementsforUAReadinessHighlevelRequirementsAnapplicationorservicethatsupportsuniversalacceptance(UA):

1. Supportsalldomainnamesregardlessoflengthorcharacterset.

SeeRFC5892.

2. Allowsmultiplecharactersetsthatarevalidfordomainnamesandemailaddresses.

Thatis,permitsUnicodecodepoints.

3. CancorrectlyrenderallcodepointsinUnicodestrings.

SeeRFC3490.

4. Cancorrectlyrenderright-to-left(RTL)stringssuchasthoseinArabicandHebrew.

ForinformationaboutRTLscripts,seeRFC5893.

5. CancommunicatedatabetweenapplicationsandservicesinformatsthatsupportUnicodeandareconvertibleto/fromUTF-8.

ForinformationaboutUTF-8,seeRFC3629.

6. OfferspublicAPIsthatsupportUnicode&UTF-8.

7. OffersprivateAPIsthatsupportUnicode&UTF-8.

PrivateAPIsapplyonlytointer-servicecallsbythesamevendor.

8. StoresuserdatainformatsthatsupportUnicodeandisconvertibleto/fromUTF-8.

Suchconversionswouldbevisibleonlytotheproduct/serviceowner.

9. Supports all domain name strings in the authoritative ICANN TLD list and the community-servedPublicSuffixListregardlessoflengthorcharacterset.

Seehttps://newgtlds.icann.org/en/program-status/delegated-strings.

10. Cansendemailtoandreceivefromrecipientsregardlessofdomainnameorcharacterset.

SeeRFC6530.

11. TreatsEAIaddressesthesamewayastheirPunycodedequivalents(IDNemailformat).

DeveloperConsiderationsSincemanyexistingsoftwaresystemscontainhardcodedassumptionsaboutdomainsandemailaddresses,

codechangesmayberequiredtorecognizeIDNsandnewTLDs.Thissectiondiscusseshowdeveloperscan

incorporatecodechangesthatwillenableUniversalAcceptanceofallnewTLDs.

Page 16: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 16

AGuidingPrincipleforAchievingUniversalAcceptance:Postel’sLawIn RFC 793, Jon Postel formulated the Robustness Principle, now known as Postel's Law, as animplementation guideline for the then-new TCP. In computing, the Robustness Principle is a general

designguidelineforsoftware:

"Beconservativeinwhatyoudo,beliberalinwhatyouacceptfromothers."

Inotherwords,beconservativeinwhatyousendandbeliberalinwhatyouaccept.Thisisalsoagood

approach when dealing with the vagaries of Universal Acceptance currently implemented in the

ecosystem.

GoodPracticesforDevelopingandUpdatingSoftwaretoAchieveUA-Readiness

Accept

AlwaysofferUnicodeequivalents.

Usersshouldbeallowed,butnotrequired,toenterASCIICompatibleEncoded(or“Punycoded”)

text in place of its Unicode equivalent. However, Unicode should be shown by default, with

Punycodedtextonlyshowntotheuseronlywhenitprovidesabenefit.

! Don’tgenerateIDN-Styleemailaddresses,butdobeabletohandlethemifpresentedbysomeone

else’ssoftware.

Anyuserinterfaceelementrequiringausertotypeadomainnameoremailaddressmustsupport

Unicode,labelsupto63characters,anddomainnamestringsupto253characters.

• SeeRFC1035.

Validate

Validateonlytotheminimumextendnecessary.

Validateonly if it is required for theoperationof theapplicationor service. This is themost

reliablewaytoensurethatallvaliddomainnamesareacceptedintoyoursystems.

✔ Recognizethatsyntacticallycorrectinputsmaynotrepresentdomainnamesoremailaddresses

currentlyinuseontheInternet.

!

Ifyoumustvalidate,considerthefollowing:

• VerifytheTLDportionofadomainnameagainstanauthoritativetable.Examplesofsome

authoritativetablesthatyoucanuseare:

o http://www.internic.net/domain/root.zone

o http://www.dns.icann.org/services/authoritative-dns/index.html

o http://data.iana.org/TLD/tlds-alpha-by-domain.txt

Seealso:https://www.icann.org/en/system/files/files/sac-070-en.pdf

• QuerythedomainnameagainsttheDNS

o ConsiderusingtheGETDNSAPI(http://getdnsapi.net/)

• Requirerepeatedentryofanemailaddresstoprecludetypingerrors

Page 17: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 17

• ValidatethecharactersinlabelsonlytotheextentofdeterminingthattheU-labeldoes

notcontain"DISALLOWED"codepointsorcodepointsthatareunassignedinitsversion

ofUnicode

o SeeRFC5892

• Limitvalidationoflabelsitselftoasmallnumberofwhole-labelrulesdefinedintheRFCs

o SeeRFC5894

• IfastringresemblingadomainnamecontainstheArabicfullstopcharacter“۔”(U+06D4),or the ideographic full stop character “ ” (U+002E), convert it to the full stop “.”

(U+002E).

• Doensurethattheproductorfeaturehandlesnumberscorrectly

o For example: ASCII numerals and Asian ideographic number representations

shouldallbetreatedasnumbers

Store

✔ ApplicationsandservicesshouldsupporttheappropriateUnicodestandards.

InformationshouldbestoredintheUTF-8(UnicodeTransformationFormat)wheneverpossible.

SomesystemsmayrequiresupportforUTF-16aswell,butgenerallyUTF-8ispreferred.UTF-7and

UTF-32shouldbeavoided.

!

Consider all end-to-end scenarios before converting A-Labels (Punycode) to U-Labels and vice

versawhenstoring.

ItmaybedesirabletomaintainonlyU-Labelsinafileordatabase,becauseitsimplifiessearching

and sorting.However, conversionmayhave implicationswhen interoperatingwitholder, non-

Unicode-enabledapplicationsandservices.Considerstoringandindexingbothformats.

Clearlymarkemailaddressesanddomainnamesduringstorageforeasieraccess.

Instanceswhereemailaddressesanddomainnameshavebeenfiledunderthe“author”fieldofa

documentor“contactinfo”inalogfilehaveledtothelossoftheoriginaladdress.

✔ Ifyoudon’tstoreinUnicode,youmustbeabletomatchstringsinmultipleformats.

Forexample,asearchforexample. shouldalsofindexample.xn--q9jyb4c.

Process

✔ EnsureallserverresponseshaveUnicodespecifiedinthecontenttype.

SpecifyUnicodeinthewebserverhttpheaderanddirectlyinawebfile.

• EverywebfileshouldincludetheUTF-8charset

• Itisimportanttoensurethattheencodingisspecifiedoneveryresponse

Page 18: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 18

!

Consider all end-to-end scenarios before converting A-Labels (Punycode) to U-Labels and vice

versaduringprocessing.

ItmaybedesirabletomaintainonlyU-Labelsinafileordatabaseasitsimplifiessearchingand

sorting. However, conversion may have implications when interoperating with older, non-

Unicode-enabledapplicationsandservices.Considerstoringinbothformats.

✔ Ensure that the product or feature handles sort order, searches, and collation according to

locale/languagespecifications,andthatitaddressesmultilanguagesearchingandsorting.

Don’tuseURL-encodingfordomainnames:

• example. iscorrect

• example.%E3%81%BF%E3%82%93%E3%81%AA isnotcorrect

SincetheUnicodestandardiscontinuallyexpanding,codepointsnotdefinedwhentheapplication

orservicewascreatedshouldbecheckedtoensuretheywillnot“break”theuserexperience.

Missing fonts in the underlying operating system may result in non-displayable characters

(frequentlythe“o”characterisusedtorepresentthese),butthissituationshouldnotresultina

fatalcrash.

✔ UsesupportedUnicode-enabledAPIs.

Use the latest Internationalized Domain Names in Applications (IDNA) Protocol and Tables

documentsforIDNs:

• RFC5891

• RFC5892

✔ ProcessinUTF-8formatwhereverpossible.

Upgradeapplicationsandservers/servicestogether.

IftheserverisUnicodeandclientisnon-Unicode,orviceversa,thedatawillneedtobeconvertedtoeachcodepageeverytimethedatatravelsbetweenserverandclient.

✔ Performcodereviewstoavoidbufferoverflowattacks.

Whendoingcharactertransformation,textstringsmaygroworshrinksubstantially.

Display

DisplayallUnicodecodepointsthataresupportedbytheunderlyingoperatingsystem.

Ifanapplicationmaintainsitsownfontsets,comprehensiveUnicodesupportshouldbeofferedto

thecollectionoffontsavailablefromtheoperatingsystem.

✔ Whendevelopinganapporaserviceconsiderthelanguagessupportedandmakesureoperating

systemsandapplicationscoverthoselanguages.

✔ Convertnon-UnicodedatatoUnicodebeforedisplay.

Page 19: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 19

For example, the end user should see “example. ” as opposed to “example.xn--q9jyb4c”. (ThisconversionisanexampleofUA-readyprocessing).

✔ DisplayUnicodebydefault.

DisplayPunycodedtexttotheuseronlywhenitprovidesabenefit.

!

Beawarethatmixed-scriptaddresseswillbecomemorecommon.

• Some Unicode characters may look the same to the human eye, but different to

computers

• Don’t assume that mixed-script strings are intended for malicious purposes, such as

phishing

• Iftheuserinterfacecallsthestringstotheuser’sattention,besurethatitdoessoina

waywhichisnotprejudicialtousersofnon-Latinscripts

LearnmoreaboutUnicodeSecurityConsiderationsat:http://unicode.org/reports/tr36

✔ UseUnicodeIDNACompatibilityProcessinginordertomatchuserexpectations.

Tolearnmore,goto:http://unicode.org/reports/tr46

✔ Beawareofunassignedanddisallowedcharactersfordomainnames.

• SeeRFC5892

Unicode

✔ UsesupportedUnicode-enabledAPIs.

Don’tbuildyourownAPIsfor:

• Stringformatconversions

• Determiningwhichscriptcomprisesastring

• Determiningifastringcontainsamixofscripts

• Unicodenormalization/decomposition

Don’tuseUTF-7orUTF-32.

• UTF-7 is generallynotusedasanative representationwithinapplicationsas it is very

awkwardtoprocess.DespiteitssizeadvantageoverthecombinationofUTF-8witheither

quoted-printableorbase64,theInternetMailConsortiumrecommendsagainstitsuse.

• ThemaindisadvantageofUTF-32isthatitisspaceinefficient,usingfourbytespercode

point.Non-BMPcharactersaresorareinmosttexts[citationneeded],theymayaswell

beconsiderednon-existentforsizingissues,makingUTF-32uptotwicethesizeofUTF-

16anduptofourtimesthesizeofUTF-8.

✔ UseUnicodeincookiessotheycanbereadcorrectlybyapplications.

✔ UseIDNA2008ProtocolandTablesdocuments:

• RFC5891

Page 20: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 20

• RFC5892

✖ Don’tuseIDNA2003;innearlyallcasesithasbeensupersededbyIDNA2008.

✖ DonotautomaticallyassumethatexternalAPIscanconsumedatathathasbeenNFKC11converted.

!

MaintainIDNAandUnicodetablesthatareconsistentwithregardtoversions.

For example, unless the application actually executes the classification rules in the Tables

document (RFC 5892), its IDNA tables must be derived from the version of Unicode that is

supportedmoregenerallyonthesystem.Aswithregistration,thetablesdonotneedtoreflect

thelatestversionofUnicode,buttheymustbeconsistent.

! ValidatethecharactersinlabelsonlytotheextentofdeterminingthattheU-labeldoesnotcontain

“DISALLOWED”12codepointsorcodepointsthatareunassignedinitsversionofUnicode.

Limitvalidationoflabelsitselftoasmallnumberofwhole-labelrules:

• Noleadingcombiningmarks

• Bidirectionalconditionsaremetifright-to-leftcharactersappear

• Any contextual rules that are associated with joiner characters (and CONTEXTJ13

charactersmoregenerally)aretested

!

Don’tuseUTF-16exceptwhereitisexplicitlyrequired(asincertainWindowsAPIs).

WhenusingUTF-16,notethat16bitscanonlycontaintherangeofcharactersfrom0x0to0xFFFF,

andadditionalcomplexityisusedtostorevaluesabovethisrange(0x10000to0x10FFFF).Thisis

doneusingpairsofcodeunitsknownassurrogates.Ifhandlingofsurrogatepairsisnotthoroughly

tested,itmayleadtotrickybugsandpotentialsecurityholes.

Linkification

✔ Ifastringresemblingadomainnamecontainstheideographicfullstopcharacter“ ”(U+3002),

doacceptitandtransformitto“.”.

General

✔ Useauthoritativeresourcestovalidatedomainnames.

Donotmakeheuristicassumptions,suchas“allTLDsare2,3,4,or6charactersinlength”.

11NFKC(NormalizationFormCompatibilityComposition):Charactersaredecomposedbycompatibility,then

recomposedbycanonicalequivalence.See:http://unicode.org/reports/tr15

12DISALLOWED:CodepointsthatshouldnotbeincludedinIDNs.See:https://tools.ietf.org/html/rfc5892

13CONTEXTJ:ContextualRuleforJoincontrols.See:https://tools.ietf.org/html/rfc5892

Page 21: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 21

Ensurethattheproductorfeaturehandlesnumberscorrectly.

Forexample,ASCIInumeralsandAsianideographicnumberrepresentationsshouldallbetreated

asnumbers.

!

Lookformailaddressesinunexpectedplaces:

• Artist/author/photographer/copyrightmetadata

• Fontmetadata

• DNScontactrecords

• Binaryversioninformation

• Supportinformation

• OEMcontactinformation

• Registration,feedback,andotherforms

! LookforpotentialIRI

14pathsinunexpectedplaces:

• Single-labelmachinenamesregardlessofloadedsystemcodepage

• Fully-qualifiedmachinenamesregardlessofloadedsystemcodepage

✔ UseGB18030(China)forChineselanguagesupport15inadditiontoUTF-8.

!

Restrictthecodepointsallowedwhengeneratingnewdomainnamesandemailaddresses:

All products that use email addressesmust accept internationalized email addresses, allowing

characters>U+007f.Thatis,nocharacters>U+007faredisallowed.However,anapporservice

neednotallowallofthesecharacterswhenausercreatesanewIDNoremailaddress.Useonly

thislistofallowedcharactersforIDNs:http://unicode.org/reports/tr36/idn-chars.txt

PreventingcertainIDNsoremailaddressesfrombeingcreatedinthefirstplacecanmitigatesome

likelysecurityandaccessibilityconcerns.(NOTE:Postel’sLawwouldstillrequiresoftwaretoaccept

suchstringsifpresented.)

!

BeawarethatUniversalAcceptancecannotalwaysbemeasuredthroughautomatedtestcases

alone.

Forexample,testinghowanapporprotocolhandlesnetworkresourcemaynotalwaysbepossible

and sometimes it is best to verify the compliance through functional spec review and design

review.

!

Don’tautomaticallyassumethatbecauseacomponentdoesnotdirectlycallname-resolutionAPIs,

ordirectlyuseemailaddresses,itdoesnotmeanthattheydonotaffectit.

Understandhownetworknamesareobtainedbythecomponent; it isnotalwaysthroughuser

interaction.Thefollowingaresomeexamplesonhowthecomponentcangetanetworkname:

• Grouppolicy

• LDAPquery

• Configurationfiles

14IRI:InternationalizedResourceIdentifiers.See:https://www.ietf.org/rfc/rfc3987.txt

15GB18030-2000 is a Chinese government standard that specifies an extended codepage for use in the

Chinesemarket.See:http://icu-project.org/docs/papers/unicode-gb18030-faq.html

Page 22: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 22

• WindowsRegistry

• Transferredto/fromanothercomponent/feature

Performcodereviewstoavoidbufferoverflowattacks.

• InUnicode,stringsmayexpandincasing:Fluß→FLUSS→fluss

• Whendoingcharacterconversion,textmaygroworshrinksubstantially

AuthoritativeSourcesforDomainNamesDNSRootZoneTherearea fewoptions for theauthoritative listofTLDs.The firstoption is theDNSrootzone itself. It is

DNSSEC-signed,sothelistisproperlyauthenticated.Youcanobtaintherootzonefromanyofthefollowing

links:

• http://www.internic.net/domain/root.zone

• http://www.dns.icann.org/services/authoritative-dns/index.html

• http://data.iana.org/TLD/tlds-alpha-by-domain.txt

PublicSuffixListThePublicSuffixList (PSL),managedbyvolunteersof theMozillaFoundation,providesanaccurate listof

domainnamesuffixes.ThislistisasetofDNSnamesorwildcardsconcatenatedwithdotsandencodedusing

UTF-8.IfyouneedtousethePSLasanauthoritativesourcefordomainnames,yoursoftwaremustregularly

receivePSLupdates.DonotbakestaticcopiesofthePSLintoyoursoftwarewithnoupdatemechanism.You

canusethelinkbelowtomakeyourappdownloadanupdatedlistperiodically.Thelistgetsupdatedonceper

dayfromGithub:

• https://publicsuffix.org/list/public_suffix_list.dat

Page 23: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 23

OtherChallengesGeneral

VariableencodingofIDNs

Insomeapplications,IDNsareencoded:

• InPunycode,asperIDNA,ifthenameisidentifiedasanInternetname,

BUT

• InUTF-8, if thenameis identifiedasanameonthe localareanetwork

(“intranet”)

Mechanismtodetectandconvertcharsets

Someolderemailapplicationswereencodedinalocalcodepageanddidnothave

a set mechanism for detecting and converting charset as needed. This was

especiallytruefortheemailheader(TO,CC,BCC,Subject).

Failuretohandlenon-DNSprotocols

SomeapplicationsthatdoIDNA(forexample,IE7+)breakfornon-DNSprotocols. Thiscouldaffectaccessingresourcesusingnon-DNSprotocols.

Mechanismtomanagemultipleemailaddressesintoasingleuseridentity

Whenauserisaliasingmultipleemailaddressesitmaybetrickytomanagethese

addressesasasingleuseridentity.

Email programs can direct traffic to such aliases to the samemailbox, but the

applicationmaystillperceivetheseemailstopertaintodifferentidentities.

Tipforsoftwaredevelopers

Whenallowingausertogenerateadomainnameoremailaddress,consideravoidingtheuseof

visuallyconfusingcharacterstopreventhomographattacks.Useonlythislistofallowedcharacters

forIDN:http://unicode.org/reports/tr36/idn-chars.txt

IDN-StyleEmailandWhyItIsNottheSameasEAIEAI isdefinedasusingUnicodeonly;A-Labels (Punycode)arenotallowed.Nevertheless,developershave

sometimesadaptedemailsoftwareandservicestohandleIDN-Styleemailaddressesratherthanmakeafull

conversiontoUnicode.

BecauseIDNscanbePunycodeencoded,someexistingsoftwareallowstheIDNportionofanemailaddress

toberepresented inASCIIorUnicode.Forexample,somesoftwarewill treatthesetwo“IDN-Styleemail”

addressesequivalentlyforallpurposes(sending,receiving,andsearching):

However,somesoftwarewillnotrobustlytreattheseaddressesasequivalent,eventhougharebothvalid,

because there isno requirement for software toprocessanA-label (i.e. “xn--q9jyb4c”) into itsU-labelequivalent (i.e. “ ”) before comparing. This can result in unpredictable user experience. The user

NotallsoftwarewilltreatthesetwoIDN-Styleemailsasfunctionallyequivalent

user@example. = [email protected]

Page 24: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 24

experience may become especially confusing if some software converts U-labels into A-labels for

“compatibility”;asmessagesarereplied-toorforwarded,theaddresseswhicharevisiblydifferenttoauser,

orwhichfailtosearchandsortasexpected,mayincrease.

Intheexamplebelow,somesoftwaremayattempttoconverteventhelocalpartoftheemailaddressusing

Punycode,creatingsomethingthatlookslikeanA-LABELinthelocalpartoftheaddress.Thisisnotallowed

under the existingRFCs, and is very likely to result in failures to receive email by certain systems and to

generatesearchingandsortingdifficultiesasexplainedabove.

RobustUA-readysoftwareandservicesmaybeabletohandleandtreatalltheseformatsidentically,even

thosewhich are not RFC-compliant.Nevertheless,UA-ready software should not generate true EAI email

addressesonly.

LinkificationandItsChallengesModernsoftwaresometimesallowsausertoautomaticallycreateahyperlinksimplybytypinginastringthat

lookslikeawebaddress,emailnameornetworkpath.Forexample,typing“www.icann.org”intoanemail

message may result in a clickable link to http://www.icann.org being automatically created if the

applicationtreats“www.”asaspecialprefixor“.org”asaspecialsuffix.

Linkificationshouldworkconsistentlyforallwell-formedwebaddresses,emailnamesornetworkpaths.

Linkificationistheactionwhereanapplicationacceptsastringanddynamicallydetermineswhetheritshould

createahyperlinktoanInternetLocation(URL)oranemailaddress(mailto:)

Linkificationusesalgorithmsandrulescreatedbysoftwaredeveloperstodeterminewhether astringshould

be deemed a link – or not. Related to this is how people can identify a string as a domain name.While

browsers,emailclientsandwordprocessorsareobviousplaces,therearemanymoreapplicationsthatmake

thesedecisions.

Good Practice Recommendations

1. Attempttolinkifybasedonexplicitprotocolprefixes(e.g.“http://”,ftp://”,“mailto:”)butonlycomplete

theactioniftherestofthestringiswellformed

ExampleString ExpectedBehavior/Result

example.com Nolinkificationbecauseprotocolisabsentandnotinferred.

http://example.com Createhyperlinkbecauseprotocolisexplicit

http:example.com Nolinkificationbecauseofbadsyntax(missing//)

NeverconvertthelocalpartofanemailaddressusingPunycode

✔ 戶@example.

[email protected]

Page 25: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 25

ExampleString ExpectedBehavior/Result

http://example.a NolinkificationbecauseICANNPoliciesrequireTLDtobeatleasttwo

characters. NB: This syntax could be supported within an internal

network.

http://example..ab Nolinkificationbecauseofbadsyntax(consecutivedots)

http:// - . Createhyperlinkbecauseprotocolisexplicit.

2. Attempttolinkifybasedonimplicitprotocolprefixes(e.g.“www”infers“http://www”)

ExampleString ExpectedBehavior/Result

www.example.com Createhyperlinkbecauseprotocolisimplied16

[email protected] Createmailto:[email protected].

3. MaptheIdeographicFullStop“”(U+3002)toFullStop“.”(U+002E)(e.g.http:// comàhttp://

.com)ifstringisotherwisewellformed.

4. IfTLDsareusedasa‘specialsuffix’todeterminelinkability,thenallTLDsmustbeincluded.Alistof

validTLDsshouldbeupdateddynamicallyonafrequentbasis.

16Note:itmightbethecasethattheactualwebsiterequiresthatenduserstypehttps://insteadofhttp://.If

thisisthecase,thenthehyperlinkmaynotresolveormayreturnanerrorpage.

Page 26: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 26

Part3:AdvancedTopicsComplexScriptsThedetailsofcomplexscriptsmaynotbeofinteresttothosewhoarenotdeveloperscreatingtheirownstring

parsing libraries. Nevertheless, a summary is included here to ensure that all readers have sufficient

awarenesstorecognizecodebugsrelatedtothesescriptswhenencounteredinuserexperiences.

RighttoLeftLanguagesandUnicodeConformanceMostscriptsdisplaycharactersfromlefttorightwhentextispresentedinhorizontallines.However,

therearealso several scripts, suchasArabicorHebrew,where theorderingofhorizontal text in

displayisfromrighttoleft.Thetextcanalsobebidirectional(lefttoright–righttoleft)whenaright-

to-leftscriptusesdigitsthatarewrittenfromlefttorightorwhen itusesembeddedwordsfrom

Englishorotherscripts.

Challengesandambiguitiescanoccurwhenthehorizontaldirectionofthetext isnotuniform.To

solvethisissue,thereisanalgorithmtodeterminethedirectionalityforbidirectionalUnicodetext.

Thereisasetofrulesthatshouldbeappliedbytheapplicationtoproducethecorrectorderatthe

timeofdisplaywhicharedescribedbytheUnicodeBidirectionalAlgorithm.Wegenerallyreferto

thisasthe“Bidialgorithm”.

TheBidiAlgorithmTheBidialgorithmdescribeshowsoftwareshouldprocesstextthatcontainsbothleft-to-right(LTR)

and right-to-left (RTL) sequences of characters. Thebase direction17 assigned to the phrasewilldeterminetheorderinwhichtextisdisplayed.

Toknowifasequenceis left-to-rightorright-to-left,eachcharacter inUnicodehasanassociated

directionalproperty.Mostlettersarestronglytyped(strongcharacters)asLTR(left-to-right).Lettersfromright-to-leftscriptsarestronglytypedasRTL(right-to-left).Asequenceofstrongly-typedRTL

characterswillbedisplayedfromrighttoleft.Thisisindependentofthesurroundingbasedirection.

Forexample:

(LTR)example-مثال(RTL).

Textwithdifferentdirectionalitycanbemixedinline.Insuchcases,theBidialgorithmproducesa

separatedirectionalrunoutofeachsequenceofcontiguouscharacterswiththesamedirectionality.

SpacesandpunctuationarenotstronglytypedaseitherLTRorRTLinUnicodebecausetheymaybe

used in either type of script. They are therefore classified asneutral orweak characters.Weak

charactersarethosewithvaguedirectionality.Examplesofthistypeofcharacterinclude:

• Europeandigits

• EasternArabic-Indicdigits

• Arithmeticsymbols,andcurrencysymbols

• Punctuationsymbolsthatarecommontomanyscripts,suchasthecolon,comma,full-stop,

andtheno-break-space

Thedirectionalityofneutralcharactersisindeterminatewithoutcontext.Someexamplesinclude:

17InHTMLthebasedirectioniseitherinheritedfromthedefaultdirectionofthedocument,whichisleft-to-

right,orexplicitlysetbythenearestparentelementthatusesthedirattribute.

Page 27: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 27

• Tabs

• Paragraphseparators

• Mostotherwhitespacecharacters

Whenaneutralcharacterisbetweentwostronglytypedcharactersthathavethesamedirectional

type, it will also assume that directionality. For example, a neutral character between two RTLcharacters will be treated as a RTL character itself, and will have the effect of extending the

directionalrun:

نطاق.مثال •

Evenifthereareseveralneutralcharactersbetweenthetwostronglytypedcharacters,theywillall

betreatedinthesameway.

When a space or punctuation falls between two strongly typed characters that have different

directionality, the neutral character (or characters) will be treated as if they have the same

directionalityastheprevailingbasedirection.Forexample:

• example. مثال

Unlessadirectionaloverrideispresentnumbersarealwaysencoded(andentered)big-endian18,andthenumeralsrenderedLTR.Theweakdirectionalityonlyappliestotheplacementofthenumberin

itsentirety.

ToseetheBidialgorithmindetail,goto:http://unicode.org/reports/tr9/tr9-11.html

TheBidiRuleforDomainNamesABididomainnameisonethatcontainsatleastoneRTLlabel.Thereisarulethatdeterminesthe

conditionstobemetforthelabelsinBididomainnames.ThisrulecanbefoundonSection2ofRFC

5893:https://tools.ietf.org/html/rfc5893

JoinersSomelanguagesusealphabeticscriptsinwhichsinglephonemesarewrittenusingtwocharacters

calledadigraph.Inotherwords,adigraphisagroupoftwosuccessivelettersthatrepresentasinglesound(orphoneme).

Somedigraphsarefullyjoinedasligatures.Inwritingandtypography,aligaturehappenswheretwoormoregraphemesorlettersarejoinedasasingleglyph.Anexampleistheampersandcharacter

(&),whichevolvedfromtheadjoinedLatinletterseandt(“et”means“and”).

18“Big-endianandlittle-endianaretermsthatdescribetheorderinwhichasequenceofbytesarestoredin

computermemory.Big-endianisanorderinwhichthe‘bigend’(mostsignificantvalueinthesequence)is

storedfirst(attheloweststorageaddress).Little-endianisanorderinwhichthe‘littleend’(leastsignificant

valueinthesequence)isstoredfirst.”

Source:http://searchnetworking.techtarget.com/definition/big-endian-and-little-endian

ExamplesofdiagraphsinEnglish

ch(asinchurch)ph(phone)

th(then)th(think)

sh(shoe)

Page 28: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 28

Ifligaturesanddigraphshavethesameinterpretationinalllanguagesthatuseagivenscript,Unicode

normalizationgenerallyresolvesthedifferencesandmakesthemmatch.Whentheyhavedifferent

interpretations,matchingmustusealternativemethods,likelychosenattheregistrylevel,orusers

mustbeeducatedtounderstandthatmatchingwillnotoccur.Anexampleofdifferentinterpretation

canbefoundinSection4.3ofRFC5894:https://tools.ietf.org/html/rfc5894

TheUnicodeConsortiumliststwomainstrategiestodeterminethejoiningbehaviorofaparticular

characterafterapplyingtheBidialgorithm:

• “Whenshaping,an implementationcan referback to theoriginalbacking store to see if

therewereadjacentZWNJorZWJ19characters.

• Alternatively,theimplementationcanreplaceZWJandZWNJbyanout-of-bandcharacter

property associated with those adjacent characters, so that the information does not

interferewiththeBidialgorithmandtheinformationispreservedacrossrearrangementof

thosecharacters.OncetheBidialgorithmhasbeenapplied,thatout-of-bandinformation

canthenbeusedforpropershaping.”20

Intheabsenceofcarebyregistriesabouthowstringsthatcouldhavedifferentinterpretationsunder

IDNA2003andthecurrentspecificationarehandled,itispossiblethatthedifferencescouldbeused

asacomponentofname-matchingorname-confusionattacks.Suchcareisthereforeappropriate.

Tolearnmoreaboutjoiners,seeSection4.3ofRFC5894:https://tools.ietf.org/html/rfc5894

HomoglyphandConfusinglySimilarCharactersHomoglyphsarecharactersthat,duetosimilaritiesinsizeandshape,mightappearidenticalatfirst

glance.

Topreventconfusinglylookingdomainnamesbeingregistered,registriescanusethe“homoglyph

bundling”procedure.21

HomoglyphbundlingiswhenyouregisteranIDNandtheregistrationsystemautomaticallybundles

all the homoglyphs of that name (if there are any). Thismeans that several domain names are

bundledatonetime,andnoneoftheotherdomainnamesinthatbundlecanberegistered.

Homoglyphbundlingisagoodpracticeforregistriestoavoidpossiblephishingpracticesthatintend

totricktheuserwithvisuallyconfusingcharacters.

TolearnmoreaboutUnicodesecuritymechanismsforconfusabledetection,goto:

19TolearnmoreaboutZWNJ/ZWJ,goto:http://www.unicode.org/L2/L2005/05307-zwj-zwnj.pdf

20Source:MarkDavis,AharonLanin,AndrewGlass.2015.Unicode.http://unicode.org/reports/tr9

21https://www.icann.org/resources/pages/idn-guidelines-2011-09-02-en

Examplesofhomoglyphs

Cyrilliccharactera

Latincharactera

=

=

Unicodenumber0430

Unicodenumber0061

Page 29: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 29

• http://www.unicode.org/reports/tr39/#Confusable_Detection

Toseealistofhomoglyphs,goto:

• http://homoglyphs.net

Tolearnmoreaboutconfusinglysimilarcharactersandgoodpractice,see:

• M3AAWGUnicodeAbuseOverviewandTutorial

https://www.m3aawg.org/sites/default/files/m3aawg-unicode-tutorial-2016-02.pdf

• M3AAWGBestPracticesforUnicodeAbusePrevention

https://www.m3aawg.org/sites/default/files/m3aawg-unicode-best-practices-2016-02.pdf

NormalizationandCaseFoldingNormalization

UnicodeNormalizationhelpstodeterminewhetheranytwoUnicodestringsareequivalenttoeach

other. Some characters can be represented inUnicode by several code sequences. This is called

Unicodeequivalence.Unicodeprovidestwotypesofequivalences:

• Canonical(NFD)

• Compatibility(NFK)

Sequencesrepresentingthesamecharacterarecalledcanonicallyequivalent.Thesesequenceshavethesameappearanceandmeaningwhenprintedordisplayed.Forexample:

Compatibility equivalents are sequences which can have different appearances, but in some

contextsthesamemeaning.Itisaweakertypeofequivalencebetweencharactersorsequencesof

characters.

Examplesofcanonicallyequivalentcharacters

U+006E (Latin lowercase “n”) followed by U+0303 (the

combiningtilde“◌̃”)

= ñ

U+00F1(lowercaseletter“ñ”oftheSpanishalphabet) = ñ

Examplesofcompatibilityequivalentcharacters

U+FB00(thetypographicligature“ff”) = ff

U+0066U+0066(twoLatin“f”letters) = ff

Page 30: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 30

In the example above, the code point U+FB00 is defined to be compatible, but not canonically

equivalent to the sequence U+0066 U+0066. Sequences that are canonically equivalent are also

compatible,buttheoppositeisnotnecessarilytrue.

To avoid interoperability problems arising from the use of canonically equivalent, yet different,

charactersequences,theW3CrecommendsusingNormalizationFormC22forallcontent.

To see a list of all characters that may change in any of the Normalization Forms, go to:

http://www.unicode.org/charts/normalization

Someotherpointstonote:

• OnlystringsNOTtransformedbyNFKC23arevalid.

• WhentwoapplicationsshareUnicodedata,butnormalizethemdifferently,errorsanddata

losscanoccur.

• NormalizationFormsmustremainstableovertime.Inotherwords,astringmustremain

normalizedunderallfutureversionsofUnicode(backwardcompatibility).

Tipforsoftwaredevelopers

Don’tnormalizebyconvertingtouppercase,orignoringnon-spacingcharacters,becausethismay

alsomakesorting,datacopy,dataimportandexport,dataretrievalbyclientapplicationsrather

difficultandmayresultindatalossorcorruption.

TolearnmoreaboutNormalizationFormsgoto:http://www.unicode.org/reports/tr15

CaseFoldingCasefoldingistheprocessofmakingtwotexts,whichdifferincasebutareotherwise“thesame”,

identical.Mapping[a-z]to[A-Z]worksformostsimpleASCII-onlytextdocuments.However,itbegins

tobreakdownwithlanguagesthatuseadditionalcharacters.

UnicodedefinesthedefaultcasefoldmappingforeachUnicodecodepoint.Therearecommonandfullcasefoldmappings:

• Commonfoldmappingsarethosethathaveasimple,straightforwardmappingtoasingle

matching(mainlylowercase)codepoint

• FullfoldmappingsarethosethatwouldnormallyrequiremorethanoneUnicodecharacter

One importantconsideration,accordingto theW3C,24 iswhether thevaluesarerestrictedto the

ASCIIsubsetofUnicodeorifthevocabularypermitstheuseofcharacters(suchasaccentsonLatin

lettersorabroadrangeofUnicodeincludingnon-Latinscripts)thatpotentiallyhavemorecomplex

casefoldingrequirements.25

22NFC:CanonicalDecomposition,followedbyCanonicalComposition.

23NFKC:CompatibilityDecomposition,followedbyCanonicalComposition.

24W3C:TheWorldWideWebConsortium(W3C)isaninternationalcommunitywhereMemberorganizations,

afull-timestaffandthepublicworktogethertodevelopWebstandards.See:https://www.w3.org

25 Source: A Phillips. 2015. Character Model for the World Wide Web: String Matching and Searching.

https://www.w3.org/TR/charmod-norm

Page 31: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 31

Tipforsoftwaredevelopers

✔ ConsiderUnicodeNormalizationinadditiontocasefolding.

TolearnmoreaboutUnicodenormalization,see:

• http://www.w3.org/TR/charmod-norm

• http://unicode.org/reports/tr15

Forrecommendationsaboutcasefolding,goto:

• https://www.w3.org/International/wiki/Case_folding

Page 32: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 32

Part4:GlossaryandOtherResourcesGlossary

A-label The ASCII-compatible encoded (ACE) representation of an internationalized

domainname,e.g.how it is transmitted internallywithin theDNSprotocol.A-

labelsalwayscommencewiththeprefix“xn--”.ContrastwithU-label.

ACEprefix ASCIICompatibleEncodingPrefix.

ASCIICharacters AmericanStandardCodeforInformationInterchange.Thesearecharactersfrom

thebasicLatinalphabettogetherwiththeEuropean-Arabicdigits.Thesearealso

includedinthebroaderrangeof"Unicodecharacters"thatprovidesthebasisfor

IDNs.

API AnApplicationProgramming Interface (API) is a setof routines,protocols, and

tools for building software and applications. An API may be for a web based

system,operatingsystem,ordatabasesystem,anditprovidesfacilitiestodevelop

applicationsforthatsystemusingagivenprogramminglanguage.

Codespace Rangethatdefinethelowerandupperboundsforanencoding.

CodePoints Acodepointorcodepositionisanyofthenumericalvaluesthatmakeupthecode

space. They are used to distinguish both, the number from an encoding as a

sequence of bits, and the abstract character from a particular graphical

representation(glyph).

DNSRootZone TherootzoneisthecentraldirectoryfortheDNS,whichisakeycomponentin

translatingreadablehostnamesintonumericIPaddresses.

EAI EmailAddress Internationalization is anemail address that requires theuseof

Unicodeinallpartsoftheemailaddress.

IANA InternetAssignedNumbersAuthority.Itsfunctionsinclude:

• MaintenanceoftheregistryoftechnicalInternetprotocolparameters

• Administrationofcertain responsibilitiesassociatedwith InternetDNS

rootzone

• AllocationofInternetnumberingresources

ICANN The Internet Corporation for Assigned Names and Numbers (ICANN) is an

internationally organized, non-profit corporation that has responsibility for

Internet Protocol (IP) address space allocation, protocol identifier assignment,

generic (gTLD) and country code (ccTLD) Top-Level Domain name system

management,androotserversystemmanagementfunctions.

IDN InternationalizedDomainNames.IDNsaredomainnamesthatincludecharacters

usedinthelocalrepresentationoflanguagesthatarenotwrittenwiththetwenty-

sixlettersofthebasicLatinalphabet“a-z”,thenumbers0-9,andthehyphen“-“.

IDNA InternationalizedDomainNamesinApplications.

Page 33: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 33

IDNccTLD Country Code Top-level Domain that includes characters used in the local

representationoflanguagesthatarenorwrittenwiththetwenty-sixlettersofthe

basicLatinalphabet“a-z”.Examples:

• .рф(Russia)

(Egypt).صر •

(ArabiaSaudi).السعودیة •

IETF The Internet Engineering Task Force (IETF) is a large open international

communityofnetworkdesigners,operators,vendors,andresearchersconcerned

withtheevolutionoftheInternetarchitectureandthesmoothoperationofthe

Internet. It is open to any interested individual. The IETF develops Internet

StandardsandinparticularthestandardsrelatedtotheInternetProtocolSuite

(TCP/IP).

Language Themethodofhumancommunication,eitherspokenorwritten,consistingofthe

useofwordsinastructuredandconventionalway.

Punycode ItisanalgorithmtorepresentUnicodewiththelimitedcharactersubsetofASCII

supportedbytheDomainNameSystem.Punycodeisintendedfortheencoding

of labels in the Internationalized Domain Names in Applications (IDNA)

framework.

Registrar Anorganizationwheredomainnamesareregisteredbyusers.Theregistrarkeeps

records of the contact information and submits the technical information to a

centraldirectoryknownasthe“registry”.

Registry Theauthoritative,masterdatabaseofalldomainnamesregistered ineachTop

LevelDomain.

RFC A Request for Comments (RFC) is a formal document from the Internet

Engineering Task Force (IETF) that is the result of committee drafting and

subsequentreviewbyinterestedparties.

Script Thecollectionoflettersorcharactersusedinwriting,representingthesoundsof

alanguage.

Second-leveldomainname

IntheDomainNameSystem(DNS)hierarchy,asecond-leveldomain(SLDor2LD)

is a domain that is directly below a top-level domain (TLD). For example, in

example.com,exampleisthesecond-leveldomainofthe.comTLD.

U-label A"U-label" isan IDNA-valid stringofUnicodecharacters includingat leastone

non-ASCIIcharacter.ConversionsbetweenU-labelsandA-labelsareperformed

accordingtothePunycodespecification[RFC3492].

UA-ready SoftwareorUA-Readiness

Universal Acceptance Ready Software. It is a software that has the ability to

Accept,Store,Process,ValidateandDisplayallTopLevelDomainsequallyandall

IDNs,hyperlinkandemailaddressesequally.

Unicode Auniversalcharacterencodingstandard.Itdefinesthewayindividualcharacters

arerepresentedintextfiles,webpages,andothertypesofdocuments.Unicode

wasdesignedtosupportcharactersfromalllanguagesaroundtheworld.Itcan

Page 34: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 34

support roughly 1,000,000 characters and supports up to 4 bytes for each

character.See:http://unicode.org

UTF UnicodeTransformationFormat.ItisawayoftransformingUnicodecodepoints

intoastreamofbytes.UTF-8isthepreferredUTFforhandlingIDNandEAI.UTF-

8convertsUnicodeto8-bitbytes.

M3AAWG TheMessaging,Malware andMobile Anti-AbuseWorkingGroup (M3AAWG) is

where the industry comes together to work against botnets, malware, spam,

viruses, DoS attacks and other online exploitation. See:

https://www.m3aawg.org/

W3C TheWorldWideWebConsortium (W3C) is an international communitywhere

Memberorganizations,afull-timestaff,andthepublicworktogethertodevelop

Webstandards.See:https://www.w3.org/

ZWJ Zero-WidthJoinerisnon-printingcharacterusedinthecomputerizedtypesetting

ofsomecomplexscriptssuchastheArabicscriptoranyIndicscript.Whenplaced

betweentwocharactersthatwouldotherwisenotbeconnected,aZWJcauses

themtobeprintedintheirconnectedforms.

ZWNJ Zero-WidthNon-Joinerisanon-printingcharacterusedinthecomputerizationof

writingsystemsthatmakeuseofligatures.Whenplacedbetweentwocharacters

thatwouldotherwisebe connected intoa ligature, aZWNJcauses them tobe

printedintheirfinalandinitialforms,respectively.Thisisalsoaneffectofaspace

character, but a ZWNJ is used when it is desirable to keep the words closer

togetherortoconnectawordwithitsmorpheme.

ForacompleteICANNglossary,goto:https://www.icann.org/resources/pages/glossary-2014-02-03-en

RFCsPUNYCODERFCs

RFC3492 Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names inApplications(IDNA)

RFC3492describesPunycodeas:

"a simple and efficient transfer encoding syntax designed for use withInternationalizedDomainNamesinApplications(IDNA)"

PunycodetransformsuniquelyandreversiblyaUnicodestringintoanASCIIstring.ThisRFC

definesageneralalgorithmcalledBootstring.Thisalgorithmallowsastringofbasiccode

pointstouniquelyrepresentanystringofcodepointsdrawnfromalargerset.

https://tools.ietf.org/html/rfc3492

IDNRFCs

Page 35: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 35

RFC5890

Internationalized Domain Names for Applications (IDNA): Definitions and DocumentFramework

ThisRFCdescribestheusagecontextandprotocolforarevisionofInternationalizedDomain

NamesforApplications(IDNA).

https://tools.ietf.org/html/rfc5890

RFC5891 InternationalizedDomainNamesinApplications(IDNA)Protocol

This RFC specifies the protocol mechanism, called Internationalized Domain Names in

Applications (IDNA), for registering and looking up IDNs in a way that does not require

changestotheDNSitself.

https://tools.ietf.org/html/rfc5891

RFC5892 TheUnicodePointsandInternationalizedDomainNamesforApplications(IDNA)

TheRFC5892specifiesrulesfordecidingwhetheracodepoint,consideredinisolationorin

context,isacandidateforinclusioninanInternationalizedDomainName(IDN).

https://tools.ietf.org/html/rfc5892

RFC5893 Right-to-leftscriptsforInternationalizedDomainNamesforApplications(IDNA)

This RFC provides a new Bidi rule for Internationalized Domain Names for Applications

(IDNA)labels,fortheuseofright-to-leftscriptsinInternationalizedDomainNames.

https://tools.ietf.org/html/rfc5893

RFC5894 InternationalizedDomainNames forApplications (IDNA): Background, Explanation andRationale

Thisinformationaldocumentprovidesanoverviewofarevisedsystemtodealwithnewer

versionsofUnicodeandprovidesexplanatorymaterialforitscomponents.

https://tools.ietf.org/html/rfc5894

RFC5895 MappingCharactersforInternationalizedDomainNamesinApplications(IDNA)2008

ThisRFCdescribestheactionsthatcanbetakenbyanimplementationbetweenreceiving

userinputandpassingpermittedcodepointstothenewIDNAprotocol(2008).Itdescribes

anoperationthatistobeappliedtouserinputinordertopreparethatuserinputforusein

an “on the network” protocol. It also includes a general implementation procedure for

mapping.

https://tools.ietf.org/html/rfc5895

EAIRFCs

RFC6530 OverviewandFrameworkforInternationalizedEmail

This standard introduces a series of specifications that definemechanisms and protocol

extensions needed to fully support internationalized email addresses. This document

describes how the various elements of email internationalization fit together and the

Page 36: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 36

relationshipsamongtheprimaryspecificationsassociatedwithmessagetransport,header

formats,andhandling.

https://tools.ietf.org/html/rfc6530

RFC6531 SMTPExtensionforInternationalizedEmail

ThedocumentdefinesaSimpleMailTransferProtocolextensionsoserverscanadvertise

the ability to accept and process internationalized email addresses and internationalized

emailheaders.

https://tools.ietf.org/html/rfc6531

RFC6532 InternationalizedEmailHeaders

ThisdocumentspecifiesanenhancementtotheInternetMessageFormatandtoMIMEthat

allows use of Unicode inmail addresses andmost header field content. This document

specifiesanenhancement to the InternetMessageFormat (RFC5322)and toMIME that

permitsthedirectuseofUTF-8,ratherthanonlyASCII,inheaderfieldvalues,includingmail

addresses. A new media type, message/global, is defined for messages that use this

extended format.This specificationalso lifts theMIME restrictiononhavingnon-identity

content-transfer-encodings on any subtype of the message top-level type so that

message/globalpartscanbesafelytransmittedacrossexistingmailinfrastructure.

https://tools.ietf.org/html/rfc6532

RFC6533 InternationalizedDeliveryStatusandDispositionNotifications

Thisspecificationaddsanewaddresstypeforinternationalemailaddressessoanoriginal

recipient address with non-ASCII characters can be correctly preserved even after

downgrading. This also provides updated content returnmedia types for delivery status

notificationsandmessagedispositionnotificationstosupportuseofthenewaddresstype.

https://tools.ietf.org/html/rfc6533

KeyStandardsISO10646(Unicode)

Toprovideacommontechnicalbasis for theprocessingofelectronic information in

various languages, the International Organization for Standardization (ISO) has

developedaninternationalcodingstandardcalledISO10646.TheISO10646provides

a unified standard for the coding of characters in allmajor languages in theworld

includingtraditionalandsimplifiedChinesecharacters.Thislargecharactersetiscalled

theUniversalCharacterSet(UCS).ThesamesetofcharactersisdefinedbytheUnicode

standard,whichfurtherdefinesadditionalcharacterpropertiesandotherapplication

detailsofgreatinteresttoimplementers.

UnicodeisacharactercodingsystemdesignedbytheUnicodeConsortiumtosupport

theinterchange,processinganddisplayofthewrittentextsofallmajorlanguagesin

theworld. ISO 10646 andUnicode define several encoding forms of their common

repertoire:UTF-8,UCS-2,UTF-16,UCS-4andUTF-32.

Page 37: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 37

http://www.iso.org/iso/home/store/catalogue_ics/catalogue_detail_ics.htm?csnumb

er=63182

GB18030(China)

GB18030-2000isaChinesegovernmentstandardthatspecifiesanextendedcodepage

foruseintheChinesemarketinadditiontoUTF-8.Theinternalprocessingcodeforthe

characterrepertoirecanandshouldbeUnicode;however,thestandardstipulatesthat

softwareprovidersmustguaranteeasuccessfulround-tripbetweenGB18030andthe

internalprocessingcode.AllproductscurrentlysoldortobesoldinChinamustplan

the code page migration to support GB18030 without exception. GB18030 is a

“mandatorystandard”andtheChinesegovernmentregulatesthecertificationprocess

toreinforceGB18030deployment.

http://icu-project.org/docs/papers/unicode-gb18030-faq.html

UnicodeTechnicalStandard#46:UnicodeIDNACompatibilityProcessing

Thisspecificationdefinesamappingconsistentwiththenormativerequirementsofthe

IDNA2008protocol,andwhichisascompatibleaspossiblewithIDNA2003.Forclient

software, this provides behavior that is themost consistentwith user expectations

aboutthehandlingofdomainnameswithexistingdata.

http://unicode.org/reports/tr46/

OnlineResourcesAPIs WindowsAPIs

https://www.msdn.microsoft.com/enus/library/windows/desktop/ff818516%28v=vs.

85%29.aspx

SharePointAPIs

https://msdn.microsoft.com/en-us/library/office/jj860569.aspx

PublicSuffixList

https://publicsuffix.org/list/public_suffix_list.dat

ICANNAuthoritativeTLDlist

http://data.iana.org/TLD/tlds-alpha-by-domain.txt

AndroidAPIs

http://developer.android.com/guide/index.html

MACIOSAPIs

https://developer.apple.com/library/mac/navigation

.NetFramework

https://msdn.microsoft.com/en-us/library/system.text.encoding(v=vs.110).aspx

UnicodeSecurity

UnicodeSecurityconsiderations

http://www.unicode.org/reports/tr36

Unicodesecuritymechanisms

http://www.unicode.org/reports/tr39

Page 38: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 38

Unicodecharactergroupings

Unicodecodeplanes

http://en.wikipedia.org/wiki/Mapping_of_Unicode_character_planes

OverviewofGB18030

http://en.wikipedia.org/wiki/GB_18030

AuthoritativemappingtablebetweenBG18038-2000andUnicode

http://source.icu-project.org/repos/icu/data/trunk/charset/data/xml/gb-18030-

2000.xml

Unicodenormalization

https://en.wikipedia.org/wiki/Unicode_equivalence

Unicodeexploits

Section3.1,“UTF-8Exploits”inUnicodeTechnicalReport#36

http://unicode.org/reports/tr36/#UTF-8_Exploit

M3AAWGBestPracticesforUnicodeAbusePrevention

https://www.m3aawg.org/sites/default/files/m3aawg-unicode-best-practices-2016-

02.pdf

M3AAWGUnicodeAbuseOverviewandTutorial

https://www.m3aawg.org/sites/default/files/m3aawg-unicode-tutorial-2016-02.pdf

Seealso:

http://www.unicode.org

Miscellaneous URIs

http://tools.ietf.org/html/rfc3986

TheDomainNameSystem:ANon-TechnicalExplanation–WhyUniversalResolvability

IsImportant

http://www.internic.net/faqs/authoritative-dns.html

ICANNglossary

https://www.icann.org/resources/pages/glossary-2014-02-03-en

Page 39: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization

IntroductiontoUniversalAcceptance(UASG007)

Version8-5May2016 39

AcknowledgementsTheauthorsgratefully acknowledge the followingpeople for their contributionsand collaborationon this

document:

EleezaAgopian

GwenCarlson

EdmonChung

SamanthaDickinson

DonHollander

ChantalLebrument

AntoniettaMangiacotti

RichardMerdinger

RamMohan

DavidMorrison

CarolynNguyen

MichaelD.Palage

KurtPritz

AndréSchappo

ZhengSong

LarsSteffen

AndrewSullivan

DennisTan

WinnieYu