Upload
ashlie-alexandrina-franklin
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Swaran Lata, Director and HoD
Technology Development for Indian Languages Programme (TDIL)
Dept of Information Technology , Govt. of India
Organization of Presentation India – cultural diversity Linguistic Diversity in India Present Knowledge Society and Indian Scenario ICT scenario in India Internet penetration – Haves & Have Not-s Mind-set - Still an inhibition Bridging the gap – Service delivery –reaching the citizens doorsteps Localization – Key enabler Challenges and Issues TDIL’s efforts National Roll Out Plan – A big Step forward Localization of Applications Putting Standards in place Collaboration and Hand-holding
India – A civilization of more than 5000 years old
Vast ancient knowledge baseDiverse culture and heritage –probably one of
the most spectacular in the worldOne of largest economy in the present worldRapid strides in Information and
communications technologyYet .. Widening divide in terms of knowledge
amongst various strata of citizens
Linguistic Diversity in IndiaAccording to Census 2001 India has 122 major
languages and 2371 dialects.Out of 122 languages 22 are constitutionally
recognized languages.Linguistic Diversity is very rich and wide in IndiaOne Language –many scriptMany Language –one scriptCulturally different depending on region though
using same script for different languages.Even wide difference for same language across
different country
Though same script – Devanagari – Content wise variation for Hindi and Marathi – Depicting cultural and linguistic difference
Marathi Hindi
Present ICT scenario in India• Despite a reputation as an emerging technology powerhouse, India’s scores on the
2009 Connectivity Scorecard are poor in the vital consumer and business segments.
• These poor scores should not be surprising, since many of the individual metrics that we utilise are effectively measuring “penetration rates.”
• This means that India is judged as a whole, and not by the pockets of ICT excellence that it undoubtedly possesses.
• India scores especially low on broadband and Internet penetration rates.
• Broadband penetration in India is below 2 percent of households compared to 20 percent of households or more in Turkey, Chile, and Mexico .
• On the consumer usage front, India is not a strong performer in terms of Internet usage, with below 10 percent of the population regularly using the Internet. The country is hampered by a relatively low literacy rate
Mind-set : Still favouring English as medium of excellence English and Hindi serves and link languages English Learning viewed as a passport to better economic and social prospects. - Even
people from low income strata now considers this. Due to surge in the ICT and ICT enabled services in recent time , English now has
become 2nd highest medium of instruction from school level Study by National University for Education Planning and Administration (NUEPA): -- In
Sarba Siksha Abhiyan no of students opting for English grew by 150% between 2003-08 while the corresponding fig of Hindi is only 32%
Example : Uttar-Pradesh , West Bengal and .. Now using English medium of instruction for schools and colleges
Primary school students in Eng medium school (in Lakhs)
2005-06 2007-08 growth
Haryana 0.19 1.56 721
WB 0.29 2.31 704
Punjab 0.93 2.78 197
UP 0.12 0.37 193
India 52.00 153.70 196
Result : Though , Hindi (ranked 3rd) and Bengali (ranked 8th) are
among the top 10 language spoken across the world- but, no Indian language is in the top 10 languages used in the Internet.
Minuscule Internet usage in Indian Languages
Confinement of Knowledge
Low usage of knowledge sources and applications
Language constitutes the foundation of communication and is fundamental to cultural and historical heritage.
Increasingly, knowledge and information are key determinants of wealth creation, social transformation and human development.
Language is the primary vector for communicating knowledge and traditions, thus the opportunity to use one’s language on global information networks such as the Internet will determine the extent to which one can participate in the emerging knowledge society.
Thousands of languages worldwide are absent from Internet content and there are no tools for creating or translating information into these excluded tongues.
Huge sections of the world’s population are thus prevented from enjoying the benefits of technological advances and obtaining information essential to their wellbeing and development.
UNESCO’s VISION for Multilingualism in Cyberspace
An uneven growth
Indian Software Export Industry growing at a very fast pace in their global presence
However , Root is not expanding its base within the country
Fallout : Domestic requirement is not being looked into within the country using Indian Languages
Result : Non-availability of Information and Knowledge to the vast section of the citizen
Expanding Software Export
Low penetration in Indian Market
Requirements :
Reaching out to the door steps of citizens offering better services for wider dissemination of knowledge .
Localization of Software Solutions , contents and services as per local requirements .
Common Services Centre –Its objectives
CSC is a strategic cornerstone of the National e-Governance Plan (NeGP) – Front end service Interface for major G2C services
CSC is one of the three infrastructure pillars of e-governance which the government is committed to building, to ensure “anytime anywhere” web enabled delivery of government services.
To provide e-governance services.100,000 CSCs for 600,000 village clustersTo cater to service needs of major rural areasBeing implemented in PPP Model
Local Language Interface – Not a desirable but An essential Component
The success of CSC hinges upon effective delivery of the G2C applications to rural masses
Since most of the citizens communicate in their local languages – Local Language Interface to G2C solutions at CSC is essential
Hosting of content in local languages helps citizens to interact in a better way in today’s knowledge society
Thus , Local Language Interface is “Not a desirable but An essential Component”
LandRecords
RoadTransport Police
LandRegn
TreasuriesComrlTaxes
Agriculture
Gram Pts
Munici palities
EmploymentExchanges
CivilSupplies
Education
IncomeTax
PassportVisa
MCA21
Insurance Banking
NationalIDCentral
ExcisePensions
GIS e-Posts
Common ServiceCentres
Gatewaye-Procure
e-Office
eBiz
EDIe-Courts
IndiaPortal
CorePolicies
NeGP – Mission Mode Projects
Initiatives already taken to enable G2C applications such as Land Records , Civil Supplies and Municipal applications with Indian Language Interface
Localization Requirements for Service Delivery Applications
• To ensure seamless access of services, language Component /Localization and interface requires at:
• Storage level – Server end• Date Exchange – Traffic (Language tags needs to be properly
embedded• Display & Rendering • Language Interface for differently -abled citizens for more
inclusive societal benefits
Web based applications
Dynamic & Static websites with search &
Cross Lingual access
Operating systems
ToolsOffice Suites
Handheld devices
Mobile Devices
Stand alone applications
Globalization of IT
Localization
Internationalization
Process of generalizing a product so that it can handle multiple languages and cultural conventions without the need for re-design.
Taking a product and making it linguistically and culturally appropriate to the target locale (country/ region and language) where it will be used and sold"
I18N L10N
Globalization & Localization
Locale Data Repository
Linguistic Resources
Standards
Certification
Localization Tools
TrainingAwareness
Technologies
Key Enablers
Localization
The Tree of Localization Complexities
• Presentation of dates, times, numbers, lists, and other values.
• Collation and sorting• Alternate calendars, which may include
holidays, work rules, weekday/weekend.• Currency• Tax or regulatory regime
• Machine Translation• Optical Character
Recognition• Speech Technologies• Cross Lingual Information
Retrieval
• Machine Translation• Optical Character
Recognition• Speech Technologies• Cross Lingual Information
Retrieval
• Project Management• Translation Memory• Translation Tools• Natural language for text processing:
parsing, spell checking, and grammar checking etc
• Automatic Testing Tools
• Encoding Standards• Multimodal input device
standards• Fonts & Rendering Engines• Transliteration & Translation
• Guidelines• Best Practices• Case Studies• Consultancy• Showcasing of Tools
& Technologies
• Parallel Corpora• Speech Corpora• Lexical resources• Ontologies• Dictionaries • Thesaurus• Reference Terminologies
• Certified Localization professionals
• PG Specialization in Localization
• PhD Programmes
• Minimizing Time lag• Benchmarking w.r.t.
English version• Political sensitivity • Pricing issues
• Testing methodologies • Metrics for Linguistic Testing• Certification by Government for
linguistic compliance
Complexities
Globalization and Localization Issues Language IssuesLanguage issues are the result of differences in how languages around the world differ in display, alphabets, grammar, and syntactical rules.
• Bidirectional scripts• Capitalization, Uppercasing and Lowercasing• Code Pages• Complex Script Awareness• Fonts• Input Method Editors• Keyboards• Line and Word Breaks• Mirroring Awareness• Unicode
Formatting IssuesFrom the user's perspective, formatting issues are the primary source of discrepancies when working with applications originally written for another language or culture/locale. Developers should use the National Language Support (NLS) APIs in Windows or the System.
Globalization Namespace to handle most of these issues automatically. Globalization Namespace.
• Addresses• Currency• Dates• Numerals• Paper Sizes• Telephone Numbers• Time• Units of Measure
Localization- Tool for increasing Financial Sustainability
• Training of local youth in Localized Content Creation
• Working with Self Help Groups to up-lift their business
• Identify Dynamically changing Local Content which helps in their local professions
• E-Tutor
• Entertainment during non-official hours
TDIL’s EffortsMore than a decade’s sustained and major national initiativeLeading to development and consolidation of various
language Tools , resources and components Continuous and untiring representation in various
International and National Standards bodies- ISO ,UNICODE, W3C, IETF , ELRA and BIS
Represented and included 22Indian Languages in UNICODEFirst time in India to launch consortium mode projects in
the technology intensive areas of Machine Translation , Cross-lingual Information Access, Text to Speech etc - to develop state of the art technologies in Indian languages
Promotes futuristic research in Language Technology
National Roll-Out Plan –A Big Step ForwardCDs containing Software Tools and Fonts for all 22
Officially Recognized Languages released in public domain for free use
Contains Fonts, Localized Open Office, Keyboard drivers, E-mail clients and Firefox browsers in Indian languages
Freely downloadable from Indian Language Data centre – http://www.ildc.gov.in
Already crossed ~ 41 lakhs downloads and 7.0 lakhs shipments
NASSCOM may take active role towards proliferating the benefits of these language CDs
These free CDs would also benefit NGOs and CSC operators for developing and promoting local language contents.
Putting Standards in place
UNICODEUNICODE – Default Text Encoding Standard. Compatible with ISO 10646Seamless data storage and search if data is stored in UNICODEAll 22 Officially recognized Indian Languages including Vedic
Sanskrit represented in UNICODE Declared as Text Encoding Standard for All E-Governance
Applications
Extracting Knowledge from our vast ancient knowledge base
UNICODE Encoding for Vedic Sanskrit , Grantha scripts : Key towards computerization of knowledge base
Capturing Region Specific Requirements : Common Locale Data Repository (CLDR)
• The Unicode CLDR provides key building blocks for software to support the world's languages.
• CLDR is by far the largest and most extensive standard repository of locale data.
• This data is used by a wide spectrum of companies for their software internationalization and localization: adapting software to the conventions of different languages for such common software tasks as formatting of dates, times, time zones, numbers, and currency values; sorting text; etc.
• Locale Data for Indian Languages are in the process of modification
• Six Languages CLDR Hindi , Nepali, Bengali , Assamese, Malayalam and Gujarati are finalized.
• Other languages in process
All Region specific requirements have been captured and put in Hindi Locale repository
Example of CLDR: Hindi
Putting Standards in place… Contd. W3C
W3CWorld-Wide –web Consortium (W3C) develops web
standards for interoperable web solutions across platform, devices and access methodology
Ensures interoperability across major browsers, IE, Firefox, Opera etc.
Work already started to represent all Indian Language representation in W3C standards.
Desirable – Pro-active Industry & Industry Body like NASSCOM participation
Keyboard LayoutsOpen Type Fonts.. Sakal Bharti FontsLocale DataLanguage Tag. (For Language Negotiation in Internet)Domain Names in Indian LanguagesIT Terminology
… and Standards for major Linguistic Resources and Tools
Putting Standards in place…Contd.
Collaboration and Hand HoldingCollaborative efforts required for wider proliferation and
sustained initiatives.
Govt., Industry Bodies and Academia needs to join hand to address the challenges of Local Language Computing and to promote and bring services closer to doorsteps to millions of citizens in their own languages