19
Digitization of Documentary Heritage Collections in Indic Language Comparative Study of Five Major Digital Library Initiatives in India Dr. Anup Kumar Das Jawaharlal Nehru University (JNU) New Delhi, India http://www.anupkumardas.blogspot.in/ Presented in the International Conference on the Memory of the World in the Digital Age: Digitization and Preservation, 26-28 September 2012, Vancouver, British Columbia, Canada

Digitization of Documentary Heritage Collections in Indic LanguageComparative Study of Five Major Digital Library Initiatives in India

Embed Size (px)

DESCRIPTION

Presented by Dr. Anup Kumar Das in the International Conference on the Memory of the World in the Digital Age: Digitization and Preservation, 26-28 September 2012, Vancouver, British Columbia, Canada

Citation preview

Page 1: Digitization of Documentary Heritage Collections in Indic LanguageComparative Study of Five Major Digital Library Initiatives in India

Digitization of Documentary Heritage Collections in Indic Language

Comparative Study of Five Major Digital Library Initiatives in India

Dr. Anup Kumar DasJawaharlal Nehru University (JNU)

New Delhi, Indiahttp://www.anupkumardas.blogspot.in/

Presented in the International Conference on the Memory of the World in the Digital Age: Digitization and Preservation, 26-28

September 2012, Vancouver, British Columbia, Canada

Page 2: Digitization of Documentary Heritage Collections in Indic LanguageComparative Study of Five Major Digital Library Initiatives in India

Outline

• Introduction• Indicative Multilingual DL Initiatives in India• Digital Library of India (DLI) project• IGNCA maintained Digital Libraries• National Mission for Manuscripts• DL Initiatives with Single Indic Language Contents• Challenges Ahead• Examining Semantic Web Principles • Conclusion

Page 3: Digitization of Documentary Heritage Collections in Indic LanguageComparative Study of Five Major Digital Library Initiatives in India

Introduction

• Article 6 of the UNESCO Universal Declaration on Cultural Diversity “Towards Access for All to Cultural Diversity”

• Mandates of Networked Knowledge Societies.• DL as a vehicle for widely disseminating documentary heritages. • Indian DL initiatives aim at producing a vast amount of Multilingual,

Multicultural digitized contents pertaining to different forms of recorded human knowledge, ranging from the rare manuscripts to current literature.

Page 4: Digitization of Documentary Heritage Collections in Indic LanguageComparative Study of Five Major Digital Library Initiatives in India

• Culturally diverse contents in multilingual DLs ensure intercultural understanding and intercultural dialogues, a building block for inclusive knowledge societies.

• When establishing digital library with a large collection, collaboration is inevitable.

• Indian DL initiatives achieved multi-stakeholders’ participation with increased international, regional, national and local collaborations.

• Providing metadata information in Indic languages is one of the major challenges in DLs in Indic languages

Introduction

Page 5: Digitization of Documentary Heritage Collections in Indic LanguageComparative Study of Five Major Digital Library Initiatives in India

Indicative Multi-/ Bi-lingual DL Initiatives in IndiaName of the Initiative Implementing Agency Funding Agency Website

Digital Library of India (DLI)

Indian Institute of Science; IIIT

Hyderabad; C-DAC

MCIT and others http://www.new1.dli.ernet.in

; http://www.new.dli.ernet.in

;http://dli.cdacnoida.in Kalasampada: Digital

Library Resources for Indian Cultural

Heritage (DL-RICH)

IGNCA MCIT http://www.ignca.nic.in/dlrich.html

National Databank on Indian Art and

Culture (NDBIAC)

IGNCA MCIT http://ignca.nic.in/ndb_0001.htm

Kritisampada : National Database of

Manuscripts

National Mission for Manuscripts, IGNCA

Ministry of Culture http://www.namami.org/

pdatabase.aspx

Panjab Digital Library (PDL)

Panjab Digital Library

Nanakshahi Trust and others

http://www.panjabdigilib.or

g/

Digital Repository of WBPLN (DR-

WBLLN)

West Bengal Public Library Network (WBPLN), CDAC

Kolkata

Directorate of Library Services, West Bengal

http://dspace.wbpublibnet.go

v.in/dspace/

Page 6: Digitization of Documentary Heritage Collections in Indic LanguageComparative Study of Five Major Digital Library Initiatives in India

Indicative Multi-/ Bi-lingual DL Initiatives in IndiaName of the

InitiativeImplementing

AgencyFunding Agency Website

Open Access to Oriya Books – Project

OaOb

National Institute of Technology,

Rourkela

NITR; Srujanika, Bhubaneswar;

Pragati Utkal Sangh R

http://oaob.nitrkl.ac.in

Archives of Indian Labour (AIL)

V. V. Giri National Labour Institute &

Association of Indian Labour Historians

Ministry of Labour http://www.indialabourarc

hives.org

Muktabodha Digital Library

Muktabodha Indological Research

Institute

Donations from Individuals & Trusts

http://muktalib5.org/

digital_library.htm

Traditional Knowledge Digital Library (TKDL)

Council of Scientific and Industrial

Research (CSIR)

Department of Ayurveda, Yoga…

(AYUSH)

http://www.tkdl.res.in

National Science Digital Library

NISCAIR, India Council of Scientific and Industrial

Research (CSIR)

http://nsdl.niscair.res.in

Vigyan Prasar Digital Library

Vigyan Prasar, India Department of Science and Technology

http://www.vigyanprasar.g

ov.in/digilib/

Page 7: Digitization of Documentary Heritage Collections in Indic LanguageComparative Study of Five Major Digital Library Initiatives in India

Digital Library of India• A partner project of Universal Digital Library (UDL) or Million Books

Project (MBP)• Initiated in India in 2002 as spin-off of Universal Digital Library

project.• 355,000+ documents; top six languages are respectively English,

Sanskrit, Hindi, Telugu, Bengali and Urdu covering about 91.3% of books in major DLI site http://www.new1.dli.ernet.in.

• Becomes a testbed for Indian language technologies, facilitating development of OCR (optical character recognition), TTS (text-to-speech) and other related software for Indian language computing.

• Challenge 1: Indic language contents are not OCR-ed.• Challenge 2: Metadata information not available in Indic languages

for Indic language documents.• Challenge 3: Document is downloaded page-wise in image, html, txt

formats; but not full whole document downloaded in a single click, e.g. in PDF file.

• Challenge 4: Broken links and page is not available – signs of aging.

Page 8: Digitization of Documentary Heritage Collections in Indic LanguageComparative Study of Five Major Digital Library Initiatives in India

Multi-stakeholders’ Participationo Principal Coordinator (International) – Carnegie Mellon

University o Principal Coordinator (National) – Indian Institute of Science

(IISc), Bangaloreo Research Coordinator (National) – International Institute of

Information Technology (IIIT), Hyderabado Infrastructure Agency – ERNET Societyo Funding Agencies – MCIT, NSF, PSAo Software and Hardware Solutions – Industrial Partners o Operational Agencies

– Regional Mega Scanning Centres (RMSCs)– Scanning Centres– Source Libraries

Page 9: Digitization of Documentary Heritage Collections in Indic LanguageComparative Study of Five Major Digital Library Initiatives in India

Participation in Content Generation

Cultural Institutions (e.g.

Salarjung Museum)

Religious Institutions

(e.g. Tirumala Tirupati

Devasthanam) Government Agencies

(e.g. Rashtrapati

Bhavan)

Industrial Agencies

(e.g. Thrinaina Informatics

Ltd.)

Research Agencies

(e.g. CDAC- Noida)

Academic Institutions(e.g. Anna University)

Digital Library of India

Page 10: Digitization of Documentary Heritage Collections in Indic LanguageComparative Study of Five Major Digital Library Initiatives in India

Content Generation Process

Coordination

Page 11: Digitization of Documentary Heritage Collections in Indic LanguageComparative Study of Five Major Digital Library Initiatives in India

IGNCA maintained Digital Libraries• Partially open access multilingual and multimedia digital contents

– Kalasampada: Digital Library Resources for Indian Cultural Heritage (DL-RICH)

– Cultural Heritage Digital Library in Hindi (CHDLH)– National Databank on Indian Art and Culture– National Digital Library of Manuscripts

• Supported by DIT, MCIT; Ministry of Culture– Content Development and IT Localisation Network

(COILNET) Programme– Technology Development for Indian Languages (TDIL)

Programme– National Mission for Manuscripts

Page 12: Digitization of Documentary Heritage Collections in Indic LanguageComparative Study of Five Major Digital Library Initiatives in India

Collaborative Digital Libraries on Indian Cultural Heritage

Manuscript Libraries

(e.g., Allama Iqbal Library)

Government Agencies

(e.g. Asiatic Society)

Academic Institutions(e.g. Visva-

Bharati)

Museums (e.g. National

Museum)

Oriental Institutions (e.g. Oriental

Research Library)

National Mission for Manuscripts

Archaeological Survey of India

IGNCA’sPartner

Institutions

Page 13: Digitization of Documentary Heritage Collections in Indic LanguageComparative Study of Five Major Digital Library Initiatives in India

National Mission for Manuscripts• February 2003 by Ministry of Tourism and Culture, Government of India.

• An ambitious five year project with the specific objectives of locating, documenting, conserving and disseminating the knowledge content of India's manuscripts.

• Established a network of 47 Manuscript Resource Centres, 32 Manuscript Conservation Centres (MCCs), 32 Manuscript Partner Centres (MPCs) and more than 200 Manuscript Conservation Partner Centres (MCPCs) across the country.

• NMM identified 45 collections of Manuscript Treasures of India (MTI). These are very unique and rare collections of manuscripts.

• 5 MTIs have already inscribed on Memory of the World Register.

• Out of 6 inscriptions from India, 5 inscriptions are from MTIs.

• National Digital Manuscripts Library will provide full-text access to all MTIs including which are covered in MoWR.

• Kritisampada: The National Database of Manuscripts provides access to metadata inform of manuscript collections of NMM partners.

Page 14: Digitization of Documentary Heritage Collections in Indic LanguageComparative Study of Five Major Digital Library Initiatives in India
Page 15: Digitization of Documentary Heritage Collections in Indic LanguageComparative Study of Five Major Digital Library Initiatives in India

DL Initiatives with Single Indic Language ContentsName of Digital Library Organization Focused

Indic Language

Whether Metadata in Indic

Language

S/W used

Digital Repository of W.B. Public Library Network

West Bengal State Central Library & CDAC Kolkata

Bengali Yes, Partial*

DSpace

Panjab Digital Library Panjab Digital Library; Nanakshahi

Punjabi No* -

Open Access to Oriya Books – Project OaOb

National Institute of Technology, Rourkela; Srujanika, BBS

Oriya No* EPrints

Digital Repository of VPM Vidya Prasarak Mandal, Thane

Marathi Yes, Partial*

DSpace

ASI Digital Library Archeological Survey of India; IGNCA New Delhi

English and Sanskrit

No* -

E-Gyankosh Indira Gandhi National Open University, New Delhi

English and Hindi

No* DSpace

* Metadata available mostly in transliterated English

Page 16: Digitization of Documentary Heritage Collections in Indic LanguageComparative Study of Five Major Digital Library Initiatives in India

Challenges Ahead

• Lack of national practice for establishing principles of interoperability, cross-search, metadata harvesting, etc.

• Enabling harvesting of metadata from South Asian digital libraries– Protocol for Metadata Harvesting (OAI-PMH) can be adopted– Other similar harvesting method can be applied

• Standardization of transliterated metadata or metadata with diacritical mark

• South Asian documentary heritage collections available worldwide – stock taking

• Innovation in DL development is needed to integrate features of interactive Web 2.0 (such as user interaction and content sharing), Multimedia, and M-Science (accessibility using mobile devices).

Page 17: Digitization of Documentary Heritage Collections in Indic LanguageComparative Study of Five Major Digital Library Initiatives in India

Examining Semantic Web Principles • Indic language metadata – providing metadata in all

major Indian languages for a full-text document• Whether ontology-based structure is followed (RT, BT,

NT…)– Standard vocabulary/ structured subject headings/

subject thesaurus vs. user-generated keywords • Whether permanent link is available for a document or a

dynamic link is generated– Rate of link failure or dead links (links to full-text

contents, images, etc.)• Whether contents can be accessed using handheld

devices• Whether text-to-speech (TTS) can be applied

Page 18: Digitization of Documentary Heritage Collections in Indic LanguageComparative Study of Five Major Digital Library Initiatives in India

Conclusion

• Helped in bridging digital divide in the country by making Indian language documents freely available to the masses.

• Helped in pushing content localization efforts.• ‘Lean backward’ to digitize important documentary

heritage collections. • “Lean forward” to include born digital contents in

multilingual OA repositories.• National DLs to include rare and out-of-print books and

manuscripts in all Indian languages.• Metadata harvesters for these DLs.

Page 19: Digitization of Documentary Heritage Collections in Indic LanguageComparative Study of Five Major Digital Library Initiatives in India

Acknowledgement

thanK You

anY Question?http://www.anupkumardas.blogspot.in/

• UNESCO, UBC and JNU for travel and technical support