19
INDEPTH Network 1 iSHARE – Let’s Share to Learn More INDEPTH INDEPTH Data Sharing Initiatives By Team iSHARE

INDEPTH Network 1 iSHARE – Let’s Share to Learn More INDEPTH INDEPTH Data Sharing Initiatives By Team iSHARE

Embed Size (px)

Citation preview

Page 1: INDEPTH Network 1 iSHARE – Let’s Share to Learn More INDEPTH INDEPTH Data Sharing Initiatives By Team iSHARE

INDEPTH Network 1

iSHARE – Let’s Share to Learn More

INDEPTHINDEPTH Data Sharing Initiatives

By

Team iSHARE

Page 2: INDEPTH Network 1 iSHARE – Let’s Share to Learn More INDEPTH INDEPTH Data Sharing Initiatives By Team iSHARE

INDEPTH Network 2

iSHARE – Let’s Share to Learn More

Presentation Agenda• Data Sharing Initiatives

• Data Sharing with INDEPTH – History, Purpose, Initiatives

• Concept of the Data Repository

• Data Extraction Methodology

• The ETL process

• The Application and the Process

• Dynamic Reports

• The Framework

• Current Limitations and Challenges

• Future plans

• QA

Page 3: INDEPTH Network 1 iSHARE – Let’s Share to Learn More INDEPTH INDEPTH Data Sharing Initiatives By Team iSHARE

INDEPTH Network 3

iSHARE – Let’s Share to Learn More

Data Sharing Initiatives• INDEPTH Data System (IDS)

– Efforts so far led by Prof. Abraham Kobus Herbst– If funded would lead to Standard Data Management System

(OpenDSS) + A web-based repository– This would greatly enhance cross-site data analysis

• Data Documentation Initiative (DDI)Documenting data within INDEPTH sites using standard machine readable formats

• Data Sharing on the web within INDEPTH Sites (iSHARE)

Page 4: INDEPTH Network 1 iSHARE – Let’s Share to Learn More INDEPTH INDEPTH Data Sharing Initiatives By Team iSHARE

INDEPTH Network 4

iSHARE – Let’s Share to Learn More

Data Sharing History• Growing call within the funding community & the

scientific community for data to be shared

• Some individual INDEPTH sites; (Agincourt, Africa Centre) had already started taking steps in the direction of sharing data documentation and/or actual data

• In 2007, three INDEPTH HDSS sites in Asia (Vadu -India, Kanchanaburi -Thailand & Wosera - Papua New Guinea) came together to share their data on a web-based repository, with funding from the INDEPTH Secretariat, and technical support from I2IT

Page 5: INDEPTH Network 1 iSHARE – Let’s Share to Learn More INDEPTH INDEPTH Data Sharing Initiatives By Team iSHARE

INDEPTH Network 5

iSHARE – Let’s Share to Learn More

Why Data Sharing?

• To encourage INDEPTH sites to share their data with the broader scientific community

• To help bring about transparency in scientific inquiry and also allow for verification and refinement of findings, more economically and effectively

• To encourage collaboration with other institutions and communities

Page 6: INDEPTH Network 1 iSHARE – Let’s Share to Learn More INDEPTH INDEPTH Data Sharing Initiatives By Team iSHARE

INDEPTH Network 6

iSHARE – Let’s Share to Learn More

iSHARE Initiative• iSHARE – INDEPTH Sharing and Accessing Repository

• Funding from the Hewlett Foundation for expansion - to include three African sites

• In response to call from Secretariat; Agincourt and Dikgale from South Africa and Magu from Tanzania joined this initiative, totalling to six HDSS sites on the platform

• All participating sites submitted draft data to be used for development of the repository

• New website (http://www.indepth-ishare.org) beta launched in October 2009 and final to be launched in February 2010

Page 7: INDEPTH Network 1 iSHARE – Let’s Share to Learn More INDEPTH INDEPTH Data Sharing Initiatives By Team iSHARE

INDEPTH Network 7

iSHARE – Let’s Share to Learn More

Concept of Data Repository

• Standardized and Harmonized dataset

• Collect data from participating HDSS sites (Push / Pull Extraction)

• Clean and transform datasets to standard format

• Upload data to centralized database

• Data Repository created!

• Repeat cycle for addition of more datasets

Page 8: INDEPTH Network 1 iSHARE – Let’s Share to Learn More INDEPTH INDEPTH Data Sharing Initiatives By Team iSHARE

INDEPTH Network 8

iSHARE – Let’s Share to Learn More

Standardized DatasetStandardized Dataset• Five table

– Base table: one record for each individual under observation

– PregnancyOutcome: one record for each pregnancy experienced by a women under observation

– Deaths: one record for each death that occurs under observation

– In migrations: one record for each in migration into a location under observation

– Out migrations: one record for each out migration from a location under observation

INDEPTH Network

Page 9: INDEPTH Network 1 iSHARE – Let’s Share to Learn More INDEPTH INDEPTH Data Sharing Initiatives By Team iSHARE

INDEPTH Network 9

iSHARE – Let’s Share to Learn More

Potential Uses of the DatasetPotential Uses of the Dataset• Basic Demographic rate and statistic calculations. Can

character the populations from each site

– Person years calculations

• Assessing vital registry systems with in the sites

– Birth registration

– Death registration

• Other analysis of

– Education

– Occupation

– Reason for migration

INDEPTH Network

Page 10: INDEPTH Network 1 iSHARE – Let’s Share to Learn More INDEPTH INDEPTH Data Sharing Initiatives By Team iSHARE

INDEPTH Network 10

iSHARE – Let’s Share to Learn More

Dataset StructureDataset Structure

INDEPTH Network

•Individual level•PID uniquely identifies the individual

•Event table link to Individuals

•EID uniquely identifies an event

•Event liked to household(locations) where they occur identified by HID•Social groups simplified to individual living at the same location (HID)•Pregnancies linked to mother. Live born children linked to mother in Individuals (base) table

Page 11: INDEPTH Network 1 iSHARE – Let’s Share to Learn More INDEPTH INDEPTH Data Sharing Initiatives By Team iSHARE

INDEPTH Network 11

iSHARE – Let’s Share to Learn More

The ETL ProcessStart

Data Extraction

Store the data in dummy tables in Excel/Mysql format

Remove errors in the data

Enforce data standards (Ex: ICD-codes)

Validation and

Integrity test

StopLoad anonymized data into iSHARE database using FTP protocol

Insert data into Error table

Test Passed

Test Failed

More data in

future ?

Yes

No

Page 12: INDEPTH Network 1 iSHARE – Let’s Share to Learn More INDEPTH INDEPTH Data Sharing Initiatives By Team iSHARE

INDEPTH Network 12

iSHARE – Let’s Share to Learn More

Data Extraction Methodology• Sites send data as per standardized dataset

requirements (Push Method)

– Sites send data in csv, xls, mdb, frm, scripts, etc formats over FTP or eMail

• Sites upload data at specified location; application access that to populate repository (Pull Method)

Page 13: INDEPTH Network 1 iSHARE – Let’s Share to Learn More INDEPTH INDEPTH Data Sharing Initiatives By Team iSHARE

INDEPTH Network 13

iSHARE – Let’s Share to Learn More

The Application

Wosera HDSS

Kanchanaburi HDSS

Vadu HDSS

Agincourt HDSS

Digkale HDSS

Magu HDSS

iSHAREDB

ETL Operations

ETL Operations

iShare Web

Server

Client

Client

Client

Page 14: INDEPTH Network 1 iSHARE – Let’s Share to Learn More INDEPTH INDEPTH Data Sharing Initiatives By Team iSHARE

INDEPTH Network 14

iSHARE – Let’s Share to Learn More

The ProcessStart

User Registration

Login & Password Generated

Send Download Request

Accept/Reject Download Request by Committee

Member

Is Download Request

Accepted ?

Accepted

RejectedStop

Download Data

Page 15: INDEPTH Network 1 iSHARE – Let’s Share to Learn More INDEPTH INDEPTH Data Sharing Initiatives By Team iSHARE

INDEPTH Network 15

iSHARE – Let’s Share to Learn More

The FrameworkUser Interface Layer

Application Layer

Registration Login Download Request

Approval Feedback Reporting

Database Layer

Site 1

Site 2

Site 3

Site n

Page 16: INDEPTH Network 1 iSHARE – Let’s Share to Learn More INDEPTH INDEPTH Data Sharing Initiatives By Team iSHARE

INDEPTH Network 16

iSHARE – Let’s Share to Learn More

Dynamic Reports• Reports generated on-the-fly providing real-time

data for faster analysis.

• It dynamically loads data from the database

• iSHARE dynamic reports provides:

– Customizable reports as per user needs

– Sophisticated actionable information without exposing internal complex data structures

• Example: Migration Reports – By Year – By Site – Drill Down to gender and generate bar. Line and pie charts for better visual simplicity

Page 17: INDEPTH Network 1 iSHARE – Let’s Share to Learn More INDEPTH INDEPTH Data Sharing Initiatives By Team iSHARE

INDEPTH Network 17

iSHARE – Let’s Share to Learn More

Current Limitations

• Error findings on datasets is manual process but cleaning is automated

• Pull method of data extraction not yet implemented – a framework for this has to be developed

Page 18: INDEPTH Network 1 iSHARE – Let’s Share to Learn More INDEPTH INDEPTH Data Sharing Initiatives By Team iSHARE

INDEPTH Network 18

iSHARE – Let’s Share to Learn More

Challenges• Re-coding existing data into agreed categories /

standards come at significant costs and requires funding

• Conflicting conditionality imposed by different parent institutions and funding agencies

• Cost of maintaining the repository as versions and contributing sites increase

• Defining policies for research data in repositories

• Abuse of data downloaded

Page 19: INDEPTH Network 1 iSHARE – Let’s Share to Learn More INDEPTH INDEPTH Data Sharing Initiatives By Team iSHARE

INDEPTH Network 19

iSHARE – Let’s Share to Learn More

Thank You

http://www.indepth-ishare.org