37
22/03/22 1 Research Data Planning ...for the Sciences MSGR UpSkills Program Lyle Winton 08 Sept 2010

22/09/2015 1 Research Data Planning...for the Sciences MSGR UpSkills Program Lyle Winton 08 Sept 2010

Embed Size (px)

Citation preview

19/04/23 1

Research Data Planning ...for the Sciences

MSGR UpSkills ProgramLyle Winton

08 Sept 2010

319/04/23

This presentation is a mixture of technical concepts (IT) and research context.

Please ask questions as we go! And there’ll be time for questions.

419/04/23

Outline (this will take 1.5 hrs)

Warm Up Overview Data Management Systems & Tools < DIY break out groups > Data Security Planning – Obligations

Records (a basis)

Personal Records: Identity records

birth cert, drivers license, passport... Personal/Family records

“Essential” – insurance policies, health records, court documents “Non-essential” – contacts, education certs, employment

Legal records copies of will, power of attorney

Property records “Essential” – contracts, lease, titles, vehicle rego “Non-essential” – warranties, receipts, inventories

Financial records ... Photos ...

“[A record is] information created, received and maintained asevidence and information by an organization or person, inpursuance of legal obligations or in the transaction of business.” – AS ISO 15489

Are you organised? What do you do with digital records? Do you scan in documents?

519/04/23

Quick Hypothetical For my imaginary research project I have…

Data:1. ABS statistics data ; 2. group collected database ; 3. my experimental data ; 4. my working notes (references, lab notebook..)

Questions:Who’s responsible? ▼ Who copy or share with? ▼ Obligations?

Example answers: ABS: ABS maintains ▼ share with my group ▼ can use provided I attribute them,

properly cite and don't redistribute ? Group Collected: me & group & Uni ? ▼ copy/share only with permission ▼ I

can use without attribution, don't redistribute, under a group ethics contract My Experiment: me ▼ confirm with supervisor, I‘d like to allow open access ▼

department and group requires a copy! conditions for open access? Working notes: me & Uni, watch for external IP ▼ group only, Uni with

permission ? ▼ periodically signed by supervisor, department keeps original, I may keep a private copy

619/04/23

Overview – a defn of “Research Data”

Research Data (uni definition) laboratory notebooks; field notebooks; primary research data (hardcopy or

in computer); questionnaires; audiotapes; videotapes; models; photographs;films; test responses; slides; artefacts; specimens; samples

Research Records (uni definition) Includes correspondence (electronic mail and paper-based correspondence);

project files; grant applications; ethics applications; technical reports; researchreports; master lists; signed consent forms; and information sheets for researchparticipants

Administrative Records (Research Office, Central Records) Includes contracts and agreements, patents, licences, grants, intellectual

propertyand trademarks, policies, ethics, research project files, reports, publications

What is often included as “Research Data”:= data + records + copies (physical & digital)= stuff you used and/or created

719/04/23

Overview – Why?

Research Sector Professional Sector

Why is research important?

• Duh!• Creating new knowledge• Scholarship and publication

• Developing your skills• Develops business practice and leadership• eg. market research, evidence lead teaching,

translational medicine and nursing, use of new technology

Why data and records?

• Help you produce scholarly works (papers, presentations etc.)

• Evidence: may need to prove your findings and legitimacy of your work

• Publications and industry leadership• Evidence: may need to prove your skills,

practice, decisions

Other reasons for managing or keeping it...

• Has ongoing value throughout your life• At your fingertips access (being organised)• Owning legal copies while you have access

819/04/23

Overview – Research directions

Research Data is increasing in size Protein crystallography sets – 100’s GB Gene sequencing – TB/day High-energy physics – 10’s PB/year Astronomy – 100’s PB/day ?

Research Collaborations are increasing Human Genome project

113 people on main paper 5 primary & 15 contributing orgs started 1990 – ended 2003

Belle collaboration ~370 people 14 countries, 60 institutions started 1994, still going

ATLAS collaboration @ LHC CERN ~2500 people 37 countries, 169 institutions Started 1994 and still 10+ years to go

Research Data is increasingly digital Wonderful opportunities for reuse,

sharing, collaboration, analysis This is “eResearch”!

919/04/23

Overview – eResearch = Data intensive science??? …that’s a bit misleading. Alternate defn:

“Research in the digital age.” Most research…

isn’t about “big data” isn’t limited by data processing speed is limited by us!

Most of you will use IT services… that you don’t own/control that are outside Uni (Google, PubMed, databases)

So your research and data could be… sourced from anywhere stored in many places sent to everywhere… a mess?

We've all got stuff (digital+not),we need to manage it professionally,be digitally wise.

1019/04/23

Data is increasingly digital... that’s great! Fast, easier access, better for sharing, but different problems.Some examples...

Data loss... some facts: while microfilm and non-acidic paper

can last for 100+ years magnetic media lasts 10+ years optical media lasts 20+ years

(with proper handling) 2-10% of hard drives fail every year software & hardware can outdate quickly

Other issues: ease of copying security and internet preserving meaning over time keep digital & physical linked over time

(much info is still only hardcopy)

Proper Planning & Management is needed!!!

!!! Being “digitally wise”...

1119/04/23

Burroughs 1977 – B 9495Magnetic Tape Subsystem

Overview –Data Planning & Managing 3 considerations:

Your obligations legal, ethical, funding requirements, uni, department, group policies BE AWARE!

Your project/study making your research work creating a data management system contributing to global research community BEST TO PLAN!

Your career being a professional researcher data – your assets and records MANAGE YOUR DATA NOW, FOR LATER ON!

1219/04/23

Overview – Summary

You will generate, collect some research data (all sorts of stuff) that you need to…keep,keep securely,be able to find it again,be able to understand it again

…for a long period of time. (years)

1319/04/23

Overview – Principles: Document your Data Management Plan/System Be aware of your discipline practice and policy and obligations Understand what data to keep Keep good records and metadata

It’s not about having a complicated IT system – it’s about being consistent, effective:

In 1 week or 5 years can I find what I’m looking for? Can I track my results and conclusions back to the source?

back through the analysis, data, tools/software/instruments, samples, and experimental conditions

Research Integrity!!!

1419/04/23

Simplified Archives Perspective (NA)

3 groups of activities in a records management cycle+1 for active data management (actually using the data)

1519/04/23

Create,Capture,Describe

(Use, Transform, Update)

Store, Secure, Preserve

Keep,Transfer,Destroy

refer to: National Archives, website on record keeping

Simplified Research Data Life Cycle

1619/04/23

Create,Capture,Describe

(Use, Transform, Update)

Store, Secure, Preserve

Keep,Transfer,Destroy

WorkingData

Context•“Metadata”: Process, Method, Apparatus, Constraints, Notes...

Data•Digital Data & Physical Stuff

high value, reusable,

compliance

rapidly changing,

intermediate products

published,outcomes,evidence

lodgesomewhere

Your “Archive”(stuff you should keep)

Your “Archive”(stuff you should keep)

Your “Storage”Your “Storage”

Records•Ethics, Permissions, Reports, Funds, Correspondence...

A simple Data Man. System Find secure places to keep physical & digital Records + Data (filing cabinet, department

shared drive) – backups essential Where and when should there be checks on your data (sanity checks, quality control,

standards) File your data and records into logical divisions, say activities, projects, or pieces of work

eg. /DeptShare/johnsmith/Records/ProteinABC Investigation Don’t break things down too much, makes things harder to find!

Have a consistent file naming convention: perhaps: ActivitiyOrContents-LocationOrPerson-CreateDate-Id-Description.ext eg. “ProteinABC-LJW-20100409-0001 Raw data from instrument.dat” Don’t rely on filesystem date-stamps or folder names.

Keep good metadata (notes, records) on how you captured your data, particularly for physical records

Descriptions of collections or files – Structured text files good enough eg. FileOrCollectionName-metadata.txt

On other things, entities that are not files – Structured text files or spreadsheets Have a good labeling/ID/coding system Perhaps keep a registry (spreadsheet will do; IDs, names, location, basic metadata)

Find the right balance in digitising physical stuff (easy and quick) Digital is easy to keep/transfer/search if stored properly. However, digitising/scanning everything

can be time consuming and without good descriptions may not be useful. Link digital notes/metadata to physical stuff (IDs, names, labels, codes, location) Have some basic digital representations or notes of important physical stuff

1719/04/23

A simple Data Man. System Find secure places to keep physical & digital Records + Data (filing cabinet, department

shared drive) Where and when should there be checks on your data (sanity checks, quality control,

standards) File your data and records into logical divisions, say activities, projects, or pieces of work

eg. /DeptShare/johnsmith/Records/ProteinABC Investigation Don’t break things down too much, make things harder to find!

Have a consistent file naming convention: perhaps: ActivitiyOrContents-LocationOrPerson-CreateDate-Id-Description.ext eg. “ProteinABC-LJW-20100409-0001 Raw data from instrument.dat” Don’t rely on filesystem date-stamps or folder names.

Keep good metadata (notes, records) on how you captured your data, particularly for physical records

Descriptions of collections or files – Structured text files good enough eg. FileOrCollectionName-metadata.txt

On other things, entities that are not files – Structured text files or spreadsheets Have a good labeling/ID/coding system Perhaps keep a registry (spreadsheet will do; IDs, names, location, basic metadata)

Find the right balance in digitising physical stuff (easy and quick) Digital is easy to keep/transfer/search if stored properly. However, digitising/scanning everything

can be time consuming and without text may not be useful. Link digital notes/metadata to physical stuff (IDs, names, labels, codes, location) Have some basic digital representations or notes of important physical stuff

1819/04/23

Data Management Tools Basic building blocks for your “System”…

Storage Physical space (for physical assets and hardware) Hard Disks Digital Tapes CD’s, DVD’s Safe and Reliable?

Lockable rooms or cabinets? Secure computers? Redundant Disks? (eg. RAID, if one fails does the system still works) Onsite and Offsite backups? (all of these fail often)

File Systems, Operating Systems (where you put services/applications and can access Directories & Files from) Metadata (file description) features – don’t rely on these Some Search features – Windows Indexing/Search, Mac “Spotlight”, Linux “Beagle”, Google Desktop

Versioning Systems (possibly useful) Manual / Automatic store each version of a changing file/collection/database eg. CVS, SVN, some document repositories If not – regular copies, archival backups, database dumps, audit trails and access logs

2019/04/23

2119/04/23

Data Management Tools Databases (DB):

Relational DB Management Systems (SQL or RDBMS) XML Databases RDF Triple Stores many others, google for “NoSQL Databases”

Some come with customisable interfaces… Entry forms Search forms Lists, Views, Reports

Examples: Microsoft Access, FileMaker Pro, Oracle, IBM DB2 (commercial) PostgreSQL, MySQL (with OpenOffice “Base” interface) eXist, Protege (non-SQL examples, less mature)

Data Management Tools Where to get Storage?

Check with your research group/project. Check with your department. Otherwise:

Desktop hardware is OK, but must secure it, backup UniMelb, ITS Research Services (local!)

http://www.its.unimelb.edu.au/ ARCS.org.au Data Fabric (national, 25GB free)

http://www.arcs.org.au/ (Consider it beta, give it a try. Backed up?)

Where to get Database hosting? Again, check with your research group/project, department. ARCS.org.au Data Services (national, 5GB free?)

2219/04/23

2319/04/23

Data Management Tools Repositories (advanced tools)

Content Repositories documents (word, pdf, html) media (images, movies)

Data Repositories Collections, Datasets with Metadata

Should come with… Internet Access or Web Interfaces (user friendly) Multiple metadata types support, customisable Complex collection and record structure Workflow configuration (who does what when) Authentication and Access control

Examples: Mediaflux (commercial) dSpace, Fedora (Fez), EPrints, Magnolia, MDID,

SRB (Storage Resource Broker)

Can take a lot of time setting up, managing... I recommend not running one yourself, if necessary seek help!

Data Management Tools

More advanced products… Laboratory Information Management Systems (LIMS) Laboratory People Management Systems Electronic Laboratory Notebook Systems (ELN) & many research specific systems

If you don’t have one don’t worry… A well documented plan and system Clear procedures for data collection, QC, storage Good records, file names, identifies/codes

2419/04/23

2519/04/23

Data Management Tools Federations and Registries (advanced tools)

Tying together multiple databases/repositories/filesystems eg. Distributed resources in a collaboration Research Networks (ASSDA, BioGrid, GRHANITE, APN, BlueNet, PARADISEC)

Should facilitate… Access to data located anywhere via identifier & services Search across multiple locations Authentication and Access control

Some tools already do this… DataVerse SRB (Storage Resource Broker) AFS (Andrew File System) Grid Middleware OAI-PMH Repositories Databases? IBM DB2 Information Integrator

Data Management Tools Collaborative Tools

Communication, Knowledge Bases, Online Workspaces can reference and sometimes store data!!!

Can also be useful places to document data, maybe even your System. Choose your favourite collaborative site. eg. Wikis, Blogs etc

Where to get access to collab tools: Check with your research group, dept. Sakai@Melbourne

https://sakai.unimelb.edu.au/ ARCS.org.au will offer some Lots of free stuff online...

Google, Ning, Blogger... but check terms, conditions, policies

2619/04/23

Free Tools (I use)… Zotero (reference material) (EndNote is Uni

default) EVO & Skype (video/tele communication)

http://evo.arcs.org.au/ Sakai@Melbourne (project workspace)

https://sakai.unimelb.edu.au/ Google docs (collaborative editing) Google groups (email list) Google Desktop (file and email search) jEdit – text file editor (private notes) local disk + Cobian Backup (private project records)

research data storage, a tricky one… use local stuff in preference, ask around DropBox? online 2 GB for private/sharing

too many others to list, heaps on the web… See Digital Research Tools (DiRT) wiki for a huge list

http://digitalresearchtools.pbworks.com/ Check with your supervisor,

2719/04/23

see Info Skills classeson EndNote,

UpSkills 22 Sept on VC

Free Tools (I use)… Zotero (reference material) (EndNote is Uni

default) EVO & Skype (video/tele communication)

http://evo.arcs.org.au/ Sakai@Melbourne (project workspace)

https://sakai.unimelb.edu.au/ Google docs (collaborative editing) Google groups (email list) Google Desktop (file and email search) jEdit – text file editor (private notes) local disk + Cobian Backup (private project records) research data storage, a tricky one…

use local stuff in preference, ask around DropBox? online 2 GB for private/sharing

too many others to list, heaps on the web… See Digital Research Tools (DiRT) wiki for a huge list

http://digitalresearchtools.pbworks.com/ Check with your supervisor,

2819/04/23

2919/04/23

Break into groups!!!

1. Grab a handout, work through it

2. Pick a research project (yours or imaginary)

3. Think about what you have (collected, created, physical, digital)and what needs to be easily identified/found.(what are your key records)

4. And the rest...

5. If time, 1 person from each group to present it!

Planning – Your Project Advice on your System ...

Best way to get started –> start writing it down, discussing with supervisor where important questions (a “Data Management Plan”, DMP)

Identify key data in your context, stuff to keep, important records (your Data Model) Tracking backwards from conclusions, to data, to origins?

Who is responsible for each? Who can have access?What obligations do you have?

Register with the department Keep for 5+ years after publication Funding agencies?

Describe the lifecycle of data elements How will I describe/register the data? Checking, Validation? Storing it where, backups? How do you use, transform, update? What about storing analysis and results? Long term Retention or Transfer to where? Destruction?

http://www.esrc.unimelb.edu.au/dmp 3019/04/23

Create,Capture,Describe

(Use, Transform, Update)

Store, Secure, Preserve

Keep,Transfer,Destroy

A few more things…

More on being digitally wise!!!(esp. data security)

More on your obligations / responsibilities / research conduct

3119/04/23

3219/04/23

Data Security 2 aspects to security

Safety from damage or loss How important is the data to you?

Safety from incorrect use What are the possible consequences?

Safety from damage (unintended and intentional)… What’s acceptable loss (safety can cost, use up time) Backups (data, software, system)

How often (hourly, daily, weekly, monthly, manually, automated)? How many and where (onsite, offsite, both, multiple)? Departmental storage? Probably backed up already!

Disaster Recovery Quality hardware, multiple/spare servers, spare disk drives, Operating System and Applications image backups

(talk with someone technical, your local IT guys)

Data Security Safety from incorrect use (unintended and malicious)…

PCI DSS - a recommendation (Payment Card Industry Data Security Standard) eg. http://www.nacubo.org/x9813.xml 12 requirements that are good practice (first 10 are the basics)

10 IT basics… Firewall servers Do not use default usernames/password Physically protected stored data (lock up servers, disk, tape, source material) Use encrypted transmission over internet (VPN, SSL, SSH, GridFTP, S/MIME email) Update antivirus/antimalware software regularly Use secure and trusted applications Restrict access to sensitive data (tighter control, or put it somewhere else) Assign unique IDs for each user Record and monitor all access to data

Plus some good practice… Don’t retain sensitive data Or encrypt sensitive information

3319/04/23

Planning - Obligations University of Melbourne Research Policy

methods and results open to scrutiny research data and records are accurate data and records sufficient for verification

(authenticity and validity of conclusions) data retained in a durable and appropriately referenced form

for at least 5 years from any publication minimum of 15 years for clinical trials minimum of 7 years for adult psychological files

(for minors 7 years after reaching 18) or longer if external/funding/regulatory/archival requirements

research units & departments have formallydocumented procedures for retention

University “Policy on Management of Research Data and Records”, sections 6 & 13 Department should have a data register. (a central register is being built)

Who is responsible. Where data is kept. When due for destruction/review. Who’s responsible for changing the data register, updating, relocating data? You?

3419/04/23

Planning - Obligations You are a part of Australia, Uni, Department...

Be aware of your responsibilities as researchers Different for each Discipline and Context Do some reading, then some talking with your colleagues

A reading list: See

http://www.eresearch.unimelb.edu.au/activities/research_data_management_for_researchers – References – “Basic reading for new research students...”

(short URL is http://bit.ly/cEzSOs )

3519/04/23

◄ ESSENTIAL!

Planning - Obligations The policy addresses WASTED TIME / EFFORT / CAREER…

You need to properly… Collect research data Manage research data Archive research data …or risk your data, wasting years of effort.

US study of hundreds of charges of “research misconduct” 40% could have been avoided by better data management!

UniMelb had ~20 cases of research misconduct in 2008. Most involved students. Most were about getting proper attribution. All needed good records to prove their case!

“Student submits her PhD thesis for examination then leaves country taking the data with her. An examiner questions the integrity of the research data. A reanalysis of the data and original questionnaire is required.”

“Participant in a research project lodges a claim for compensation, alleging that he was not adequately informed about the effects of the study, does not recall giving consent. Where are the records?“

3619/04/23

Planning - Obligations The reading list covers:

Policy What policy applies to me? University Policy, Group/Department Policies, National Policy

Ethics, Consent & Legislation Do I need ethics approval? Are there legal issues sounding my research?

Intellectual Property Am I using material created/owned by someone else? Commercial or under contract? Want do I want done with creations or publications? Restricted. Open access. Attribution.

Funding Requirements Your group and department will have active research grants, existing data. Australian Code for the Responsible Conduct of Research

http://www.nhmrc.gov.au/publications/synopses/r39syn.htm ARC, NHMRC, ...

People who can help: Supervisor and Colleagues; Dept. Research Manager; Melbourne Research Office

3719/04/23

Recap – Principles: Document your Data Management Plan/System Be aware of your discipline practice and policy and obligations Understand what data to keep Keep good records and metadata

It’s not about having a complicated IT system – it’s about being consistent.

In 1 week or 5 years can I find what I’m looking for? Can I track my results and conclusions back to the source?

back through the analysis, data, tools/software/instruments, samples, and experimental conditions

Research Integrity!!!

3819/04/23

3919/04/23

Any Questions?

Go to the following site for guidelines, essential links: http://www.eresearch.unimelb.edu.au/activities/

research_data_management_for_researchers http://bit.ly/cEzSOs (shorter URL)

Copyright (c) 2010, VeRSI Consortium, Lyle Winton This work is licensed under a Creative Commons Attribution 2.5 Australia License. To view a copy of this license visit:http://creativecommons.org/licenses/by/2.5/au/