Research Data Management and Librarians

  • View

  • Download

Embed Size (px)


This presentation was delivered at the Elsevier Library Connect Seminar on 6 October 2014 in Johannesburg, 7 October 2014 in Durban and 9 October 2014 in Cape Town and gives an overview of the potential role that librarians can play in research data management

Text of Research Data Management and Librarians

  • 1. Research Data Managementand LibrariansPresentation at Elsevier Library Connect Seminar,6 October 2014, Johannesburg, 7 October 2014,Durban and 9 October 2014, Cape TownBy Johann van Wyk (University of Pretoria)

2. IntroductionInternationally research data is increasingly recognised as a vitalresource whose value needs to be preserved for future research.This places a huge responsibility on Higher Education Institutions toensure that their research data is managed in such a manner thatthey are protected from substantial reputational, financial and legalrisks in the future. Librarians have a unique skillset to help theseinstitutions navigate this complex environment. This presentationwill highlight a number of potential roles librarians could play. 3. Research Data Management: A (Brave) Complex New WorldMessy ComplexSmall DataVarious formatsVarious devicesVarious VersionsSensitive Data 4. What is meant by Research Data?Research data, unlike other types of information, iscollected, observed, created or generated, for purposesof analysis to produce original research results 5. What is research data management? the process of controlling the information generated during aresearch project Managing data is an integral part of the research process.How data is managed depends on the types of data involved,how data is collected and stored, and how it is used -throughout the research lifecycle. 6. Why Manage Research Data?By managing research data you will: Meet funding body grant requirements, e.g. NSF, NIH; Meet publisher requirements Ensure research integrity and replication; Ensure research data and records are accurate, complete, authentic and reliable; Increase your research efficiency; Save time and resources in the long run; Enhance data security and minimise the risk of data loss; Prevent duplication of effort by enabling others to use your data; Comply with practices conducted in industry and commerce; and Protect your institution from reputational, financial and legal risk. 7. Designing Data Management PlansCreatingDataA Data Management Plan is a formal document that outlines what you will dowith your data during and after you complete your research (The University of VirginiaLibrary, 2014).Data Management Planning Tools: Data Management Planning Tool (DMPTool) of California Curation Center of the California Digital Library) DMPonline tool (Digital Curation Centre, UK)Librarians can play an advisory role 8. Data Capture/CollectionCreatingDataThe action or process of gathering and measuring information on variables ofinterest, in an established systematic fashion that enables one to answer statedresearch questions, test hypotheses, and evaluate outcomes(Responsible conduct of research, n.d.; The Oxford Dictionary, 2014).Examples of data collection methods:Observations, textual or visual analysis, interviews, focus group interviews, surveys, tracking,experiments, case studies, literature reviews, questionnaires, data from sensors, model outputs,scenarios, etc.Librarians can play their traditional role of information searching, - trainingand - consultation 9. Data Storage and BackupCreatingDataProcessingDataAnalysingDataData storage is the process of preservation of data files in a securelocation which can be accessed readily (Research Data Services,University of Wisconsin-Madison, 2014)Data Backup is the process of preserving additional copies of your datain a separate physical location from data files in storage.Librarians can advise researchers on File Naming Conventions 10. Metadata CreationCreatingDataProcessingDataAnalysingDataPreservingData Metadata is searchable, standardised and structured information that describes adataset and explains the aim, origin, time references, geographic location, creatingauthor, access conditions and terms of use of a data set(Corti et al., 2014: 38; USGS Data Management Website, 2014) Examples:- Dublin Core Metadata Element Set;- ISO 19115: 2003(E) Geographic Information Metadata;- PREMISLibrarians, especially cataloguers have the skill-set to assist with metadatacreation and to advise 11. Data Cleansing, Verification &ValidationProcessingDataAnalysingData Data Cleansingrefers to identifying incomplete, incorrect, inaccurate, irrelevant, etc. parts of the data andthen replacing, modifying, or deleting this dirty data (Wikipedia) Data Verificationthe process of evaluating the completeness, correctness, and compliance of a dataset withrequired procedures to ensure that the data is what it purports to be. This can be done by personswho are less familiar with the data, for example Librarians.(Martin and Ballard, 2010: 8-9; US EPA, 2002:7) Data validationprocess to determine if data quality goals have been achieved and the reasons for anydeviations. Validation checks that the data makes sense.(Martin and Ballard, 2010: 8; US EPA 2002:15). 12. Data anonymisationProcessingDataAnalysingData anonymisation is the process of de-identifying sensitive data, whilepreserving its format and data type (Raghunathan, 2013: 4).Anonymisation Techniques - Examples: Generalisation, Suppression, Permutation,Pertubation, Substitution, Shuffling, Number and Date Variance, Nulling-out (Charles,2012; Cormode and Srivastava, 2009; Raghunathan ,2013: 172-182; Simpson, n.d.; Vinogradov andPastsyak,2012: 163).Data 13. Data Interpretation & AnalysisAnalysingDataData interpretation and analysis is the process of assigning meaning tothe gathered information and ascertaining the conclusions,significance, and implications of the findings (Analyzing and Interpreting Data,n.d.). 14. Data PublishingAnalysingDataData publishingThis is the process of making research data underpinning the findings published in peer-reviewedarticles, available for readers and reviewers in an appropriate repository, or assupplementary materials to a journal publication (Corti et al 2014: 197; Marques, 2013)Data JournalsA more recent development has been the appearance of data journals. These journals publishdata papers that describe a dataset, and also give an indication in which repository thedataset is available (Corti et al. 2014: 7-8).Librarians can be involved in creating and managing a data repository, and can give trainingand advise 15. Examples of Data Repository Software 16. Registry of Research Data Repositories is a global registry of research data repositories that coversresearch data repositories from different academic disciplines. It presents repositories for the permanent storage and access of data sets toresearchers, funding bodies, publishers and scholarly institutions. It can be used a tool for the easy identification of appropriate datarepositories to store research data. 17. Data Journals A list of Data Journals available at Example of data journal at Elsevier: Data in Brief 18. Data VisualisationAnalysingDataData Visualisation is the visual representation of data, and is used to enablepeople to both understand and communicate information through graphical andschematic avenues (Friendly, 2009: 2; Schnell and Shetterley, 2013: 3)From Xiaoru Yuans presentation at CODATA Workshop on 12 June 2014 19. Data ArchivingPreservingDataData archiving can be described as the process of retention andstorage of valuable data (this is data that will be essential for futurereference) for long-term preservation, so that the data will beprotected from risk (i.e. loss, or corruption), and will be accessible forfuture use (Rouse, 2010). 20. Data PreservationPreservingDataData preservation is the process of providing enough representationinformation, context, metadata, fixity, etc. to the data so that anyone otherthan the original data creator can use and interpret the data (Ruth Duerr,National Snow and Ice Data Center as cited by Choudhury, 2014)The Librarian can assist researchers in preparing data for long-termpreservation, by advising on metadata standards 21. Linking Data to research outputsPreservingDataThis is the process of connecting the underlying data relating to a specificresearch output, e.g. journal article, thesis, etc to the research output itself.This can be done by adding a digital object identifier (DOI) to the datasetand including this in the metadata of the research output, or by citing thedataset (Callaghan et al., 2013).The Librarian can assist researchers, through training and consultationon DOIs and data citation methods 22. Data SharingGivingAccess to Sharing data is the process of opening up access to research data andmaking it available to other researchers (Corti et al., 2014: 2). Data sharing provides opportunities for other researchers to review,confirm or challenge research findings (Data sharing and implementation guide,n.d.).Data 23. Data sharing MethodsThe method for sharing data will depend on a variety of factors,including size and complexity of the dataset, sensitivity of the datacollected, and anticipated number of requests for data sharing.Researchers could(1) Take responsibility for sharing data themselves, or(2) Use a data archive, or(3) Use a combination of these methods. 24. Data repurposing/reuseRe-usingData This is the process where secondary data (data that have been captured andanalysed by other researchers) can be re-analysed, reworked or -used for newanalyses, and compared with contemporary data (Corti et al., 2014: 169) This process also enables research where the required data may be expensive,difficult or impossible to collect, e.g. large scale surveys, or historic data (Corti etal., 2014: 169). 25. Data CitationRe-usingDataData citation is the process of referencing (attributing and acknowledging)reused data in a similar fashion as traditional sources o