40
Digital Preservation and Stewardship Committee Research Data Working Group Interim Report and Proposal for the Atlantic Research Data Repository (ARDR) Michael Beazley Suzanne van den Hoogen Karen Keiller Mark Leggott Mike Nason Maggie Neilson Kathryn Reddy May 12, 2015

Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

Digital Preservation and Stewardship Committee

Research Data Working Group

Interim Report and

Proposal for the

Atlantic Research Data Repository (ARDR)

   

Michael Beazley Suzanne van den Hoogen

Karen Keiller Mark Leggott Mike Nason 

Maggie Neilson Kathryn Reddy 

 May 12, 2015 

 

Page 2: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

Report Contents

EXECUTIVE SUMMARY

BACKGROUND History of the DPSC Overview of National/Regional/Institutional Cyberinfrastructure

SURVEY Introduction Results

Collection Services User Services Access Services Preservation Services

Gaps and Requirements Summary and Discussion

RECOMMENDATIONS AND NEXT STEPS Introduction Survey Recommendations Research Data Management Infrastructure

Research Data Management Planning Tool Regional Research Data Storage Service CAUL/CBUA RDM Team Governance and Administration Sustainability

APPENDICES Appendix A: Acronyms and Glossary Appendix B: Bibliography Appendix C: CAUL/CBUA Research Data Management Survey Instrument Appendix D: CAUL/CBUA Survey Infographic

2 | Page

Page 3: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

EXECUTIVE SUMMARY  The interim report of the CAUL/CBUA Research Data Working Group reflects the discussions of the Working Group for the last 18 months, which centered around the emerging interest in supporting research data management. A key part of this effort was the completion of a research data management survey of CAUL/CBUA members, providing a baseline for the discussion of possible next steps for the consortium and its members. This was augmented with updates on efforts of national initiatives and reports of activities from member institutions. These discussions led to a number of recommendations, which are summarized below and explored in more detail later in the document.

1. Research Data Management Survey a. That the survey results be posted in full on the CAUL/CBUA website. b. That the survey be completed on an annual or biennial basis. c. The survey questions be evaluated and updated as necessary for each survey

cycle. d. Employ a survey tool (such as Survey Monkey) which can provide a long term

framework for maintaining the survey. e. Offer a response category labeled “in progress” to allow institutions to indicate

services that are in development, but not yet fully realized. f. Provide more opportunities for comments within the survey, to allow respondents

to clarify or elaborate on answers to specific questions. 2. Research Data Management Planning Tool

a. CAUL/CBUA endorse a RDMP tool, with an eye to the RDMP tool currently under development at the University of Alberta, CARL and Compute Canada.

b. Locally deployed tools be able to export plans to the national repository, contributing to a national inventory of RDMPs.

3. Regional Research Data Storage Service a. CAUL/CBUA support the creation of the ARDR service, which would function on an

“all-in” cost-sharing model and would provide data storage and services to all member institutions.

b. The ARDR system provide a two-tiered service model, offering a basic service package as well as a value-added service package.

c. The hardware infrastructure (i.e. servers, storage drives, etc.) be hosted in four locations: UPEI, UNB, Dalhousie, and MUN.

3 | Page

 

Page 4: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

4. CAUL/CBUA RDM Team a. Establish a CAUL/CBUA Research Data Management Team. b. Establish a support network akin to the current Data Liberation Initiative (DLI)

model. 5. ARDR Governance and Administration

a. Adopt a model policy that CAUL/CBUA would endorse for the oversight of ARDR, and individual institutions would use as the basis for local policies.

b. CAUL/CBUA institutions adopt a minimal RDM preservation commitment that says: “The University will steward the data for [Project X] for as long as is needed.”

c. ARDR services and policies be crafted with an awareness of national RDM initiatives.

6. ARDR Sustainability a. Propose an initial 3-year financial investment from all CAUL/CBUA members to

get the project started. b. CAUL/CBUA look for grant opportunities (ACOA, TC3+, etc.) to assist with the

start-up costs.

4 | Page

 

Page 5: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

BACKGROUND History of the DPSC

Committee Formation The CAUL/CBUA Digital Preservation and Stewardship Committee was first proposed at a Board of Directors meeting in October 2009. The terms of reference for the Committee were developed in early 2010 through meetings with Mark Leggott, Donna Bourne-Tyson, Lynne Murphy, Tanja Harrison, and Bruno Gnassi. The initial membership of the DPSC included: Donna Bourne-Tyson (Chair), Tanja Harrison, Karen Keiller, Gillian Byrne, Mark Leggott, Dawn Hooper, Alain Roberge, and Slavko Manojlovich.

Priorities In May 2012, the DPSC recommended adopting a working-group approach to focus on specific activities, as forwarded by the Directors. At that time the following priorities were approved:

1. Develop and adopt principles, guidelines and an infrastructure capable of sustainingdigital/data preservation and stewardship.

2. Develop initiatives to test and advance these principles and guidelines and build oninfrastructure and capacities that may already exist as regards to digital/datapreservation and stewardship.

3. Organize training and promote cooperation amongst CAUL/CBUA members to advancedigital/data preservation and stewardship.

4. Foster a culture within CAUL/CBUA committed to the sound management andadvancement of digital/data preservation and stewardship.

5. Liaise with other regional, national, and international bodies on behalf of CAUL/CBUA.

First Digital Preservation Survey In November 2012, the DPSC compiled a list of priorities. One of the initial priorities was to distribute a survey “to assess the state of CAUL/CBUA members' current digital preservation (DP) activities” and further facilitate the work of the DPSC. The results of this survey were presented by Marc Truitt (DPSC Chair) at the May 2013 Board of Directors meeting. Some of the core findings included the following:

● For some members, responsibility for digital preservation exists outside of the library'score area of responsibility. Examples include the following: academic computing units (orIT departments), archives, and records management.

● Few institutions have written digital preservation policies and procedures.● Common among institutions appear to be audiovisual files, PDFs, digital image files,

word-processing files, licensed e-journal files, and institutional records.● Dealing with obsolete external media is a common experience among members, especially

with regards to reading files from obsolete media such as 5.25 disks. It was noted that

5 | Page

Page 6: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

smaller institutions might benefit from the sharing of expertise and knowledge of other libraries who have experience with file migration and playback. This is an opportunity for CAUL/CBUA to play a role in facilitating the sharing of information between institutions on how playback devices might be acquired, set-up, and used appropriately.

● Online and disk-based (external media) appear to be common backup solutions employedby reporting institutions.

● At least one institution is interested in achieving the status of Trusted Digital Repository;either by its own efforts or through a CAUL/CBUA-sponsored initiative.

● Dalhousie, Memorial, UNB, and UPEI have established institutional programs for digitalpreservation.

● Given the diverse states of CAUL/CBUA members' individual digital preservationactivities, there is a place for CAUL/CBUA to advance preservation activities regionally.

Establishment of Working Groups In late 2013, Mark Leggott assumed the role of Chair for the DPSC. Membership included Lou Duggan, Mark Leggott, Erik Moore, Karen Keillor, Nicole Dixon, Creighton Barrett, David Mawhinney, and Roger Gillis. Discussions led to the development of five working groups, each with a lead from the DPSC, whose responsibility was to advance the goals of the group, develop group membership, and encourage a culture of learning through working group discussions.

The working groups and leads began, as follows: 1. Digitization and Preservation Policies: Nicole Dixon (CBU)2. Processing of Obsolete Media: David Mawhinney (MtA)3. Research Data Working Group (Formerly Regional Cloud Storage Pilot): Mark Leggott

(UPEI)4. Regional TDR/TRAC Framework: Creighton Barrett (DAL) and Erik Moore (UNB)5. Digitization/Preservation of Government Documents: Roger Gillis (MSVU)

The Digitization and Preservation Policies Working Group has since been integrated with the Regional TDR/TRAC Framework Working Group. The Processing of Obsolete Media Working Group released a report in 2013 and has since been disbanded. In early 2015 the issue of the preservation of Government Documents was also assumed by the CAUL/CBUA Collections Committee, so areas of overlap will be considered as this group advances. As a result, there are currently two active working groups in the DPSC.

Research Data Working Group In February 2014, the Research Data Working Group began meeting on a regular basis under the following terms:

1. Document existing services and resources at CAUL/CBUA member institutions directedtowards the stewardship of research data, and consider the development of a common

6 | Page

Page 7: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

instrument for determining faculty and graduate student interests around research data stewardship.

2. Review options and make recommendations for providing research data stewardshipservices for members, including expertise, storage, processing, training, and including therole of Liaison Librarians in this effort.

3. Ensure that recommendations facilitate institutional, regional, national, and internationalefforts to steward research data and that they intersect with existing and emergingmandates at all levels.

4. Provide information regarding opportunities for funding, including possible partnershipswith CARL, CRKN, RDC (Research Data Canada) and other national and internationalinitiatives.

This Interim Report is the outcome of these meetings and includes the working group’s recommendations on responding to the need for research data management.

7 | Page

Page 8: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

BACKGROUND Overview of National/Regional/Institutional Cyberinfrastructure Note: This is a brief summary of the current state of Canadian research data stewardship.

The Canadian landscape for research data stewardship is in its infancy in comparison with other countries. In many respects, the U.S., UK, and Australia define the current state-of-the-art in research data management. In Canada, individual institutions are starting to develop policy-driven approaches to research data management, such as Simon Fraser University, the University of British Columbia, the University of Alberta, and the University of PEI. These efforts are an attempt to lay the groundwork for institutional services that represent best-practices, in anticipation of the introduction of a research data management mandate for researchers receiving Tri-Council funding. This effort has been highlighted by Canada's Action Plan on Open

Government 2014-16 , which calls for the development and adoption of policies, guidelines and 1

tools to support effective stewardship of scientific data.

In addition to these efforts there are two substantial national initiatives underway: 1. Leadership Council for National Infrastructure, Pilot Project

● A multi-jurisdictional effort with primary oversight by Research Data Canada (nowfunded by Canarie) and participation from CUCCIO, CARL, Compute Canada and anumber of nationally-funded research projects, including CBRAIN, CADC, and IPY.

● The Pilot Project group is working to develop an approach to building nationalcyberinfrastructure for the stewardship of research outputs.

2. CARL Portage Project● A CARL-initiated project designed to identify current institutional best-practices and a

pilot framework in which interested parties can determine service and resourceoptions for a national approach. This effort was originally referred to as the CARL ARCproject, based on the first meeting location in Ottawa. As the discussion proceeded theproject assumed the name Portage.

Early in 2015, the CARL ARC project team submitted a document to CARL Directors, including a set of recommendations for a national RDM support service and a business model for sustaining the initiative. The recommendations included in this document reflect the Working Group’s assumptions that the CARL Portage project will proceed as outlined in this proposal, and that a regional CAUL/CBUA approach to RDM needs to consider the broader national efforts and ways to best integrate with those efforts. [Note: A copy of the CARL-approved Portage proposal will be

1 Government of Canada. (2014). Canada’s Action Plan on Open Government 2014-16. http://open.canada.ca/en/content/canadas-action-plan-open-government-2014-16

8 | Page

Page 9: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

either included in this report or distributed to CAUL/CBUA Director’s when it is available.]

Excerpt from the Portage Brief : 2

The aim of the Portage network is to pool and expand existing expertise, services and infrastructure so that all academic researchers in Canada will have access to the support they need for research data management. The Portage network will have two major components:

1. A library-based distributed centre of expertise for research data management; and2. A national preservation and discovery system for research data that will evolve and expand

over time.

Distributed Centre of Expertise RDM requires specialized knowledge and expertise, which many researchers do not have. The Portage centre of expertise will provide access to a comprehensive set of resources that point users to the most up-to-date, relevant and trusted sources about RDM. In addition, Portage will host a national web-based tool, to launch in early 2015 that will assist Canadian researchers in developing data management plans. Portage will also act as a forum for sharing expertise across the country in order to build institutional capacity. Areas of expertise will include: privacy, security, and confidentiality; skills and training; data management plans, data discovery, data curation and preservation.

National Preservation and Discovery System Advice and support for researchers must be accompanied by viable technical solutions. To that end, Portage has also been working on a project to connect the various infrastructure and service components needed for a national preservation and discovery network. The project is being undertaken in close collaboration with Compute Canada, Research Data Canada, and some of the domain data centres to ensure that it will be both inclusive and interoperable.

The project will soon begin to ingest data into two sites that will provide long term preservation services. Once any problems have been addressed and workflows have been stabilized, the network will expand to include other repositories. The ultimate aim is to enable all interested universities to participate, whether or not they have their own local infrastructure, by coordinating shared repositories and services under a cost model that recognizes varying institutional investments and needs.

2CARL. (2014, December 22). Portage: Supporting Canadian innovation through shared expertise and stewardship of research data. Retrieved from http://www.carl-abrc.ca/uploads/SCC/Portage-External-2-Dec-22-2014.pdf

9 | Page

Page 10: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

SURVEY Introduction This section addresses the first priority for the RDWG: indication of gaps and requirements in research data stewardship within the CAUL/CBUA member institutions. In March 2014, DPSC RDWG members determined a survey would be the best tool for collecting reliable data and staff input from across the CAUL/CBUA community. A survey distributed by CARL was adapted with permission and distributed to Library Directors on April 13, 2014. CAUL/CBUA Directors were also asked to forward the survey to the appropriate staff within their institutions. Libraries who had previously responded to the CARL survey were encouraged to forward their results directly to the DPSC Chair, Mark Leggott. The survey focused on four aspects of research data services, with ten questions in each section:

● Collections Services ● User Services ● Access Services ● Preservation Services

Individual institutions were given four weeks to respond. Additionally, a reminder was sent to encourage participation in the survey. The survey met our goal of obtaining regional representation, with 16 out of 17 institutions responding. Survey administration and data collection was compiled by the DPSC RDWG. Following a meeting on June 3, 2014 at the annual Atlantic Provinces Library Association (APLA) Conference, a need to confirm survey results with respondents was discovered. Respondents were contacted for clarification where needed. Following the February 2015 Directors meeting, members were given additional time to update their survey response. The survey proved to be an effective tool which revealed the discrepancies within the CAUL/CBUA environment related to research data stewardship and preservation.

10 | Page

 

Page 11: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

SURVEY Results Survey results indicate that there are significant gaps in research data stewardship across the consortia. 3

Collection Services

Collection services were described in the survey as “activities that specifically support the development, acquisition, management, description, and discovery of a collection of research data files”. Some key observations from the data are as follows:

● 68.8% of CAUL/CBUA respondents have a subscription with data providers such as DLI, DMTI, or ICPSR.

● 25% of respondents (DAL, MUN, SMU, and UNBSJ) supplement their data services with documentation such as user’s guides, data dictionaries, and variable lists.

● 25% of institutions (DAL, MUN, SMU, and UPEI) have metadata librarians or other specialists who advise on standards for content and technical metadata.

● DAL, MUN and UPEI are leaders in collection services and data. They are the only institutions polled who:

○ have a dedicated budget to purchase data files outside of subscription services; ○ maintain a collection of data files from local researchers; ○ maintain the infrastructure to manage local data file collection; and ○ produce standards-based metadata for research data.

● DAL is the only institution that is a member of a standards body for research data or metadata.

● DAL and UPEI are the only institutions with a written Collection Policy for research data.

User Services User services were described in the survey as “activities that focus on supporting user communities by identifying their data needs, assisting them in preparing data management plans, selecting metadata standards and best practices, identifying existing data sources, and retrieving, manipulating, and transforming data”. Some key observations from the data are as follows:

● 68.8% of CAUL/CBUA respondents provide data reference services to help users find and select research data.

● 62.5% of CAUL/CBUA respondents advise and/or provide instruction on how to cite data sources.

3CAUL/CBUA members requesting access to the detailed survey results should contact [email protected].

11 | Page

 

Page 12: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

● 43.8% of CAUL/CBUA respondents promote a culture of data sharing and data reuse at their institutions through handouts, teaching, or participating in GIS Day or Open Access Week.

● 37.5% of CAUL/CBUA respondents offer services to reformat data for users to facilitate their use of data (e.g. converting files from SPSS to Microsoft Excel)

● 37.5% of CAUL/CBUA respondents offer services to transform data files for users (e.g. extracting data subsets, merging data files, or creating new variables)

● Two institutions, DAL and StFX, maintain a website that lists online research data management resources.

● DAL and UPEI maintain data curation profiles of their user communities. ● DAL and UPEI offer the following:

○ provide research data management training for faculty and/or graduate students; ○ recommend or provide instruction on the use of online tools for research data

management (e.g. Manta, DMPTool, etc.); and ○ assist researchers with preparing Data Management Plans.

Access Services Access services were described in the survey as “activities dealing with support needed to provide users with access to data collections and resources, including data platforms, data linkage, data retrieval, and data tools”. Some key observations are as follows:

● Over half (56.25%) of CAUL/CBUA respondents provide access to metadata discovery tools beyond their OPAC (e.g. Nesstar, DataVerse, or MarkLogic servers).

● 43.75% provide access to online data access tools such as an FTP or a DataVerse server. ● 37.50% provide access to software for analyzing and visualizing research data. ● 37.50% of CAUL/CBUA respondents support a local website that describes data and

contains links for downloading data. ● 31.25% provide access to online subsetting tools (e.g. Nesstar or SDA server). ● 31.25% of CAUL/CBUA respondents support a secure data enclave to provide research

access to sensitive data. ● UPEI and MUN provide access and/or support to data cleaning, processing, or format

translation tools (e.g. DataWrangler, Stat Transfer, or Google Refine). ● DAL is the only institution that links out to DataCite Canada. ● DAL is also the only institution that subscribes to the Data Citation Index through the Web

of Knowledge platform. ● There are no CAUL/CBUA institutions currently connecting local research data files with

their OPAC.

12 | Page

 

Page 13: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

Preservation Services

Preservation services were described in the survey as “activities describing services to support the mid-term and long-term preservation of research data”. Key observations for preservation services are as follows:

● Four institutions (DAL, King’s College, MSVU, and UPEI) offer advice and help for researchers to locate an appropriate repository for their research data.

● DAL, UPEI, and MSVU assist researchers with the selection of appropriate data and metadata standards for data preservation.

● UPEI and Kings College are the only institutions to provide researchers with tools to submit their data and metadata for long-term preservation.

● UPEI and DAL are the only institutions to support and maintain preservation storage and management systems for the long-term preservation of data.

● UPEI is the sole institution to offer: ○ a research records retention policy that addresses the preservation and protection

of research data assets; ○ support for a staging repository for researchers to deposit data for short-term

storage and subsequent long-term deposit; ○ support and preparation of research data archival information packages for

long-term preservation. ● None of the institutions polled maintain a registry of acceptable or recommended file

types for research data and metadata. ● Additionally, none of the institutions polled have a formal data deposit agreement form

for researchers to sign when they submit their data.

13 | Page

 

Page 14: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

SURVEY

Gaps and Requirements It is clear from the results that many CAUL/CBUA institutions have a lot of room for improvement in terms of research data management. The majority of institutions do provide access to research data through subscriptions. Even DAL and UPEI – with 90% and 73% affirmative responses, respectively – have room to grow. The most obvious gaps are specifically with technical services. Preservation, data management plans, and secure places to store content all appear as clear gaps within the region. The results show that many of the responding institutions offer something in the way of subscription services, reference help, and literature or guides of some sort to aid researchers; however, the survey does little to address the fact that some of the CAUL/CBUA institutions may actually see little utility in providing research data services of any kind. Indeed, in their comments section, NSCAD noted:

“The nature of research data in visual arts is an area still being figured out: As of yet, the NSCAD Library does not have any services to support research data collection but it is a subject I will discuss with our institution’s Library Committee.”

Similarly, Université Sainte-Anne noted that they would contribute where resources allowed, but that their size prevented them from taking much of a step forward at this juncture.

14 | Page

 

Page 15: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

SURVEY

Summary and Discussion Research data stewardship is a fairly new field. Many schools in CAUL/CBUA are just beginning to gain momentum with general scholarly communications initiatives like Institutional Repositories. It is not surprising, then, that many institutions do not yet have well-developed research data services. The adapted CARL survey reveals that the region has a lot of room to grow. With the arrival of the Tri-Council Open Access policies it is a particularly relevant time to investigate 4

how CAUL/CBUA members can best respond to this national mandate. A collaborative and regional response that intersects with national activities, as well as one which promotes and supports local efforts are recommended by this committee. This survey strongly suggests that DAL and UPEI have important roles to play as regional leaders in research data stewardship. Their experience in research data management and preservation should, at least, provide some guidance for those institutions beginning to look at these issues. It would be prudent, however, to note that the survey itself may not offer the best measures of awareness or action on the topic of research data stewardship. For example, many of the questions from both the “access” and “user” sections of the survey relate to providing links to existing data repositories, databases, and other such services. It may be debated that linking to a resource is the same as providing access. Dataverse, for example, might not warrant a link on a library website if a liaison or scholarly communications librarian gets this information to researchers through other methods. We recognize that the survey may have missed additional ways that institutions are approaching issues surrounding research data stewardship. It is worth noting that results for Dalhousie are more current – by as much as two years – than other institutions. Their responses were updated in February 2015. In order to best reflect the ongoing efforts of the CAUL/CBUA members it would be beneficial to find a way to maintain a dynamic reflection of members services in this area.

4 Government of Canada (2015, February 27). Tri-Agency Open Access Policy on Publications. Retrieved from

http://www.science.gc.ca/default.asp?lang=En&n=F6765465-1

15 | Page

 

Page 16: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

RECOMMENDATIONS AND NEXT STEPS Introduction

This section provides recommendations for the establishment and expansion of research data management services within the CAUL/CBUA consortium. The proposed expansion involves the establishment of an Atlantic Research Data Repository (ARDR), which would be jointly hosted by four CAUL/CBUA institutions, one in each province. The proposed repository would be made available to all CAUL/CBUA members. Specific recommendations regarding infrastructure and funding are outlined below. This section also contains recommendations regarding the distribution of the Research Data Stewardship Survey data, and the continuation of the survey.

16 | Page

 

Page 17: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

RECOMMENDATIONS AND NEXT STEPS Survey Recommendations

The survey was intended to provide a current picture of research data stewardship services offered by each CAUL/CBUA institution. The survey results can provide valuable information to each institution, aiding in goal setting and providing a sense of the regional capabilities for data stewardship. For this reason, we recommend that the survey results be posted in full on the CAUL/CBUA website. As data stewardship becomes increasingly necessary, it will be useful to track our regional and institutional capabilities in the coming years. Thus, we recommend that the survey be completed on an annual or biennial basis. The survey in its current form is not ideal, so we recommend that the survey questions be evaluated and updated as necessary for each survey cycle. While this may make some data points difficult to track year over year, it will allow us to eliminate or modify questions that become irrelevant as regional and technological capabilities grow. For the next cycle of the survey, we recommend employing a survey tool (such as Survey Monkey) which can provide a long term framework for maintaining the survey. A survey tool will allow for a better respondent experience, and facilitate results analysis. We also recommend offering a response category labeled “in progress” to allow institutions to indicate services that are in development, but not yet fully realized. Lastly, it would be useful to provide more opportunities for comments within the survey, to allow respondents to clarify or elaborate on answers to specific questions.

17 | Page

 

Page 18: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

RECOMMENDATIONS AND NEXT STEPS

Research Data Management Infrastructure

CAUL/CBUA libraries have a great opportunity to assist researchers with data management; however, this assistance will require a robust infrastructure. The following sections will outline the necessary components of such an infrastructure including: a RDMP tool, data repository services (data storage, metadata creation, and search interface), the creation of a CAUL/CBUA RDM team, appropriate governance, and recommendations for a funding model. The development of this project should dovetail with similar endeavors like Portage, benefiting from national infrastructure efforts.

Research Data Management Planning Tool

The TC3+ will soon mandate that all grant applicants provide a research data management plan (RDMP). We feel that our libraries should offer support in the creation of these plans. There are tools in development at a number of institutions in North America for creating RDMPs. We recommend that CAUL/CBUA endorse one of these tools, with an eye to the RDMP tool currently under development at the University of Alberta, CARL and Compute Canada. Individual CAUL/CBUA institutions may also wish to deploy local instances of a similar tool in order to better reflect institutional practices. In these cases, we recommend that locally deployed tools be able to export plans to the national repository, contributing to a national inventory of RDMPs. Providing a RDMP tool to our researchers will require resources (funding, staffing, etc.). As it develops, the Portage model may help to inform some of these resource decisions and may even provide some funding for the ARDR initiative, depending on how CAUL/CBUA decides to participate in this project. If the CAUL/CBUA proceeds with a project similar to ARDR, then CAUL/CBUA libraries and staff would develop strong regional expertise, benefiting not only the region, but national efforts as well.

Regional Research Data Storage Service

Whether or not CAUL/CBUA decides to provide a regional data storage solution will depend upon a desire to actively and collectively steer data management practices. National projects such as Portage are on the horizon, and some individual CAUL/CBUA institutions are providing strong RDM services. The establishment of an Atlantic Research Data Repository would allow regional capabilities to grow more quickly, and would help distribute the financial burden of these services across all member institutions. Data storage is a key requirement of a RDM service, whether providing secure working storage or long-term preservation storage. Individual research projects will vary widely as to the specific requirements for the amount and type of storage, including the potential for encrypted storage

18 | Page

 

Page 19: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

for data containing private information, Terabyte/Petabyte-level massive storage, efficient data analysis tools, and more. Requirements common to all research projects include: secure and redundant storage which can ensure data integrity and preservation for the short and long term; descriptive and administrative metadata; data publishing services. One cost-effective approach to a sustainable data storage service would be for CAUL/CBUA institutions to collaborate on the provision and delivery of storage services. We recommend CAUL/CBUA support the creation of the ARDR service, which would function on an “all-in” cost-sharing model and would provide data storage and services to all member institutions. The ARDR system would provide a two-tiered service model, offering a basic service package as well as a value-added service package. Member institutions would be free to choose the package that best suited their needs. The basic service package would include:

● creation and storage of a final dataset, defined by the Principal Investigator as one needing an appropriate level of accessibility (e.g. is part of a publication, or needs to be shared with collaborators);

● creation and vetting of a Dublin Core record for the dataset; ● creation of a VIVO/CASRAI (or similar) record to identify the

institutional/researcher/funder context; ● minting of a DOI for the dataset; ● synchronization/integration of the dataset with national and/or domain-specific

repositories; ● support for library staff and researchers in all aspects of research data management.

The value-added service would include:

● bulk storage of active data; ● access to a DropBox-style service (but one with provincially provisioned storage) that

allows a researcher to easily synchronize active data from a local desktop or system to a central managed service;

● other services deemed useful to researchers and consistent with member institution’s practices.

The value-added service would come at an additional cost to institutions that choose this option. In regards to additional data storage as an example, costs would vary depending on the amount of storage required. The RDWG proposes that the hardware infrastructure (i.e. servers, storage drives, etc.) be hosted in four locations: UPEI, UNB, Dalhousie, and MUN. These four sites would serve as regional nodes, including providing back-ups for one another, with all data synchronized between sites for added redundancy. Given the costs of setting up and maintaining this hardware,

19 | Page

 

Page 20: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

our proposed funding model would funnel a significant portion of the member fees to the four host institutions.

CAUL/CBUA RDM Team

A vital component of the ARDR proposal is the development and sharing of expertise in the area of research data management. CAUL/CBUA members are already cultivating this expertise in their own institutions, but experts within separate institutions can benefit from formalized information sharing within CAUL/CBUA. We recommend the establishment of a CAUL/CBUA Research Data Management Team. This team would consist primarily of the CAUL/CBUA Manager and one or more additional CAUL/CBUA employees with technical expertise in data management. Funding for the additional position(s) could come from CAUL/CBUA membership fees, individual research projects, or external subsidies from larger research data management programs like Portage. This team would offer RDM support to member institutions. The RDM Team would provide annual training workshops in matters relating to RDM, as well as offer ongoing support via a mechanism such as instant messaging or a ticketing system similar to those commonly used in IT support. This team would be accessible to faculty and staff at all member institutions. The RDM Team would report to the CAUL/CBUA Directors via the DPSC. In addition to the CAUL/CBUA RDM Team, we recommend establishing a support network akin to the current Data Liberation Initiative (DLI) model. This would see each member institution designating a RDM representative who would liaise with the RDM Team as well as representatives from other member institutions. The RDM Team would provide a mailing list where RDM representatives could share information, post questions, and offer advice to one another.

ARDR Governance and Administration

The ARDR will require a strong RDM policy framework to guide implementation and development. This framework will have to account for institutional autonomy, but CAUL/CBUA could offer model policies for institutions to consider as they develop their internal services. All of the following policy proposals are aspirational, and full implementation will take several years. In the development of model policy, CAUL/CBUA should examine and adapt policies at place in other research networks. Specifically, we recommend adopting a model policy that CAUL/CBUA would endorse for the oversight of ARDR, and individual institutions would use as the basis for local policies. The example used here is based on the the University of Edinburgh’s RDM policy . 5

5 University of Edinburgh, The. (2015, February 5). Research data management policy. Retrieved from http://www.ed.ac.uk/schools-departments/information-services/about/policies-and-regulations/research-data-policy

20 | Page

 

Page 21: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

The aforementioned policy contains the following ten guiding points, slightly modified here to better suit CAUL/CBUA:

1. Research data will be managed to the highest standards throughout the data lifecycle as part of the institution’s commitment to research excellence.

2. Responsibility for research data management through a sound research data management plan during any research project or programme lies primarily with Principal Investigators (PIs).

3. All new research proposals [from date of adoption] must include research data management plans or protocols that explicitly address data capture, management, integrity, confidentiality, retention, sharing, and publication.

4. CAUL/CBUA will arrange training, support, advice, and where appropriate guidelines and templates for the research data management and research data management plans.

5. CAUL/CBUA will facilitate the provision of mechanisms and services for storage, backup, registration, deposit and retention of research data assets in support of current and future access, during and after completion of research projects.

6. Any data which are retained elsewhere, for example in an international data service or domain repository, should be registered with the institution.

7. Research data management plans must ensure that research data are available for access and re-use where appropriate and under appropriate safeguards. If possible, data should be made accessible with a statement like this:

a. To the extent possible under law, the authors have waived all copyright and related or neighbouring rights to this data. CC0/Open Data.

8. The legitimate interests of the subjects of research data must be protected. 9. Research data of future historical interest, and all research data that represent records of

a member institution, including data that substantiate research findings, will be offered and assessed for deposit and retention in an appropriate national or international data service or domain repository, or a member institution’s repository.

10. Exclusive rights to reuse or publish research data should not be handed over to commercial publishers or agents without retaining the rights to make the data openly available for re-use, unless this is a condition of funding.

Another important policy statement, not included in the model policy above, pertains to the duration of data retention. We recommend that CAUL/CBUA institutions adopt a minimal RDM preservation commitment that says: “The University will steward the data for [Project X] for as long as is needed”. This statement will allow for flexible retention timelines that can be adjusted in consultation with the PI. Given the active discourse around RDM in Canada and further afield, we recommend that all ARDR services and policies be crafted with an awareness of national RDM initiatives. A well-designed ARDR should be poised to integrate with CARL’s Portage and the Compute Canada cyberinfrastructure.

21 | Page

 

Page 22: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

ARDR Sustainability Should the CAUL/CBUA Directors decide to pursue the ARDR project, a sustainable funding model will have to be established. We are proposing an initial 3-year financial investment from all CAUL/CBUA members to get the project started. After the initial 3-year period, research grant money can be leveraged to provide ongoing funding. According to the Association of Atlantic Universities (AAU), Atlantic universities generate $500 million in research funding annually, approximately 75% of which is TC3+ funding. If we were to assume a TC3+ mandate for research data stewardship, even 1% of the TC3+ grant money ($3,750,000) could keep the ARDR sustainably funded, negating the need for further financial contributions from CAUL/CBUA members. Despite this potential funding base, we recommend that CAUL/CBUA look for grant opportunities (ACOA, TC3+, etc.) to assist with the start-up costs. While not prescriptive, a basic funding proposal is included below to suggest one possible approach to funding ARDR in the near term.

1) Cost Model a) CAUL/CBUA RDM staff person

i) Annual requirements: FT equivalent resource ($90,000), travel funds ($10,000) ii) One-time costs: Laptop and software ($5,000)

b) Infrastructure i) Support for centralized ARDR repository: $60,000

c) Other: $40,000 d) Total Annual: $200,000

2) Funding Model

a) Core Services (multiple options) i) Using current CAUL/CBUA membership model, member fees for the initial 3

years would be as follows:

INSTITUTION FEE

Acadia University $10,600

Atlantic School of Theology $200

Cape Breton University $7,200

Dalhousie university $44,100

Holland College $5,200

Memorial University of Newfoundland $42,120

Mount Allison University $6,380

Mount Saint Vincent University $7,160

NSCAD University $1,860

NSCC $6,380

Saint Mary’s University $17,120

22 | Page

 

Page 23: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

St. Francis Xavier University $11,400

Université de Moncton $13,160

Université Sainte-Anne $1,000

University of King’s College $3,000

University of New Brunswick Fredericton $18,860

University of New Brunswick Saint John $5,580

University of Prince Edward Island $10,160

ii) Tiered membership model (variable cost) iii) Equal costs per institution all-in ($11,000 per year for 3 years) iv) Equal costs per institution opt-in (variable cost)

b) Revenue Model

This model assumes that those institutions providing the shared services (hosting hardware/software infrastructure and providing the staff to maintain said infrastructure) would split the ARDR infrastructure funds to help offset the costs of local services. Using a 4-institutions model in this example, the revenue sharing would provide $15,000 annually to the 4 host institutions.

A firm commitment to this ARDR vision from CAUL/CBUA members will allow CAUL/CBUA to play a significant and active role in the guidance and development of research data management in the region and at a national level. A robust ARDR could become an important component of Portage and provide a Canadian model for regional RDM. The proposed 3-year investment will help member institutions develop the necessary expertise to stand at the forefront of research data management and allow Atlantic Canadian researchers to continue to excel at national and international levels.

23 | Page

 

Page 24: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

APPENDIX A: Glossary and Acronyms

Access Services: Activities dealing with support needed to provide users with access to data collections and resources, including data platforms, data linkage, data retrieval, and data tools. Aggregate data: The organization of statistics into a data structure, to store in a database or in a data file. ARC: Advanced Research Computing. A project initiated by the Canadian Association of Research Libraries (CARL), aiming to continue work on research data management in Canada. Born Analog: Information that was originally created in a non-digital format and has been digitized. Born Digital: A digital object that has never had an analog form. CADC: Canadian Astronomy Data Centre CARL: Canadian Association of Research Libraries CASRAI: Consortia Advancing Standards in Research Administration Information CBRAIN: Web-based software that allows neuroimaging researchers to perform analyses on data by connecting to High-Performance Computing facilities. CFI: Canada Foundation for Innovation CIHR: Canadian Institutes of Health Research Collection Services: Activities that specifically support the development, acquisition, management, description, and discovery of a collection of research data files. Compute Canada: Compute Canada deploys state-of-the-art advanced research computing (ARC) systems, storage and software in partnership with regional organizations ACENET, Calcul Quebec, Compute Ontario and WestGrid. CRKN: Canadian Research Knowledge Network CUCCIO: Canadian University Council of Chief Information Officers Dark Archive: An archive that does not grant public access. Data: Facts, ideas, or discrete pieces of information, especially when in the form originally collected and unanalyzed. Data Management Plan: A data management plan is a formal document that outlines what you will do with your data during and after a research project. 6

DC/QDC: Dublin Core / Qualified Dublin Core. A standard for metadata description. Digital object: A representation of information in digital form. Digital Preservation: The series of management policies and activities necessary to ensure the enduring usability, authenticity, discoverability and accessibility of content over the very long term. The key goals of digital preservation include usability, authenticity, discoverability, and accessibility. 7

6 DMP Tool, Data Management General Guidance. Web. Accessed March 2015. https://dmptool.org/dm_guidance 7 Portico. Web. Accessed March 2015. http://www.portico.org/digital-preservation/glossary 24 | Page

 

Page 25: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

DILC: Digital Infrastructure Leadership Council. DILC acts as a forum to discuss, develop, and coordinate Canada’s digital infrastructure networks. DLI: Data Liberation initiative. DMP: Data Management Plan. DMTI: DMTI Spatial. A provider of digital mapping data, location based data, geocoding, routing and GIS software. DPSC: Digital Preservation and Stewardship Committee. A committee of the Council of Atlantic University Libraries. DRM: Digital Rights Management GIS: Geographic Information Systems (or Science). Software that allows the visualization and analysis of spatially referenced data on spatial data. GIS Day: This is a grassroots educational event promoting the use of GIS and showcasing the uses of GIS. ICPSR: Interuniversity Consortium for Political and Social Research IDSE: Integrated Digital Scholarship Ecosystem. A CRKN initiative aiming to provide guidelines for digital scholarship in a Canadian research context. IPY: International Polar Year Project ISO: International Organization for Standardization LCDI: Leadership Council for Digital Infrastructure Member Institutions: Also referred to as Members. The post-secondary libraries belonging to the Council of Atlantic University Libraries. Metadata: data that describes information about digital objects. Descriptive/Bibliographic Metadata: Information used to search and locate an object such as title, author, subjects, keywords, and publisher Technical Metadata: Information about aspects of the object related to its file format or the original software used to create the file. Administrative Metadata: Information needed to help manage the digital object, such as copyright and preservation information. Structural Metadata: Information on how the digital object is organized, including the pages, chapters, and indexes. Methodology: The procedure(s) used to collect information or research, which explains the scope of the study, including factors such as sample selection, data sources, and disclosure. METS: Metadata Encoding and Transmission Standard. A framework for describing metadata. Migration: Process of changing a file format. NSERC: Natural Sciences and Engineering Research Council OAIS: Open Archival Information System. Archival framework developed by the Consultative Committee for Space Data Systems (CCSDS). Obsolete Format/Technology: Hardware or software that is no longer widely used. OPAC: Online Public Access Catalogue

25 | Page

 

Page 26: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

Open Archival Information System (OAIS): An archive that meets a set of responsibilities, as defined in the OAIS Reference Model. Preservation Description Information (PDI): The information necessary for preservation purposes, such as provenance, reference, context, and access rights information. Preservation Services: Activities describing services to support the mid-term and long-term preservation of research data. Provenance Information: Information that documents the history of an object, including its origin, changes that may have occurred over the course of its life cycle, and current custody. Digital Provenance is information regarding the origin of a digital object. RDC: Research Data Canada RDM: Research Data Management RDMPT: Research Data Management Pricing Tool RDWG: Research Data Working Group. A Working Group under the CAUL/CBUA DPSC. Research Data Stewardship: The management and care of research data. Render: To process a digital object, in order to view, listen to, or interact with the content. Repository: An area designated to storing and maintaining items. Digital repositories house digital objects. Research Data Management Planning: A plan outlining the storage, maintenance, management, and policies relating to research data. SOA: Service Oriented Architecture SPSS: A statistical analysis package with data management functions. SSHRC: Social Sciences and Humanities Research Council Succession Plan: A procedure outlining how and when to transfer the management, ownership and/or control of holdings. TC3+: Tri-Agency Council. Comprised of the Social Sciences and Humanities Research Council (SSHRC), the Natural Sciences and Engineering Research Council (NSERC), the Canadian Institutes of Health Research (CIHR), in collaboration with the Canada Foundation for Innovation (CFI) and with Genome Canada. TDR: Trustworthy Digital Repository TRAC: Trustworthy Repositories Audit and Certification User Services: Activities that focus on supporting user communities by identifying their data needs, assisting them in preparing data management plans, selecting metadata standards and best practices, identifying existing data sources, and retrieving, manipulating, and transforming data.

26 | Page

 

Page 27: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

VIVO: An open-source software system, a network of investigators and institutions, and an open information representation model for scholarship. VIVO leverages work done over the past nine years by Cornell University, supporting researchers and finding of researchers by representing data about them and their activities including publications, awards, presentations and partners. Support for researchers using VIVO is often done by librarians of the research institutions. 8

Workflow: The formalization of the process metadata which includes a description of the researcher's method. It identifies the data inputs, transformations, and analytical steps to achieve the final data output. 9

8 University of Nebraska Medical Center. Data Management. Web. Accessed March 2015. http://unmc.libguides.com/content.php?pid=525776&sid=4325759 9 US Geological Survey. (2015, January 16). USGS Data Management. Retrieved from http://www.usgs.gov/datamanagement/describe/capture.php 

27 | Page

 

Page 28: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

APPENDIX B: Bibliography

Baker, K. S., & Yarmey, L. (2009, October 15). Data stewardship: Environmental data

curation and a web-of-repositories. International Journal of Digital Curation, 4, 2, 12-27. DOI:10.2218/ijdc.v4i2.90

Berman, F. (2008, December 01). Got data?: A guide to data preservation in the information

age. Communications of the ACM, 51, 12, 50-56. DOI: 10.1145/1409360.1409376 CARL. (2014, December 22). Portage: Supporting Canadian innovation through shared expertise and stewardship of research data. Retrieved from http://www.carl-abrc.ca/uploads/SCC/Portage-External-2-Dec-22-2014.pdf CARL. (2013, December 12). CARL’s response to the consultation document Capitalizing on Big Data:

Toward a Policy Framework for Advancing Digital Scholarship in Canada. Retrieved from http://www.carl-abrc.ca/uploads/SCC/CARL%20Big%20Data%20Consultation%20Response%20Dec%2012%202013.pdf

Delserone, L. M. (2009, March 27). At the watershed: Preparing for research data management and stewardship at the University of Minnesota Libraries. Library Trends, 57, 2, 202-210. DOI: 10.1353/lib.0.0032

Government of Canada. (2015, February 27). Tri-Agency Open Access Policy on Publications. Retrieved from

http://www.science.gc.ca/default.asp?lang=En&n=F6765465-1 Government of Canada. (2014). Canada’s Action Plan on Open Government 2014-16. Retrieved from http://open.canada.ca/en/content/canadas-action-plan-open-government-2014-16 Hedstrom, M. (1998, January 01). Digital preservation: A time bomb for digital libraries.

Computers and the Humanities, 31, 3, 189-202. Retrieved from http://www.uky.edu/~kiernan/DL/hedstrom.html

Heery, R. and Anderson, S. (2005) Digital repositories review. University of Bath. Retrieved from

http://opus.bath.ac.uk/23566/2/digital-repositories-review-2005.pdf

Research Data Strategy Working Group. (2008). Stewardship of research data in Canada: A gap analysis. Retrieved from http://rds-sdr.cisti-icist.nrc-cnrc.gc.ca/eng/reports/2008_gap_analysis.html

Rosenbaum, S. (2010, October). Data governance and stewardship: Designing data

stewardship entities and advancing data access. Health Services Research, 45, 5, 1442-55. DOI: 10.1111/j.1475-6773.2010.01140.x

Shearer, K., & Canadian Association of Research Libraries. (2009). Research data: Unseen

opportunities. Ottawa, Ont: Canadian Association of Research Libraries. Retrieved from http://carl-abrc.ca/uploads/pdfs/data_mgt_toolkit.pdf

28 | Page

 

Page 29: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

SSHRC, NSERC, CIHR, and CFI. (2013, October 16). Capitalizing on big data: Toward a policy framework

for advancing digital scholarship in Canada. Retrieved from http://www.sshrc-crsh.gc.ca/about-au_sujet/publications/digital_scholarship_consultation_e.pdf

University of Edinburgh, The. (2015, February 5). Research data management policy. Retrieved from http://www.ed.ac.uk/schools-departments/information-services/about/policies-and-regulations   /research-data-policy

29 | Page

 

Page 30: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

APPENDIX C: CAUL/CBUA Research Data Management Survey Instrument

Objective The purpose of this exercise is to benchmark your Library’s current involvement in data management services. This is not a comprehensive list of activities that a Library could undertake in providing data management services, but rather is a sample of items that we feel is representative of such activities. This survey is shamelessly copied from the CARL instrument with permission. If your institution responded to this CARL survey, please feel free to forward it rather than filling this out again, as they essentially the same document.

Please read each of the ten items under the four service areas and Highlight the items that your Library currently carries out or supports. When completed, add the number of highlighted items within each service area and then determine the total for all service areas. Use the Score Sheet below to record these sums. Please also feel free to add any comments you have to each item where appropriate. We are especially interested in details that can help us start an inventory of what services and resources CAUL members provide for research data management.

The DPSC Research Data Working Group will discuss the results at their meeting at APLA in June. In order to facilitate that review we would ask that you respond by filling out this document by May 16. If you have any questions please contact Mark Leggott - 902-566-0460, [email protected].

A glossary is provided to clarify the meaning of terms used.

Collection Services These are activities that specifically support the development, acquisition, management, description, and discovery of a collection of research data files.

1. Do you have a collection policy for research data? 2. Do you have subscriptions with data providers (e.g. DLI, DMTI, or ICPSR)? 3. Do you have a dedicated budget to purchase data files outside of subscription services? 4. Do you maintain a collection of data files from local researchers? 5. Do you catalogue local research data files in your OPAC? 6. Do you maintain infrastructure to manage a local data file collection (e.g. a digital assets

management system)? 7. Do you produce data documentation to enhance your data collection (e.g. user’s guides,

data dictionaries, variable lists, etc.)? 8. Do you have metadata librarians or specialists who advise on standards for content and

technical metadata? 9. Do you produce standards-based metadata for research data?

10. Is your department or unit a member of a standards body for research data or metadata?

User Services These are activities that focus on supporting user communities by identifying their data needs, assisting them in preparing data management plans, selecting metadata standards and best practices, identifying existing data sources, and retrieving, manipulating, and transforming data.

30 | Page

 

Page 31: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

1. Do you collect and maintain data curation profiles of your user communities? 2. Do you conduct activities that promote a culture of data sharing and data reuse at your

institution, e.g., through handouts, teaching, or participating in GIS Day or Open Access Week?

3. Do you provide research data management training for faculty or graduate students? 4. Do you recommend or provide instruction on the use of online tools for research data

management (e.g. Mantra, DMPTool, etc.)? 5. Do you assist researchers with preparing Data Management Plans? 6. Do you maintain a website that lists online research data management resources? 7. Do you advise on or provide instruction on how to cite data sources? 8. Do you provide data reference services to help users find and select research data? 9. Do you reformat data for users to facilitate their use of data (e.g. convert from SPSS to

Excel)? 10. Do you transform data files for users (e.g. extract data subsets, merge data files, or create

new variables)?

Access Services This set of activities deals with the support needed to provide users with access to data collections and resources, including data platforms, data linkage, data retrieval, and data tools.

1. Do your OPAC records provide links to local research data files? 2. Do you support a local website that describes data and contains links for downloading

data? 3. Do you provide a link to DataCite Canada from your local website? 4. Do you subscribe to the Data Citation Index through the Web of Knowledge platform? 5. Do you provide metadata discovery tools beyond an OPAC (e.g. Nesstar, DataVerse, or

MarkLogic server)? 6. Do you provide online data access tools (e.g. FTP or DataVerse server)? 7. Do you provide access to online data subsetting tools (e.g. Nesstar or SDA server)? 8. Do you provide access and support to data cleaning, processing, or format translation

tools (e.g. DataWrangler, Stat Transfer, or Google Refine)? 9. Do you provide access to software for analyzing and visualizing research data?

10. Do you support a secure data enclave to provide research access to sensitive data?

Preservation Services These activities describe services to support the mid-term and long-term preservation of research data.

1. Does the University have a research records retention policy that addresses the preservation and protection of research data assets?

2. Does the library have a mandate to preserve research data? 3. Do you advise on or help researchers locate an appropriate repository for their

research data? 4. Does the library have a formal data deposit agreement form for researchers to sign

when they submit their data? 5. Do you assist researchers with the selection of appropriate data and metadata

standards for the preservation of data? 6. Do you maintain a registry of acceptable or recommended file types for research data

31 | Page

 

Page 32: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

and metadata? 7. Does your library support a staging repository for researchers to deposit data for

short-term keeping and subsequent long-term deposit? 8. Does your library provide researchers with tools to submit their data and metadata

for long-term preservation? 9. Does your library support and prepare research data archival information packages

for long-term preservation? 10. Does your library support and maintain preservation storage and management

systems for the long-term preservation of data?

Score Sheet

Function Number of Activities Circled

Collection Services

User Services

Access Services

Preservation Services

TOTAL

General Comments Please record any general comments you wish us to consider here.

Survey Glossary Archival Information Packages: A concept from the OAIS Reference Model describing the package of digital objects organized, documented, and managed in a long-term preservation environment. Data Citation Index: A Thompson Reuters database linking data files in repositories with published literature that cites data. DataCite Canada: An online data registry service provided by the National Science Library of the National Research Council to assign digital object identifiers (DOIs) to data files. i.e. persistent and unique identifiers for data. Data Curation Profiles: Narrative-based methodology for describing research data from individual or team research projects. The Purdue Data Curation Profile is one approach to documenting researchers' data management activities, their data holdings and their data management practices. The Digital Curation Centre Data Asset Framework is another method. Data Deposit Agreement: A document specifying the terms and responsibilities of the researcher depositing her or his data with a repository and the terms and responsibilities of the repository in disseminating the data. Data Dictionary: Supporting data documentation for a data file that identifies variable names and labels, origin of the variable, values and labeling, missing data codes, record layout, and other related information. Data Enclave: A secure facility for analyzing sensitive data. Services commonly associated with a data enclave include restricted and authenticated access to the facility and disclosure approval for analysis results to be removed from the facility.

32 | Page

 

Page 33: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

Data Management Plans: A formal document outlining how a researcher plans to handle her or his data both during research and after a project is completed. Data Subsets: Customized extractions from a complete data file consisting of selected cases (observations) or variables. DataVerse: A virtual data collection system for managing and retrieving data files. DataWrangler: A software product for interactively cleaning and transforming data. DLI: Data Liberation Initiative, a subscription program between Statistics Canada and post-secondary institutions providing access to all standard data products and spatial data. DMPTool: An online tool developed by the California Digital Library to produce data management plans. DMTI: A spatial data collection. Excel: Microsoft’s spreadsheet program distributed as part of the Microsoft Office Suite. FTP: A file transfer service based on the file transfer protocol. SFTP (secure file transfer protocol) has tended to replace earlier FTP services. GIS: Geographic Information Systems (or Science). Software that allows the visualization and analysis of spatially referenced data on spatial data. GIS Day: This is a grassroots educational event promoting the use of GIS and showcasing the uses of GIS. Google Refine: A Google cloud-based tool for editing messy data, transforming it to other formats, and providing access the data through web services. ICPSR: Inter-university Consortium for Political and Social Research, a large membership-based data repository for social science data. Mantra: An online instructional course developed and maintained by EDINA at the University of Edinburgh based on best practices in research data management from three disciplines: social science, clinical psychology, and geoscience. OPAC: Online Public Access Catalogue Open Access Week: This is a grassroots educational event promoting Open Access publishing. MarkLogic: A commercial database system capable of indexing structured, semi-structured, and unstructured digital content. Metadata: Descriptive information about other digital objects. Some metadata are based on a standard, while other metadata are based on local convention. Nesstar: A Web-based service for data discovery and dissemination. Sensitive Data: A data file containing information that could easily disclose the identity or location of an observation within a data file. For example, the names, street addresses, or phone numbers of individuals in a file make the data sensitive. The location of nesting grounds for an endangered species in a data file is also sensitive data. SDA: A set of Web-based programs for the documentation and analysis of survey data. SPSS: A statistical analysis package with data management functions. Staging Repository: A service for organizing and submitting research data for a period of time to provide immediate access to the data. A staging repository may work with a data repository supporting long-term preservation services and have arrangements to structure the data it holds for submission with the long-term preservation repository. Stat Transfer: A program for changing the formats between popular software systems. User Communities: The groups of data users sharing a common background, such as, discipline, data source (e.g., all Census users), or authorization category (e.g., all graduate students.) User’s Guide: Supporting data documentation that provides a description of the study or program under which the data were produced, the study design, sampling methodology, data collection and editing process, weighting procedures, and other related information.

33 | Page

 

Page 34: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

 

Appendix D: CAUL/CBUA Survey Infographic

34 | Page

 

Page 35: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

CAUL/CBUA RESEARCH DATASTEWARDSHIP SERVICESSURVEY: RESULTSThe CAUL/CBUA Digital Preservation Stewardship Committee (DPSC) Research DataWorking Group (RDWG) was created in Winter 2014 for the purpose of conductingresearch related to Research Data Stewardship in the Atlantic Region. DPSC RDWGmembers determined that a survey would be the best tool for collecting reliable dataand staff input from across the CAUL/CBUA community. A survey distributed by CARLwas adapted with permission and distributed to University Directors on April 13, 2014.

16/17 InstitutionsResponded to the survey.

CAUL/CBUA Members

The SurveyThe survey itself focused primarily on four aspects of research data stewardshipservices, with 10 questions in each section: 

Collection Services“Activities that specifically support the development,acquisition, management, description, and discovery of acollection of research data files”.

User Services“Activities that focus on supporting user communities by

Page 36: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

identifying their data needs, assisting them in preparingdata management plans, selecting metadata standards andbest practices, identifying existing data sources, andretrieving, manipulating, and transforming data”.

Access Services“Activities dealing with support needed to provide userswith access to data collections and resources, includingdata platforms, data linkage, data retrieval, and data tools”.

Preservation Services“Activities describing services to support the mid-term andlong-term preservation of research data”.

Results at a GlanceResults for types of services offered at CAUL/CBUA institutions were quite varied. Mostinstitutions were able to provide at least token Collections, User, or Access services.However, even the highest scoring institution in the survey – UPEI – answered in thenegative for 28% of the questions. Very few institutions are well served in terms ofPreservation services. Some institutions responded as having zero services in place atall, and only one failed to respond in general. 

The following graph is a representation of the total percentage of "yes" answers on thesurvey against the total number of questions for each institution that had at least oneaffirmative response.

Page 37: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

Collections Services User Services Access Services Preservation Services

Total

ACA

MSVU

NSCC

DAL

SMU

STFX

UdeM

UNB

UNBSJ

UPEI

MUN

AST

UKC

% of affirmative answers for each section of the survey.0 10 20 30 40 50 60 70 80 90

Clearly, Preservation Services are far­and­away the most lacking amongst theCAUL/CBUA membership. It is also certainly worth noting that the institution with thehighest research data stewardship ratings also develops software that is particularlygood at meeting preservation needs. 

Every institution has room for improvement. 

Dal, generally, is the regional leader in research data stewardship efforts.

Some Additional Observations

3Institutions – CBU, USA, and NSCAD – reported no

Page 38: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

activities related to research data stewardship.

"The nature of research data in visual arts isan area still being figured out".

- NSCAD Survey Comment

5 InstitutionsReported minimal activities related to research datastewardship. (ACA, AST, NSCC, StFX, UKC)

2 InstitutionsRecorded affirmative answers for > 50% of the survey.(DAl, UPEI)

3 InstitutionsHave mandates for the preservation of research data. (DAL,UPEI, MSVU)

Collection Services GapsWhile three of the schools – DAL, MUN, and UPEI – answered in the affirmative for70% of Collections questions, the rest of the region lagged behind significantly. Takingout the schools with no current activities in Research Data Stewardship, the averageresponse for Collections Services was only 1 affirmative answer out of 10. 

70% of InstitutionsHave subscriptions to data providers such as DLI, DMTI orICPSR.

UPEIUPEI is the only CAUL/CBUA institution with a writtenCollection Policy for research data.

DALDAL is the only CAUL/CBUA institution with membershipto a standards body for research data or metadata.

0%Of CAUL/CBUA member institutions are cataloguing localresearch data files in their OPAC.

User Services GapsUser services were fairly well represented across the CAUL/CBUA group, with only four

Page 39: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

institutions falling short of providing any specific services. Most institutions are makingat least a gesture or two in this field, be it through Open Access Week events,reference help, or helping users to find research data. 

Most of the obvious gaps fall directly in the field of Data Management plans or specifictechnical assistance in modifying or converting specific types of data files. 

2 InstitutionsUPEI and Dal are the only institutions to provide researchdata management training for faculty and/or graduatestudents.

2 InstitutionsUPEI and Dal are also the only institutions recommendingor providing instruction on the use of online tools forresearch data management (e.g. Mantra, DMPTool, etc.).

2 InstitutionsUPEI and Dal are also the only institutions assistingresearchers with preparing Data Management Plans.

Access Services GapsGaps in access services are a little more spread out than user and Collections Services.It's notable that UPEI and DAL were much closer to other member institutions in thisparticular category. 

It's worth noting that most of the questions about Access Services regarded directlinking or access to a service that exists elsewhere. This doesn't mean that librarians orfaculty at these institutions cannot access this material and, really, only speaks tospace for linking on a website. 

7 InstitutionsHave no current Access Services as represented in thesurvey. That's nearly half of the respondents!

5 InstitutionsProvide both software for analysing and visualizingresearch data and also support a secure data enclave forstoring sensitive data.

Consolation PrizeAt least 9 institutions – more than half – either link out tosome existing resources or provide specific access to someresearch data discovery tools.

Preservation Services GapsPreservation is the space with the most room to grow, but is also likely the mostresource and infrastructure­dependent service. It is vital to note that research data

Page 40: Digital Preservation and Stewardship Committee Research Data … RDWG ARDR Repor… · anticipation of the introduction of a research data management mandate for researchers receiving

preservation is a fairly new problem for libraries. This may speak to the absence ofservices in this section.

Only 5 Institutions...... offer anything in the way of Preservation Services. Thisincludes preservation policies, helping researchers findappropriate repositories, application of standards tometadata, tools for submitting data, archival packages, andpreservation storage/management systems.

That means...... there are 12 CAUL/CBUA institutions with no researchdata Preservation Services.

UPEI AND DALUPEI and Dal are the only institutions offering extensivePreservation Services.

SummaryResearch data stewardship is a fairly new field. Many schools in the CAUL/CBUA groupare only just starting to get momentum with general scholarly communicationsinitiatives like Institutional Repositories, so it is not overly surprising that many do notyet have well­developed research data services. That said, the adapted CARL surveysuggests that the region has a lot of room to grow. With the eventual arrival of Tri­Council Open Access policies that lean increasingly towards access to research data, itis a particularly relevant time to be investigating how CAUL/CBUA members can bestrespond with either their immediate resources or a collaborative, regional effort. 

This survey strongly suggests that DAL and UPEI have important roles to play asregional leaders in research data stewardship. Their experience in this work should, atleast, provide some guidance for those institutions only just beginning to look at theissues.