40
Researcher Requirements Report Virtual Infrastructure with Database as a Service (VIDaaS) Project vidaas.oucs.ox.ac.uk Oxford University Computing Services Affiliation Meriel Patrick Author

VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

Researcher Requirements Report

Virtual Infrastructure with Database as a Service (VIDaaS) Project

vidaas.oucs.ox.ac.uk

A

A

Oxford University Computing Services

Meriel Patrickuthor

ffiliation

Page 2: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

Project Document Cover Sheet

Project Information

Project Acronym VIDaaS

Project Title Virtual Infrastructure with Database as a Service

Start Date 1 April 2011 End Date 31 March 2012

Lead Institution University of Oxford

Project Director Prof. Paul Jeffreys

Project Manager &

contact details

Dr. James A J Wilson

Oxford University Computing Services, 13 Banbury Road, Oxford, OX2

6NN. Tel. 01865 613489. Email: [email protected]

Partner Institutions N/A

Project Web URL http://vidaas.oucs.ox.ac.uk/

Programme Name (and

number)

UMF Shared Services and the Cloud

Programme Manager John Milner

Document Name

Document Title Researcher Requirements Report

Author(s) & project

role

Meriel Patrick (Project Analyst)

Date 26 September 2011 Filename VIDaaS Researcher

Requirements Report.pdf

URL http://vidaas.oucs.ox.ac.uk/docs/VIDaaS%20Researcher%20Requirements

%20Report.pdf

Access Project and JISC internal General dissemination

Document History

Version Date Comments

1.0 26/09/11 Report signed off by VIDaaS Steering Group

01.Version VIDaaS Project Researcher Requirements Report

Page 3: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

VIDaaS Project Researcher Requirements Report

Contents

1. Executive Summary.........................................................................................................................2

2. Introduction ....................................................................................................................................4

3. Methodology...................................................................................................................................4

3.1. Interviews................................................................................................................................4

3.2. Survey......................................................................................................................................5

4. Interviewee and Survey Respondent Profiles.................................................................................5

4.1. Interviewees............................................................................................................................5

4.2. Survey Respondents................................................................................................................6

5. Current Practices.............................................................................................................................7

5.1. Dataset Details ........................................................................................................................7

5.1.1. Interviewees........................................................................................................................7

5.1.2. Survey Respondents............................................................................................................8

5.2. Software Used.........................................................................................................................9

5.2.1. Interviewees........................................................................................................................9

5.2.2. Survey Respondents............................................................................................................9

5.3. Data Sharing and Publication................................................................................................10

5.3.1. Interviewees......................................................................................................................10

5.3.2. Survey Respondents..........................................................................................................11

6. Database as a Service (DaaS) ........................................................................................................12

6.1. Interest in the DaaS.......................................................................................................................12

6.1.1. Interviewees......................................................................................................................12

6.1.2. Survey Respondents..........................................................................................................15

6.2. User Requirements for the DaaS ..................................................................................................17

6.2.1. Survey Respondents – Software Feature Rankings...........................................................17

6.2.2. Interviewees and Survey Respondents – Feature Requests.............................................19

6.2.3. Interviewees – Views on Training .....................................................................................24

7. Comparison with Sudamih Findings..............................................................................................26

8. Conclusions and Recommendations .............................................................................................27

Appendix A: Index of Interviewees .......................................................................................................30

Appendix B: Interview Question Template...........................................................................................31

Appendix C: Survey Questionnaires......................................................................................................33

1

Page 4: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

1. Executive Summary

This document reports the findings of the VIDaaS Project’s requirements gathering exercise,

conducted in the summer of 2011. The exercise involved a series of interviews with Oxford

researchers working with structured data who were based in the Social Sciences Division, the

Medical Sciences Division, and the Mathematical, Physical and Life Sciences Division, plus a broader

online survey for both researchers and IT support staff. Those canvassed showed a substantial

amount of interest in the DaaS (Database as a Service).

The majority of researchers working with structured data use some combination of textual and

numerical information. Working with images is also reasonably common, and a substantial minority

use audio and video material. A smaller proportion of researchers work with GIS data.

Size of datasets varies enormously – from a few hundred megabytes to a few hundred gigabytes.

However, smaller datasets are in the majority, with more than half the researchers canvassed

working with data collections of under 10GB. There were no obvious correlations between dataset

size and discipline, though a larger sample size would be needed to draw firm conclusions on this

point.

Researchers are reasonably evenly divided between those who keep their data on a personal hard

drive or other local storage, and those who use network-attached storage (though IT support staff

are significantly more likely to use the latter). Use of cloud-based storage is currently rare.

While a substantial number of researchers working with structured data use relational databases,

flat file formats are also very common. IT support staff were substantially more likely to report use

of relational databases than researchers. Spreadsheets and statistical analysis packages are both

popular means of storing and analysing flat file data, with the latter being particularly important for

social scientists. Humanities researchers, however, are more likely to use XML documents to manage

their data.

Researchers in general seem to have a rather lukewarm attitude towards making data publicly

available; while many are in principle happy to do so once they have finished their own work, few

seem to regard it as a high priority. Data publication often seems to happen only because it is

expected – either to support a research publication, or because funding bodies require it.

Nevertheless, a substantial proportion of researchers do believe that their data would be of value or

interest to others, and there are some projects which exist specifically to make a particular body of

data more widely available.

Researchers generally reacted positively to the DaaS. However, discussions about potential uses for

the service indicated that it is imperative to catch people at the right point in the research cycle:

researchers are more likely to consider using a new service for a project they have not yet embarked

on than for one that is already underway. In disciplinary terms, the warmest response was received

from humanists and social scientists.

The DaaS

haringSData

etailsDDataset

2

01.Version VIDaaS Project Researcher Requirements Report

Page 5: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

Major factors affecting how likely researchers would be to use the DaaS include cost and

functionality. It was also deemed vital that the DaaS be straightforward and intuitive to use. Other

key user requirements include:

Automated backup

The ability to import or export data in a range of formats

The ability to view and present data in different ways

The ability to set different permissions levels

The ability to make data publicly available via the Web

Automated versioning

A substantial proportion of researchers indicated that it would be helpful to be able to share data

with colleagues, including those outside Oxford. Similarly, the idea of having a straightforward way

of publishing datasets – including subsets of data to accompany research publications – was also

appealing to some. Researchers would like published datasets to be easily citable: that is, with a

persistent URL or DOI.

There was a certain amount of interest in the prospect of using the DaaS to find out about other

researchers’ work, but this was not as popular as other aspects of the service.

Researchers were generally happy to teach themselves to use a piece of software (although some

said that they would like more training in general database theory). To enable users to learn to use

the DaaS independently, it will therefore be important to provide clear online documentation and

guides to performing particular tasks. Some researchers indicated that they felt face-to-face training

was useful – but that courses needed to be short and focused.

The findings of the VIDaaS and Sudamih requirements gathering exercises were broadly in

agreement with each other, although displaying some minor disciplinary differences.

1. Development work on the DaaS should continue in line with the prioritized user

requirements list compiled as a result of this exercise.

2. User testing should be employed to ensure requirements (and in particular the requirement

that the service be straightforward to use) have been met.

3. The DaaS should be accompanied by clear, focused documentation, support material, and

training.

4. Publicity and training materials for the DaaS should strike a balance between emphasizing

those features that make the service distinctive, and giving an accurate overview of its

functionality as a whole.

5. The service should be promoted most extensively to the groups of researchers who have

shown themselves to be most likely to be interested in using it (namely, humanists and

social scientists at the beginning of a research project), and to IT and other support staff who

advise researchers.

Conclusions and Recommendations

Comparison with Sudamih Project

Training

3

01.Version VIDaaS Project Researcher Requirements Report

Page 6: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

Introduction

The VIDaaS (Virtual Infrastructure with Database as a Service) Project, based at Oxford University

Computing Services has two main aims:

To develop an online service that will enable researchers to build, edit, search, and share

databases online

To develop a virtual infrastructure which will enable the database service to function within

a cloud computing environment

The project runs until March 2012, and is funded by JISC and HEFCE under the University

Modernisation Fund.1

VIDaaS is the successor to the Sudamih Project,2 in which a pilot version of the database service

(currently known as the DaaS3) was developed. The VIDaaS Project aims to expand the DaaS’s

functionality, and to develop it into a full production service. The DaaS was initially designed with

humanities researchers in mind, and an additional aim of the VIDaaS Project is to broaden the scope

of the service to make it relevant and useful to academics working in other disciplines.

Between May and July 2011, we conducted a substantial requirements gathering exercise to

improve and refine our understanding of user requirements for the DaaS. This document reports the

findings of that exercise. The process described here builds on the requirements gathering

conducted as part of the Sudamih Project.4

Methodology

The VIDaaS requirements gathering process had two main phases: the collection of qualitative

information through a series of interviews with researchers, and the collection of quantitative data

via an online survey.

Both interviews and survey were designed to explore two main areas: researchers’ current projects

(including details of the datasets they use), and their potential interest in and user requirements for

the DaaS. As the DaaS is designed to be a service that will allow researchers to share and publish

data, questions gauging researchers’ attitudes to data sharing were also included.

Interviews

In June and July 2011, we interviewed nine University of Oxford researchers currently working with

structured data. Potential interviewees were identified via personal profiles on the University

website or elsewhere online, and were then emailed to ask if they would be willing to participate.

http://sudamih.oucs.ox.ac.uk/docs/Sudamih%20Researcher%20Requirements%20Report.pdfSee the Sudamih Researcher Requirements Report:

4This is an interim name, standing for Database as a Service: a service name will be selected in due course.

3http://sudamih.oucs.ox.ac.uk/For more details, see the Sudamih Project website:

2http://www.jisc.ac.uk/whatwedo/programmes/umf.aspxFor more information about the UMF, see

1

3.1.

3.

2.

4

01.Version VIDaaS Project Researcher Requirements Report

Page 7: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

Most interviews lasted around an hour (though a couple were somewhat shorter than this, and one

a little longer), and followed a semi-structured pattern. A list of questions compiled with input from

other VIDaaS team members formed the basis for the interviews, but was not always followed

rigidly, to allow the conversation to develop naturally, and to provide space for the researchers to

expand on particular points of interest. A copy of the final interview question template is provided in

Appendix B.

Survey

In the second half of July 2011, we conducted an online survey. To gain a wider range of

perspectives, two versions of this were provided: one aimed at researchers, and the other at IT

support staff. The survey was hosted by BOS,5 and was publicized via the project’s blog, relevant

mailing lists,6 and targeted emails to University of Oxford researchers and research facilitators. The

majority of the questions were multiple choice, but a few free text fields were also provided to

permit additional comments. A copy of the survey is provided in Appendix C.

62 responses were received in total (although one of these appeared to be spam and was excluded

from the analysis), approximately two thirds of which were from researchers, and one third from IT

support staff.

Interviewee and Survey Respondent Profiles

4.1.Interviewees

The requirements gathering process for the Sudamih Project7 involved extensive interviewing of

humanities researchers; therefore, to round out the picture, the subjects for the VIDaaS Project

interviews were drawn from Oxford’s three non-humanities academic divisions. A decision was

taken to focus particularly on social science researchers, as these were deemed the group to whom

the DaaS was most likely to be relevant. We attempted to recruit interviewees from a wide range of

subject areas and career stages, although the final breakdown was inevitably dependent on which

researchers responded to the invitation email.

University of Oxford Division Graduate students

Early career/ postdoctoral researchers

Mid-career/ senior

researchers

Total

Social Sciences 2 2 1 5

Mathematical, Physical and Life Sciences 1 - 1 2

Medical Sciences - 1 1 2

Total 3 3 3 9

Table 1: Breakdown of interviewees by division and career stage

http://sudamih.oucs.ox.ac.uk/docs/Sudamih%20Researcher%20Requirements%20Report.pdfSee the Sudamih Researcher Requirements Report:

7the ARMA and IASSIST mailing lists.

, the UCISA digest email, JISC research data management lists, and Digital.Humanites@OxfordThese included 6

/www.survey.bris.ac.uk/https:/Bristol Online Surveys: 5

4.

3.2.

5

01.Version VIDaaS Project Researcher Requirements Report

Page 8: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

A full anonymized index of interviewees is provided in Appendix A. Throughout this report, numbers

in square brackets refer to the interview in which a point was raised or a comment was made.

One notable feature of the process was the comparative difficulty of finding interviewees from the

Medical Sciences Division. In the other two divisions, approximately half of the researchers

approached were happy to be interviewed; in Medical Sciences, only one in ten responded to the

invitation email. While of course we cannot be certain of the reasons for this, this may suggest that

researchers in this area felt the DaaS was less likely to be relevant to them than those working in

other disciplines.

Survey Respondents

Approximately three quarters of the researchers who responded were working in either the

humanities (32% of the total) or the social sciences (44%). Only a handful of researchers from the

hard sciences completed the survey, of whom just two were from the medical or life sciences,

mirroring the difficulty in recruiting medical sciences interviewees noted above.

A substantial proportion of the IT support staff who responded also worked on or advised projects in

the social sciences (50%) or humanities (45%). However, medical and life sciences were significantly

better represented here, with 35% working in this area, compared to 25% for maths and physical

sciences, and 15% for creative arts. About a third (35%) of the IT support staff worked on or advised

projects in multiple subject areas.

A little under half the researchers (44%) were mid-career or senior; 29% were postdoctoral or early

career researchers, and 20% were graduate students. They were reasonably evenly divided between

those working mostly alone (44%) and those who were part of a research group (56%).

The largest single group of IT support staff (40%) was those whose job involved IT support or advice

targeted at researchers, but not focused on one particular project. A further 15% provided IT

support for one project, and 20% more general IT support within their institution. (The remaining

25% were mostly engaged in a range of data-related activities.)

Subject area Graduate students

Early career/ postdoctoral researchers

Mid-career/senior

researchers

Other Total

Humanities 4 4 5 0 13

Creative arts 0 0 0 0 0

Social sciences 3 7 6 2 18

Maths or physical sciences 1 1 2 0 4

Medical or life sciences 0 0 2 0 2

Other 0 0 3 1 4

Total 8 12 18 3 41

Table 2: Breakdown of researcher survey respondents by subject area and career stage

4.2.

6

01.Version VIDaaS Project Researcher Requirements Report

Page 9: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

Subject area Number of IT support staff Percentage of total

Humanities 9 45%

Creative arts 3 15%

Social sciences 10 50%

Maths or physical sciences 5 25%

Medical or life sciences 7 35%

Other 1 5%

Table 3: Breakdown of IT support staff survey respondents by subject area of projects worked on

Roughly half the total respondents were from the University of Oxford (49%), and the other half

from elsewhere (51%). (However, a larger proportion (70%) of the IT support staff were from outside

Oxford. This was probably due to the nature of the mailing lists used to publicize the survey – while

there are a number of national and international mailing lists targeted at IT staff, it is harder to reach

large groups of researchers outside one’s own institution.)

In the sections that follow, the survey responses from researchers and IT support staff have been

aggregated where there were no substantial differences between the answers of the two groups.

When there were significant differences (and of course where different questions were asked), the

two are treated separately.

Current Practices

Dataset Details

Interviewees

The researchers interviewed were working on a wide range of different data types. These included:

GIS data [1, 2]

Aggregated statistics [3, 5]

Governmental data [4]

Survey responses [5, 9]

Sensor observations [6]

Protein structure information [7]

Patient data [8, 9]

Most datasets consisted of a combination of textual and numerical information; some also included

images.

The size of datasets also varied widely. Estimates of total size ranged from 200MB to over 800GB.

Smaller datasets were in the majority, however, with more than half of the interviewees working

with data collections of under 10GB. There was no apparent correlation between subject area and

dataset size: both the smallest and the largest datasets were divided reasonably evenly across the

three divisions.

5.1.1.

5.1.

5.

7

01.Version VIDaaS Project Researcher Requirements Report

Page 10: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

Even among the more technologically proficient interviewees, knowledge of the technical details of

datasets was often hazy. Only one interviewee knew the precise size of the data collection without

having to check, or was able to venture an informed guess at input and output figures (that is, the

quantity of data being pumped in and out of it within a given period of time) for the database. Most

interviewees had little idea what the ultimate size of their dataset would be (and in several cases

this question was not really applicable, as the researchers were engaged in ongoing work rather than

a discrete project). None of the interviewees working with relational databases could say precisely

how many tables the database had. This suggests that on a day-to-day basis, there is little need for

most researchers to have this sort of information readily available. One senior researcher [7]

commented that she tended only to think about the size of datasets when selecting appropriate

storage media, and this may well be a common pattern.

The interviewees were reasonably evenly split between those who worked mostly with data stored

on their personal hard drive (though perhaps still using networked storage for back-up, or to obtain

an initial copy of the raw data) and those whose data was hosted on local shared storage or an

institutionally-provided server – frequently to facilitate sharing with colleagues. One interviewee

who worked with large datasets habitually moved older data to tapes for long term storage; she

commented that this meant a lot of effort was involved in accessing previous datasets – it would be

useful to have an archive of all past data that could be easily consulted and shared [7].

Survey Respondents

As with the interviewees, textual (67%) and numerical (70%) data were the most common types

regularly used by the survey respondents. Images were used by a little under half (46%), and audio

or video material by 30%. 16% worked with GIS data.

Datasets were, on the whole, relatively small. Almost half (46%) were under 1GB, and another 26%

between 1GB and 10GB.8 Only 5% of respondents said their dataset was larger than 100GB. 11% (all

from the researcher respondent group) did not know the size of their data collection. Once again,

there were no obvious correlations between dataset size and subject area – although given the

comparatively small number of respondents from the sciences, more data would be needed to draw

firm conclusions on this point.

When asked about the anticipated final size of their dataset, well over a third (43%) said they did not

expect their dataset to get significantly larger than it was at present (this includes 5% – all from the

IT support group – whose projects had already concluded). Just under another third (30%) said their

dataset would have grown, but not to more than double its current size. Those who forecast a larger

growth were typically those who had smaller datasets to begin with: no respondent with over 100GB

of data expected that their dataset would more than double in size.

Data storage practices varied noticeably between the two groups surveyed, with – perhaps

unsurprisingly – IT support staff reporting much greater use of networked storage. Only 15% of

datasets worked on by this group were stored on personal hard drives or other local storage such as

DVDs, compared to 49% for researchers. A tenth of each group used network attached storage

ad recently worked on.hAs many IT support staff work on multiple projects, this group was asked to answer for a typical project they

8

5.1.2.

8

01.Version VIDaaS Project Researcher Requirements Report

Page 11: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

managed by the research project. About a quarter of researchers (24%) used departmental servers,

compared to a third (35%) of IT support staff. Another third (35%) of the IT support staff used

centrally-provided institutional storage, while this was true for only 7% of researchers.

Use of the cloud was rare: only one researcher (and no IT support staff) reported this as the main

method of data hosting. One other made some use of the Grubba online database service’s cloud

storage, along with institutional servers.

Software Used

Interviewees

Four interviewees used relational databases to store and analyse their data. Another four worked

predominantly with data in flat file format, and used a combination of spreadsheets (usually Excel)

and statistical analysis packages (most commonly Stata and SPSS). Three of these four were social

scientists, and one other social scientist (who had a custom-built relational database system) also

reported making significant use of statistical analysis software. The final interviewee’s data was

stored in plain text format.

Some researchers also used specialist software tools to meet the specific demands of their subject

area. For example, the archaeology doctoral student [2] made extensive use of GIS software, and the

senior chemistry researcher [7] used a biomolecular simulation package.

Survey Respondents

We asked respondents to indicate which types of software or other tools they commonly used to

store and analyse their structured data (or, for IT support staff, the software or tools that were used

in the projects they worked on). On average, researchers reported making use of two to three types

of software, and IT support staff around five.

Spreadsheet programs were popular (used by 51% of researchers, and 75% of IT support staff), as

were statistical analysis programs (used by 44% and 70% respectively). Also widely used were plain

text files, word processing programs, and XML documents (all used by around a third of the

researchers, and about half the IT support staff).

A little over half of all respondents (56%) used some kind of relational database. However, there was

a more substantial difference here between the two groups: while only 40% of researchers used this

sort of system, 90% of IT support staff selected it. Respondents were asked to indicate whether they

used Microsoft Access, FileMaker Pro, another widely available package, or a custom-built system:

Access, custom-built systems, and other packages were roughly equal in popularity, with FileMaker

used by a significantly smaller number.

A second question asked which type of software or tool was used most. The answers broadly

followed the same pattern as the previous question. Among researchers, the most popular choices

were statistical analysis packages (27%), relational databases (17%), spreadsheets (15%), and XML

documents (12%). For IT support staff, relational databases and statistical analysis packages were in

joint first place (30% each), closely followed by spreadsheets (25%).

5.2.2.

5.2.1.

5.2.

9

01.Version VIDaaS Project Researcher Requirements Report

Page 12: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

There were some notable disciplinary differences here. As with the interviewees, a large proportion

of social scientists made use of spreadsheets and/or statistical analysis packages – and all but one of

the researchers who used statistical packages most were from this subject area. Among humanities

scholars, XML documents proved to be the most popular method for dealing with structured data:

62% make some use of them, and 23% use them more than any other type of software.

Data Sharing and Publication

Interviewees

Two interviewees [1, 3] conducted their research predominantly alone, and did not share the

working version of their dataset with others. Three others did a substantial amount of solo work, but

stored at least some of their data in such a way as to permit colleagues to access it. Two of these [4,

8] used local shared storage, although one of the two [4] commented that in practice her colleagues

rarely (if ever) made use of the shared data; the third [5] used Dropbox to share selected portions of

his data (approximately 30% of the total) with collaborators. The other four interviewees [2, 6, 7, 9]

were all working on collaborative research projects that required multiple users to have access to

the data.

Where data was shared, this was typically with between one and three other people, though in two

research projects [6, 9] the groups were much larger. It should also be noted, however, that while

specific portions of a data collection are often only shared with a small number of people, each

researcher may have multiple such sharing groups. For example, the researcher using Dropbox

estimated his total number of collaborators as between eight and ten, although individual datasets

were not typically shared with more than three people. Similarly, another researcher [7] shared data

both with a research group in Oxford, and with collaborators at other institutions.

As far as can be judged from a sample of this size, sharing of working data seemed to be more

common among scientists: all four of the scientists interviewed shared data with colleagues. The two

projects with larger research groups were also both in the sciences.

When asked whether they had future plans to share their data, the interviewees gave mixed

responses. Four of the researchers (all social scientists) were working with data that was already

publicly available [1, 3, 4, 5]. However, at least one was planning to deposit copies of the analysed

and manipulated data in an archive at the end of the project (and another noted this was something

he should be better at doing, although in practice he didn’t usually get round to it).

The fifth social scientist was working on a project designed to make a large body of previously

unpublished material publicly available [2].

One scientist told us that in her area there were some central databanks in which certain types of

information would be deposited [7]. Otherwise, data sharing tended to happen informally: you

might email a colleague and ask to see some of their data (although you would normally only do this

with someone you knew, or if the data was quite old – more than three years post-publication).

Another scientist said there were no plans to make the whole dataset public, but that portions

associated with specific research publications would be released [6].

5.3.1.

5.3.

10

01.Version VIDaaS Project Researcher Requirements Report

Page 13: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

Both medical scientists were working with confidential information, and did not have personal

control over whether the data should be published or not [8, 9]. However, one of the two had

recently worked on a project to create an online compendium of anonymized datasets, and the

project the other worked for has made aggregate information (for example, numbers of cases)

available on the Web.

Survey Respondents

The respondents exhibited mixed views towards sharing data with people other than their own

immediate collaborators. Although 59% of researchers said that making use of publicly available

datasets (shared by other researchers or organizations) was important for their work, not much

more than two thirds of this number (41%) said they were happy to make their own research data

available once they had completed the work they intended to do and published the results. About a

third (34%) had previously published data, while roughly half (49%) said they intended to make all or

most of the data from their current project available in the future.

Just under half the researchers (46%) said they would be happy to share data privately with

colleagues (by, for example, emailing them a file), and a slightly larger number (51%) had actually

done so in the past.

In some cases, data sharing was restricted by factors beyond the researchers’ control. 41% said that

confidentiality or intellectual property restrictions made it hard to share at least a substantial

portion of their data, and 65% of IT support staff said this was at least occasionally true of the

researchers they worked with. For 42% of researchers, the decision about whether to make the data

available did not rest with them. Unsurprisingly, this was more likely to be true of respondents who

were part of a project team.

A fifth of the researchers said that they would like to make their data publicly available, but didn’t

currently have a straightforward means of doing so.

Only a relatively small proportion – 17% – reported that their funding body required them to make

their data available. When asked a similar question, 30% of IT support staff said that this was

frequently true for the researchers they worked with, and another 40% that it was least occasionally

the case. (It is possible that the discrepancy between the two groups arises because projects with

significant input from IT support staff are more likely to be those funded by specific grants, leading

to funding agencies taking a greater interest in what happens to the data.)

Despite the somewhat lukewarm attitudes to data sharing, most researchers felt their data was

likely to be useful to other people. Just under half (48%) said that all or most of their data was of

potential value or interest to other researchers in higher education, and just over another third

(38%) said this was true of a substantial portion of their data. Almost two thirds (63%) said at least a

substantial portion would interest people outside the HE community, although a much smaller

number (21%) believed it had commercial value.

5.3.2.

11

01.Version VIDaaS Project Researcher Requirements Report

Page 14: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

Database as a Service (DaaS)

Interest in the DaaS

Interviewees

When the DaaS was described to them, the interviewees by and large reacted positively. Only one of

the nine said he couldn’t see any real application for the service in his own research. However, in

most cases, interest in the DaaS was more hypothetical than actual: a number of researchers said

that the DaaS sounded as though it might be useful for projects they were thinking of undertaking in

the future, or that it would have been helpful for projects they had worked on in the past. This

mirrors the finding of the Sudamih Project requirements gathering exercise that researchers who

were some way into a project generally already had a well-established system and were not eager to

make major change to their working practices, but were far more willing to consider the DaaS for

projects which had not yet got underway.

Two researchers did suggest possible uses for the DaaS as part of their current research projects,

although both were for relatively minor aspects of the work.

A doctoral student noted that the research group he worked for had initially planned to

produce a relational database of their data to enable colleagues with little experience of

working with the raw data to run simple queries. However, this had not yet happened, as

the amount of work that would be involved in setting it up would outweigh the benefits of

doing so. If the DaaS could offer a straightforward way of doing this, it might be of interest

[6].

A postdoctoral researcher told us that her project had inherited a dataset that was originally

compiled as a relational database, but now exists only as a flat file. A user-friendly database

service might provide a way of reconstructing the database, which would allow them to do

more with the data [4].

It is noticeable that both these interviewees stressed that for them to be interested in using it, the

DaaS would have to make accomplishing the task both easy and reasonably quick. In both cases the

suggested use for the service was something that they would have liked to be able to do, but which

was by no means essential to the success of the research project, and therefore would not merit any

major investment of time and effort in learning to use a new system.

However, there were two proposed features of the DaaS that produced a rather more enthusiastic

response: the possibility of multiple researchers being able to access the same database, and a

straightforward means of publishing datasets online.

Several researchers said it would be useful to be able to share datasets with colleagues, especially

those located outside Oxford, and often outside the UK. At present, there are few straightforward

ways of doing this securely.

, p. 24.http://sudamih.oucs.ox.ac.uk/docs/Sudamih%20Researcher%20Requirements%20Report.pdfSee Sudamih Researcher Requirements Report,

9

Data Sharing

9

6.1.1.

6.1.

6.

12

01.Version VIDaaS Project Researcher Requirements Report

Page 15: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

Methods currently employed include emailing files [4] and sharing data via an anonymous

FTP connection (although the researcher who had done this commented that the IT staff

were often reluctant to set these up [7]), but this only provides collaborators with a

duplicate copy, rather than allowing multiple parties to work on (and edit or annotate) the

same file.

One doctoral student also noted that some of his colleagues who are working with third-

party data often have difficulty sharing files in a way that meets the security requirements

imposed by the data providers [1].

On a related note, one researcher also said that it would be very helpful to be able to access

her own data remotely – when she is away from Oxford working with collaborators, for

example [7].

A number of interviewees indicated that it would be helpful to have a straightforward way of making

datasets available online.

In particular, several researchers were interested in being able to publish a particular subset

of their data – to accompany a journal article and allow other people to verify their findings,

for example [1, 3, 4, 6, 7].

o “You’re often required to do this somewhere, and providing the data in a suitable

format can be quite annoying.” [1]

o “It would be helpful if it were possible to publish a particular subset of the data, or a

particular layout, rather than the whole database. In some cases, you might want to

be able to work on the details with a small group of peers, and then publish a neat

version – with comments fields hidden, for example.” [3]

o “My supervisor doesn’t want the whole dataset to be made publicly available as it is.

However, he is very keen that whenever research papers based on the data are

published, relevant portions of the data that support the findings are also

published.” [6]

A number of interviewees mentioned the importance of providing stable and long-term

access to datasets, and suggested this might be more easily provided via a central service.

“You really need a persistent URL, and researchers don’t want to have to worry

about this themselves – they don’t want to have to keep an old machine running or

to keep setting up redirects to ensure that the original URL still works.” [1]

o “We would really prefer not to have the responsibility of ensuring that the website

the data is made available on remains accessible for the long term.” [4]

o “I’ve sometimes had the experience of following up references to datasets from

publications, and finding that the data owner has moved institutions and the data is

no longer available from the original URL.” [7]

The importance of datasets being citable was also mentioned.

“From the point of view of personal academic reputation, there’s no point in putting

a lot of work into something if you won’t get any credit for it [...]. It would be nice if

the online datasets had DOIs, or something similar.” [3]

o

o

Data Publication

13

01.Version VIDaaS Project Researcher Requirements Report

Page 16: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

A service like this would also provide a convenient way of making available information that

is too bulky to be easily presented in print.

o “My thesis included a large number of data tables in an appendix – these ended up

being in small print and not easy to read.” [3]

o “Some collaborators recently sent me a draft of a paper, which had some

supplementary material that would ultimately be provided alongside the paper –

probably on the journal website. I started to print this out, and then realized that it

included 234 pages of tables!” [7]

However, it should be noted that the probable usefulness of such a feature varies from discipline to

discipline.

Two senior researchers noted that in at least some parts of their fields, publication of

supporting datasets was usually handled by the journals themselves [5, 7].

One of these two also observed that if he was going to publish full datasets, it would make

more sense to do this through something like the UK Data Archive: they have the resources

to deal with large data deposits, and other researchers would be more likely to find the data

this way [5].

In some areas, such as archaeology, publication of full datasets is not common: articles will

usually just be accompanied by a couple of photographs. However, the archaeology doctoral

student we spoke to commented that “It would be amazing” if more researchers did choose

to share their data [2].

In addition to the possibilities for data sharing and publication, DaaS users will be invited to add

information about their project to a central system that will allow other users to see what they’re

working on – as, for example, a means of identifying potential collaborators, or enquiring about

potential data re-use.

When asked about this, the majority of interviewees said they would in principle be happy to make

details of their own research available in this way. A couple, however, noted that the decision did

not rest entirely with them: in particular the two medical sciences researchers, who were both

working with confidential data [8, 9]. One doctoral student said he did not have any objections

himself, but that his supervisor might be concerned: because the data he is working with is all

publicly available, drawing attention to his project in this way might increase the risk that someone

else would imitate his research strategy and then publish before him [1].

Opinions about the usefulness of such a service for finding out about other researchers’ work were

more mixed. Some felt it would definitely be useful, while others were uncertain that it would tell

them anything that wasn’t already available from other sources.

Among the positive comments were:

“We’re very interested in having a public face for what we’re doing, especially considering

how important impact has become. [...] There are lots of project websites out there that few

people know about – some sort of central service might make it easier.” [4]

Making Dataset Details Available

14

01.Version VIDaaS Project Researcher Requirements Report

Page 17: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

“Being able to find out what other researchers are working on would be very useful. I talked

to someone recently who’s spent the last year working on something related to my project,

and I’ve only just found out about it.” [4]

A doctoral student noted that as far as he was aware, there was currently no systematic

means of finding out about new datasets, other than trawling people’s websites to see what

they’re working on [6].

“I could imagine this being really useful if it covered researchers outside Oxford. Within

Oxford, I’d probably already have a good idea what people were working on.” [7]

“It might be useful for people working in a new area: they could see who else had worked on

that and perhaps learn from them.” [8]

More cautious voices included:

“The difficulty with this sort of service is that it would depend on people actually making use

of it, and in practice I suspect most people wouldn’t bother.” [3]

“I probably wouldn’t make much use of this myself: I already feel I’m drowning in

information sources, rather than needing to find more.” [3]

“The people whose names would appear would often be the usual suspects [i.e. people you

already know are working in that area]. And where they aren’t, it wouldn’t always be

appropriate for me to start asking them lots of questions.” [5]

“This probably wouldn’t be relevant to me.” [9]

Survey Respondents

When asked if the DaaS was something they could envisage use for in their own research (or for IT

support staff, the projects they work or advise on), just over half (56%) of the survey respondents

said they could. Another 31% gave the slightly more cautious reply that ‘It depends’.

When the latter group were asked what it depended on, some clear themes emerged. The single

most frequently mentioned factor was cost. Functionality was also mentioned by a number of

respondents: in some cases in general terms, and in others with reference to a specific feature the

service would need to offer to be of interest. These have been treated as user requirements, and are

discussed in Section 6.2.2 below. Other concerns included how easy it would be to learn to use the

system, data security, and whether it would be possible to persuade colleagues to use the system.

Figure 1 is a word cloud generated from the responses to this question.

6.1.2.

15

01.Version VIDaaS Project Researcher Requirements Report

Page 18: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

Figure 1: Wordle.net word cloud generated from free text responses regarding factors

affecting likelihood of using the DaaS

A second question asked about the likelihood of using the DaaS to publish research data. A third of

respondents (33%) said they could definitely envisage themselves doing this, and just over another

third (38%) said it was possible. A further 13% were unsure; the remainder were evenly divided

between those who were happy with their existing ways of publishing data, and those who did not

expect to publish data at all.

Those who answered the previous question positively were also invited to say whether they would

be most likely to publish complete datasets, or specific subsets of their data (e.g. to support research

publications). This was an optional question, and relatively few people responded, but of those who

did, about a third said they would publish specific subsets, and another third both complete datasets

and subsets. The rest were equally divided between those who said they would publish just

complete datasets, and those who were unsure.

A free text question invited respondents to suggest ways in which the DaaS might save time,

improve research, increase efficiency, or otherwise make life easier. The answers here also indicated

a significant amount of interest in the prospect of using the service to share data.

“It would be most useful for enabling multiple users (like research assistants) to access the

database at the same time.”

“Could greatly simplify process of supporting collaboration between researchers working

across institutions / countries without unmanageable overhead of security and backup

issues.”

“Ease of sharing a very significant benefit to researchers who have neither expertise nor

resources to provide [this] by other means.”

“Having a secure and fairly straightforward means by which to share data with selected

collaborators around the world would be extremely useful.”

“Allow[ing] easy publishing and sharing of log type information and calibration information

for research datasets, even if research data itself not ingestable in this form.”

16

01.Version VIDaaS Project Researcher Requirements Report

Page 19: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

“By encouraging users to make datasets publicly available, it would also give them incentive

to be more rigorous in their experiments and data labelling.”

Secondly, researchers and IT support staff both liked the idea of a service that would provide a more

straightforward method of creating a database than existing solutions.

“When I can't afford developer time to build a robust database, I could turn to this to avoid

using more questionable methods (e.g. spreadsheets; Access).”

“It's usually quite difficult to explain to researchers the specifics of database design, so if

they can learn a manageable front-end themselves, that would help.”

“DaaS is something we could provide [to] many of the people we assist who are primarily

asking for database-backed websites, or database-backed management of project data.”

One respondent also commented on the role the DaaS might play in funding applications.

“I can potentially see DaaS being used as part of an NSF required Data Management Plan for

grant proposals.”

The final question in this section asked if respondents would be interested in finding out more about

the DaaS test user group. Just under half (44%) said they would, indicating that a substantial number

felt the DaaS was of at least some relevance to their work.

6.2.1.

User Requirements for the DaaS

Survey Respondents – Software Feature Rankings

Survey respondents were presented with a list of possible software features, and asked to indicate

which of these they would find useful for managing data. The responses are summarized in Table 4.

Ratings for each suggested feature Essential Useful but not essential

Not relevant

Unsure what this means

Automated backup 71% 26% 3% 0%

Automated versioning 31% 57% 10% 2%

The ability to import or export data in a range of formats 62% 29% 8% 0%

The ability to view or present data in different ways (e.g. by creating customized forms or reports)

51% 34% 13% 2%

The ability to plot results on maps 30% 36% 33% 2%

The ability to enter and search data in XML formats 25% 34% 30% 11%

The ability for multiple users to access and edit the same database

44% 30% 26% 0%

The ability to set different permission levels 48% 33% 20% 0%

The ability to make data publicly available via the Web 41% 38% 21% 0%

Document-oriented database functionality 16% 43% 16% 25%

A mail merge function 2% 28% 54% 16%

Table 4: Overall ratings (researchers and IT support staff) for each feature

6.2.

17

01.Version VIDaaS Project Researcher Requirements Report

Page 20: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

As one might anticipate given the popularity of XML documents in the humanities that was noted

above, humanities researchers were significantly more likely to rate the ability to enter and search

data in XML formats as essential than any other group. The same is true of document-oriented

database functionality (although 50% of the social science researcher respondents rated this feature

as useful).

Humanists were also slightly more likely to deem the features relating to data sharing to be essential

(that is, the abilities for multiple users to access and edit the same database, to set different

permission levels, and to make data publicly available via the Web). Social scientists, on the other

hand, tended to regard these as useful, but not essential.

For social science researchers, the most important feature was the ability to import or export data in

a range of formats: all the respondents in this group rated this as at least useful, with 61% regarding

it as essential.

Both researchers and IT support staff rated automated backup and import/export functions very

highly. However, there were some significant differences between the two groups of respondents in

terms of their next few priorities. These are listed in Table 5.

Researchers IT Support Staff

1 Automated backup Automated backup

2The ability to import or export data in a range of formats

The ability to import or export data in a range of formats

3The ability to view or present data in different ways (e.g. by creating customized forms or reports)

The ability to set different permission levels

4Automated versioning The ability for multiple users to access and

edit the same database

5

The ability to make data publicly available via the Web

The ability to make data publicly available via the Web

The ability to set different permission levels The ability to view or present data in different ways (e.g. by creating customized forms or reports)

Table 5: Highest priority data management software features for each respondent group

These rankings were derived by assigning each suggested feature a weighted score (two points were

awarded if a feature was deemed essential, and one point if it was deemed merely useful).

Coincidentally, both groups had a tie for fifth place.

It is slightly puzzling to note that both groups apparently regarded the ability to set different

permission levels as more important than the ability to for multiple users to access and edit the

same database – despite the fact that the former only becomes relevant in situations where the

latter is implemented. This may have resulted from some confusion about exactly what the question

was asking, or it is possible that some respondents understood the question about permission levels

conditionally – that is, to be asking how important they would deem this if multiple users could

access the database.

18

01.Version VIDaaS Project Researcher Requirements Report

Page 21: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

Interviewees and Survey Respondents – Feature Requests

In addition to the feature rankings listed above, survey respondents were given the opportunity to

list any additional features that would make a service like the DaaS attractive to them (and in the

case of IT support staff, what might make them inclined to recommend it to researchers). Just over a

quarter of researchers and just over half the IT support staff chose to complete this question. A

handful of additional user requirements were also mentioned in free text questions elsewhere in the

survey (such as the question discussed in Section 6.1.2 above). The interviewees were also asked

what features they would like to see in the DaaS.

For ease of reference, the comments and quotations in this section are colour-coded:

Black text: interviewees

Purple text: researcher survey respondents

Blue text: IT support staff survey respondents

(While they are presented together here for convenience, it should perhaps be borne in mind that

the comments from the interviews and from the survey responses are not directly comparable:

interviewees were simply asked to list useful features, whereas survey respondents were specifically

asked to name features other than those they had ranked in the earlier question.)

A key consideration was ease of use. There were two main factors here. First, this would be

necessary to make the system attractive to researchers with limited technical expertise. Secondly,

even technically proficient users are wary of investing a lot of time and effort in becoming familiar

with a new system, especially if it’s not immediately clear how useful it will be to them.

“It needs to have an interface that is intuitive and efficient – so it’s easy to work out how to

do things, and isn’t too time consuming to do them.” [3]

o The importance of an intuitive interface was also stressed by interviewees 4, 6, 7,

and 9.

“I’d have to look into it a bit more. There’s a cost to learning how to use a new program, and

for it to be useful, my colleagues would also need to use the same system.” [4]

“I find that a major barrier to getting things done is getting distracted; there aren’t enough

hours in the day to investigate everything that might possibly be useful.” [5]

“[I’d like] a system that makes it clear what you’re doing – so it’s hard to overwrite or delete

things accidentally.” [7]

Factors affecting survey respondents’ likelihood of using the service:

o

“My data is already quite simple. The service would have to be equally simple.”

o

“Default basic set up (easy out-of-the-box use).”

“Whether it would be easy to use and whether I'd have time to learn to use it.”

o “Ease of use for researchers without significant technical experience.”

On a similar note, when asked about the training resources they would like to see available, a

couple of interviewees responded that to be attractive, the service would need to be sufficiently

intuitive that little or no training was needed.

o

6.2.2.

19

01.Version VIDaaS Project Researcher Requirements Report

Page 22: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

“To be slightly controversial, unless I can see how to use it when I open it, I probably won’t

be too interested.” [4]

“If a service is well designed, you shouldn’t really need additional documentation or

training.” [6]

The issue of sustainability was also raised:

“You don’t want to put your data into a service and then find it’s no longer accessible, or

that a commitment to keep something available online hasn’t been honoured.” [4]

“We would like the service to be stable and durable.”

As noted in Sections 6.1.1 and 6.1.2 above, there was substantial interest in the possibility of using

the service for data sharing. Specific feature requests included:

The ability to share data securely with collaborators at other universities and in other

countries [1, 4, 7, plus several survey respondents]

A system that can be used to make datasets available to a wider group without needing

them to have specific software or technical expertise [3]

Several requests related to the ability to choose to share only a limited portion of the data:

The ability to easily extract a particular subset of the data for sharing [9]

o Capability to truly do collaborative work on defined datasets

o [The ability to] make a subset of the data available in an aggregated form to one

group of people, while another group have full access to the raw data

A number of interviewees and survey respondents noted that if multiple people were accessing the

same dataset, it was important to be able to set different permissions levels. Specific requests

included:

Control over who can edit the data – perhaps including a mechanism where people can

make changes, but these have to be agreed or checked by another user [3]

Access controls – you need to be able to control who can edit the data, while perhaps

allowing others to view it but not change anything [4]

“I would like to make [a database of resources that is] searchable by other labs in our unit,

but in a way that they cannot see where things are kept so we can control what is taken.”

One researcher survey respondent also requested the ability for multiple users to access the

database simultaneously. This would presumably require mechanisms to prevent users from

inadvertently overwriting each others’ changes.

Backup and versioning were also mentioned by a number of people. The latter was of interest for a

number of different reasons:

One postdoctoral researcher noted that the ability to see how a dataset looked at a

particular point in the past was helpful in many research projects not just as a way of

rectifying mistakes, but because it can be useful to be able to track how the information has

changed over time. [3]

o

20

01.Version VIDaaS Project Researcher Requirements Report

Page 23: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

A senior researcher said that the security of knowing you can always revert to an earlier

version if anything goes wrong helps people to be more confident in using the software:

they are more likely to experiment if they aren’t worried about losing data. [7]

An IT support staff survey respondent requested “Not just version control (e.g. rollbacks) but

notification of conflicts (e.g. like Subversion) so a researcher knows they are overwriting a

colleague's changes (on re-import, or DB updates).”

As also noted above, a substantial proportion of researchers were interested in using the DaaS for

data publishing. Specific feature requests included10:

A straightforward way of publishing a particular subset or layout of one’s data – e.g. that

associated with a journal article [1, 3, 4, 6, 7]

Persistent URLs or DOIs for published datasets, so they’re citable and remain available long

term [1, 3, 4, 7]

The ability to supply explanatory notes or documentation alongside each individual dataset

[4]

Knowledge of who, if anyone, uses the data

And relatedly:

One researcher survey respondent also stressed the need for the use of the appropriate

metadata standards for the field or discipline.

Good data security was a concern for a number of people, especially when data was being shared.

A doctoral student noted that security would need to satisfy the requirements of data

providers who impose strict NDAs and confidentiality agreements [1].

“Security (e.g. appropriate authentication) is absolutely crucial).”

One IT support staff survey respondent unfortunately saw this as an insurmountable barrier:

“We would not be able to use a service like this due to security and confidentiality concerns;

I do not see a way that this could be overcome.”

As many researchers have pre-existing datasets, the ability to import data was also essential. Being

able to export data was similarly important, both to have the option of using it in other programs,

and so that researchers did not have to worry about getting locked in to the system. Specific

requests included:

The ability to import data from Excel [3]

The ability to import layouts as well as data from FileMaker

XML document import either using simple mappings to relational fields, or true XMLDB

import using something like eXist

above.6.1.1For ease of reference, this section includes some user requirements already mentioned in Section

10

21

01.Version VIDaaS Project Researcher Requirements Report

Page 24: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

Once their data had been imported, researchers wanted to be sure that the system would have

sufficient processing power to handle it.

One doctoral researcher reported that he had previously had problems running complex

operations on his dataset (which was a few GB in size): the system froze, or took hours or

even days to complete the process [1].

A senior researcher commented that it was very frustrating if a system was “so cumbersome

that it becomes slow” [7].

A related point was made by a postdoctoral researcher, who said that although her datasets were

not large in terms of file size, they could often by quite unwieldy and difficult to view on screen.

Some way of keeping track of the work done would therefore be very helpful – for example

a way of cataloguing the graphs or data visualizations that have been created, to make it

easy to locate them again later [4].

Regarding working with their data, researchers expressed a desire for flexibility. Specific requests

included:

Flexible data formats – you need to be able to choose the format that works for your

project, not have one imposed on you [2]

o In particular, the ability to handle archaeological find numbers (and so on) that

include a mixture of letters, numbers, dots, and hyphens [2]

The ability to create custom forms for data entry [2]

Flexible layouts – the ability to present the data (or subsets of it) in different ways, for

example to rearrange columns, or filter out certain information [3]

Flexibility in terms of the type of data that can be entered – including the ability to enter

large amounts of text in a single field [3]

Reporting tools – the ability to save or print off a customizable summary of the data or a

subset of it [9]

High levels of customisability (as project needs grow)

However, some also noted that there are occasions when it is equally important to be able to

restrict the format of data that can be entered into specified fields. This might be needed:

To prevent people from entering a range rather than a single figure [3].

To force the user to select from limited number of options (e.g. via a drop-down list) [9].

To standardize date formats or other information [6, 9].

To sift out entries that don’t make sense within the project in question – e.g. ensuring that

that date of death is always later than date of birth [9].

A good search function was also mentioned.

“It would be useful to have features that allow users with little technical knowledge to run

queries easily – e.g. combo boxes and a simple query wizard.” [6]

“It would need to have good search facilities.” [7]

“For XML imports, [we’d need] XPath and full XQuery search interfaces.”

22

01.Version VIDaaS Project Researcher Requirements Report

Page 25: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

Researchers and IT support staff also had a range of requests for other features. These included:

Good graphing and visualization tools – e.g. the ability to create multi-dimension graphs [4]

Visualization of datasets in tables and graphs

The ability to cross reference between entries in the database [3]

The ability to include audio and video snippets in addition to textual information [4]

The ability to connect specific points on drawings or photographs to database entries (e.g. to

link information about a pot discovered in an archaeological dig to the place where it was

found) [2]

A tool that allows you to create digital drawings which can go straight into the database [2]

The ability to connect drawings or photographs to each other – to show how the areas

depicted relate to each other, for example [2]

“We need to be able to link to maps of the many different storage areas in a relational way.”

A version of the system designed for use on mobile devices such as the iPad, so it’s possible

to enter data while you’re doing fieldwork [2]

The ability to parse data more easily – i.e. to transform non-standardized information into

something you can actually work with and analyse [1]

“It would be great if sufficient storage space were available to create an easily accessible

archive of earlier datasets.” [7]

Mail merge functionality [9]

Data manipulation, either through scripts (in the style of PHP, R, etc.) or through a data

mining interface (e.g. Cognos)

Summary info, data inconsistency checks, and some basic textual analysis

Native database functionality

“Being able to work with no or only very slow network access is vital for me – e.g. during

remote fieldwork.”

Customization and scripting capabilities for the web-based front-end of databases created

Being able to generate linked data as RDF

Platform interoperability

Some researchers told us that it would be very helpful if the database service was able to interact

with other software and tools:

An archaeology graduate student observed that standard database packages (such as

Microsoft Access and FileMaker Pro) are usually unsuitable for doing work in her field, as

they cannot interact with the GIS tools that archaeologists use – the Harris matrix, for

example. If the DaaS were able to do this, that would make it very attractive [2].

A sociology doctoral student observed that it would be really helpful to have an easy way of

bridging the gap between the database and analysis programs such as Stata, SPSS, or R, and

between the database and visualization tools [1].

This interviewee did most of his analysis using Python, and also commented that it would be

useful to have an easy interface with this [1].

A desire for documentation and other accompanying materials was also expressed:

Comprehensive, cross-platform documentation [1]

23

01.Version VIDaaS Project Researcher Requirements Report

Page 26: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

Tools/training resources to ease the process of designing the database (and designing it

well!) at the beginning of the project [9]

'Out-of-the-box' support for the University of Oxford's data access policies (in the form of a

short and reassuring document for managers and some extremely detailed documentation

for IT staff)

Finally, there were comments relating to financial issues:

Cost is the major stumbling block; if it was free at point of use for individual researchers, and

paid for by funders for funded research projects and institutions, that would be a major

benefit to research.

6.2.3. Interviewees – Views on Training

In addition to asking about user requirements for the database service itself, we also talked briefly to

the interviewees about training. We asked what training in data management or data handling they

had personally received, how happy they were with existing provision, and what sort of training

resources they would like to see offered alongside the DaaS.

Most of the interviewees were largely self-taught. Some had attended short courses (lasting

between a few hours and a few days) to learn to use a particular piece of software: in particular, it

was moderately common for those using statistical analysis packages to have received training (a

senior lecturer told us this was standard for graduate students in the social sciences). Two

interviewees had done more extended training (a Master’s degree in IT [6], and a longer course on

Access [9]). However, it was most common for people to learn to use a new piece of software by

experimenting with it, reading documentation or online resources, and perhaps asking colleagues or

friends for advice.

“If I have a project that needs me to learn something new in order to do it, I’ll do that – but

I’m unlikely to pick it up by just doing exercises.” [1]

“You generally learn how to use things like simulation software on the job, from the people

you’re working with [...] new people joining a lab tend to learn very quickly, as you can’t

really do anything without it.” [7]

In general, the researchers we spoke to were reasonably happy with the software training that was

provided. However, several people commented that it would be useful to have more training

available on the general principles of data management and database design [1, 2, 3, 5, 9].

“Ideally, training would be modular – you’d start with general training in the theory of

working with databases, which would be relevant whatever program you were using, and

then you’d add specific training on how to get the particular package you’re using to do

what you want it to do.” [2]

General–Training

24

01.Version VIDaaS Project Researcher Requirements Report

Page 27: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

“It would be very helpful to have a short course – maybe a lunchtime session – about why

people should consider databases and what they’re helpful for. The idea of a database is

abstract, and without filling it with life, it’s quite difficult to convey its attractiveness.” [3]

“On previous projects, it’s been glaringly obvious that the scientists didn’t really know about

databases. They had done the best they could, [...] but they weren’t aware that database

design was important and that they should have got proper advice at the beginning.” [9]

Some interviewees commented that ideally, they would like to be able to consult someone on a one-

to-one basis for help in designing a database for their particular project [2, 3, 9].

As noted in Section 6.2.2 above, there was a general feeling that for the service to be attractive, it

should be sufficiently intuitive to be usable with little or no training. The majority of interviewees

were happy to figure out how to do things using help files and documentation – as long as these

provided a sufficiently clear guide [1, 3, 4, 5, 7]. It was also commented that it was important for

accompanying materials to be searchable, and to be presented in a way that meant it was easy to

find information on the specific task you were interested in.

Some interviewees also said they liked online tutorials – either demonstrations of how to use the

software, or specific interactive examples you can work through, with a view to learning how to

apply particular techniques to your own material [4, 5, 7, 8].

A couple of researchers mentioned print resources, in addition to or instead of electronic ones.

“I actually rather like to have a hard copy of training material (as long as it’s not too long!),

rather than just an online resource: it’s much easier to flick through the paper version and

find what you’re after.” [7]

There were mixed views about face-to-face courses. Some researchers actively preferred this sort of

training, while others were less enthusiastic.

“Face-to-face is always better [...] though it needs to be run regularly, so that new starters

and people who’ve changed roles have an opportunity to learn.” [8]

“Nothing can replace face-to-face courses – because they commit you to going and doing

the training and setting aside specific time for it.” [3]

“Face-to-face training has the advantage of allowing you to talk to other people.” [9]

“I have been on courses in the past – but I’m not certain how much help they’ve actually

been.” [1]

It was acknowledged that not all researchers work in the same way.

“A combination of face-to-face courses and online material would be good – different things

suit different people.” [9]

A postdoctoral researcher who usually preferred to work things out for herself commented

that she had a colleague who would much rather go on a course to learn the basics. [4]

Similarly, a senior researcher who was not a great fan of face-to-face training himself

observed that “A lot of students like courses.” [5]

Training Requirements for the DaaS

25

01.Version VIDaaS Project Researcher Requirements Report

Page 28: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

Some researchers cited lack of time as a reason for preferring online resources.

“My working time is already pretty full with research: taking time off to do a lot of training

isn’t a high priority.” [4]

“I find it increasingly difficult to fit in more courses; the ratio of time and effort to benefit,

and the risk of them turning out not to be that useful makes them less attractive.” [5]

Although it was also noted that online resources have their drawbacks.

“Online courses are more flexible and can be done at any time, but in practice this means

you often don’t do them at all.” [3]

Opinions also varied about whether face-to-face courses were more useful for learning the basics of

a system, or for more advanced features.

“For technical questions that arise later on, I’m happy to use documentation, but for getting

started and getting basic ideas, face-to-face is useful.” [3]

“I’m happy to use help menus and online tutorials, but if that isn’t enough to get me at least

a little way, I probably won’t bother going further – though I might then go for more training

in the subtleties of a program, or when I run up against a specific question about how to do

something.” [4]

However, there was a general consensus that for face-to-face training to be really useful, it should

be short (perhaps half a day or so), and focused on the issues most likely to interest researchers.

“I could see myself going on a course that covered uploading data for publication – if there’s

an API that could be demonstrated in an hour or two, that’s fine, but I don’t want to spend a

week on it.” [1]

“It might be useful to have a short (say three hour) information session that would give

people the basics, tell them what else they might need to know about, and point them

towards further resources that they could follow up.” [7]

Comparison with Sudamih Findings

While it should be borne in mind that the two requirements gathering exercises were not wholly

equivalent (for example, as the Sudamih Project was somewhat broader in scope, the interviewees

included researchers who were not working with structured data; the Project also produced a much

larger body of qualitative data, and had no quantitative element), some useful comparisons may be

drawn.

In broad terms, the requirements of the researchers canvassed in the two exercises were similar.

Both groups expressed a desire for an interface that was straightforward to learn, while offering

sufficient flexibility to accommodate a wide range of projects.

The prospect of being able to share data with colleagues and to publish it online also appealed to

both groups. However, the VIDaaS interviews brought to light two specific considerations that were

not prominent during the Sudamih Project: first, the importance of being able to share data with

7.

26

01.Version VIDaaS Project Researcher Requirements Report

Page 29: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

colleagues outside Oxford, and outside the UK, and secondly, the desire to be able to publish a

specific subset of a dataset rather than the whole thing.

In the Sudamih requirements gathering process, a substantial number of researchers said it was

essential to their work to be able to use diacritics and non-standard character sets. This did not

appear to be a significant issue for any of the VIDaaS interviewees, suggesting that this is chiefly of

relevance to humanities scholars. The VIDaaS interviewees, on the other hand, seemed more

concerned with data security than the Sudamih researchers had been, at least partly because several

of them were working with confidential or otherwise restricted information.

Both groups showed a modest amount of interest in the idea of a service that allowed them to find

out about the datasets other researchers were working on, although other aspects of the service

excited more enthusiasm. Finally, the views of the two groups on training were very similar.

Conclusions and Recommendations

Both the interviews and the survey responses indicate that a substantial portion of researchers

regard the DaaS as something that could have a positive impact on their research.

Although researchers described a wide range of user requirements, there were a number of

common themes. These included a desire for automated backup, the ability to import and export

data, and flexibility. One of the most frequently mentioned requests was that the system should

have an intuitive interface that could be used without much training.

Recommendation 1: DaaS development should proceed in line with the prioritized list of

requirements compiled as a result of this exercise.

Recommendation 2: As the service develops, it should be subjected to regular user testing

to ensure that it meets requirements, and in particular the requirement to be

straightforward and intuitive to use.

Recommendation 3: The DaaS should be accompanied by documentation and support

material that provide a clear and easily navigable guide to performing a range of common

tasks.

Recommendation 4: Any face-to-face training courses that are provided should be short and

focused.

When considering making use of a new tool or service (especially one without a proven track

record), researchers naturally wish to know what it will offer them that is not provided by the

resources they have used previously. The requirements gathering exercise highlighted two features

that seemed to catch researchers’ attention: the ability to share data securely with collaborators

(including those outside Oxford), and a straightforward way of publishing particular subsets of their

data to accompany research publications.

Recommendation 5: These features should be emphasized in publicity and training material

for the DaaS.

8.

27

01.Version VIDaaS Project Researcher Requirements Report

Page 30: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

However, researchers also want to be sure that the new service will offer them the functionality

they have come to rely on in the tools and services they are currently using.

Recommendation 6: To avoid giving users a misleading or simply inadequate impression of

the DaaS’s purpose and capabilities, it will be important to ensure that core functionality and

other features are not neglected.

In general terms, many researchers seem rather ambivalent to data publication. Most have many

calls on their time, and unless it is required of them, it is usually low on the list of priorities.

Recommendation 7: To encourage use of the DaaS for data sharing, training materials

should cover not just how to publish data, but why data publication is worth considering.

In some cases, it seems that researchers may be choosing not to make data public because the effort

of doing so outweighs the potential benefits; if the DaaS can make the process more

straightforward, this might possibly encourage more data publication.

When promoting the DaaS, it is important to catch people at the right point in the research cycle.

Unless there are serious deficiencies in their existing strategy, researchers are usually reluctant to

make major changes to their data management methods or tools once a project is underway. On the

other hand, they are far more willing to give serious consideration to a new system for a project that

is still in the planning stages, or which has only just begun.

Recommendation 8: The DaaS should be publicized extensively to researchers who are likely

to be at the beginning of a research project – for example, new doctoral students and new

post-docs.

Recommendation 9: Particular efforts should be made to ensure awareness of the DaaS

among research facilitators and other support staff who are likely to advise researchers

during the planning stages of a project.

Recommendation 10: While it is perhaps not where the focus of efforts should lie, it will

nevertheless be worth also taking advantage of any opportunities to promote the DaaS

more widely, so that researchers are more likely to be aware of it at the point when it

becomes relevant to them.

Humanists and social scientists have shown most interest in the DaaS thus far. However, although a

comparatively small number of hard science researchers responded to the survey invitation, IT

support staff working with researchers in this area were reasonably well represented, and seemed

as interested in the service as those working in other disciplines.

Recommendation 11: For maximum effect, publicity aimed at researchers should be

concentrated on the humanities and the social sciences.

Recommendation 12: Researchers in other disciplines should be reached indirectly, via the

IT support personnel who advise them.

Although there is still a significant amount of work to be done for the DaaS to satisfy the

requirements discussed in this report, this exercise seems to indicate that by facilitating

28

01.Version VIDaaS Project Researcher Requirements Report

Page 31: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

improvement of data management and curation practices and increased data sharing, the DaaS is a

service with the potential to be of considerable value to individual researchers, and beyond that, to

institutions and the research community at large.

29

01.Version VIDaaS Project Researcher Requirements Report

Page 32: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

Appendix A: Index of Interviewees

ID Academic Division Researcher Details

1 Social Sciences Doctoral student in sociology

2 Social Sciences Doctoral student in archaeology

3 Social Sciences Postdoctoral researcher in sociology

4 Social Sciences Postdoctoral researcher in politics

5 Social Sciences Senior researcher in political sociology

6 MPLS Doctoral student in engineering science and zoology

7 MPLS Senior researcher in chemistry

8 Medical Sciences Early career researcher in health economics

9 Medical Sciences Researcher in cancer epidemiology

Throughout this report, numbers in square brackets indicate in which interview a point was raised or

a comment was made.

30

01.Version VIDaaS Project Researcher Requirements Report

Page 33: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

Appendix B: Interview Question Template

Introduction:

We’re conducting this interview as part of the VIDaaS Project – VIDaaS standing for ‘Virtual Infrastructure with

Database as a Service’. One major aim of the project is to develop a software tool that enables people to build,

edit, search, and share databases online. A pilot version of the tool already exists – this was developed as part

of the earlier SUDAMIH (Supporting Data Management Infrastructure for the Humanities) Project, and we’re

now aiming to improve and expand this into a full service.

We’d like to gain a better understanding of some of the ways that researchers are using structured data and

databases in the course of their research, and of what they’d be looking for in a database service – what’s

essential, what additional features would be useful, and what they’d like to be able to do that isn’t currently

easily possible.

If time permits, we’d also like to talk briefly about sharing and publishing data, and about the sort of training it

would be useful to have available.

We’ll keep the interview to no more than 60 minutes.

We’d like to record the interview, if that’s OK with you. The recording will only be used by project team

members – it won’t be made public. However, we would like your permission to use anonymized quotations in

project reports and other documents. Is that OK with you? [Respondent asked to sign consent form.]

Interview:

Could you start by telling us a little bit about the research you are engaged in?

What’s your area of research? Do you work in a project team or as an individual?

What sort of use do you make of structured data? What software tools do you use?

[How do you store your data? Is it backed up anywhere?]

If the data is stored in a database, how big is this? How much further do you expect it to

have grown by the end of the project?

o How many tables does it have?

o How many people use it?

o Do you have a sense of what the input and output are (that is, how much data (in

bytes) is being pumped in and out of it each day)?

Are there things you’d like to be able to do with your data but can’t, because the software

doesn’t permit it (or because it would be too difficult or time consuming)?

31

01.Version VIDaaS Project Researcher Requirements Report

Page 34: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

The DaaS

The DaaS will be an online service that allows researchers to create, use, and share

databases. It will also be possible to import existing databases into the DaaS.

o Is this a service you could envisage a use for in your own research?

o What would such a service need to offer to make it attractive?

Data sharing

Who owns the data generated by your research project? If you were to move to a different

institution, what would happen to it?

Is your data accessible to anyone other than those working on your project? Do you have

plans to make it available in the future? (If not, what are the reasons for that decision?)

As well as permitting the sharing of datasets among colleagues, the DaaS will provide users

with a straightforward way of making their data publicly available online. Is this a feature

you could see yourself using?

Do you ever make use of other people’s datasets? How do you go about finding out about

and accessing these?

DaaS users will have the option of adding information about their project to a central system

that allows other users to see what they’re working on.

o Would you be happy making details of your own research available in this way?

o Is this something you might make use of to find out what other researchers are

working on?

Thinking now about training…

Have you received any training in either general database design or the use of specific

database software?

[How much IT support is available for the more technical aspects of your work? Are you

happy with the current provision, or are there areas in which you’d like more help to be

available?]

Given your own experience and that of your colleagues/students, do you see a need for

more training in either generic data skills (e.g. selecting the appropriate software tools, the

principles of database design, etc.) or in the use of specific pieces of software?

Other than the program’s own help files, what resources would you like to see being made

available for a new service like the DaaS? Face-to-face training courses? Online tutorials?

Video demos?

What advice would you give to graduate students or new researchers just starting data-

driven research? Is there anything you wish you’d learnt earlier?

32

01.Version VIDaaS Project Researcher Requirements Report

Page 35: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

Appendix C: Survey Questionnaires

Researcher Survey

The VIDaaS (Virtual Infrastructure with Database as a Service) Project, based at Oxford University Computing

Services, is in the process of developing an online database service for researchers. Consequently, we are

interested in finding out about researchers’ current use of structured data (that is, the sort of data which

might be stored in tables, spreadsheets, or databases), and the features they would find useful in a database

service.

If you are conducting academic research involving structured data, we would like to hear from you, and

would be very grateful if you would complete the survey below. We estimate that the survey will take

around 10-15 minutes to complete. All respondents will be entered into a draw for a £100 Amazon voucher.

If you are involved in providing IT support to researchers working with structure data, please complete the

alternative version of this survey, which can be found here [link].

Section 1: About your project

1. What sort of data do you regularly work with? (Please select all that apply.)

Textual

Numerical

Images

GIS data

Audio and/or video

Other (please specify)

2a. What software or other tools do you use to store and analyse your structured data? (Please select all that

apply.)

Plain text files

A word processing program such as Microsoft Word

A spreadsheet program such as Microsoft Excel

Microsoft Access

FileMaker Pro

Another widely available database package

A custom-built relational database system

Other custom-built software

A statistical analysis package such as SPSS or Stata

XML documents

Other (please specify)

2.a. If you use another widely available database package, what is it called? (Optional)

Plain text files

Tables in a word processing program such as Microsoft Word

A spreadsheet program such as Microsoft Excel

Microsoft Access

FileMaker Pro

Another widely available database package

A custom-built relational database system

Other custom-built software

A statistical analysis package such as SPSS or Stata

XML documents

Other (please specify)

4. Approximately how big is the dataset you’re currently working with? (If you use multiple datasets

regularly, please estimate the approximate total size of the data collection.)

Under 1GB

Between 1GB and 10GB

Between 10GB and 100GB

Over 100GB

I don’t know

5. By the end of your current project, is it most likely that your dataset will:

Not be significantly bigger than it is at present?

Have grown, but not to more than double its current size?

Be more than double but less than ten times its current size?

Be more than ten times its current size?

I don’t know

6. Where do you host your data at present?

On local storage formats, such as DVDs, memory sticks, your computer’s hard-drive, etc.

On network-attached storage managed by your research project

On a departmental server

On centrally-provided storage (e.g. on a server provided by central university IT support services )

In the cloud

I don’t know

Other (please specify)

6.a. If you store your data in the cloud, please specify who provides your cloud storage:

to store and analyse your structured data?most3. Which of these do you use

33

Page 36: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

7. Thinking now about data sharing: which of the following statements are true of you and your research

project? (Please select all that apply.)

NB. In the statements below, ‘publicly available’ describes datasets that are generally accessible by members of the research

community – for example, those published on a website or deposited in an archive.

Making use of publicly available datasets (provided by other researchers or organizations) is

important for my research

I have made my own research data publicly available in the past

I intend to make all or most of the data from my current research project publicly available in the

future

I do not intend to publish the complete dataset from my current project, but have already made

or expect to make limited subsets of it publicly available – e.g. to accompany research

publications

I do not currently plan to make any data from my current research project publicly available

I would like to make my data publicly available, but don’t currently have a straightforward means

of doing so

My funding body requires me to make my data publicly available

I have shared data privately with colleagues (e.g. by emailing a file) in the past

I would be happy to share data privately with colleagues (e.g. by emailing a file) if asked

I am in principle happy to make my data publicly available at any point in a research project

I am in principle happy to make my data publicly available once I have completed the work I

intend to do with it and published the results

I generally prefer not to share my research data

None of the above

8. Please select the appropriate option for each of the following statements:

True of all or

most of my data

True of a substantial portion of my data

True of little or none of my

data

a. My data is of potential value or interest to other researchers in higher education

b. My data is of potential value or interest outside the HE community

c. My data has potential commercial value

d. There would be little or no value in sharing my data: other researchers are unlikely to find it useful

e. There would be little or no value in sharing my data: it’s already publicly available from other sources

f. Confidentiality or intellectual property restrictions make it hard to share the data I am working with

g. The decision about whether to make the data I am working with available does not rest with me

34

9. Please think about the software packages you use (or might use in the future) to manage your data.

Which of the following features are (or would be) most important or useful for your work?

EssentialUseful, but

not essentialNot relevant to my work

Unsure what this means

a. Automated backup

b. Automated versioning (that is, the ability to return to previous versions of the dataset)

c. The ability to import or export data in a range of formats

d. The ability to view or present data in different ways (e.g. by creating customized forms or reports)

e. The ability to plot results on maps

f. The ability to enter and search data in XML formats

g. The ability for multiple users to access and edit the same database

h. The ability to set different permission levels (that is, to control the extent to which different users can add, edit, or delete information)

i. The ability to make data publicly available via the Web

j. Document-oriented database functionality

k. A mail merge function

Section 4: The DaaS

10. The DaaS (Database as a Service) will be an online service that allows researchers to create databases

from scratch, import existing databases, work with the data via a clear Web interface, and (if desired) share

access to the database with other members of a project team or research group. It will be centrally

supported, enabling integration with back-up systems and timely upgrades. It has not yet been decided how

the service will be funded. Is this a service you could envisage a use for in your own research?

Yes

No

It depends

10.a. On what, most importantly?

Section 3: Data management software features

Section 2: Data sharing

Page 37: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

11. The DaaS will also provide researchers with a

Yes, definitely

Yes, possibly

Unsure

No – I am happy with existing ways of publishing my data

No – I don’t expect to publish my data

11.a. If you answered ‘Yes’, what would you be most likely to publish?

Complete datasets for use by other researchers (or other interested parties)

Specific subsets of my data – e.g. to support research publications

Both complete datasets and specific subsets

Unsure

Not applicable

7d. (Optional) Other than those listed above, are there any particular features that would make a service like

the DaaS attractive to you?

7e. Can you envisage any ways in which a service like the DaaS could help you save time, improve your

research, increase efficiency, or otherwise make life easier?

7e. We are currently recruiting a group of test users for the DaaS. Would you be interested in finding out

more about this?

Yes

No

Section 5: Finally, a few questions about you:

Are you:

A graduate student?

A postdoctoral/early career researcher?

A mid-career or senior researcher?

Other (please specify)

Which subject area are you currently working in?

Humanities

Creative arts

Social sciences

Maths or physical sciences

Medical or life sciences

straightforward way of publishing their data online. Is this a

feature you could envisage using to publish your own datasets?

Do you:

Other (please specify)

Mostly work alone?

Work as part of a project team?

Your name (optional)

Your institution

Your email address (optional)

Email addresses will be used only to contact the winner of the prize draw and those who have expressed an interest in the DaaS test user group.

Thank you for taking the time to complete this survey.

35

Page 38: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

IT Support Staff Survey

The VIDaaS (Virtual Infrastructure with Database as a Service) Project, based at Oxford University Computing

Services, is in the process of developing an online database service for researchers. Consequently, we are

interested in finding out about researchers’ current use of structured data (that is, the sort of data which

might be stored in tables, spreadsheets, or databases), and the features that would be useful in a database

service of this sort.

If you are involved in providing IT support for researchers working with structured data (either by providing

advice, or through direct involvement in designing, building, and working with databases), we would like to

hear from you, and would be very grateful if you would complete the survey below. We estimate that the

survey will take around 10-15 minutes to complete. All respondents will be entered into a draw for a £100

Amazon voucher.

If you are a researcher using structured data, please complete the alternative version of this survey, which

can be found here [link].

Section 1: About the projects you work on

1. What sort of data do you (or the researchers you advise) regularly work with? (Please select all that apply.)

Textual

Numerical

Images

GIS data

Audio and/or video

Other (please specify)

2. To which subject area(s) do the research projects you work on or advise belong? (Please select all that

apply.)

Humanities

Creative arts

Social sciences

Maths or physical sciences

Medical or life sciences

Other (please specify)

3. What software or other tools are commonly used to store and analyse structured data in the projects you

advise or work on? (Please select all that apply.)

Plain text files

A word processing program such as Microsoft Word

A spreadsheet program such as Microsoft Excel

Microsoft Access

FileMaker Pro

Another widely available database package

A custom-built relational database system

Other custom-built software

A statistical analysis package such as SPSS or Stata

XML documents

Other (please specify)

3.a. If another widely available database package is used, what is it called? (Optional)

4. Which of these would you say is most commonly used to store and analyse data in the projects you advise

or work on?

Plain text files

Tables in a word processing program such as Microsoft Word

A spreadsheet program such as Microsoft Excel

Microsoft Access

FileMaker Pro

Another widely available database package

A custom-built relational database system

Other custom-built software

A statistical analysis package such as SPSS or Stata

XML documents

Unknown

Other (please specify)

5. Please think about a recent project you have worked on (or are still working on) or advised. Approximately

how big is/was the dataset in question?

(If you are working on multiple projects, please select a project you consider reasonably typical and answer

for that: the idea of this question is to get a snapshot of a sample of current or recent research databases.)

Under 1GB

Between 1GB and 10GB

Between 10GB and 100GB

Over 100GB

Unknown

6. If the project is ongoing, is it most likely that by the time the project concludes, the dataset will:

Not be significantly bigger than it is at present?

Have grown, but not to more than double its current size?

Be more than double but less than ten times its current size?

Be more than ten times its current size?

Unknown

Not applicable – the project has already concluded

36

Page 39: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

On local storage formats, such as DVDs, memory sticks, your computer’s hard-drive, etc.

On network-attached storage managed by members of the research project team

On a departmental server

On centrally-provided storage (e.g. on a server provided by central university IT support services )

In the cloud

Unknown

Other (please specify)

7.a. If the data is stored in the cloud, please specify who provides the cloud storage:

Section 2: Data sharing

8. Thinking now about data sharing: in your experience, how often are the following statements true of the

researchers you advise or work with?

NB. In the statements below, ‘publicly available’ describes datasets that are generally accessible by members of the research

community – for example, those published on a website or deposited in an archive.

Frequently true

Occasion-ally true

Rarely or never true

Unknown

a. Researchers want to be able to make their research data publicly available while the project is still ongoing

b. Researchers want to be able to make their research data publicly available at the end of the project

c. Researchers don't want to publish their whole dataset, but do want to make limited subsets of it publicly available - e.g. to accompany research publications

d. Researchers prefer not to make any of their data publicly available

e. Researchers' funding bodies require them to make their data publicly available

f. Researchers would like to make their data publicly available, but don't currently have a straightforward means of doing so

g. Confidentiality or intellectual property restrictions make it hard for researchers to share their data

No

It depends

9. Please think about the software packages used (or which might be used in the future) to manage data in

the projects you advise or work on. Which of the following features are (or would be) important or useful?

EssentialUseful, but

not essentialNot relevant to my work

Unsure what this means

a. Automated backup

b. Automated versioning (that is, the ability to return to previous versions of the dataset)

c. The ability to import or export data in a range of formats

d. The ability to view or present data in different ways (e.g. by creating customized forms or reports)

e. The ability to plot results on maps

f. The ability to enter and search data in XML formats

g. The ability for multiple users to access and edit the same database

h. The ability to set different permission levels (that is, to control the extent to which different users can add, edit, or delete information)

i. The ability to make data publicly available via the Web

j. Document-oriented database functionality

k. A mail merge function

Section 4: The DaaS

10. The DaaS (Database as a Service) will be an online service that allows researchers to create databases

from scratch, import existing databases, work with the data via a clear Web interface, and (if desired) share

access to the database with other members of a project team or research group. It will be centrally

supported, enabling integration with back-up systems and timely upgrades. It has not yet been decided how

the service will be funded.

Is this a service you could envisage a use for in the projects you advise or work on?

Yes

10.a. On what, most importantly?

Section 3: Data management software featuresoject, where is the data hosted?Still thinking about the same pr. 7

37

Page 40: VIDaaS Researcher Requirements Report - University of Oxfordvidaas.oucs.ox.ac.uk/docs/VIDaaS Researcher... · Oxford University Computing Services, 13 Banbury Road, Oxford, OX2 6NN

11. The DaaS will also provide researchers with a straightforward way of publishing their data online. Is this a

Yes, definitely

Yes, possibly

Unsure

No – data would probably continue being published by other means

No – data publication isn’t relevant for the projects I work on

11.a. If you answered 'Yes', what would you expect researchers to be most likely to publish? (Optional)

Complete datasets for use by other researchers (or other interested parties)

Specific subsets of my data – e.g. to support research publications

Both complete datasets and specific subsets

Not applicable

Other t

Unknown

12. han those already mentioned, are there any particular features that would make a service like the

DaaS attractive to you or to the researchers you work with? What might make you inclined to recommend it

to researchers in preference to existing solutions? (Optional)

13. Can you envisage any ways in which a service like the DaaS could help you or the researchers you work

with to save time, improve the quality of research, increase efficiency, or otherwise make life easier?

(Optional)

14. We are currently recruiting a group of test users for the DaaS. Would you be interested in finding out

more about this?

Yes

No

Section 5: Finally, a few questions about you:

15. Which of the following best describes your role?

General IT support within a higher education institution, including some advice and/or support for

researchers

IT support and/or advice specifically targeted at researchers, but not focused on one specific

project

IT support for one specific research project

Other (please specify)

Your institution

Your email address (optional)

Email addresses will be used only to contact the winner of the prize draw and those who have expressed an interest in the DaaS test user group.

Thank you for taking the time to complete this survey.

al)(optionYour name

on making use of?feature you could envisage the projects you advise or work

38