Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Researcher Requirements Report
Virtual Infrastructure with Database as a Service (VIDaaS) Project
vidaas.oucs.ox.ac.uk
A
A
Oxford University Computing Services
Meriel Patrickuthor
ffiliation
Project Document Cover Sheet
Project Information
Project Acronym VIDaaS
Project Title Virtual Infrastructure with Database as a Service
Start Date 1 April 2011 End Date 31 March 2012
Lead Institution University of Oxford
Project Director Prof. Paul Jeffreys
Project Manager &
contact details
Dr. James A J Wilson
Oxford University Computing Services, 13 Banbury Road, Oxford, OX2
6NN. Tel. 01865 613489. Email: [email protected]
Partner Institutions N/A
Project Web URL http://vidaas.oucs.ox.ac.uk/
Programme Name (and
number)
UMF Shared Services and the Cloud
Programme Manager John Milner
Document Name
Document Title Researcher Requirements Report
Author(s) & project
role
Meriel Patrick (Project Analyst)
Date 26 September 2011 Filename VIDaaS Researcher
Requirements Report.pdf
URL http://vidaas.oucs.ox.ac.uk/docs/VIDaaS%20Researcher%20Requirements
%20Report.pdf
Access Project and JISC internal General dissemination
Document History
Version Date Comments
1.0 26/09/11 Report signed off by VIDaaS Steering Group
01.Version VIDaaS Project Researcher Requirements Report
VIDaaS Project Researcher Requirements Report
Contents
1. Executive Summary.........................................................................................................................2
2. Introduction ....................................................................................................................................4
3. Methodology...................................................................................................................................4
3.1. Interviews................................................................................................................................4
3.2. Survey......................................................................................................................................5
4. Interviewee and Survey Respondent Profiles.................................................................................5
4.1. Interviewees............................................................................................................................5
4.2. Survey Respondents................................................................................................................6
5. Current Practices.............................................................................................................................7
5.1. Dataset Details ........................................................................................................................7
5.1.1. Interviewees........................................................................................................................7
5.1.2. Survey Respondents............................................................................................................8
5.2. Software Used.........................................................................................................................9
5.2.1. Interviewees........................................................................................................................9
5.2.2. Survey Respondents............................................................................................................9
5.3. Data Sharing and Publication................................................................................................10
5.3.1. Interviewees......................................................................................................................10
5.3.2. Survey Respondents..........................................................................................................11
6. Database as a Service (DaaS) ........................................................................................................12
6.1. Interest in the DaaS.......................................................................................................................12
6.1.1. Interviewees......................................................................................................................12
6.1.2. Survey Respondents..........................................................................................................15
6.2. User Requirements for the DaaS ..................................................................................................17
6.2.1. Survey Respondents – Software Feature Rankings...........................................................17
6.2.2. Interviewees and Survey Respondents – Feature Requests.............................................19
6.2.3. Interviewees – Views on Training .....................................................................................24
7. Comparison with Sudamih Findings..............................................................................................26
8. Conclusions and Recommendations .............................................................................................27
Appendix A: Index of Interviewees .......................................................................................................30
Appendix B: Interview Question Template...........................................................................................31
Appendix C: Survey Questionnaires......................................................................................................33
1
1. Executive Summary
This document reports the findings of the VIDaaS Project’s requirements gathering exercise,
conducted in the summer of 2011. The exercise involved a series of interviews with Oxford
researchers working with structured data who were based in the Social Sciences Division, the
Medical Sciences Division, and the Mathematical, Physical and Life Sciences Division, plus a broader
online survey for both researchers and IT support staff. Those canvassed showed a substantial
amount of interest in the DaaS (Database as a Service).
The majority of researchers working with structured data use some combination of textual and
numerical information. Working with images is also reasonably common, and a substantial minority
use audio and video material. A smaller proportion of researchers work with GIS data.
Size of datasets varies enormously – from a few hundred megabytes to a few hundred gigabytes.
However, smaller datasets are in the majority, with more than half the researchers canvassed
working with data collections of under 10GB. There were no obvious correlations between dataset
size and discipline, though a larger sample size would be needed to draw firm conclusions on this
point.
Researchers are reasonably evenly divided between those who keep their data on a personal hard
drive or other local storage, and those who use network-attached storage (though IT support staff
are significantly more likely to use the latter). Use of cloud-based storage is currently rare.
While a substantial number of researchers working with structured data use relational databases,
flat file formats are also very common. IT support staff were substantially more likely to report use
of relational databases than researchers. Spreadsheets and statistical analysis packages are both
popular means of storing and analysing flat file data, with the latter being particularly important for
social scientists. Humanities researchers, however, are more likely to use XML documents to manage
their data.
Researchers in general seem to have a rather lukewarm attitude towards making data publicly
available; while many are in principle happy to do so once they have finished their own work, few
seem to regard it as a high priority. Data publication often seems to happen only because it is
expected – either to support a research publication, or because funding bodies require it.
Nevertheless, a substantial proportion of researchers do believe that their data would be of value or
interest to others, and there are some projects which exist specifically to make a particular body of
data more widely available.
Researchers generally reacted positively to the DaaS. However, discussions about potential uses for
the service indicated that it is imperative to catch people at the right point in the research cycle:
researchers are more likely to consider using a new service for a project they have not yet embarked
on than for one that is already underway. In disciplinary terms, the warmest response was received
from humanists and social scientists.
The DaaS
haringSData
etailsDDataset
2
01.Version VIDaaS Project Researcher Requirements Report
Major factors affecting how likely researchers would be to use the DaaS include cost and
functionality. It was also deemed vital that the DaaS be straightforward and intuitive to use. Other
key user requirements include:
Automated backup
The ability to import or export data in a range of formats
The ability to view and present data in different ways
The ability to set different permissions levels
The ability to make data publicly available via the Web
Automated versioning
A substantial proportion of researchers indicated that it would be helpful to be able to share data
with colleagues, including those outside Oxford. Similarly, the idea of having a straightforward way
of publishing datasets – including subsets of data to accompany research publications – was also
appealing to some. Researchers would like published datasets to be easily citable: that is, with a
persistent URL or DOI.
There was a certain amount of interest in the prospect of using the DaaS to find out about other
researchers’ work, but this was not as popular as other aspects of the service.
Researchers were generally happy to teach themselves to use a piece of software (although some
said that they would like more training in general database theory). To enable users to learn to use
the DaaS independently, it will therefore be important to provide clear online documentation and
guides to performing particular tasks. Some researchers indicated that they felt face-to-face training
was useful – but that courses needed to be short and focused.
The findings of the VIDaaS and Sudamih requirements gathering exercises were broadly in
agreement with each other, although displaying some minor disciplinary differences.
1. Development work on the DaaS should continue in line with the prioritized user
requirements list compiled as a result of this exercise.
2. User testing should be employed to ensure requirements (and in particular the requirement
that the service be straightforward to use) have been met.
3. The DaaS should be accompanied by clear, focused documentation, support material, and
training.
4. Publicity and training materials for the DaaS should strike a balance between emphasizing
those features that make the service distinctive, and giving an accurate overview of its
functionality as a whole.
5. The service should be promoted most extensively to the groups of researchers who have
shown themselves to be most likely to be interested in using it (namely, humanists and
social scientists at the beginning of a research project), and to IT and other support staff who
advise researchers.
Conclusions and Recommendations
Comparison with Sudamih Project
Training
3
01.Version VIDaaS Project Researcher Requirements Report
Introduction
The VIDaaS (Virtual Infrastructure with Database as a Service) Project, based at Oxford University
Computing Services has two main aims:
To develop an online service that will enable researchers to build, edit, search, and share
databases online
To develop a virtual infrastructure which will enable the database service to function within
a cloud computing environment
The project runs until March 2012, and is funded by JISC and HEFCE under the University
Modernisation Fund.1
VIDaaS is the successor to the Sudamih Project,2 in which a pilot version of the database service
(currently known as the DaaS3) was developed. The VIDaaS Project aims to expand the DaaS’s
functionality, and to develop it into a full production service. The DaaS was initially designed with
humanities researchers in mind, and an additional aim of the VIDaaS Project is to broaden the scope
of the service to make it relevant and useful to academics working in other disciplines.
Between May and July 2011, we conducted a substantial requirements gathering exercise to
improve and refine our understanding of user requirements for the DaaS. This document reports the
findings of that exercise. The process described here builds on the requirements gathering
conducted as part of the Sudamih Project.4
Methodology
The VIDaaS requirements gathering process had two main phases: the collection of qualitative
information through a series of interviews with researchers, and the collection of quantitative data
via an online survey.
Both interviews and survey were designed to explore two main areas: researchers’ current projects
(including details of the datasets they use), and their potential interest in and user requirements for
the DaaS. As the DaaS is designed to be a service that will allow researchers to share and publish
data, questions gauging researchers’ attitudes to data sharing were also included.
Interviews
In June and July 2011, we interviewed nine University of Oxford researchers currently working with
structured data. Potential interviewees were identified via personal profiles on the University
website or elsewhere online, and were then emailed to ask if they would be willing to participate.
http://sudamih.oucs.ox.ac.uk/docs/Sudamih%20Researcher%20Requirements%20Report.pdfSee the Sudamih Researcher Requirements Report:
4This is an interim name, standing for Database as a Service: a service name will be selected in due course.
3http://sudamih.oucs.ox.ac.uk/For more details, see the Sudamih Project website:
2http://www.jisc.ac.uk/whatwedo/programmes/umf.aspxFor more information about the UMF, see
1
3.1.
3.
2.
4
01.Version VIDaaS Project Researcher Requirements Report
Most interviews lasted around an hour (though a couple were somewhat shorter than this, and one
a little longer), and followed a semi-structured pattern. A list of questions compiled with input from
other VIDaaS team members formed the basis for the interviews, but was not always followed
rigidly, to allow the conversation to develop naturally, and to provide space for the researchers to
expand on particular points of interest. A copy of the final interview question template is provided in
Appendix B.
Survey
In the second half of July 2011, we conducted an online survey. To gain a wider range of
perspectives, two versions of this were provided: one aimed at researchers, and the other at IT
support staff. The survey was hosted by BOS,5 and was publicized via the project’s blog, relevant
mailing lists,6 and targeted emails to University of Oxford researchers and research facilitators. The
majority of the questions were multiple choice, but a few free text fields were also provided to
permit additional comments. A copy of the survey is provided in Appendix C.
62 responses were received in total (although one of these appeared to be spam and was excluded
from the analysis), approximately two thirds of which were from researchers, and one third from IT
support staff.
Interviewee and Survey Respondent Profiles
4.1.Interviewees
The requirements gathering process for the Sudamih Project7 involved extensive interviewing of
humanities researchers; therefore, to round out the picture, the subjects for the VIDaaS Project
interviews were drawn from Oxford’s three non-humanities academic divisions. A decision was
taken to focus particularly on social science researchers, as these were deemed the group to whom
the DaaS was most likely to be relevant. We attempted to recruit interviewees from a wide range of
subject areas and career stages, although the final breakdown was inevitably dependent on which
researchers responded to the invitation email.
University of Oxford Division Graduate students
Early career/ postdoctoral researchers
Mid-career/ senior
researchers
Total
Social Sciences 2 2 1 5
Mathematical, Physical and Life Sciences 1 - 1 2
Medical Sciences - 1 1 2
Total 3 3 3 9
Table 1: Breakdown of interviewees by division and career stage
http://sudamih.oucs.ox.ac.uk/docs/Sudamih%20Researcher%20Requirements%20Report.pdfSee the Sudamih Researcher Requirements Report:
7the ARMA and IASSIST mailing lists.
, the UCISA digest email, JISC research data management lists, and Digital.Humanites@OxfordThese included 6
/www.survey.bris.ac.uk/https:/Bristol Online Surveys: 5
4.
3.2.
5
01.Version VIDaaS Project Researcher Requirements Report
A full anonymized index of interviewees is provided in Appendix A. Throughout this report, numbers
in square brackets refer to the interview in which a point was raised or a comment was made.
One notable feature of the process was the comparative difficulty of finding interviewees from the
Medical Sciences Division. In the other two divisions, approximately half of the researchers
approached were happy to be interviewed; in Medical Sciences, only one in ten responded to the
invitation email. While of course we cannot be certain of the reasons for this, this may suggest that
researchers in this area felt the DaaS was less likely to be relevant to them than those working in
other disciplines.
Survey Respondents
Approximately three quarters of the researchers who responded were working in either the
humanities (32% of the total) or the social sciences (44%). Only a handful of researchers from the
hard sciences completed the survey, of whom just two were from the medical or life sciences,
mirroring the difficulty in recruiting medical sciences interviewees noted above.
A substantial proportion of the IT support staff who responded also worked on or advised projects in
the social sciences (50%) or humanities (45%). However, medical and life sciences were significantly
better represented here, with 35% working in this area, compared to 25% for maths and physical
sciences, and 15% for creative arts. About a third (35%) of the IT support staff worked on or advised
projects in multiple subject areas.
A little under half the researchers (44%) were mid-career or senior; 29% were postdoctoral or early
career researchers, and 20% were graduate students. They were reasonably evenly divided between
those working mostly alone (44%) and those who were part of a research group (56%).
The largest single group of IT support staff (40%) was those whose job involved IT support or advice
targeted at researchers, but not focused on one particular project. A further 15% provided IT
support for one project, and 20% more general IT support within their institution. (The remaining
25% were mostly engaged in a range of data-related activities.)
Subject area Graduate students
Early career/ postdoctoral researchers
Mid-career/senior
researchers
Other Total
Humanities 4 4 5 0 13
Creative arts 0 0 0 0 0
Social sciences 3 7 6 2 18
Maths or physical sciences 1 1 2 0 4
Medical or life sciences 0 0 2 0 2
Other 0 0 3 1 4
Total 8 12 18 3 41
Table 2: Breakdown of researcher survey respondents by subject area and career stage
4.2.
6
01.Version VIDaaS Project Researcher Requirements Report
Subject area Number of IT support staff Percentage of total
Humanities 9 45%
Creative arts 3 15%
Social sciences 10 50%
Maths or physical sciences 5 25%
Medical or life sciences 7 35%
Other 1 5%
Table 3: Breakdown of IT support staff survey respondents by subject area of projects worked on
Roughly half the total respondents were from the University of Oxford (49%), and the other half
from elsewhere (51%). (However, a larger proportion (70%) of the IT support staff were from outside
Oxford. This was probably due to the nature of the mailing lists used to publicize the survey – while
there are a number of national and international mailing lists targeted at IT staff, it is harder to reach
large groups of researchers outside one’s own institution.)
In the sections that follow, the survey responses from researchers and IT support staff have been
aggregated where there were no substantial differences between the answers of the two groups.
When there were significant differences (and of course where different questions were asked), the
two are treated separately.
Current Practices
Dataset Details
Interviewees
The researchers interviewed were working on a wide range of different data types. These included:
GIS data [1, 2]
Aggregated statistics [3, 5]
Governmental data [4]
Survey responses [5, 9]
Sensor observations [6]
Protein structure information [7]
Patient data [8, 9]
Most datasets consisted of a combination of textual and numerical information; some also included
images.
The size of datasets also varied widely. Estimates of total size ranged from 200MB to over 800GB.
Smaller datasets were in the majority, however, with more than half of the interviewees working
with data collections of under 10GB. There was no apparent correlation between subject area and
dataset size: both the smallest and the largest datasets were divided reasonably evenly across the
three divisions.
5.1.1.
5.1.
5.
7
01.Version VIDaaS Project Researcher Requirements Report
Even among the more technologically proficient interviewees, knowledge of the technical details of
datasets was often hazy. Only one interviewee knew the precise size of the data collection without
having to check, or was able to venture an informed guess at input and output figures (that is, the
quantity of data being pumped in and out of it within a given period of time) for the database. Most
interviewees had little idea what the ultimate size of their dataset would be (and in several cases
this question was not really applicable, as the researchers were engaged in ongoing work rather than
a discrete project). None of the interviewees working with relational databases could say precisely
how many tables the database had. This suggests that on a day-to-day basis, there is little need for
most researchers to have this sort of information readily available. One senior researcher [7]
commented that she tended only to think about the size of datasets when selecting appropriate
storage media, and this may well be a common pattern.
The interviewees were reasonably evenly split between those who worked mostly with data stored
on their personal hard drive (though perhaps still using networked storage for back-up, or to obtain
an initial copy of the raw data) and those whose data was hosted on local shared storage or an
institutionally-provided server – frequently to facilitate sharing with colleagues. One interviewee
who worked with large datasets habitually moved older data to tapes for long term storage; she
commented that this meant a lot of effort was involved in accessing previous datasets – it would be
useful to have an archive of all past data that could be easily consulted and shared [7].
Survey Respondents
As with the interviewees, textual (67%) and numerical (70%) data were the most common types
regularly used by the survey respondents. Images were used by a little under half (46%), and audio
or video material by 30%. 16% worked with GIS data.
Datasets were, on the whole, relatively small. Almost half (46%) were under 1GB, and another 26%
between 1GB and 10GB.8 Only 5% of respondents said their dataset was larger than 100GB. 11% (all
from the researcher respondent group) did not know the size of their data collection. Once again,
there were no obvious correlations between dataset size and subject area – although given the
comparatively small number of respondents from the sciences, more data would be needed to draw
firm conclusions on this point.
When asked about the anticipated final size of their dataset, well over a third (43%) said they did not
expect their dataset to get significantly larger than it was at present (this includes 5% – all from the
IT support group – whose projects had already concluded). Just under another third (30%) said their
dataset would have grown, but not to more than double its current size. Those who forecast a larger
growth were typically those who had smaller datasets to begin with: no respondent with over 100GB
of data expected that their dataset would more than double in size.
Data storage practices varied noticeably between the two groups surveyed, with – perhaps
unsurprisingly – IT support staff reporting much greater use of networked storage. Only 15% of
datasets worked on by this group were stored on personal hard drives or other local storage such as
DVDs, compared to 49% for researchers. A tenth of each group used network attached storage
ad recently worked on.hAs many IT support staff work on multiple projects, this group was asked to answer for a typical project they
8
5.1.2.
8
01.Version VIDaaS Project Researcher Requirements Report
managed by the research project. About a quarter of researchers (24%) used departmental servers,
compared to a third (35%) of IT support staff. Another third (35%) of the IT support staff used
centrally-provided institutional storage, while this was true for only 7% of researchers.
Use of the cloud was rare: only one researcher (and no IT support staff) reported this as the main
method of data hosting. One other made some use of the Grubba online database service’s cloud
storage, along with institutional servers.
Software Used
Interviewees
Four interviewees used relational databases to store and analyse their data. Another four worked
predominantly with data in flat file format, and used a combination of spreadsheets (usually Excel)
and statistical analysis packages (most commonly Stata and SPSS). Three of these four were social
scientists, and one other social scientist (who had a custom-built relational database system) also
reported making significant use of statistical analysis software. The final interviewee’s data was
stored in plain text format.
Some researchers also used specialist software tools to meet the specific demands of their subject
area. For example, the archaeology doctoral student [2] made extensive use of GIS software, and the
senior chemistry researcher [7] used a biomolecular simulation package.
Survey Respondents
We asked respondents to indicate which types of software or other tools they commonly used to
store and analyse their structured data (or, for IT support staff, the software or tools that were used
in the projects they worked on). On average, researchers reported making use of two to three types
of software, and IT support staff around five.
Spreadsheet programs were popular (used by 51% of researchers, and 75% of IT support staff), as
were statistical analysis programs (used by 44% and 70% respectively). Also widely used were plain
text files, word processing programs, and XML documents (all used by around a third of the
researchers, and about half the IT support staff).
A little over half of all respondents (56%) used some kind of relational database. However, there was
a more substantial difference here between the two groups: while only 40% of researchers used this
sort of system, 90% of IT support staff selected it. Respondents were asked to indicate whether they
used Microsoft Access, FileMaker Pro, another widely available package, or a custom-built system:
Access, custom-built systems, and other packages were roughly equal in popularity, with FileMaker
used by a significantly smaller number.
A second question asked which type of software or tool was used most. The answers broadly
followed the same pattern as the previous question. Among researchers, the most popular choices
were statistical analysis packages (27%), relational databases (17%), spreadsheets (15%), and XML
documents (12%). For IT support staff, relational databases and statistical analysis packages were in
joint first place (30% each), closely followed by spreadsheets (25%).
5.2.2.
5.2.1.
5.2.
9
01.Version VIDaaS Project Researcher Requirements Report
There were some notable disciplinary differences here. As with the interviewees, a large proportion
of social scientists made use of spreadsheets and/or statistical analysis packages – and all but one of
the researchers who used statistical packages most were from this subject area. Among humanities
scholars, XML documents proved to be the most popular method for dealing with structured data:
62% make some use of them, and 23% use them more than any other type of software.
Data Sharing and Publication
Interviewees
Two interviewees [1, 3] conducted their research predominantly alone, and did not share the
working version of their dataset with others. Three others did a substantial amount of solo work, but
stored at least some of their data in such a way as to permit colleagues to access it. Two of these [4,
8] used local shared storage, although one of the two [4] commented that in practice her colleagues
rarely (if ever) made use of the shared data; the third [5] used Dropbox to share selected portions of
his data (approximately 30% of the total) with collaborators. The other four interviewees [2, 6, 7, 9]
were all working on collaborative research projects that required multiple users to have access to
the data.
Where data was shared, this was typically with between one and three other people, though in two
research projects [6, 9] the groups were much larger. It should also be noted, however, that while
specific portions of a data collection are often only shared with a small number of people, each
researcher may have multiple such sharing groups. For example, the researcher using Dropbox
estimated his total number of collaborators as between eight and ten, although individual datasets
were not typically shared with more than three people. Similarly, another researcher [7] shared data
both with a research group in Oxford, and with collaborators at other institutions.
As far as can be judged from a sample of this size, sharing of working data seemed to be more
common among scientists: all four of the scientists interviewed shared data with colleagues. The two
projects with larger research groups were also both in the sciences.
When asked whether they had future plans to share their data, the interviewees gave mixed
responses. Four of the researchers (all social scientists) were working with data that was already
publicly available [1, 3, 4, 5]. However, at least one was planning to deposit copies of the analysed
and manipulated data in an archive at the end of the project (and another noted this was something
he should be better at doing, although in practice he didn’t usually get round to it).
The fifth social scientist was working on a project designed to make a large body of previously
unpublished material publicly available [2].
One scientist told us that in her area there were some central databanks in which certain types of
information would be deposited [7]. Otherwise, data sharing tended to happen informally: you
might email a colleague and ask to see some of their data (although you would normally only do this
with someone you knew, or if the data was quite old – more than three years post-publication).
Another scientist said there were no plans to make the whole dataset public, but that portions
associated with specific research publications would be released [6].
5.3.1.
5.3.
10
01.Version VIDaaS Project Researcher Requirements Report
Both medical scientists were working with confidential information, and did not have personal
control over whether the data should be published or not [8, 9]. However, one of the two had
recently worked on a project to create an online compendium of anonymized datasets, and the
project the other worked for has made aggregate information (for example, numbers of cases)
available on the Web.
Survey Respondents
The respondents exhibited mixed views towards sharing data with people other than their own
immediate collaborators. Although 59% of researchers said that making use of publicly available
datasets (shared by other researchers or organizations) was important for their work, not much
more than two thirds of this number (41%) said they were happy to make their own research data
available once they had completed the work they intended to do and published the results. About a
third (34%) had previously published data, while roughly half (49%) said they intended to make all or
most of the data from their current project available in the future.
Just under half the researchers (46%) said they would be happy to share data privately with
colleagues (by, for example, emailing them a file), and a slightly larger number (51%) had actually
done so in the past.
In some cases, data sharing was restricted by factors beyond the researchers’ control. 41% said that
confidentiality or intellectual property restrictions made it hard to share at least a substantial
portion of their data, and 65% of IT support staff said this was at least occasionally true of the
researchers they worked with. For 42% of researchers, the decision about whether to make the data
available did not rest with them. Unsurprisingly, this was more likely to be true of respondents who
were part of a project team.
A fifth of the researchers said that they would like to make their data publicly available, but didn’t
currently have a straightforward means of doing so.
Only a relatively small proportion – 17% – reported that their funding body required them to make
their data available. When asked a similar question, 30% of IT support staff said that this was
frequently true for the researchers they worked with, and another 40% that it was least occasionally
the case. (It is possible that the discrepancy between the two groups arises because projects with
significant input from IT support staff are more likely to be those funded by specific grants, leading
to funding agencies taking a greater interest in what happens to the data.)
Despite the somewhat lukewarm attitudes to data sharing, most researchers felt their data was
likely to be useful to other people. Just under half (48%) said that all or most of their data was of
potential value or interest to other researchers in higher education, and just over another third
(38%) said this was true of a substantial portion of their data. Almost two thirds (63%) said at least a
substantial portion would interest people outside the HE community, although a much smaller
number (21%) believed it had commercial value.
5.3.2.
11
01.Version VIDaaS Project Researcher Requirements Report
Database as a Service (DaaS)
Interest in the DaaS
Interviewees
When the DaaS was described to them, the interviewees by and large reacted positively. Only one of
the nine said he couldn’t see any real application for the service in his own research. However, in
most cases, interest in the DaaS was more hypothetical than actual: a number of researchers said
that the DaaS sounded as though it might be useful for projects they were thinking of undertaking in
the future, or that it would have been helpful for projects they had worked on in the past. This
mirrors the finding of the Sudamih Project requirements gathering exercise that researchers who
were some way into a project generally already had a well-established system and were not eager to
make major change to their working practices, but were far more willing to consider the DaaS for
projects which had not yet got underway.
Two researchers did suggest possible uses for the DaaS as part of their current research projects,
although both were for relatively minor aspects of the work.
A doctoral student noted that the research group he worked for had initially planned to
produce a relational database of their data to enable colleagues with little experience of
working with the raw data to run simple queries. However, this had not yet happened, as
the amount of work that would be involved in setting it up would outweigh the benefits of
doing so. If the DaaS could offer a straightforward way of doing this, it might be of interest
[6].
A postdoctoral researcher told us that her project had inherited a dataset that was originally
compiled as a relational database, but now exists only as a flat file. A user-friendly database
service might provide a way of reconstructing the database, which would allow them to do
more with the data [4].
It is noticeable that both these interviewees stressed that for them to be interested in using it, the
DaaS would have to make accomplishing the task both easy and reasonably quick. In both cases the
suggested use for the service was something that they would have liked to be able to do, but which
was by no means essential to the success of the research project, and therefore would not merit any
major investment of time and effort in learning to use a new system.
However, there were two proposed features of the DaaS that produced a rather more enthusiastic
response: the possibility of multiple researchers being able to access the same database, and a
straightforward means of publishing datasets online.
Several researchers said it would be useful to be able to share datasets with colleagues, especially
those located outside Oxford, and often outside the UK. At present, there are few straightforward
ways of doing this securely.
, p. 24.http://sudamih.oucs.ox.ac.uk/docs/Sudamih%20Researcher%20Requirements%20Report.pdfSee Sudamih Researcher Requirements Report,
9
Data Sharing
9
6.1.1.
6.1.
6.
12
01.Version VIDaaS Project Researcher Requirements Report
Methods currently employed include emailing files [4] and sharing data via an anonymous
FTP connection (although the researcher who had done this commented that the IT staff
were often reluctant to set these up [7]), but this only provides collaborators with a
duplicate copy, rather than allowing multiple parties to work on (and edit or annotate) the
same file.
One doctoral student also noted that some of his colleagues who are working with third-
party data often have difficulty sharing files in a way that meets the security requirements
imposed by the data providers [1].
On a related note, one researcher also said that it would be very helpful to be able to access
her own data remotely – when she is away from Oxford working with collaborators, for
example [7].
A number of interviewees indicated that it would be helpful to have a straightforward way of making
datasets available online.
In particular, several researchers were interested in being able to publish a particular subset
of their data – to accompany a journal article and allow other people to verify their findings,
for example [1, 3, 4, 6, 7].
o “You’re often required to do this somewhere, and providing the data in a suitable
format can be quite annoying.” [1]
o “It would be helpful if it were possible to publish a particular subset of the data, or a
particular layout, rather than the whole database. In some cases, you might want to
be able to work on the details with a small group of peers, and then publish a neat
version – with comments fields hidden, for example.” [3]
o “My supervisor doesn’t want the whole dataset to be made publicly available as it is.
However, he is very keen that whenever research papers based on the data are
published, relevant portions of the data that support the findings are also
published.” [6]
A number of interviewees mentioned the importance of providing stable and long-term
access to datasets, and suggested this might be more easily provided via a central service.
“You really need a persistent URL, and researchers don’t want to have to worry
about this themselves – they don’t want to have to keep an old machine running or
to keep setting up redirects to ensure that the original URL still works.” [1]
o “We would really prefer not to have the responsibility of ensuring that the website
the data is made available on remains accessible for the long term.” [4]
o “I’ve sometimes had the experience of following up references to datasets from
publications, and finding that the data owner has moved institutions and the data is
no longer available from the original URL.” [7]
The importance of datasets being citable was also mentioned.
“From the point of view of personal academic reputation, there’s no point in putting
a lot of work into something if you won’t get any credit for it [...]. It would be nice if
the online datasets had DOIs, or something similar.” [3]
o
o
Data Publication
13
01.Version VIDaaS Project Researcher Requirements Report
A service like this would also provide a convenient way of making available information that
is too bulky to be easily presented in print.
o “My thesis included a large number of data tables in an appendix – these ended up
being in small print and not easy to read.” [3]
o “Some collaborators recently sent me a draft of a paper, which had some
supplementary material that would ultimately be provided alongside the paper –
probably on the journal website. I started to print this out, and then realized that it
included 234 pages of tables!” [7]
However, it should be noted that the probable usefulness of such a feature varies from discipline to
discipline.
Two senior researchers noted that in at least some parts of their fields, publication of
supporting datasets was usually handled by the journals themselves [5, 7].
One of these two also observed that if he was going to publish full datasets, it would make
more sense to do this through something like the UK Data Archive: they have the resources
to deal with large data deposits, and other researchers would be more likely to find the data
this way [5].
In some areas, such as archaeology, publication of full datasets is not common: articles will
usually just be accompanied by a couple of photographs. However, the archaeology doctoral
student we spoke to commented that “It would be amazing” if more researchers did choose
to share their data [2].
In addition to the possibilities for data sharing and publication, DaaS users will be invited to add
information about their project to a central system that will allow other users to see what they’re
working on – as, for example, a means of identifying potential collaborators, or enquiring about
potential data re-use.
When asked about this, the majority of interviewees said they would in principle be happy to make
details of their own research available in this way. A couple, however, noted that the decision did
not rest entirely with them: in particular the two medical sciences researchers, who were both
working with confidential data [8, 9]. One doctoral student said he did not have any objections
himself, but that his supervisor might be concerned: because the data he is working with is all
publicly available, drawing attention to his project in this way might increase the risk that someone
else would imitate his research strategy and then publish before him [1].
Opinions about the usefulness of such a service for finding out about other researchers’ work were
more mixed. Some felt it would definitely be useful, while others were uncertain that it would tell
them anything that wasn’t already available from other sources.
Among the positive comments were:
“We’re very interested in having a public face for what we’re doing, especially considering
how important impact has become. [...] There are lots of project websites out there that few
people know about – some sort of central service might make it easier.” [4]
Making Dataset Details Available
14
01.Version VIDaaS Project Researcher Requirements Report
“Being able to find out what other researchers are working on would be very useful. I talked
to someone recently who’s spent the last year working on something related to my project,
and I’ve only just found out about it.” [4]
A doctoral student noted that as far as he was aware, there was currently no systematic
means of finding out about new datasets, other than trawling people’s websites to see what
they’re working on [6].
“I could imagine this being really useful if it covered researchers outside Oxford. Within
Oxford, I’d probably already have a good idea what people were working on.” [7]
“It might be useful for people working in a new area: they could see who else had worked on
that and perhaps learn from them.” [8]
More cautious voices included:
“The difficulty with this sort of service is that it would depend on people actually making use
of it, and in practice I suspect most people wouldn’t bother.” [3]
“I probably wouldn’t make much use of this myself: I already feel I’m drowning in
information sources, rather than needing to find more.” [3]
“The people whose names would appear would often be the usual suspects [i.e. people you
already know are working in that area]. And where they aren’t, it wouldn’t always be
appropriate for me to start asking them lots of questions.” [5]
“This probably wouldn’t be relevant to me.” [9]
Survey Respondents
When asked if the DaaS was something they could envisage use for in their own research (or for IT
support staff, the projects they work or advise on), just over half (56%) of the survey respondents
said they could. Another 31% gave the slightly more cautious reply that ‘It depends’.
When the latter group were asked what it depended on, some clear themes emerged. The single
most frequently mentioned factor was cost. Functionality was also mentioned by a number of
respondents: in some cases in general terms, and in others with reference to a specific feature the
service would need to offer to be of interest. These have been treated as user requirements, and are
discussed in Section 6.2.2 below. Other concerns included how easy it would be to learn to use the
system, data security, and whether it would be possible to persuade colleagues to use the system.
Figure 1 is a word cloud generated from the responses to this question.
6.1.2.
15
01.Version VIDaaS Project Researcher Requirements Report
Figure 1: Wordle.net word cloud generated from free text responses regarding factors
affecting likelihood of using the DaaS
A second question asked about the likelihood of using the DaaS to publish research data. A third of
respondents (33%) said they could definitely envisage themselves doing this, and just over another
third (38%) said it was possible. A further 13% were unsure; the remainder were evenly divided
between those who were happy with their existing ways of publishing data, and those who did not
expect to publish data at all.
Those who answered the previous question positively were also invited to say whether they would
be most likely to publish complete datasets, or specific subsets of their data (e.g. to support research
publications). This was an optional question, and relatively few people responded, but of those who
did, about a third said they would publish specific subsets, and another third both complete datasets
and subsets. The rest were equally divided between those who said they would publish just
complete datasets, and those who were unsure.
A free text question invited respondents to suggest ways in which the DaaS might save time,
improve research, increase efficiency, or otherwise make life easier. The answers here also indicated
a significant amount of interest in the prospect of using the service to share data.
“It would be most useful for enabling multiple users (like research assistants) to access the
database at the same time.”
“Could greatly simplify process of supporting collaboration between researchers working
across institutions / countries without unmanageable overhead of security and backup
issues.”
“Ease of sharing a very significant benefit to researchers who have neither expertise nor
resources to provide [this] by other means.”
“Having a secure and fairly straightforward means by which to share data with selected
collaborators around the world would be extremely useful.”
“Allow[ing] easy publishing and sharing of log type information and calibration information
for research datasets, even if research data itself not ingestable in this form.”
16
01.Version VIDaaS Project Researcher Requirements Report
“By encouraging users to make datasets publicly available, it would also give them incentive
to be more rigorous in their experiments and data labelling.”
Secondly, researchers and IT support staff both liked the idea of a service that would provide a more
straightforward method of creating a database than existing solutions.
“When I can't afford developer time to build a robust database, I could turn to this to avoid
using more questionable methods (e.g. spreadsheets; Access).”
“It's usually quite difficult to explain to researchers the specifics of database design, so if
they can learn a manageable front-end themselves, that would help.”
“DaaS is something we could provide [to] many of the people we assist who are primarily
asking for database-backed websites, or database-backed management of project data.”
One respondent also commented on the role the DaaS might play in funding applications.
“I can potentially see DaaS being used as part of an NSF required Data Management Plan for
grant proposals.”
The final question in this section asked if respondents would be interested in finding out more about
the DaaS test user group. Just under half (44%) said they would, indicating that a substantial number
felt the DaaS was of at least some relevance to their work.
6.2.1.
User Requirements for the DaaS
Survey Respondents – Software Feature Rankings
Survey respondents were presented with a list of possible software features, and asked to indicate
which of these they would find useful for managing data. The responses are summarized in Table 4.
Ratings for each suggested feature Essential Useful but not essential
Not relevant
Unsure what this means
Automated backup 71% 26% 3% 0%
Automated versioning 31% 57% 10% 2%
The ability to import or export data in a range of formats 62% 29% 8% 0%
The ability to view or present data in different ways (e.g. by creating customized forms or reports)
51% 34% 13% 2%
The ability to plot results on maps 30% 36% 33% 2%
The ability to enter and search data in XML formats 25% 34% 30% 11%
The ability for multiple users to access and edit the same database
44% 30% 26% 0%
The ability to set different permission levels 48% 33% 20% 0%
The ability to make data publicly available via the Web 41% 38% 21% 0%
Document-oriented database functionality 16% 43% 16% 25%
A mail merge function 2% 28% 54% 16%
Table 4: Overall ratings (researchers and IT support staff) for each feature
6.2.
17
01.Version VIDaaS Project Researcher Requirements Report
As one might anticipate given the popularity of XML documents in the humanities that was noted
above, humanities researchers were significantly more likely to rate the ability to enter and search
data in XML formats as essential than any other group. The same is true of document-oriented
database functionality (although 50% of the social science researcher respondents rated this feature
as useful).
Humanists were also slightly more likely to deem the features relating to data sharing to be essential
(that is, the abilities for multiple users to access and edit the same database, to set different
permission levels, and to make data publicly available via the Web). Social scientists, on the other
hand, tended to regard these as useful, but not essential.
For social science researchers, the most important feature was the ability to import or export data in
a range of formats: all the respondents in this group rated this as at least useful, with 61% regarding
it as essential.
Both researchers and IT support staff rated automated backup and import/export functions very
highly. However, there were some significant differences between the two groups of respondents in
terms of their next few priorities. These are listed in Table 5.
Researchers IT Support Staff
1 Automated backup Automated backup
2The ability to import or export data in a range of formats
The ability to import or export data in a range of formats
3The ability to view or present data in different ways (e.g. by creating customized forms or reports)
The ability to set different permission levels
4Automated versioning The ability for multiple users to access and
edit the same database
5
The ability to make data publicly available via the Web
The ability to make data publicly available via the Web
The ability to set different permission levels The ability to view or present data in different ways (e.g. by creating customized forms or reports)
Table 5: Highest priority data management software features for each respondent group
These rankings were derived by assigning each suggested feature a weighted score (two points were
awarded if a feature was deemed essential, and one point if it was deemed merely useful).
Coincidentally, both groups had a tie for fifth place.
It is slightly puzzling to note that both groups apparently regarded the ability to set different
permission levels as more important than the ability to for multiple users to access and edit the
same database – despite the fact that the former only becomes relevant in situations where the
latter is implemented. This may have resulted from some confusion about exactly what the question
was asking, or it is possible that some respondents understood the question about permission levels
conditionally – that is, to be asking how important they would deem this if multiple users could
access the database.
18
01.Version VIDaaS Project Researcher Requirements Report
Interviewees and Survey Respondents – Feature Requests
In addition to the feature rankings listed above, survey respondents were given the opportunity to
list any additional features that would make a service like the DaaS attractive to them (and in the
case of IT support staff, what might make them inclined to recommend it to researchers). Just over a
quarter of researchers and just over half the IT support staff chose to complete this question. A
handful of additional user requirements were also mentioned in free text questions elsewhere in the
survey (such as the question discussed in Section 6.1.2 above). The interviewees were also asked
what features they would like to see in the DaaS.
For ease of reference, the comments and quotations in this section are colour-coded:
Black text: interviewees
Purple text: researcher survey respondents
Blue text: IT support staff survey respondents
(While they are presented together here for convenience, it should perhaps be borne in mind that
the comments from the interviews and from the survey responses are not directly comparable:
interviewees were simply asked to list useful features, whereas survey respondents were specifically
asked to name features other than those they had ranked in the earlier question.)
A key consideration was ease of use. There were two main factors here. First, this would be
necessary to make the system attractive to researchers with limited technical expertise. Secondly,
even technically proficient users are wary of investing a lot of time and effort in becoming familiar
with a new system, especially if it’s not immediately clear how useful it will be to them.
“It needs to have an interface that is intuitive and efficient – so it’s easy to work out how to
do things, and isn’t too time consuming to do them.” [3]
o The importance of an intuitive interface was also stressed by interviewees 4, 6, 7,
and 9.
“I’d have to look into it a bit more. There’s a cost to learning how to use a new program, and
for it to be useful, my colleagues would also need to use the same system.” [4]
“I find that a major barrier to getting things done is getting distracted; there aren’t enough
hours in the day to investigate everything that might possibly be useful.” [5]
“[I’d like] a system that makes it clear what you’re doing – so it’s hard to overwrite or delete
things accidentally.” [7]
Factors affecting survey respondents’ likelihood of using the service:
o
“My data is already quite simple. The service would have to be equally simple.”
o
“Default basic set up (easy out-of-the-box use).”
“Whether it would be easy to use and whether I'd have time to learn to use it.”
o “Ease of use for researchers without significant technical experience.”
On a similar note, when asked about the training resources they would like to see available, a
couple of interviewees responded that to be attractive, the service would need to be sufficiently
intuitive that little or no training was needed.
o
6.2.2.
19
01.Version VIDaaS Project Researcher Requirements Report
“To be slightly controversial, unless I can see how to use it when I open it, I probably won’t
be too interested.” [4]
“If a service is well designed, you shouldn’t really need additional documentation or
training.” [6]
The issue of sustainability was also raised:
“You don’t want to put your data into a service and then find it’s no longer accessible, or
that a commitment to keep something available online hasn’t been honoured.” [4]
“We would like the service to be stable and durable.”
As noted in Sections 6.1.1 and 6.1.2 above, there was substantial interest in the possibility of using
the service for data sharing. Specific feature requests included:
The ability to share data securely with collaborators at other universities and in other
countries [1, 4, 7, plus several survey respondents]
A system that can be used to make datasets available to a wider group without needing
them to have specific software or technical expertise [3]
Several requests related to the ability to choose to share only a limited portion of the data:
The ability to easily extract a particular subset of the data for sharing [9]
o Capability to truly do collaborative work on defined datasets
o [The ability to] make a subset of the data available in an aggregated form to one
group of people, while another group have full access to the raw data
A number of interviewees and survey respondents noted that if multiple people were accessing the
same dataset, it was important to be able to set different permissions levels. Specific requests
included:
Control over who can edit the data – perhaps including a mechanism where people can
make changes, but these have to be agreed or checked by another user [3]
Access controls – you need to be able to control who can edit the data, while perhaps
allowing others to view it but not change anything [4]
“I would like to make [a database of resources that is] searchable by other labs in our unit,
but in a way that they cannot see where things are kept so we can control what is taken.”
One researcher survey respondent also requested the ability for multiple users to access the
database simultaneously. This would presumably require mechanisms to prevent users from
inadvertently overwriting each others’ changes.
Backup and versioning were also mentioned by a number of people. The latter was of interest for a
number of different reasons:
One postdoctoral researcher noted that the ability to see how a dataset looked at a
particular point in the past was helpful in many research projects not just as a way of
rectifying mistakes, but because it can be useful to be able to track how the information has
changed over time. [3]
o
20
01.Version VIDaaS Project Researcher Requirements Report
A senior researcher said that the security of knowing you can always revert to an earlier
version if anything goes wrong helps people to be more confident in using the software:
they are more likely to experiment if they aren’t worried about losing data. [7]
An IT support staff survey respondent requested “Not just version control (e.g. rollbacks) but
notification of conflicts (e.g. like Subversion) so a researcher knows they are overwriting a
colleague's changes (on re-import, or DB updates).”
As also noted above, a substantial proportion of researchers were interested in using the DaaS for
data publishing. Specific feature requests included10:
A straightforward way of publishing a particular subset or layout of one’s data – e.g. that
associated with a journal article [1, 3, 4, 6, 7]
Persistent URLs or DOIs for published datasets, so they’re citable and remain available long
term [1, 3, 4, 7]
The ability to supply explanatory notes or documentation alongside each individual dataset
[4]
Knowledge of who, if anyone, uses the data
And relatedly:
One researcher survey respondent also stressed the need for the use of the appropriate
metadata standards for the field or discipline.
Good data security was a concern for a number of people, especially when data was being shared.
A doctoral student noted that security would need to satisfy the requirements of data
providers who impose strict NDAs and confidentiality agreements [1].
“Security (e.g. appropriate authentication) is absolutely crucial).”
One IT support staff survey respondent unfortunately saw this as an insurmountable barrier:
“We would not be able to use a service like this due to security and confidentiality concerns;
I do not see a way that this could be overcome.”
As many researchers have pre-existing datasets, the ability to import data was also essential. Being
able to export data was similarly important, both to have the option of using it in other programs,
and so that researchers did not have to worry about getting locked in to the system. Specific
requests included:
The ability to import data from Excel [3]
The ability to import layouts as well as data from FileMaker
XML document import either using simple mappings to relational fields, or true XMLDB
import using something like eXist
above.6.1.1For ease of reference, this section includes some user requirements already mentioned in Section
10
21
01.Version VIDaaS Project Researcher Requirements Report
Once their data had been imported, researchers wanted to be sure that the system would have
sufficient processing power to handle it.
One doctoral researcher reported that he had previously had problems running complex
operations on his dataset (which was a few GB in size): the system froze, or took hours or
even days to complete the process [1].
A senior researcher commented that it was very frustrating if a system was “so cumbersome
that it becomes slow” [7].
A related point was made by a postdoctoral researcher, who said that although her datasets were
not large in terms of file size, they could often by quite unwieldy and difficult to view on screen.
Some way of keeping track of the work done would therefore be very helpful – for example
a way of cataloguing the graphs or data visualizations that have been created, to make it
easy to locate them again later [4].
Regarding working with their data, researchers expressed a desire for flexibility. Specific requests
included:
Flexible data formats – you need to be able to choose the format that works for your
project, not have one imposed on you [2]
o In particular, the ability to handle archaeological find numbers (and so on) that
include a mixture of letters, numbers, dots, and hyphens [2]
The ability to create custom forms for data entry [2]
Flexible layouts – the ability to present the data (or subsets of it) in different ways, for
example to rearrange columns, or filter out certain information [3]
Flexibility in terms of the type of data that can be entered – including the ability to enter
large amounts of text in a single field [3]
Reporting tools – the ability to save or print off a customizable summary of the data or a
subset of it [9]
High levels of customisability (as project needs grow)
However, some also noted that there are occasions when it is equally important to be able to
restrict the format of data that can be entered into specified fields. This might be needed:
To prevent people from entering a range rather than a single figure [3].
To force the user to select from limited number of options (e.g. via a drop-down list) [9].
To standardize date formats or other information [6, 9].
To sift out entries that don’t make sense within the project in question – e.g. ensuring that
that date of death is always later than date of birth [9].
A good search function was also mentioned.
“It would be useful to have features that allow users with little technical knowledge to run
queries easily – e.g. combo boxes and a simple query wizard.” [6]
“It would need to have good search facilities.” [7]
“For XML imports, [we’d need] XPath and full XQuery search interfaces.”
22
01.Version VIDaaS Project Researcher Requirements Report
Researchers and IT support staff also had a range of requests for other features. These included:
Good graphing and visualization tools – e.g. the ability to create multi-dimension graphs [4]
Visualization of datasets in tables and graphs
The ability to cross reference between entries in the database [3]
The ability to include audio and video snippets in addition to textual information [4]
The ability to connect specific points on drawings or photographs to database entries (e.g. to
link information about a pot discovered in an archaeological dig to the place where it was
found) [2]
A tool that allows you to create digital drawings which can go straight into the database [2]
The ability to connect drawings or photographs to each other – to show how the areas
depicted relate to each other, for example [2]
“We need to be able to link to maps of the many different storage areas in a relational way.”
A version of the system designed for use on mobile devices such as the iPad, so it’s possible
to enter data while you’re doing fieldwork [2]
The ability to parse data more easily – i.e. to transform non-standardized information into
something you can actually work with and analyse [1]
“It would be great if sufficient storage space were available to create an easily accessible
archive of earlier datasets.” [7]
Mail merge functionality [9]
Data manipulation, either through scripts (in the style of PHP, R, etc.) or through a data
mining interface (e.g. Cognos)
Summary info, data inconsistency checks, and some basic textual analysis
Native database functionality
“Being able to work with no or only very slow network access is vital for me – e.g. during
remote fieldwork.”
Customization and scripting capabilities for the web-based front-end of databases created
Being able to generate linked data as RDF
Platform interoperability
Some researchers told us that it would be very helpful if the database service was able to interact
with other software and tools:
An archaeology graduate student observed that standard database packages (such as
Microsoft Access and FileMaker Pro) are usually unsuitable for doing work in her field, as
they cannot interact with the GIS tools that archaeologists use – the Harris matrix, for
example. If the DaaS were able to do this, that would make it very attractive [2].
A sociology doctoral student observed that it would be really helpful to have an easy way of
bridging the gap between the database and analysis programs such as Stata, SPSS, or R, and
between the database and visualization tools [1].
This interviewee did most of his analysis using Python, and also commented that it would be
useful to have an easy interface with this [1].
A desire for documentation and other accompanying materials was also expressed:
Comprehensive, cross-platform documentation [1]
23
01.Version VIDaaS Project Researcher Requirements Report
Tools/training resources to ease the process of designing the database (and designing it
well!) at the beginning of the project [9]
'Out-of-the-box' support for the University of Oxford's data access policies (in the form of a
short and reassuring document for managers and some extremely detailed documentation
for IT staff)
Finally, there were comments relating to financial issues:
Cost is the major stumbling block; if it was free at point of use for individual researchers, and
paid for by funders for funded research projects and institutions, that would be a major
benefit to research.
6.2.3. Interviewees – Views on Training
In addition to asking about user requirements for the database service itself, we also talked briefly to
the interviewees about training. We asked what training in data management or data handling they
had personally received, how happy they were with existing provision, and what sort of training
resources they would like to see offered alongside the DaaS.
Most of the interviewees were largely self-taught. Some had attended short courses (lasting
between a few hours and a few days) to learn to use a particular piece of software: in particular, it
was moderately common for those using statistical analysis packages to have received training (a
senior lecturer told us this was standard for graduate students in the social sciences). Two
interviewees had done more extended training (a Master’s degree in IT [6], and a longer course on
Access [9]). However, it was most common for people to learn to use a new piece of software by
experimenting with it, reading documentation or online resources, and perhaps asking colleagues or
friends for advice.
“If I have a project that needs me to learn something new in order to do it, I’ll do that – but
I’m unlikely to pick it up by just doing exercises.” [1]
“You generally learn how to use things like simulation software on the job, from the people
you’re working with [...] new people joining a lab tend to learn very quickly, as you can’t
really do anything without it.” [7]
In general, the researchers we spoke to were reasonably happy with the software training that was
provided. However, several people commented that it would be useful to have more training
available on the general principles of data management and database design [1, 2, 3, 5, 9].
“Ideally, training would be modular – you’d start with general training in the theory of
working with databases, which would be relevant whatever program you were using, and
then you’d add specific training on how to get the particular package you’re using to do
what you want it to do.” [2]
General–Training
24
01.Version VIDaaS Project Researcher Requirements Report
“It would be very helpful to have a short course – maybe a lunchtime session – about why
people should consider databases and what they’re helpful for. The idea of a database is
abstract, and without filling it with life, it’s quite difficult to convey its attractiveness.” [3]
“On previous projects, it’s been glaringly obvious that the scientists didn’t really know about
databases. They had done the best they could, [...] but they weren’t aware that database
design was important and that they should have got proper advice at the beginning.” [9]
Some interviewees commented that ideally, they would like to be able to consult someone on a one-
to-one basis for help in designing a database for their particular project [2, 3, 9].
As noted in Section 6.2.2 above, there was a general feeling that for the service to be attractive, it
should be sufficiently intuitive to be usable with little or no training. The majority of interviewees
were happy to figure out how to do things using help files and documentation – as long as these
provided a sufficiently clear guide [1, 3, 4, 5, 7]. It was also commented that it was important for
accompanying materials to be searchable, and to be presented in a way that meant it was easy to
find information on the specific task you were interested in.
Some interviewees also said they liked online tutorials – either demonstrations of how to use the
software, or specific interactive examples you can work through, with a view to learning how to
apply particular techniques to your own material [4, 5, 7, 8].
A couple of researchers mentioned print resources, in addition to or instead of electronic ones.
“I actually rather like to have a hard copy of training material (as long as it’s not too long!),
rather than just an online resource: it’s much easier to flick through the paper version and
find what you’re after.” [7]
There were mixed views about face-to-face courses. Some researchers actively preferred this sort of
training, while others were less enthusiastic.
“Face-to-face is always better [...] though it needs to be run regularly, so that new starters
and people who’ve changed roles have an opportunity to learn.” [8]
“Nothing can replace face-to-face courses – because they commit you to going and doing
the training and setting aside specific time for it.” [3]
“Face-to-face training has the advantage of allowing you to talk to other people.” [9]
“I have been on courses in the past – but I’m not certain how much help they’ve actually
been.” [1]
It was acknowledged that not all researchers work in the same way.
“A combination of face-to-face courses and online material would be good – different things
suit different people.” [9]
A postdoctoral researcher who usually preferred to work things out for herself commented
that she had a colleague who would much rather go on a course to learn the basics. [4]
Similarly, a senior researcher who was not a great fan of face-to-face training himself
observed that “A lot of students like courses.” [5]
Training Requirements for the DaaS
25
01.Version VIDaaS Project Researcher Requirements Report
Some researchers cited lack of time as a reason for preferring online resources.
“My working time is already pretty full with research: taking time off to do a lot of training
isn’t a high priority.” [4]
“I find it increasingly difficult to fit in more courses; the ratio of time and effort to benefit,
and the risk of them turning out not to be that useful makes them less attractive.” [5]
Although it was also noted that online resources have their drawbacks.
“Online courses are more flexible and can be done at any time, but in practice this means
you often don’t do them at all.” [3]
Opinions also varied about whether face-to-face courses were more useful for learning the basics of
a system, or for more advanced features.
“For technical questions that arise later on, I’m happy to use documentation, but for getting
started and getting basic ideas, face-to-face is useful.” [3]
“I’m happy to use help menus and online tutorials, but if that isn’t enough to get me at least
a little way, I probably won’t bother going further – though I might then go for more training
in the subtleties of a program, or when I run up against a specific question about how to do
something.” [4]
However, there was a general consensus that for face-to-face training to be really useful, it should
be short (perhaps half a day or so), and focused on the issues most likely to interest researchers.
“I could see myself going on a course that covered uploading data for publication – if there’s
an API that could be demonstrated in an hour or two, that’s fine, but I don’t want to spend a
week on it.” [1]
“It might be useful to have a short (say three hour) information session that would give
people the basics, tell them what else they might need to know about, and point them
towards further resources that they could follow up.” [7]
Comparison with Sudamih Findings
While it should be borne in mind that the two requirements gathering exercises were not wholly
equivalent (for example, as the Sudamih Project was somewhat broader in scope, the interviewees
included researchers who were not working with structured data; the Project also produced a much
larger body of qualitative data, and had no quantitative element), some useful comparisons may be
drawn.
In broad terms, the requirements of the researchers canvassed in the two exercises were similar.
Both groups expressed a desire for an interface that was straightforward to learn, while offering
sufficient flexibility to accommodate a wide range of projects.
The prospect of being able to share data with colleagues and to publish it online also appealed to
both groups. However, the VIDaaS interviews brought to light two specific considerations that were
not prominent during the Sudamih Project: first, the importance of being able to share data with
7.
26
01.Version VIDaaS Project Researcher Requirements Report
colleagues outside Oxford, and outside the UK, and secondly, the desire to be able to publish a
specific subset of a dataset rather than the whole thing.
In the Sudamih requirements gathering process, a substantial number of researchers said it was
essential to their work to be able to use diacritics and non-standard character sets. This did not
appear to be a significant issue for any of the VIDaaS interviewees, suggesting that this is chiefly of
relevance to humanities scholars. The VIDaaS interviewees, on the other hand, seemed more
concerned with data security than the Sudamih researchers had been, at least partly because several
of them were working with confidential or otherwise restricted information.
Both groups showed a modest amount of interest in the idea of a service that allowed them to find
out about the datasets other researchers were working on, although other aspects of the service
excited more enthusiasm. Finally, the views of the two groups on training were very similar.
Conclusions and Recommendations
Both the interviews and the survey responses indicate that a substantial portion of researchers
regard the DaaS as something that could have a positive impact on their research.
Although researchers described a wide range of user requirements, there were a number of
common themes. These included a desire for automated backup, the ability to import and export
data, and flexibility. One of the most frequently mentioned requests was that the system should
have an intuitive interface that could be used without much training.
Recommendation 1: DaaS development should proceed in line with the prioritized list of
requirements compiled as a result of this exercise.
Recommendation 2: As the service develops, it should be subjected to regular user testing
to ensure that it meets requirements, and in particular the requirement to be
straightforward and intuitive to use.
Recommendation 3: The DaaS should be accompanied by documentation and support
material that provide a clear and easily navigable guide to performing a range of common
tasks.
Recommendation 4: Any face-to-face training courses that are provided should be short and
focused.
When considering making use of a new tool or service (especially one without a proven track
record), researchers naturally wish to know what it will offer them that is not provided by the
resources they have used previously. The requirements gathering exercise highlighted two features
that seemed to catch researchers’ attention: the ability to share data securely with collaborators
(including those outside Oxford), and a straightforward way of publishing particular subsets of their
data to accompany research publications.
Recommendation 5: These features should be emphasized in publicity and training material
for the DaaS.
8.
27
01.Version VIDaaS Project Researcher Requirements Report
However, researchers also want to be sure that the new service will offer them the functionality
they have come to rely on in the tools and services they are currently using.
Recommendation 6: To avoid giving users a misleading or simply inadequate impression of
the DaaS’s purpose and capabilities, it will be important to ensure that core functionality and
other features are not neglected.
In general terms, many researchers seem rather ambivalent to data publication. Most have many
calls on their time, and unless it is required of them, it is usually low on the list of priorities.
Recommendation 7: To encourage use of the DaaS for data sharing, training materials
should cover not just how to publish data, but why data publication is worth considering.
In some cases, it seems that researchers may be choosing not to make data public because the effort
of doing so outweighs the potential benefits; if the DaaS can make the process more
straightforward, this might possibly encourage more data publication.
When promoting the DaaS, it is important to catch people at the right point in the research cycle.
Unless there are serious deficiencies in their existing strategy, researchers are usually reluctant to
make major changes to their data management methods or tools once a project is underway. On the
other hand, they are far more willing to give serious consideration to a new system for a project that
is still in the planning stages, or which has only just begun.
Recommendation 8: The DaaS should be publicized extensively to researchers who are likely
to be at the beginning of a research project – for example, new doctoral students and new
post-docs.
Recommendation 9: Particular efforts should be made to ensure awareness of the DaaS
among research facilitators and other support staff who are likely to advise researchers
during the planning stages of a project.
Recommendation 10: While it is perhaps not where the focus of efforts should lie, it will
nevertheless be worth also taking advantage of any opportunities to promote the DaaS
more widely, so that researchers are more likely to be aware of it at the point when it
becomes relevant to them.
Humanists and social scientists have shown most interest in the DaaS thus far. However, although a
comparatively small number of hard science researchers responded to the survey invitation, IT
support staff working with researchers in this area were reasonably well represented, and seemed
as interested in the service as those working in other disciplines.
Recommendation 11: For maximum effect, publicity aimed at researchers should be
concentrated on the humanities and the social sciences.
Recommendation 12: Researchers in other disciplines should be reached indirectly, via the
IT support personnel who advise them.
Although there is still a significant amount of work to be done for the DaaS to satisfy the
requirements discussed in this report, this exercise seems to indicate that by facilitating
28
01.Version VIDaaS Project Researcher Requirements Report
improvement of data management and curation practices and increased data sharing, the DaaS is a
service with the potential to be of considerable value to individual researchers, and beyond that, to
institutions and the research community at large.
29
01.Version VIDaaS Project Researcher Requirements Report
Appendix A: Index of Interviewees
ID Academic Division Researcher Details
1 Social Sciences Doctoral student in sociology
2 Social Sciences Doctoral student in archaeology
3 Social Sciences Postdoctoral researcher in sociology
4 Social Sciences Postdoctoral researcher in politics
5 Social Sciences Senior researcher in political sociology
6 MPLS Doctoral student in engineering science and zoology
7 MPLS Senior researcher in chemistry
8 Medical Sciences Early career researcher in health economics
9 Medical Sciences Researcher in cancer epidemiology
Throughout this report, numbers in square brackets indicate in which interview a point was raised or
a comment was made.
30
01.Version VIDaaS Project Researcher Requirements Report
Appendix B: Interview Question Template
Introduction:
We’re conducting this interview as part of the VIDaaS Project – VIDaaS standing for ‘Virtual Infrastructure with
Database as a Service’. One major aim of the project is to develop a software tool that enables people to build,
edit, search, and share databases online. A pilot version of the tool already exists – this was developed as part
of the earlier SUDAMIH (Supporting Data Management Infrastructure for the Humanities) Project, and we’re
now aiming to improve and expand this into a full service.
We’d like to gain a better understanding of some of the ways that researchers are using structured data and
databases in the course of their research, and of what they’d be looking for in a database service – what’s
essential, what additional features would be useful, and what they’d like to be able to do that isn’t currently
easily possible.
If time permits, we’d also like to talk briefly about sharing and publishing data, and about the sort of training it
would be useful to have available.
We’ll keep the interview to no more than 60 minutes.
We’d like to record the interview, if that’s OK with you. The recording will only be used by project team
members – it won’t be made public. However, we would like your permission to use anonymized quotations in
project reports and other documents. Is that OK with you? [Respondent asked to sign consent form.]
Interview:
Could you start by telling us a little bit about the research you are engaged in?
What’s your area of research? Do you work in a project team or as an individual?
What sort of use do you make of structured data? What software tools do you use?
[How do you store your data? Is it backed up anywhere?]
If the data is stored in a database, how big is this? How much further do you expect it to
have grown by the end of the project?
o How many tables does it have?
o How many people use it?
o Do you have a sense of what the input and output are (that is, how much data (in
bytes) is being pumped in and out of it each day)?
Are there things you’d like to be able to do with your data but can’t, because the software
doesn’t permit it (or because it would be too difficult or time consuming)?
31
01.Version VIDaaS Project Researcher Requirements Report
The DaaS
The DaaS will be an online service that allows researchers to create, use, and share
databases. It will also be possible to import existing databases into the DaaS.
o Is this a service you could envisage a use for in your own research?
o What would such a service need to offer to make it attractive?
Data sharing
Who owns the data generated by your research project? If you were to move to a different
institution, what would happen to it?
Is your data accessible to anyone other than those working on your project? Do you have
plans to make it available in the future? (If not, what are the reasons for that decision?)
As well as permitting the sharing of datasets among colleagues, the DaaS will provide users
with a straightforward way of making their data publicly available online. Is this a feature
you could see yourself using?
Do you ever make use of other people’s datasets? How do you go about finding out about
and accessing these?
DaaS users will have the option of adding information about their project to a central system
that allows other users to see what they’re working on.
o Would you be happy making details of your own research available in this way?
o Is this something you might make use of to find out what other researchers are
working on?
Thinking now about training…
Have you received any training in either general database design or the use of specific
database software?
[How much IT support is available for the more technical aspects of your work? Are you
happy with the current provision, or are there areas in which you’d like more help to be
available?]
Given your own experience and that of your colleagues/students, do you see a need for
more training in either generic data skills (e.g. selecting the appropriate software tools, the
principles of database design, etc.) or in the use of specific pieces of software?
Other than the program’s own help files, what resources would you like to see being made
available for a new service like the DaaS? Face-to-face training courses? Online tutorials?
Video demos?
What advice would you give to graduate students or new researchers just starting data-
driven research? Is there anything you wish you’d learnt earlier?
32
01.Version VIDaaS Project Researcher Requirements Report
Appendix C: Survey Questionnaires
Researcher Survey
The VIDaaS (Virtual Infrastructure with Database as a Service) Project, based at Oxford University Computing
Services, is in the process of developing an online database service for researchers. Consequently, we are
interested in finding out about researchers’ current use of structured data (that is, the sort of data which
might be stored in tables, spreadsheets, or databases), and the features they would find useful in a database
service.
If you are conducting academic research involving structured data, we would like to hear from you, and
would be very grateful if you would complete the survey below. We estimate that the survey will take
around 10-15 minutes to complete. All respondents will be entered into a draw for a £100 Amazon voucher.
If you are involved in providing IT support to researchers working with structure data, please complete the
alternative version of this survey, which can be found here [link].
Section 1: About your project
1. What sort of data do you regularly work with? (Please select all that apply.)
Textual
Numerical
Images
GIS data
Audio and/or video
Other (please specify)
2a. What software or other tools do you use to store and analyse your structured data? (Please select all that
apply.)
Plain text files
A word processing program such as Microsoft Word
A spreadsheet program such as Microsoft Excel
Microsoft Access
FileMaker Pro
Another widely available database package
A custom-built relational database system
Other custom-built software
A statistical analysis package such as SPSS or Stata
XML documents
Other (please specify)
2.a. If you use another widely available database package, what is it called? (Optional)
Plain text files
Tables in a word processing program such as Microsoft Word
A spreadsheet program such as Microsoft Excel
Microsoft Access
FileMaker Pro
Another widely available database package
A custom-built relational database system
Other custom-built software
A statistical analysis package such as SPSS or Stata
XML documents
Other (please specify)
4. Approximately how big is the dataset you’re currently working with? (If you use multiple datasets
regularly, please estimate the approximate total size of the data collection.)
Under 1GB
Between 1GB and 10GB
Between 10GB and 100GB
Over 100GB
I don’t know
5. By the end of your current project, is it most likely that your dataset will:
Not be significantly bigger than it is at present?
Have grown, but not to more than double its current size?
Be more than double but less than ten times its current size?
Be more than ten times its current size?
I don’t know
6. Where do you host your data at present?
On local storage formats, such as DVDs, memory sticks, your computer’s hard-drive, etc.
On network-attached storage managed by your research project
On a departmental server
On centrally-provided storage (e.g. on a server provided by central university IT support services )
In the cloud
I don’t know
Other (please specify)
6.a. If you store your data in the cloud, please specify who provides your cloud storage:
to store and analyse your structured data?most3. Which of these do you use
33
7. Thinking now about data sharing: which of the following statements are true of you and your research
project? (Please select all that apply.)
NB. In the statements below, ‘publicly available’ describes datasets that are generally accessible by members of the research
community – for example, those published on a website or deposited in an archive.
Making use of publicly available datasets (provided by other researchers or organizations) is
important for my research
I have made my own research data publicly available in the past
I intend to make all or most of the data from my current research project publicly available in the
future
I do not intend to publish the complete dataset from my current project, but have already made
or expect to make limited subsets of it publicly available – e.g. to accompany research
publications
I do not currently plan to make any data from my current research project publicly available
I would like to make my data publicly available, but don’t currently have a straightforward means
of doing so
My funding body requires me to make my data publicly available
I have shared data privately with colleagues (e.g. by emailing a file) in the past
I would be happy to share data privately with colleagues (e.g. by emailing a file) if asked
I am in principle happy to make my data publicly available at any point in a research project
I am in principle happy to make my data publicly available once I have completed the work I
intend to do with it and published the results
I generally prefer not to share my research data
None of the above
8. Please select the appropriate option for each of the following statements:
True of all or
most of my data
True of a substantial portion of my data
True of little or none of my
data
a. My data is of potential value or interest to other researchers in higher education
b. My data is of potential value or interest outside the HE community
c. My data has potential commercial value
d. There would be little or no value in sharing my data: other researchers are unlikely to find it useful
e. There would be little or no value in sharing my data: it’s already publicly available from other sources
f. Confidentiality or intellectual property restrictions make it hard to share the data I am working with
g. The decision about whether to make the data I am working with available does not rest with me
34
9. Please think about the software packages you use (or might use in the future) to manage your data.
Which of the following features are (or would be) most important or useful for your work?
EssentialUseful, but
not essentialNot relevant to my work
Unsure what this means
a. Automated backup
b. Automated versioning (that is, the ability to return to previous versions of the dataset)
c. The ability to import or export data in a range of formats
d. The ability to view or present data in different ways (e.g. by creating customized forms or reports)
e. The ability to plot results on maps
f. The ability to enter and search data in XML formats
g. The ability for multiple users to access and edit the same database
h. The ability to set different permission levels (that is, to control the extent to which different users can add, edit, or delete information)
i. The ability to make data publicly available via the Web
j. Document-oriented database functionality
k. A mail merge function
Section 4: The DaaS
10. The DaaS (Database as a Service) will be an online service that allows researchers to create databases
from scratch, import existing databases, work with the data via a clear Web interface, and (if desired) share
access to the database with other members of a project team or research group. It will be centrally
supported, enabling integration with back-up systems and timely upgrades. It has not yet been decided how
the service will be funded. Is this a service you could envisage a use for in your own research?
Yes
No
It depends
10.a. On what, most importantly?
Section 3: Data management software features
Section 2: Data sharing
11. The DaaS will also provide researchers with a
Yes, definitely
Yes, possibly
Unsure
No – I am happy with existing ways of publishing my data
No – I don’t expect to publish my data
11.a. If you answered ‘Yes’, what would you be most likely to publish?
Complete datasets for use by other researchers (or other interested parties)
Specific subsets of my data – e.g. to support research publications
Both complete datasets and specific subsets
Unsure
Not applicable
7d. (Optional) Other than those listed above, are there any particular features that would make a service like
the DaaS attractive to you?
7e. Can you envisage any ways in which a service like the DaaS could help you save time, improve your
research, increase efficiency, or otherwise make life easier?
7e. We are currently recruiting a group of test users for the DaaS. Would you be interested in finding out
more about this?
Yes
No
Section 5: Finally, a few questions about you:
Are you:
A graduate student?
A postdoctoral/early career researcher?
A mid-career or senior researcher?
Other (please specify)
Which subject area are you currently working in?
Humanities
Creative arts
Social sciences
Maths or physical sciences
Medical or life sciences
straightforward way of publishing their data online. Is this a
feature you could envisage using to publish your own datasets?
Do you:
Other (please specify)
Mostly work alone?
Work as part of a project team?
Your name (optional)
Your institution
Your email address (optional)
Email addresses will be used only to contact the winner of the prize draw and those who have expressed an interest in the DaaS test user group.
Thank you for taking the time to complete this survey.
35
IT Support Staff Survey
The VIDaaS (Virtual Infrastructure with Database as a Service) Project, based at Oxford University Computing
Services, is in the process of developing an online database service for researchers. Consequently, we are
interested in finding out about researchers’ current use of structured data (that is, the sort of data which
might be stored in tables, spreadsheets, or databases), and the features that would be useful in a database
service of this sort.
If you are involved in providing IT support for researchers working with structured data (either by providing
advice, or through direct involvement in designing, building, and working with databases), we would like to
hear from you, and would be very grateful if you would complete the survey below. We estimate that the
survey will take around 10-15 minutes to complete. All respondents will be entered into a draw for a £100
Amazon voucher.
If you are a researcher using structured data, please complete the alternative version of this survey, which
can be found here [link].
Section 1: About the projects you work on
1. What sort of data do you (or the researchers you advise) regularly work with? (Please select all that apply.)
Textual
Numerical
Images
GIS data
Audio and/or video
Other (please specify)
2. To which subject area(s) do the research projects you work on or advise belong? (Please select all that
apply.)
Humanities
Creative arts
Social sciences
Maths or physical sciences
Medical or life sciences
Other (please specify)
3. What software or other tools are commonly used to store and analyse structured data in the projects you
advise or work on? (Please select all that apply.)
Plain text files
A word processing program such as Microsoft Word
A spreadsheet program such as Microsoft Excel
Microsoft Access
FileMaker Pro
Another widely available database package
A custom-built relational database system
Other custom-built software
A statistical analysis package such as SPSS or Stata
XML documents
Other (please specify)
3.a. If another widely available database package is used, what is it called? (Optional)
4. Which of these would you say is most commonly used to store and analyse data in the projects you advise
or work on?
Plain text files
Tables in a word processing program such as Microsoft Word
A spreadsheet program such as Microsoft Excel
Microsoft Access
FileMaker Pro
Another widely available database package
A custom-built relational database system
Other custom-built software
A statistical analysis package such as SPSS or Stata
XML documents
Unknown
Other (please specify)
5. Please think about a recent project you have worked on (or are still working on) or advised. Approximately
how big is/was the dataset in question?
(If you are working on multiple projects, please select a project you consider reasonably typical and answer
for that: the idea of this question is to get a snapshot of a sample of current or recent research databases.)
Under 1GB
Between 1GB and 10GB
Between 10GB and 100GB
Over 100GB
Unknown
6. If the project is ongoing, is it most likely that by the time the project concludes, the dataset will:
Not be significantly bigger than it is at present?
Have grown, but not to more than double its current size?
Be more than double but less than ten times its current size?
Be more than ten times its current size?
Unknown
Not applicable – the project has already concluded
36
On local storage formats, such as DVDs, memory sticks, your computer’s hard-drive, etc.
On network-attached storage managed by members of the research project team
On a departmental server
On centrally-provided storage (e.g. on a server provided by central university IT support services )
In the cloud
Unknown
Other (please specify)
7.a. If the data is stored in the cloud, please specify who provides the cloud storage:
Section 2: Data sharing
8. Thinking now about data sharing: in your experience, how often are the following statements true of the
researchers you advise or work with?
NB. In the statements below, ‘publicly available’ describes datasets that are generally accessible by members of the research
community – for example, those published on a website or deposited in an archive.
Frequently true
Occasion-ally true
Rarely or never true
Unknown
a. Researchers want to be able to make their research data publicly available while the project is still ongoing
b. Researchers want to be able to make their research data publicly available at the end of the project
c. Researchers don't want to publish their whole dataset, but do want to make limited subsets of it publicly available - e.g. to accompany research publications
d. Researchers prefer not to make any of their data publicly available
e. Researchers' funding bodies require them to make their data publicly available
f. Researchers would like to make their data publicly available, but don't currently have a straightforward means of doing so
g. Confidentiality or intellectual property restrictions make it hard for researchers to share their data
No
It depends
9. Please think about the software packages used (or which might be used in the future) to manage data in
the projects you advise or work on. Which of the following features are (or would be) important or useful?
EssentialUseful, but
not essentialNot relevant to my work
Unsure what this means
a. Automated backup
b. Automated versioning (that is, the ability to return to previous versions of the dataset)
c. The ability to import or export data in a range of formats
d. The ability to view or present data in different ways (e.g. by creating customized forms or reports)
e. The ability to plot results on maps
f. The ability to enter and search data in XML formats
g. The ability for multiple users to access and edit the same database
h. The ability to set different permission levels (that is, to control the extent to which different users can add, edit, or delete information)
i. The ability to make data publicly available via the Web
j. Document-oriented database functionality
k. A mail merge function
Section 4: The DaaS
10. The DaaS (Database as a Service) will be an online service that allows researchers to create databases
from scratch, import existing databases, work with the data via a clear Web interface, and (if desired) share
access to the database with other members of a project team or research group. It will be centrally
supported, enabling integration with back-up systems and timely upgrades. It has not yet been decided how
the service will be funded.
Is this a service you could envisage a use for in the projects you advise or work on?
Yes
10.a. On what, most importantly?
Section 3: Data management software featuresoject, where is the data hosted?Still thinking about the same pr. 7
37
11. The DaaS will also provide researchers with a straightforward way of publishing their data online. Is this a
Yes, definitely
Yes, possibly
Unsure
No – data would probably continue being published by other means
No – data publication isn’t relevant for the projects I work on
11.a. If you answered 'Yes', what would you expect researchers to be most likely to publish? (Optional)
Complete datasets for use by other researchers (or other interested parties)
Specific subsets of my data – e.g. to support research publications
Both complete datasets and specific subsets
Not applicable
Other t
Unknown
12. han those already mentioned, are there any particular features that would make a service like the
DaaS attractive to you or to the researchers you work with? What might make you inclined to recommend it
to researchers in preference to existing solutions? (Optional)
13. Can you envisage any ways in which a service like the DaaS could help you or the researchers you work
with to save time, improve the quality of research, increase efficiency, or otherwise make life easier?
(Optional)
14. We are currently recruiting a group of test users for the DaaS. Would you be interested in finding out
more about this?
Yes
No
Section 5: Finally, a few questions about you:
15. Which of the following best describes your role?
General IT support within a higher education institution, including some advice and/or support for
researchers
IT support and/or advice specifically targeted at researchers, but not focused on one specific
project
IT support for one specific research project
Other (please specify)
Your institution
Your email address (optional)
Email addresses will be used only to contact the winner of the prize draw and those who have expressed an interest in the DaaS test user group.
Thank you for taking the time to complete this survey.
al)(optionYour name
on making use of?feature you could envisage the projects you advise or work
38