Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
ANDS 2012-13 Business Plan Page 1 of 118
Australian National Data Service (ANDS)
BUSINESS PLAN 2012-13
ANDS 2012-13 Business Plan Page 2 of 118
INDEX
1 Executive Summary 3
2 ANDS Context and Approach 4
3 Status of Project 9
3.1 Establishment 9
3.2 NCRIS funding 9
3.3 Additional Funding 10
3.4 Time Extension and Reporting 11
3.5 The Function of ANDS 11
3.6 Preparation for ANDS Function beyond June 2013 12
3.7 ANDS Programs and Relationship between NCRIS and EIF Activities 13
3.8 ANDS Principles 15
3.9 Scope 16
3.10 ANDS’ Impact 17
4 Research Infrastructure 23
4.1 Frameworks and Capabilities (NCRIS) 24
4.2 Seeding the Commons (NCRIS) 31
4.3 National Collections (NCRIS) 46
4.4 Data Capture (EIF) 52
4.5 Metadata Stores (EIF) 71
4.6 Public Sector Data (EIF) 76
4.7 ARDC Core Infrastructure (EIF) 84
4.8 ARDC Applications (EIF) 90
4.9 International Infrastructure 93
4.10 Overall 2012-13 Outcomes 96
5 Program Engagement Strategies 99
6 Promotion 100
7 Access and Pricing 102
8 Governance, Management and Implementation 103
8.1 Governance Framework 103
8.2 Risk Management 105
8.3 Milestones for 2012-13 111
8.4 Key Performance Indicators 115
ANDS 2012-13 Business Plan Page 3 of 118
1 Executive Summary The Australian National Data Service (ANDS) was established in January 2009 following the ANDS
Establishment Project. ANDS was originally created as part of the National Collaborative Research
Infrastructure Strategy (NCRIS) initiative to ensure that research data is used as effectively as
possible by Australian researchers. The Super Science initiative announced in May 2009 provided
additional funding from the Education Investment Fund (EIF) to establish the Australian Research
Data Commons (ARDC). This provided the opportunity to leverage both NCRIS and EIF funds to build
a co-ordinated set of programs through to June 2013.
ANDS exists to transform Australia’s research data environment by making Australian research data
collections more valuable though managing, connecting, enabling discovery and supporting the
multiple use of this data. The purpose of this activity is to enable richer research, more accountable
research, more efficient use of research data, and improved provision of data to support policy
development. The outcome of this activity will be that Australia’s research data as a whole becomes
a nationally strategic resource.
In this document ANDS’ plans for the 2012-13 financial year are described in detail. This plan is in
accordance with our extended strategic direction and completes the very large number of activities
conducted over the previous years. These activities are largely being carried out by others in
partnership with ANDS to deliver national reach, ensure re-use and maintain coherence. Rather than
institutions just meeting ANDS’ goals, the team will encourage organisations, research groups and
researchers to realise their research data ambitions.
There are no substantial changes to the proposed plan of work outlined in the 2011-12 business plan
other than the introduction of the National Collections program, the addition of the International
Infrastructure program, and a refinement of our Metadata Stores program engagements.
Significant progress has been made to date and by June 2012 ANDS will have:
established several national services: a data collections registration service, a dataset identification service through the DataCite consortium, a data collection description publication service, a researcher identification service in partnership with the National Library of Australia, and Research Data Australia – a data collections discovery service;
populated the ARDC with data collections that have been managed, connected, and have 35,000 collections descriptions discoverable through Research Data Australia, Google and other mechanisms;
helped establish coherent institutional research data infrastructure at 32 Universities, CSIRO, the Australian Synchrotron and ANSTO, including automated tools for capturing rich metadata from instruments, metadata stores, pipes connecting institutional systems, and connecting to national services;
improved the ability of the Australian research system to exploit the ARDC with guidance on research data management, including responding to the Australian Code for the Responsible Conduct of Research; and
improved data management around the country with a substantially increased cohort of research data managers, and new tools that enable more effective re-use of research data.
ANDS 2012-13 Business Plan Page 4 of 118
During 2012-13, ANDS will:
establish more national services: a “see-also” service that enables other discovery tools to use the ANDS discovery service, a more tightly integrated data citation service, a research activity identification service and an enhanced Research Data Australia;
populate the ARDC with data collections that have been managed and connected, ensuring over 40,000 collections descriptions discoverable through Research Data Australia, Google and other mechanisms;
support the custodians of a large number of nationally significant research collections in making those collections more connected, visible and available, often through a RDSI node;
work with NeCTAR and others to ensure that eResearch teams have access to the best tool suite appropriate to a wide range of data;
establish an international role for Australia in research data infrastructure,
help Australia’s researchers at institutions producing 98% of Australia’s research to get access to coherent institutional research data infrastructure, including automated tools for capturing rich metadata from instruments, metadata stores, pipes connecting institutional systems, and connecting to national services, and connected into appropriate discipline data stores and portals with RDA;
demonstrate that researchers are publishing their data, citing their data and being cited, as well as demonstrating the effective re-use of data; and
demonstrate the value of integrated and assembled research data environments through leading researchers providing practical demonstrations of value in a wide range of disciplines but with two foci in climate adaptation research and biological research.
By the end of 2012-13 it is expected that researchers across every discipline that uses research data
will be represented in the Australian Research Data Commons, and nearly all research institutions
will have improved their research data management, leading to routine publication of their data with
ANDS persistent identifiers into a data store that feeds information to the ANDS collections registry.
In addition, researchers will be able to find a wide variety of data sets using the ANDS data pages
through a variety of discovery paths, and more institutions will be successfully engaged in meeting
their responsibilities described in the Australian Code for the Responsible Conduct of Research. Most
importantly ANDS will have engaged the research community to the extent that researchers see
publishing their research data as their default practice.
2 ANDS Context and Approach Research is becoming more data intensive, and the data is becoming more complex. Moreover the
problems being tackled are increasingly large scale and span multiple disciplines. Consequently, and
as a result of high level Government reviews, the Department of Innovation, Industry, Science,
Research and Tertiary Education (DIISRTE) has invested in improving the Australia’s research sector’s
capability to use and re-use research data. This has been guided first by the NCRIS roadmaps and
then by the document entitled Towards the Australian Data Commons1 (TADC).
1 Available online at http://www.pfc.org.au/twiki/pub/Main/Data/TowardstheAustralianDataCommons.pdf
ANDS 2012-13 Business Plan Page 5 of 118
In support of this goal, Towards the Australian Data Commons identified a range of objectives for
ANDS. These objectives are based on the belief that “ANDS can contribute most effectively by
developing services and activities that enable stewardship within multiple federations of data
management and data user communities” (p. 6). TADC identified a number of longer-term objectives
for data management:
a) A national data management environment exists in which Australia’s research data reside in a cohesive network of research repositories within an Australian ‘data commons’.
b) Australian researchers and research data managers are ‘best of breed’ in creating, managing, and sharing research data under well-formed and maintained data management policies.
c) Significantly more Australian research data is routinely deposited into stable, accessible and sustainable data management and preservation environments.
d) Significantly more people have relevant expertise in data management across research communities and research managing institutions.
e) Researchers can find and access any relevant data in the Australian ‘data commons’.
f) Australian researchers are able to discover, exchange, reuse and combine data from other researchers and other domains within their own research in new ways.
g) Australia is able to share data easily and seamlessly to support international and nationally distributed multidisciplinary research teams. (p. 6)
As a result of these goals, initial activity, and consultations, ANDS role is to enable Australia’s
research data to be transformed:
From Data that are:
Unmanaged
Disconnected
Invisible
Single use
To Structured Collections that are:
Managed
Connected
Findable
Reusable
This will form a nationally significant resource so that Australian researchers can easily publish,
discover, access and use Australian research data.
ANDS is doing this by creating the Australian Research Data Commons (ARDC), the focus of the Super
Science project. The ARDC is a combination of the set of shareable Australian research collections,
the descriptions of those collections including the information required to support their re-use, the
relationships between the various elements involved (the data, the researchers who produced it, the
instruments that collected it and the institutions where they work), and the infrastructure needed to
enable, populate and support the commons. ANDS does not hold the actual data, but points to the
location where the data can be accessed. The ARDC can be envisaged below, where ANDS is
contributing to the green pipes and boxes:
ANDS 2012-13 Business Plan Page 6 of 118
Figure 1: Australian Research Data Commons
ANDS is thus creating a combination of national services and coherent institutional research data
infrastructure, combined with the ability to exploit that infrastructure with tools, policy and
capability. To deliver against these objectives, ANDS has nine inter-related programs of activity
(Frameworks and Capabilities, Seeding the Commons, National Collections, Data Capture, Metadata
Stores, Public Sector Data, ARDC Core, ARDC Applications and International Infrastructure).
The ANDS activities (whether NCRIS funded or EIF funded) are both being conducted under the same
management structure which can thus maximise opportunities for cost savings. However, as required
by contract, separate reports will be provided to DIISRTE for the NCRIS service and EIF project. ANDS
has also entered into three other Funding Agreements with the Department. The first agreement was
to progress outcomes arising from the First Joint Australia-EU Research Infrastructure (RI) Workshop
in June 2011. Subsequently, in 2012, two more agreements were entered into, for the Second Joint
RI Workshop, and the DataWeb Forum. This business plan provides a combined view of how ANDS
intends to execute all of these activities in 2012-13.
The ARDC Project represents a very significant change in the approach to research data in Australia,
enabling Australia’s research data as a whole to become a nationally strategic resource. It will give an
advantage to Australian researchers in three significant ways:
Research data will be routinely published, enhancing the visibility, reproducibility and reputation of Australian researchers.
It will be easier for international researchers to work with Australian researchers because of the excellence and visibility of the Australian research data environment.
ANDS 2012-13 Business Plan Page 7 of 118
New research will be carried out using existing data more effectively and often, exploiting more completely the value of Australia’s research data.
The ARDC will be a populated information infrastructure that will achieve the critical mass necessary
to become the primary means by which researchers routinely discover and access Australia’s
research data. In order to achieve the goals of the Project, ANDS will continue to work closely with
providers of research data storage, so that every researcher will be able to manage, store, publish
and share their research data. ANDS will also work closely with other NCRIS and EIF‐funded
capabilities, especially expanding existing relationships with the Atlas of Living Australia, Terrestrial
Ecosystem Research Network, the Integrated Marine Observation System, AuScope, the Australian
Urban Research Infrastructure Network, and the Australian Bio-security Intelligence Network.
ANDS has been operating since January 2009. In that time ANDS has built a consensus on the
importance of research data and research data infrastructure. In 2010-11, ANDS created a number of
national research data services and engaged with a large number of organisations to start the
realisation of the Australian Research Data Commons. In 2011-12, ANDS expanded and deepened
that institutional engagement, as well as added to the range of available services.
In 2012-13, the outcome of ANDS activity will be that Australian researchers have access to
infrastructure that enables them to:
systematically, reliably and authoritatively connect their research data to project, institutional and disciplinary descriptions, and
simultaneously publish citable research data collections through institutional, disciplinary and national services.
This will ensure that Australia has a mature, globally leading capability in research data, making it
a key locus for data intensive research. This capability will be demonstrated by leading researchers
in a variety of disciplines to show the power of this infrastructure to enhance their research.
It will be possible to do this as a result of ANDS and its partners developing a wide-ranging set of
coherent software, policy and process outputs that support the Australian Research Data Commons.
Substantial infrastructure has already been created. By July 2012, ANDS will have in place the
following national services:
a data collections registration service;
a data collection description publication service;
a national gazetteer service;
a data citation service;
a researcher identification service;
a research project identification service; and
an enhanced Research Data Australia – a data collections discovery service.
By July 2012 there will be coherent institutional research data infrastructure in place:
48 tools will have been deployed to automatically capture rich metadata along with the data for a wide range of instruments;
ANDS 2012-13 Business Plan Page 8 of 118
6 institutions will operate metadata stores;
40 institutions will provide collections descriptions feeds to ANDS, both Research Institutions and Public Sector data holders;
35,000 collections will be available for discovery through Research Data Australia; and
7 discipline oriented portals will be cross connected to Research Data Australia.
By July 2012 the following tools, frameworks and capability will be in place to exploit the ARDC:
improved institutional guidance for internal institutional data management;
improved institutional guidance for responding to national instruments such as the Australian Code for the Responsible Conduct of Research will be available;
institution wide research data management planning frameworks at 5 research institutions and 33 institutions with improved research data management;
increased institutional capability for research data management with 150 more staff trained with research data management capability; and
5 new tools that enable more effective re-use of research data.
In 2012-13, ANDS will build on this infrastructure that has been created. By July 2013, ANDS will have
in place additional and enhanced national services:
a “see-also” service that enables other discovery tools to use the ANDS discovery service;
a more tightly integrated data citation service;
a research activity identification service; and
an enhanced Research Data Australia.
By July 2013 there will be further coherent institutional research data infrastructure in place:
70 tools will have been deployed to automatically capture rich metadata along with the data for a wide range of instruments;
22 institutions will operate metadata stores;
60 institutions will provide collections descriptions feeds to ANDS, both Research Institutions and Public Sector data holders;
At least 20 institutions responsible for nationally significant collections will have been supported in making those collections connected, visible and available, often through an RDSI node;
At least 40,000 collections will be available for discovery through Research Data Australia; and
10 discipline-oriented portals will be cross connected to Research Data Australia.
By July 2013 the following tools, frameworks and capability will be in place to exploit the ARDC:
further institutional guidance for internal institutional data management;
institution wide research data management planning frameworks at 20 research institutions and all institutions;
ANDS partners will have improved research data management;
ANDS 2012-13 Business Plan Page 9 of 118
increased institutional capability for research data management with 160 more staff trained with research data management capability;
20 new tools that enable more effective re-use of research data; and
this capability will be demonstrated by 20 high profile researchers showing the value of this infrastructure
We will continue to engage with the National Infrastructure Capabilities in a variety of ways – direct
engagement on collections descriptions, common engagement with Government data providers,
support on research data licensing, publishing research data collections held by the capabilities,
especially by establishing collection descriptions feeds, providing another access point to the
collections and capability portals.
3 Status of Project The Australian National Data Service is one of the components of the Platforms for Collaboration
(PfC) capability. Its status can best be understood in terms of its four sequential stages:
establishment, initial NCRIS funding, additional EIF funding, and time extension.
3.1 Establishment During the course of the PfC facilitation process, a number of workshops were held to determine the
activities that might be included in the investment plan to assist research data management. Details
of these workshops and their outcomes are available at http://www.pfc.org.au/bin/view/Main/Data.
Following the approval of the overall PfC investment plan by NCRIS, an implementation workshop
with wide representation was held to confirm the proposal to establish the Australian National Data
Service (ANDS). This workshop took place on May 29, 2007. It endorsed the ANDS concept and
proposed that a technical working group (the ANDS TWG) should be formed to draft a more detailed
statement on the purpose and goals for ANDS, moving beyond the conceptual definition provided in
the PfC investment plan. This working group met both physically and virtually over the course of
2007, and in October produced Towards the Australian Data Commons: A proposal for an Australian
National Data Service2.
In late 2007, the then Department of Education, Science and Training (DEST) asked Monash as the
lead agency to work with ANU and CSIRO on a project to take the next step and establish the
Australian National Data Service (ANDS). ANDS is part of the Platforms for Collaboration capability
within the National Collaborative Research Infrastructure Strategy (NCRIS). The ANDS establishment
project concluded in December 2008.
3.2 NCRIS funding The ANDS Draft Interim Business Plan was submitted in September 2008, and the Interim Business
Plan was submitted in December 2008. ANDS commenced officially in January 1 2009. By March 1,
2 http://www.pfc.org.au/pub/Main/Data/TowardstheAustralianDataCommons.pdf
ANDS 2012-13 Business Plan Page 10 of 118
2009, ANDS had 16 staff. NeAT Round 1 projects were underway and a second round of NeAT
projects was identified.
3.3 Additional Funding In the May 2009 budget, the Commonwealth government announced a series of initiatives
collectively labelled as Super Science3. The ANDS 2009-10 business plan was submitted in March
2009 (prior to this announcement) and accepted in July 2009 (post this announcement). The
substance and execution of this plan was substantially affected by the ARDC project (announced
under the Super Science program and funded from EIF). Consequently considerable effort was
expended on creating a project plan for the ARDC that was complementary to the NCRIS-funded
activities.
The ANDS Steering Committee decided in mid 2009 to recommend to the Department of Innovation,
Industry, Science and Research (DIISRTE) that ANDS manage the NCRIS-funded and EIF-funded
activities as an integrated project. The Steering Committee also decided to reshape the portfolio of
ANDS programs to better reflect the implications of, and constraints on, the added funding. As a
consequence, the existing separate Frameworks and Capabilities programs were merged, and the
Utilities program was moved from NCRIS-funded to EIF-funded and renamed ARDC Core. Four new
EIF-funded programs were instated: Data Capture, Metadata Stores, Public Sector Data, and
Applications. In the period July 2009-March 2010, ANDS consulted widely on these changed plans,
and after some fine-tuning to respond to consultation feedback commenced executing against them.
One of the requirements of the ARDC project was that ANDS would provide to DIISRTE an initial
specification for the ARDC that would detail $10M of early expenditure to support the creation of the
ARDC. This required commitment of funds by September 30th 2009.This initial commitment was
described in the Preliminary Specifications Report as “early activity” expenditure, but has come to be
known as “fast start”. Consequently ANDS produced a description of proposed (now actual)
engagements based on discussions already underway and relationships that had already been
established. A number of the activities described under various programs in the Business Plan below
were funded as early activity/fast start, with two goals. The first was to start expending the allocated
funds (which at the time had to be expended by the end of 2010-11), thus smoothing somewhat the
expenditure curve. The second was to quickly undertake a range of activities from which ANDS could
learn and thereby fine-tune the process of expending the remainder of the Super Science funding.
As previously described, ANDS has also entered into three other Funding Agreements worth a total
of $462K with the DIISRTE. The first agreement was to progress outcomes arising from the First Joint
Australia-EU Research Infrastructure (RI) Workshop in June 2011. Subsequently, in 2012, two more
agreements were entered into, for the Second Joint RI Workshop, and the DataWeb Forum.
3 http://minister.innovation.gov.au/Carr/Pages/SUPERSCIENCEINITIATIVE.aspx
ANDS 2012-13 Business Plan Page 11 of 118
3.4 Time Extension and Reporting In April 2010, an opportunity arose to ask DIISRTE whether a short extension might be possible for
ANDS, and they advised that there was an opportunity to extend for a further period of two years to
harmonize with other NCRIS and EIF investments. ANDS staff and the Steering Committee managed
to identify shifts of funding and timing across reasonably permeable boundaries that still delivered a
viable ANDS, one able to continue to deliver on behalf of the Australian research community through
to June 2013. This extension of time required a re-allocation of funds between programs, as well as a
change to the funding profile within programs. It also required the funding of the project office over
a much longer period. Consequently an additional $0.5M was provided by DIISRTE under NCRIS
funding to support the operation of ANDS over a longer period. A three-year high-level project plan
was developed, so that ANDS:
honoured all existing commitments;
continued with existing partnerships - this means continuing to actively engage with the research institutions;
retained the capacity to work with data champions; and
maintained an ongoing capability of engaging with the sector.
The project plan shows an uneven level of expenditure (as ANDS had already made substantial
commitments) but does balance the need to engage with the sector over a longer period of time,
and to demonstrate value early. This business plan describes the proposed activity over the final year
of a three-year plan concluding in June 2013, and the chart in the finance section shows the intended
expenditure pattern for the various programs.
Another important variation to the ARDC project contract, agreed in March 2011, replaced quarterly
milestone reports being delivered individually with annual reporting that incorporates reports on
ARDC progress as well as NCRIS progress.
3.5 The Function of ANDS ANDS exists to transform Australia’s research data environment by making Australian research data
collections more valuable though improved management, more richly connecting data, enabling
discovery and supporting the multiple use of this data. The purpose of this activity is to enable richer
research, more accountable research, more efficient use of research data, and improved provision of
data to support policy development
The ANDS project runs from 2009 to 2013. The function of ANDS to help transform Australia’s
research data environment will not be completed by June 2013. As a consequence ANDS will develop
a map of the function of ANDS over 10 years informed by Towards the Australian Data Commons.
This will determine future directions necessary to ensure the effective delivery of the broader 2011
Strategic Roadmap for Australian Research Infrastructure.
ANDS 2012-13 Business Plan Page 12 of 118
3.6 Preparation for ANDS Function beyond June 2013 “Toward the Australian Data Commons”, the blueprint for ANDS, describes a ten year vision for
transforming Australia’s research data environment. ANDS was funded to deliver against that vision
from 2009 to June 2013. Substantial progress has been made, but the vision has yet to be fully
realised. The 2011 Strategic Roadmap for Australian Research Infrastructure described a strong and
central role for research data infrastructure, but this Roadmap has yet to be funded and executed.
The functions of ANDS remain critical to achieving this vision over a period in which there is no
allocated funding to enable this to occur.
There are two critical functions that ANDS should be able to deliver beyond June 2013:
Critical support for Australia’s research organisations and their infrastructure, ensuring that their research data infrastructure is able to continue to produce well managed, connected, discoverable and useable research data collections.
The core research data infrastructure of the Australian Research Data Commons, ensuring that its data collection registration, publication and discovery services and advisory services remain available and of high quality.
It is also important to maintain critical functions where the cost of disruption would be high:
Work on research data policy and advice, as the work currently undertaken on data management planning, licencing and funding policy is crucial to the long term vision.
International engagement, as there is an opportunity now to engage in the large international initiatives such as the DataWeb Forum and DataCite that cannot be paused at the moment without loss of Australia’s position and influence.
Work on institutional collection development and exposure. This activity is critical in its own right, and the advent of RDSI means there will be a strong focus on data collections – these collections have to be well managed, connected, discoverable and useable.
It is thus highly desirable that the functions of ANDS continue beyond June 2013. However it is
unlikely that there will be a significant research infrastructure investment made by government in
the near future, so there is a need for a bridging year, using minimal funding. As a result it is
appropriate for ANDS to be cautious with regard to spending in this final Business Plan until the
nature of the bridging year is determined, and if it is to occur. Consequently the Business Plan
describes activities that expend most of the available funds but preserves $4M for decision in
December 2012.
At that time ANDS will propose to DIISRTE a final 6 months of activity that either transfers as much of
the critical functions of ANDS to other parties; or proposes activities that most effectively enable
ANDS to continue in 2013-14 with modest funding in a way that preserves the critical functions of
ANDS. If ANDS is to conclude, then its task would be to transfer as much of the functions to other
parties in the sector as is possible. This would include the national services, the collections
descriptions, and the advice and services provided through the ANDS website. If ANDS is to continue,
then there would be a need for an extension of the contract between Monash University and
DIISRTE, and an agreement to continue the collaboration agreement between Monash University,
the Australian National University and CSIRO. DIISRTE has agreed in principle to extend this contract.
ANDS 2012-13 Business Plan Page 13 of 118
In addition to this, the work program of ANDS should reflect a smooth transition to a smaller and
more tightly focused ANDS, based on the functions described above.
3.7 ANDS Programs and Relationship between NCRIS and EIF Activities
There are now nine programs that have been established to meet the aims of ANDS and to create the infrastructure needed for the ARDC. The total set of programs therefore comprises:
Frameworks and Capabilities – an ANDS driven program ensuring that institutions have the capability and the research system have the structures in place to enable researchers to manage, publish, share and re‐use research data (NCRIS funded);
Seeding the Commons – an institutionally based program to ensure that well managed data is made available through the ARDC with a focus on data that cannot be automatically captured (NCRIS funded);
National Collections – an ANDS driven program partnering with institutions wishing to make National Collections available, and with RDSI and its nodes to help improve storage and access to those collection (NCRIS funded);
Data Capture – an institutionally based program to automate the capture of data and metadata from instruments in data intensive research (EIF Funded);
Public Sector Data – an outsourced program of making more public data collections visible and available through the ARDC (EIF Funded);
Metadata Stores – an institutionally based program that enables metadata to be stored coherently across an institution that supports data management, publishing, sharing and re‐use (EIF Funded);
ARDC Core Infrastructure – an ANDS driven program that puts in place the national services that enable research data to be published and discovered (EIF Funded); and
ARDC Applications – a program that develops tools and services to support demonstrations of the value of exploiting data in the ARDC (EIF Funded).
International Infrastructure – a program designed to work collaboratively with international organisations and partners to ensure a more compatible international data-sharing environment for Australian researchers.
The following table shows the intended size and focus of the programs over the whole funding
period. It takes into account interest earned as well as project funding.
ANDS 2012-13 Business Plan Page 14 of 118
Programs EIF ($M) NCRIS ($M) % Focus
Frameworks and Capabilities 4.07 5.7% ANDS
Seeding the Commons 12.16 17.1% Institutions
National Collections 0.67 1.0% ANDS
Data Capture 17.20 24.2% Institutions
Public Sector Data 6.06 8.5% Contractors
Metadata Stores 7.32 10.3% Institutions
ARDC Core Infrastructure 7.62 10.7% ANDS
ARDC Applications 9.54 13.4% Contractors
International Infrastructure 0.23 0.3% ANDS
EIF Project Management 0.90 1.3% ANDS
Project Office 5.21 7.5% ANDS
Total Allocated Funds ($70.98M) 48.87 22.11 100.0%
Table 1: Total allocated funds (EIF and NCRIS) by Program
Notes:
The Total Allocated Funds is $71,037,566.45 which consists of EIF & NCRIS Allocated Funds of $70,978,651.12 (EIF $48,869,829.23 and NCRIS $22,108,821.89 respectively) and 1st and 2nd Joint RI Workshop of $58,905.33. Further details are in Section 8.
Total International Infrastructure program planned expenditure for the financial year 2012-13 is $292,130.83, of which $233,393.75 is funded via EIF funds. The remaining $58,905.33 are funded via the two funding agreements for the 1st Joint RI Workshop ($12,500.00) and 2nd Joint RI Workshop ($46,405.33) respectively. There is a third agreement with DIISRTE for the DataWeb Forum for a total of $400,000.00, and these funds are intended to be expended in the 2013-14 and 2014-15 financial years.
Included in the Project Office figures is an amount of $3,999.26 which has been expended for the 1st Joint RI Workshop contract in the financial year 2011-12.
ANDS 2012-13 Business Plan Page 15 of 118
The next diagram shows how the NCRIS programs complement and inter-relate to the creation of the Australian Research Data Commons.
3.8 ANDS Principles In responding to the new objectives and program requirements, ANDS continues to follow these
principles:
Commons Framework: ANDS has started in a way that anticipates the need to scale up and adapt
over time via an extensible framework of data stores, federations and services that enable better
data creation, capture, management and sharing.
Focus: ANDS will continue to identify and work with those who are ready, willing and able to
contribute significantly to the ARDC vision, and who provide the most strategic return to the ARDC
for the effort expended. However, ANDS will endeavour to directly support all of the larger research
institutions directly, in order to rapidly achieve critical mass.
Content: ANDS is initially focussing on content recruitment into stores and federation across stores
so as to achieve a wide coverage of data quickly at an agreed level of quality; in later years the
emphasis will shift towards quality improvement.
Service Provision: ANDS is focussed on service provision and infrastructure development, not
research and exploration; its programs will develop, integrate, and continually improve production-
level systems in support of well-understood services.
Data Capture
Metadata Stores
ARDC Applications
Public Sector Data
ARDC Core Infrastructure
Frameworks and Capabilities International Infrastructure
National Collections
Data Stores (RDSI &
Institutional)
Seeding the Commons
Figure 2: Relationship between Programs
ANDS 2012-13 Business Plan Page 16 of 118
Strategic Partners: ANDS recognises the need to be open to, and engage appropriately with,
innovations and external institutions relevant to the ARDC, including the Australian Access
Federation (AAF), the National Computational Facility (NCI), the National eResearch Collaboration
Tools and Resources (NeCTAR) and the Research Data Stores Initiative (RDSI).
Stores: ANDS assumes an environment where storage and long-term curation occur in nationally or
institutionally-supported stores, either existing or brought into being over the life of ANDS. These
stores will preferably hold objects described by various discipline-specific and documented metadata
schemas. ANDS will work with whatever repositories exist, national, institutional or disciplinary.
Sustainability: Research data management requires a long-term commitment. ANDS has developed
its plans on the assumption that this does not represent a one-off investment in data. The enduring
changes forecast in this document within each program are also intended to be sustainable beyond
the end of the ANDS planning period.
3.9 Scope Constituency: ANDS works with a variety of publicly funded institutions that produce, manage or
consume research inputs and outputs to achieve its aims. The scope includes:
all Higher Education Providers in Australia;
all research organisations that are publicly funded, including CSIRO, GeoScience Australia (GA), Bureau of Meteorology, Australian Bureau of Statistics (ABS), Australian Institute of Marine Science (AIMS), the Australian Antarctic Division, Departments of Primary Industry; and
members of the cultural collections sector (galleries, libraries, archives and museums).
As a component of Platforms for Collaboration (PfC), ANDS is funded to work with all research
disciplines in Australia, not just the NCRIS capabilities. This means that the specific concerns of the
Humanities and Social Sciences are also taken into account.
ANDS Community: The ANDS Community consists of providers of research data and ANDS services,
consumers of research data and those services, and managers of research inputs/outputs. This
includes key stakeholder aggregations such as CAUDIT (The Council of Australian University Directors
of Information Technology) and CAUL (The Committee of Australian University Librarians). The ANDS
Community includes the general public only to the extent that they will be able to use some ANDS
services to discover and access publicly available data.
Data: ANDS is concerned with the digital data that is produced by researchers as well as data that is
used by and made accessible to them. Data is the information that researchers study, that is
transformed by researchers and produced by researchers. Research publications are not included
within the scope of ANDS but files, images, tables, databases, models, computer outputs, and similar
digital representations are included. ANDS will support the ability to create links between data,
publications, software code and visualisations, where these may appear as either research inputs or
research outputs.
ANDS 2012-13 Business Plan Page 17 of 118
3.10 ANDS’ Impact
3.10.1 Research Data Collecting Institutions Overview
Collections Approach
As the concept of data publication and sharing gains traction, and the volume of data available in the
Australian Research Data Commons increases, the value of taking a collections approach is becoming
apparent. This approach has long been used in libraries to introduce focus and applicability into the
presentation of material. Collectors can track content across time series, locations or related subject.
In ANDS this approach will allow the highlighting of data collections with an increasing focus on those
that are of national significance. It will work with collecting institutions to bring this material
together, describe it in a way that enables flexibility in presentation, and highlight it for discovery.
Enabling this approach to the management of data provides the following benefits:
enables increased discoverability and browsability of data collections;
provides a more complete picture of data on any given topic;
shines a spotlight on important and significant data; and
enables new and more complex research questions to be addressed.
By working with collecting institutions, ANDS can leverage their understanding of the data and
research environment in which it is created and used to develop the data descriptions that enable it
to be presented in meaningful contexts.
Collection Types
There are a number of ways in which this collecting may occur.
1. Single Significant National Collections
These collections are collected and maintained by a single institution or unit. They can be
definitive, comprise specimens or observations and may be seminal in nature. Their function may
be likened to reference material in libraries and be of broad uptake outside a specific discipline.
However there may also be collections in this category that are largely confined to a specific
discipline, but may have unexpected uses. Curation of these collection lies with a single
institution.
2. Collections brought together for a specific collaboration
These are often gathered from a range of sources and they could also cross a range of disciplines.
They are brought together to address a specific research activity and may be transitory in nature
depending on the size and longevity of the research activity. The question of the continuing
maintenance of the collection once the collaboration is complete will arise. Datasets may be
repurposed. Curation of these data collections will be as varied as the number of collaborators
unless data curation is identified as a specific activity within the collaboration. This may then
become an issue when the collaboration is complete and funding for curation ceases.
3. Distributed Collections
ANDS 2012-13 Business Plan Page 18 of 118
One of the values of a national metadata store is that it provides the ability to pull together ‘like’
data. There are two ways this can come about. One is a coordinated approach to collecting from
institutions (collecting with intent) in which relevant institutions describe and publish collections
with a view to their use. Examples would include the collections that are in discipline portals.
Another is to do this on the fly within Research Data Australia using the metadata. This method
allows a ‘slice’ to be taken that crosses disciplines and formats and profiles a collection based
around a topic or location. Thus a collection could be formed of the data within RDA on water
that includes observations, models, photographs, historical anecdotes, commentary on water
use, socio-economic impact statements etc. It has the potential to provide wider and deeper
insight into a given issue. It also delivers value in the serendipitous exploration of data for
innovative research and strongly supports the addressing of the ‘big’ questions: complex
research issues. Additionally it presents data in a way that supports e-research methodologies. A
characteristic of this type of collection is its fluidity. A dataset can belong to more than one
collection and be presented in many contexts.
4. Institutional profiles
While not necessarily a national collection, a collections approach has the value of being able to
showcase the data produced by an institution, either as research output or as support for
Australia’s research effort. The value of Australia being able to showcase the data outputs of its
institutions could contribute to attracting research and funding internationally.
The ANDS National Collections program proposes a series of activities to identify such collections
and work with institutions to describe and publish them as collections of national significance.
The program recognises the specific subject and content expertise contained in the institutions
as well as the coordination capability. It also proposes working with the current ANDS services to
allow the profiling of data collections within the Australian Research Data Commons.
5. Locally significant collections
Often collections arise from a single research project or from a single strand of research. Treating
the data as a collection enables effective management, connection, discovery and reuse. This
generally needs institutional support and has been the basis of engagement with institutions
through many of the Seeding the Commons projects and Data Capture projects.
How collections are used is also a factor in defining collections types and the NSF (National Science
Foundation) uses the following typology for data collections:
Reference data collections are intended to serve large segments of the research community.
Community data collections serve a single discipline community.
Research data collections are the products of one or more focused research projects and typically contain data that are subject to limited processing or curation.
National Collections would typically fall into the first two categories.
Collections Engagement
Through ANDS Seeding the Commons, Data Capture, Public Sector Data, and our new National
Collections program we have engaged with institutions to support their data collections and to
ANDS 2012-13 Business Plan Page 19 of 118
support the role of data collecting. Our focus in the National Collections program will be on those
collections of national significance, particularly those collections that may be made available through
an RDSI node. In all cases the engagement has been structured to enhance the value of those
collections through improved management, connectedness, discoverability, and reuse.
3.10.2 Research Institutional Engagement Overview - Coherence
As many institutions are now reaching significant milestones in their data management projects two
particular aspects of ANDS’ support for institutional research data management are emerging:
1. the continuing relevance of ANDS’ engagement with the institution as a whole; and
2. the importance of institutions undertaking an enterprise approach in a range of areas in order
to deliver the ANDS “four transformations” effectively.
At the same time, ANDS is ramping up the provision of services to allow for consistent ways of
publishing, discovering, accessing and using data.
In order to bring together these services and the institutional approach ANDS is encouraging a
coherent approach to research data infrastructure; without this, Australian research will be unable to
reach its full potential.
ANDS plans to engage with its research institutional partners beyond individual projects through a
coordinated approach that will assist partners to achieve this coherence and realise the following
benefits:
develop knowledge of the full extent of the institution’s data holdings and the subsequent ability to present it as their body of work;
secure valuable data assets for validation and reuse;
improve the institution's ability to collaborate through a consistent and supported data management infrastructure;
position the institution to be fully acknowledged in the re-use of its data by others and the consequent reputational benefits; and
improve the institution's ability to demonstrate research excellence.
The development of coherent institutional research data management across the sector ultimately
supports the building of a sound ARDC infrastructure.
This approach is a way of delivering a combination of services, software, policy and resourcing that
will enable a research producing institution to develop and provide the necessary infrastructure so
that its researchers can effectively manage their data to be published, discovered, accessed and
used.
The aim of the approach should be to present a unified, consistent picture of the infrastructure,
policy and procedures that ANDS thinks an institution needs to have in place for effective research
data management, publishing and research support. The approach should actively support the four
transformations as seamlessly and effectively as possible.
ANDS 2012-13 Business Plan Page 20 of 118
An institution with a coherent approach is one whose data infrastructure can be accessed and used
across the organisation, as well as being consistent with infrastructure approaches across the sector.
ANDS will continue to work to put in place services that enable and support this form of coherence
across the sector. Solutions should not limit the ability of an institution to interact with the broader
environment in the sector. The aim of the approach should be to enable every data collection that
can be made available to be made available in a timely fashion.
The way the approach is implemented, and the focus areas that the institution chooses to address
should be in line with the larger data aspirations of that institution. That is, they should enable the
institution to participate in collections of particular interest or value to them.
Note: ANDS defines data management as all activities associated with data other than the direct use
of the data. This may include:
data organisation;
backups;
archiving data for long-term preservation;
data sharing or publishing;
ensuring security of confidential data; and
data synchronisation.
The approach will be a substantial underpinning of ANDS’ activities to the end of the current
planning period, and will be designed to continue beyond this. Activities across ANDS should always
consider how they fit into the coherent model we are trying to develop, support and promulgate.
ANDS’ role will be in encouraging, supporting and/or funding institutions to have the following
coherent capabilities in place:
1. An institution wide data management policy that includes relevant planning options for
researchers where appropriate.
2. IT infrastructure for the institution that enables this policy to be implemented. At the most
basic level, if there is nowhere to store data, and/or no easy way to collect it into that storage,
then there is no point in having a management policy for it. Equally this infrastructure should
be capable of connection to IT tools that will allow for sharing, re-use and reconfiguring. These
tools may have been developed locally, through ANDS funded programs such as Data Capture
or Applications, through other DIISRTE funded projects such as NeCTAR, or outside Australia.
3. Co-ordination of the policy and IT infrastructure to encourage and facilitate the creation and
storage of data in a fashion that will enable sharing and re-use at an appropriate future time.
4. Storage and collection mechanisms that support the capture and creation of metadata about
the data, so that the data can be shared and re-used. Where possible this should be drawn
from existing sources (to avoid re-keying) and be as widely shared as possible. From an ANDS
perspective, these metadata should be made available in the Commons.
5. Dedicated services in place to support all of the above that draw from the relevant areas of
expertise across the institution.
ANDS 2012-13 Business Plan Page 21 of 118
Institutions will implement research data support in a wide variety of ways that will best suit their
needs. By seeking common outcomes from similar initiatives at institutions across the sector in
Australia, ANDS is seeking to enable the Australian Research Data Commons to be a nationally
strategic resource.
ANDS will work with institutions to attempt to ensure that all of the above capabilities are at least
partially present, although institutions may wish to go beyond basic research data support depending
on their existing infrastructure and research data ambitions.
ANDS will enter into engagements to promote coherent institutional research data infrastructure.
ANDS needs to be able to offer to institutions that wish to take this up a selection from the following
offerings, drawn from across the ANDS programs as appropriate:
advice on the scope and implications of this more coherent approach, as well as a rationale for its adoption;
services and tools that support aspects of coherence, such as guides, templates and software that we endorse, either from our own projects or elsewhere;
advice and examples for the less tangible aspects of coherence, especially policies, training and procedures that have worked elsewhere;
facilitated connections to aspects of coherence not directly under the control of ANDS, notably NeCTAR and RDSI;
solution patterns in particular domains that have been shown to be effective elsewhere;
services that support this, including those developed by others (such as the ability to connect to the Australian Research Council (ARC)/ National Health and Medical Research Council (NHMRC) grant information system), as well as the full range of services developed by us; and
assistance with deployment of software that we believe to be important, but which may be a lower priority of the institution (e.g. metadata stores).
Greater uptake of a coherent approach to research data infrastructure will also enable ANDS to point
to a change in the way institutions manage data, as well as delivering an increased amount of data
available in the Commons.
There are a number of outcomes we would expect to come out of these engagements:
a number of research institutions will have built on their initial ANDS funding to be able to provide a coherent infrastructure for their researchers to manage data;
they will have a better understanding of their research data holdings, and their areas of strength as a result;
more data will be transformed in a fashion consistent with ANDS’ recommendations;
more data will be shared as a result; and
more data will be cited.
3.10.3 National Research Data Infrastructure Partner Overview
ANDS is part of the Government’s investment in research data infrastructure in Australia. This
investment can be seen very broadly as comprising all of the investments in research data generating
ANDS 2012-13 Business Plan Page 22 of 118
activity – indeed many research grants have a substantial data acquisition component. However
ANDS has a particular responsibility and opportunity to engage in partnership with other major
national data infrastructure investments. ANDS has to date particularly engaged with:
some of the major data generating instrument investments including the Australian Synchrotron, the Australian Nuclear Science and Technology Organisation’s neutron beam instruments, the Australian Telescope National Facility, as well as investments in a larger number of smaller data generating instruments through investments like BioPlatforms Australia, the Australian Microscopy and Microanalysis Research Facility, and the National Imaging Facility;
the major problem or discipline focused data investments, including the Terrestrial Eco-Research Network (TERN), the Integrated Marine Observation System (IMOS), the Atlas of Living Australia, AuScope, the Australian Urban Research Information Network (AURIN), the Australian Data Archive, the Australian Biosecurity Information Network, and the Population Health Research Network (PHRN); and
the major eResearch enabling data investments including NeCTAR, with investments in data tools and data collaboration, RDSI for data storage, and the high end computation facilities to enable data analysis and generation.
ANDS’ engagements to date and planned engagements have occurred across all of the ANDS
programs. ANDS continues to engage with a number of capabilities on data licencing issues through
our work with AusGOAL and an ANDS sponsored working group. A number of institutionally based
projects in our Seeding the Commons and Data Capture projects have supported the richer capture
and management of research data from NCRIS or Super Science funded instruments, for example flux
tower data entered using a Monash University metadata access tool into a Metadata Store as part of
the TERN OzFlux program. Collections from many of the facilities are discoverable though Research
Data Australia and collections can be persistently identified. TERN has just launched its portal based
on theANDS registry tool to expose its own collections.. Public Sector Data is funding work to make
Geological Data more available to exploit using the AuScope portal and more discoverable using
Research Data Australia. ANDS funded the automation of real-time publication of data from Research
Vessel Southern Surveyor and Research Vessel Aurora Australis available through the IMOS portal
and with collections descriptions discoverable through Research Data Australia.
ANDS has partnered with a number of the National Data Investments to fund demonstrations of the
value of new approaches to research data through its Applications program. ANDS is partnering with
the Terrestrial Eco-Research Network, the Atlas of Living Australia, BioPlatforms Australia, the
Integrated Marine Observation System, the National Imaging Facility, the Population Health Research
Network and the Australian Urban Infrastructure Research Information Network to together
demonstrate these advantages, in particular demonstrating the value of integrating data from
different disciplines. Examples include the bringing together of satellite, genetic and field
observations of Australian soil, or the ability to visualise and combine biological data though many
dimensions from whole-of-organism through microscope level to genetic level. A particularly
powerful example is the demonstration of the value of examining very wide scale effects of climate
change, response and adaptation though built environment, natural environment and health though
sharing climate data downscaled to local conditions using National Computational Infrastructure
computation, though a portal at Griffith University, utilising the network of researchers around
ANDS 2012-13 Business Plan Page 23 of 118
Australia organised through the National Climate Change Adaptation Research Facility and with the
intent to store the data on an RDSI funded node.
Finally ANDS is focusing on nationally significant research collections though partnering with a range
of collection institutions with a view to enabling Australian researchers to get better access to this
data, often stored and made available through RDSI funding.
ANDS thus has engaged with a very wide range of activities in collaboration with other research data
capabilities, but also participates in efforts to ensure that ANDS strong institutional focus
complements other more specific investments.
The net effect of all of this activity is that the many data investments that Australia makes are
enhanced by having data collectors, research institutions, and research infrastructure providers
combine to give Australian researchers a significant research data advantage.
4 Research Infrastructure For researchers to work in the world of data-intensive research, they will need:
policies that support a new way of working;
a technical data fabric that enables storing and moving data;
a metadata infrastructure to manage rich information about their data;
a referencing mechanism that enables input data, modelling outputs (such as visualisations), software code and documents to be cross referenced;
the ability to search across all the collections that have been registered; and
training and training materials that enable the infrastructure to be used well.
To deliver that infrastructure a combination of national services and coherent institutional services is needed. ANDS is creating many of those services, and helping to seed institutional services that are optimised to be part of the Australian Research Data Commons where re-use, sharing and commonality of approach is possible because of the coherence achievable with national investments. These services can also be integrated with other national services – whether they be data storage services, high intensity data analysis, or discipline or problem specific data services.
ANDS has instituted nine programs to deliver on its NCRIS and EIF commitments:
Frameworks and Capabilities;
Seeding the Commons;
National Collections;
Data Capture;
Metadata Stores;
Public Sector Data;
ARDC Core Infrastructure;
ARDC Applications; and
International Infrastructure.
ANDS 2012-13 Business Plan Page 24 of 118
These activities will be delivered in a co-ordinated manner that will ensure that ANDS’ partners
engage with ANDS as a whole, not with the individual programs. This has the consequence that ANDS
needs to have a greater emphasis on customer relationship management so that partners do not
have to navigate their way through different parts of ANDS.
4.1 Frameworks and Capabilities (NCRIS) The Frameworks and Capabilities program addresses two of the systemic obstacles to the emergence
of the ARDC: policy irregularity and human capability constraints.
The common approach to addressing both of these generic issues is to partner with collaborators
around specific solutions. The Frameworks and Capabilities program produces materials that address
some of the fundamental shared issues in data intensive research. The key collaborators for the
Capabilities activities within the program are eResearch support groups and the key collaborators for
the Frameworks activities are research leaders and funding agencies. A central concern of the
program as a whole is the desire for research organisations and research groups to have effective
policies around the full lifecycle of data management.
4.1.1 Program Aims
The Frameworks activities within the program aim to support new approaches to data-intensive
research by strengthening the overall policy context for, and facilitating the emergence of, the ARDC.
The Capabilities activities aim to improve the level of capability for data management, data-intensive
research and associated technologies across Australia by partnering with willing institutions and
NCRIS facilities to improve core data [or data management] competencies. This program aims to
directly support the ANDS IT services provided through ARDC Core as well as the institutional
infrastructure projects managed through the Access to Public Sector Data, Data Capture, and Seeding
the Commons programs.
4.1.2 Program Overview
The Frameworks activities are focused primarily on the research community and the institutions in
which they work (as well as the collaborators described below), which include universities, museums,
libraries, galleries and government agencies. The activities will work towards harmonising and
streamlining the overall policy framework within which a research data commons can operate. The
result will be a shared vision of the opportunities, benefits and responsibilities of a data commons. It
is important to acknowledge that the Frameworks activities work through facilitating the goal of an
effective research data commons rather than prescribing the specific policies required to achieve that
end.
The Frameworks activities will bridge between researchers, research institutions, research funders,
and data creators and curators. In addition, the program will engage with current and emerging
initiatives such as the Government 2.0 Taskforce report (in collaboration with the Public Sector Data
program) and the National Committee for Data for Science. ANDS primary collaborators include:
ANDS 2012-13 Business Plan Page 25 of 118
institutional data holders (CSIRO, NCRIS Capabilities, National Library of Australia, National Archives of Australia, Departments of Primary Industry, GeoScience Australia, Australian Bureau of Statistics, etc.);
national initiatives such as the National Committee for Data for Science;
cross-governmental groups such as Australian Government Information Management Office (AGIMO), Office of Spatial Policy (OSP) and the Australian Spatial Consortium;
research funding departments such as DIISRTE and the Department of Employment, Education and Workplace Resources (DEEWR);
research funding schemes such as the Australian Research Commission (ARC), National Health and Medical Research Council (NHMRC), Research Infrastructure Block Grants (RIBG);
the Office of the Australian Information Commissioner;
sector peak bodies such as The Committee of Australian University Librarians (CAUL), The Council of Australian University Directors of Information Technology (CAUDIT) and Universities Australia;
discipline leaders within institutions; and
research office staff at institutions.
As cohesive networks of research data are increasingly regarded as an important and enduring part
of the collaborative research infrastructure, the Capabilities activities will focus in particular on
building the capability of researchers and support staff to contribute to and better exploit national
data infrastructure. The various activities will work with the sector to identify and document the
fundamentals of working with research data and the specifics of discipline-based data-intensive
research. They will also work with research communities and local e-Research support services to
improve particular data-related competencies, as well as enhancing and adding national focus to
institutionally based support, materials development, and training initiatives.
The Capabilities activities will lead to services such as consultancy, informal knowledge transfer,
workshops, documentation, and training materials both directly and by re-enforcing local services.
Staff from this program will work within an integrated engagement activity with staff from all other
ANDS programs and also within the community of researchers and e-Research support services.
These groups themselves are engaged in capability building within their own institutions as well as
with their own staff.
4.1.3 Infrastructure Created to Date
In order to improve the policy environment and the capability to enhance and exploit the research
data commons, a range of infrastructure initiatives has already been undertaken:
ANDS 2012-13 Business Plan Page 26 of 118
Infrastructure
Exploitation Support Area
Existing Support Infrastructure
Data Commons Re-Use
Policy
ARC, NHMRC, DIISRTE preliminary agreements
Membership of sector and government policy committees (the
Cross Jurisdictional Chief Information Officers' Committee -
CJCIOC, National Committe for Data in Science - NCDS,
Commonwealth Spatial Data Management Group - SDMG)
Licensing Frameworks –
implementation
Commonwealth AusGOAL practitioners working group
Membership of AusGOAL oversight committee
Establishment of licensing working groups for university and
research sectors
Creation of licensing area on ANDS website
ANDS guides
Collection Description
Publication
Content Providers’ Guide
ANDS Guides
Training, events
Data citation ANDS Guides
International Consortium website
Workshop materials
Institutional research
data management
infrastructure
development
ANDS Guides
Training, events
Data management plans ANDS Guides
Training, events
ANDS Infrastructure
Usage Support
ANDS Guides
Training, events
4.1.4 Agreed 2012-13 Activities
The 2012-13 activities will build on the work done to date, continuing the emphasis on data
management capability, data citation, and licensing. For the Frameworks agenda a new focus
emerges for 2012-13: alliance with sector peak bodies on data sharing policy. In the “Building
Capabilities” agenda there is a renewed emphasis on supporting people and organisations working
on ANDS funded projects to build the research data commons.
ANDS 2012-13 Business Plan Page 27 of 118
Infrastructure
Exploitation Support Area
Activities
Research Data Commons
Re-Use Policy
Building alliances with sector peak bodies (DVCs Research, the
Committee of Australian University Librarians (CAUL), the
Council of Australian University Directors of Information
Technology (CAUDIT), Universities Australia)
Collaboration and policy advocacy with research funders
(DIISRTE, ARC, NHMRC)
Membership of sector and government policy committees (the
Cross Jurisdictional Chief Information Officers' Committee -
CJCIOC, National Committee for Data in Science - NCDS,
Commonwealth Spatial Data Management Group - SDMG)
Preparation for possible social-benefit analysis and application
of its results in advocacy, materials and workshops
Public forum
Uptake by the Office of the Australian Information
Commissioner (OAIC)
Licensing Frameworks –
implementation
Materials development, events, website development
University and NCRIS facility licensing working groups
Webinars and other events
Ethics and data re-use
policy
Review of practice at Australian research organizations
Liaison with the NHMRC and the community using the National
Ethics Application Form
Public forum
Materials development
Third party report
Data citation DataCite collaboration and contribution
Proof of concept projects with mature archives and disciplines
Documentation and events to support ANDS DOI services,
events
Data Re-Use Information Materials development to document and support metadata
stores initiatives
ANDS 2012-13 Business Plan Page 28 of 118
Infrastructure
Exploitation Support Area
Activities
Institutional research
data management
infrastructure
development
Holistic approach to research data infrastructure for research
organisations (metadata stores, policy, plans, frameworks, IT
infrastructure)
Data management plans Pilot project with University partners
International liaison (DMPOnline, DMPTool), in collaboration
with the International Infrastructure program
Harmonising with Australian policy and funding processes
ANDS Infrastructure
Usage Support
Support for Australian researchers and research organisations
building ARDC infrastructure or using ANDS National Services
(collection description, dataset identification, party
identification, research project identification, location
identification)
4.1.5 Process for Other 2012-13 Activities
There are no significant activities yet to be determined for 2012-13. Some 2012-13 activities will
arise from ad hoc opportunities to collaborate with institutional, national, and international partners
over the course of the year (e.g., universities, NCRIS facilities, state-based eResearch organisations
and international players such as DCC). While largely resourced from existing program allocations,
some of these joint projects will involve new resources for collaborative activities such as joint
materials production, workshops, pilot projects and tools specification. These will not be allocated
through open call, but rather through a "ready, willing, and able" filter with the particular
stakeholders mentioned above. These are tentatively scheduled for Semester 1, 2013.
4.1.6 Highlights
ANDS has made significant progress in supporting research organizations and public sector agencies
building the Australian Research Data Commons.
Highlights include ANDS support forums, guides, materials, and events in a number of key areas:
Licensing Frameworks -implementation
Collection Description Publication
Data citation
Ethics
Institutional research data management infrastructure development
Data management plans
ANDS 2012-13 Business Plan Page 29 of 118
ANDS Infrastructure Usage Support
AND is also actively engaged with the ARC, NHMRC and DIISRTE and contributes to a number of
public sector data policy and implementation groups such as the Spatial Data Management Group,
the Australian Governments Open Access and Licensing Framework steering committee, and the
Office of the Australian Information Commissioner’s working group on valuing PSI information.
4.1.7 Challenges
The key challenge for the Frameworks activities is how to work effectively with and through
collaborators. Clearly the success of the ARDC depends on favourable adjustments to the policy
settings throughout the data and innovation sector. In some cases new paradigms will emerge. In all
cases these policy decisions are taken by research organisations or government bodies and not by
ANDS. The ANDS role in this area is that of advocate and catalyst. The challenge for ANDS is to be an
effective voice for research data.
A related challenge for the Capabilities activities is to create and empower a community of data
management practice. Research organisations and research service organisations are the right
entities to address questions of data capability at the individual, collective, and institutional levels.
ANDS will set an agenda for capability building, provide resources to support institutions’ aspirations,
and create the right environment for sharing amongst a community of practice. ANDS prefers not to
act independently, so a major challenge for this program is to act effectively through that
community.
Effective coordination and embedding of this program of work within the activities of the ANDS institutional infrastructure projects is a high priority, as is strategic collaboration with NCRIS facilities.
4.1.8 End of 2012-13 Outcomes
By the end of 2012-13 ANDS will have produced the following outputs:
contributed to institutional research data infrastructure by providing materials, events and support on a number areas of strategic importance:
ethics and data sharing;
enhanced institutional research data infrastructure;
data management plans for research projects;
journal policies on data submission, citation, ownership;
spatially-enhanced data infrastructure;
freedom of information;
data citation practice and policy;
data reuse practice and policy; and
data licensing practice and policy;
established data citation practice and procedures with targeted community partners;
progressed discussions with peak sector bodies and research funding agencies on the principles and guidelines for the dissemination, management, and re-use of research data;
ANDS 2012-13 Business Plan Page 30 of 118
held licensing support group meetings and events promoting harmonised licensing across the research and government sector; and
published a social benefit analysis for research organisations and data collecting agencies and promoted its conclusions in the sector.
4.1.9 End of 2012-13 Enduring Changes
By the end of the 2012-13 funding year, ANDS will have contributed toward, or brought about, the
following enduring changes:
A national resource of managed, connected, findable and re-usable research data now exists that previously did not.
Data providers are increasingly seeing the value in providing their data through richer shared environments.
Data are now visible to researchers that were never visible before.
Researchers can more easily find and gain access to public sector data.
Australian researchers can publish and cite their research data outputs.
Larger questions and grander challenges can be addressed by a data commons that spans facilities, institutions, disciplines and sectors.
New types of research are now possible that previously were not.
Richer data environments can facilitate evidence-based policy.
Data can be captured, managed, and shared more easily at most research organisations in Australia.
Research organisations and public sector agencies now provide policy support and invest more in data management.
Australian research institutions can more easily comply with the requirements of the Code for the Responsible Conduct of Research.
There is an established community of data management professionals.
Australian researchers and research organisations have better capability to operate and exploit the new research data infrastructure.
Australia has improved ability to measure the quality and impact of all of its research outputs.
Research Data Australia reduces duplication of effort and cost of unnecessary data acquisition by facilitating re-use.
The enduring changes will be supported by sound and consistent policies.
A community of practice with a sense of common national purpose has emerged amongst key stakeholders of the ARDC.
Fundamental issues of data management and usage are clearly recognised and addressed in policy and practice.
ANDS 2012-13 Business Plan Page 31 of 118
4.2 Seeding the Commons (NCRIS)
4.2.1 Program Aims
The aims of this program are to improve the fabric for data management in a way that will increase
the amount of content in the data commons; and to improve the state of data capture and
management across the research sector with a focus on the tertiary education sector, CSIRO and the
NCRIS Capabilities.
4.2.2 Program Overview
ANDS has a significant existing set of commitments for this program. Its major activities are:
40 research institutions developing local capability for managing research data and making collections visible in the ARDC through funded projects;
regional local support through funded positions; and
national advice and direct support.
These activities are focused on growing the commons, supporting partners, and providing national
advice. This advice will complement the written guides and training provided in the Frameworks and
Capability program. Specific contributions to growing the data commons will provide:
advice on metadata standards and requirements to ensure that metadata is prepared in a manner consistent with ANDS’ needs;
advice on ANDS service requirements to ensure that metadata is available in the ARDC;
advice and support on requirements to enable the creation and sustainability of research data infrastructure;
funding of staff at partner projects and institutions to provide support for ANDS’ goals; and
Identification of as much content as possible and making it discoverable.
An analysis of research intensity for the major Australian research-producing institutions was
undertaken in late 2009 based on the most recent publicly available data, and $4.55M of Seeding the
Commons funds were allocated in bands of $250K, $125K, or $75K. Institutions were sent an
invitation to take part in an Expression of Interest process in late 2009.
This offer was accepted by all of the institutions with the exception of Charles Sturt University. In
addition, the Australian Nuclear Science and Technology organisation (ANSTO) was allocated $125K
in funding to undertake similar work.
In late 2011 this offer was expanded to include 6 institutions with lower research intensity. Charles
Sturt was also offered another opportunity to take up the offer, which they did not take up. These
new projects have been influenced by successful projects already undertaken by ANDS, and are
intended to be complete within the current planning period.
Activities to work more closely with partners will provide:
ANDS 2012-13 Business Plan Page 32 of 118
review and assessment of partner projects to ensure they are completed on time and as specified;
advice to these projects and other ANDS programs;
targeted engagements around a coherent approach to research data management, with a focus on the specific needs and desires of the partners; where possible these will be integrated with the Metadata Stores projects;
advice and assistance on data management and related policy and procedures;
advice and assistance in the use and deployment of ANDS produced or funded services, applications and material;
identification of and partnering with exemplar institutions to maximise data management; and
analysis, reuse and redeployment assistance of the outputs of the projects funded by ANDS or drawn from the ANDS product catalogue.
ANDS will create a community of data managers through:
continuing to support the state based or related groups of data managers established to date, with effective communication channels between them;
training provided to Community members as required (in conjunction with Frameworks and Capabilities Program);
capturing information about successes for dissemination within the community and beyond; and
developing relationships with equivalent activities overseas to share approaches to data management systems that can inform ANDS.
4.2.3 Infrastructure Created To Date
By July 2012, ANDS will have funded the development of the following infrastructure components.
Over the past two years 20 projects funded under the Seeding the Commons program were
completed. All of these delivered collection records to the ARDC, as well as promoting the growth of
data management policy within the institution. In addition to these projects, ANDS-funded work at
has produced advice and guidance material on data management policy and practice, which has
been made available to the larger data manager community, through ANDS communication tools.
ANDS has been working to grow the community of data managers in Australia through the provision
of well-used communication tools (message board, email list and website, including a directory of
ANDS funded projects). Additionally training programs have been held to promote the understanding
of ANDS’ expectations around data management. This work has been done in conjunction with the
Frameworks and Capabilities program. ANDS has also been actively seeking to broaden the
discussion through a series of state based ANDS Community Events – this includes work on the
development of a software developers forum, and conversations with NECTAR on collaborating on
their roadshow program. Informal state based sessions have also been held to help grow the
community and promote information sharing.
ANDS 2012-13 Business Plan Page 33 of 118
Institution Project Name Project Description
The Australian
Nuclear
Science and
Technology
Organisation
(ANSTO)
Scientific
Information
Architecture
ANSTO used the project to kickstart the process of sharing
ANSTO's publicly funded research data. The Project resulted
in over 100 datasets being made available to the public
along with development of technology tools, workflows,
and processes to ensure the ongoing publication of
research data at ANSTO. ANSTO modified its internal
Metadata Repository/Aggregator to suit the storage and
publication of research data and metadata and released the
software as open source via Google Code. ANSTO created a
number of educational materials to induct researchers into
the project and to be re-used to continue researcher
involvement in the project in the longer term.
Central
Queensland
University
Centre for
Environmental
Management Core
Data Curation
project
This project seeks to influence researcher attitude and
institutional policy at Central Queensland University
through an exemplar activity involving curation of data
collected by the Centre for Environmental Management. To
accomplish this task the development of policies and
practical protocols around the management of research
data will be required.
The project will be based around longitudinal
environmental and social data collected by the Centre for
Environmental Management (CEM). This core curation
activity will be used to drive policy development and a
framework of practical procedures associated with data
curation at Central Queensland University.
Thus it is intended that the project will act as a pilot and an
exemplar activity towards the aim of improved research
data curation at Central Queensland University.
CSIRO Seeding the
Commons:
Enabling CSIRO’s
Biological
Collections for the
ARDC
The aim of this project was to identify biological collections
managed by CSIRO that are potentially of significance to the
broader research community and to enhance the visibility
of those collections. From an audit of collections managed
by CSIRO, approximately 40 biological collections were
identified. Contact was made with the relevant Collection
Managers from a range of CSIRO Business Units include
Ecosystem Sciences (CES), Land & Water (CLW), Materials
Science & Engineering (CMSE), Plant Industry (CPI),
Livestock Industries (CLI), Food & Nutritional Science (CFNS)
ANDS 2012-13 Business Plan Page 34 of 118
Institution Project Name Project Description
and Marine & Atmospheric Research (CMAR).
Edith Cowan
University
Data Management
Plan and Policy
This project has the following key aims:
to develop university wide data management policies and procedures;
to develop and deliver internal training programs and resources for research data management;
identify at least 12 strategically important research data collections at Edith Cowan University associated with publicly funded research and published material (2003 – 2010); and
ensure that RIF-CS metadata about these collections are made available to ARDC.
Flinders
University
Reformatting the
AusStage dataset
to support access
and re-use by
researchers AND
Reforming the
Movies: the Motion
Picture Producers
and Distributors of
America, Inc.
database
Databases evolve over time, both in straight growth and in
the description of their contents. A long-lived database,
such as AusStage, is expected to undergo change as more
data is added, more possible uses are identified and the
user and developer group gain experience.
This work has made the AusStage database available to all
potential researchers, without requiring expertise in
database manipulation and extraction. This increases the
audience of AusStage dramatically, providing a reusable
and human readable form of the AusStage data corpus.
The ANDS SC22B MPPDA (Motion Picture Producers and
Distributors of America, Inc.) project has succeeded in
making available a dataset of international significance
developed during an ARC Large Grant project entitled
Reforming the Movies: A Political History of the American
Cinema, 1908-1940, held from 2001 to 2003. The dataset
comprises 35,000 digital images (18GB, in JPEG format) of
documents digitised from a microfilm copy of the General
Correspondence files of the MPPDA, Hollywood’s industry
trade association, covering the period from 1922 to 1939.
In addition, the projects contributed to the development of
Flinders University research data management policy and
the ANDS projects steering committee, which highlighted
research data management issues for Flinders University.
Griffith Identifying and The aims of this project were:
ANDS 2012-13 Business Plan Page 35 of 118
Institution Project Name Project Description
University describing Griffith
University's
research datasets
and making
metadata available
to the Griffith
research
community and the
Australian
Research Data
Service
- Assess research data resulting from ARC, NHMRC and
Australia Arts Council grants
- Identify and describe research data associated with
Griffith research projects in order to meet the criteria set
out by the Australian Research Data Commons and the
National Collaborative Research Infrastructure Strategy
- Consult with researchers to identify appropriate access to
their research data (open, mediated, controlled, meta-data
only, restricted) in line with Federal, State and Griffith's
data management policies, and accommodate any legal,
ethical, security or other constraints over the lifecycle of
the research data identified.
Results: The successful import of 1144 registry objects
published to the ANDS Sandbox and thence to production.
Griffith
University/
Macquarie
University
National Linguistics
Corpus
Establishment of an Australian National Corpus that:
- Aggregates data from existing corpora residing at
Australian universities, to provide a diverse and accurate
representation of written and spoken languages in
Australia.
- Allows the discovery, access and deposition of written and
spoken data
- Allows linguistic researchers to collaboratively apply
textual annotations to written and spoken data
- Is multimodal: supports text, multimodal text, audio and
AV
- Is multilingual: English in Australia, Indigenous languages,
community languages and sign languages
- Increases the potential for re-use of linguistic research
datasets, and enables new research opportunities
- Is harvestable by the Australian Research Data Commons
(ARDC)
La Trobe
University
Archaeological
Database
Development: The
People and Place
Project
This project will build the platform for a potential national
database of historical archaeological collections, excavated
sites and the people connected to those objects and places.
This project will contribute to the Seeding the Commons
program by federating two Archaeological databases
together. Conducting data auditing for existing data sets;
and seeking new datasets from the private, public and
ANDS 2012-13 Business Plan Page 36 of 118
Institution Project Name Project Description
tertiary sectors. It will enable researchers to access a vast
dataset that is currently unavailable to them, and provide
the platform for future datasets to be made freely available
in a standardised and timely fashion.
Monash
University
Monash University
Seeding the
Commons Project
The Research Data Collections Project led by Monash
University Library aimed to identify and describe research
data collections arising from publicly funded research, and
to showcase these collections by contributing information
about them to Research Data Australia (RDA).
As a result of this project, 60+ Monash researchers have
been provided with an additional channel to promote their
research, including showcasing their work to new generalist
and cross-disciplinary audiences.
For those researchers with data that is restricted or only
available by negotiation, appearing in RDA may still increase
opportunities for collaboration and highlight data collection
and analysis techniques that may be of interest to other
researchers. Finally, and perhaps most importantly, the
project has led to a dramatic increase in the participation of
Library staff in data management activities.
Queensland
University of
Technology
Seeding the
Commons Funding
Project was to identify and described QUT research
datasets in preparation for making metadata more
accessible to QUT research community and the Australian
Research Data Service. With this project QUT intends to
identify its datasets created from ARC, NHMRC and Arts
Council funded projects, describe these datasets using the
RIF-CS Schema using information derived from data
interviews with researchers, and store these records in
QUT's research data metadata repository. Data Librarians
were seconded to the project which was completed by June
2010. Data interview activities were completed and some
statistical summaries were generated.
RMIT
University
Screen Media
Research Archive
The Media Virtual Research Collections (MaVReC) is a
virtual collections service designed to integrate cognate but
differently structured existing research collections into a
responsive, sustainable, user-friendly application for the
ingestion, search and retrieval of screen media research.
ANDS 2012-13 Business Plan Page 37 of 118
Institution Project Name Project Description
Screen media research is uniquely characterized by a wide
range of objects and outputs, such as: datasets,
publications (print and online), audio-visual media
(photography, film, video and sound including oral
histories/interviews as well as creative works), and
hypermedia. Cataloguing and archiving each of these
research outputs entails different data standards and
management protocols and has resulted in disaggregation
of research knowledge in the sector. By creating
opportunities to interrogate assets and metadata from
different collection frameworks, MaVReC aims to bridge the
divide between the practice based screen media research
(ERA Creative Works) and other forms of research.
Swinburne
University of
Technology
Swinburne’s
“Watering the
garden for the
seeds to grow”
project
They have developed the framework for a fledgling data
management service at Swinburne that is both customer-
focused and suited to their local environment and
resources. Discussions with researchers informed the
development of both of our public-facing deliverables, the
Research data collections survey and the Swinburne
research data management checklist. Wherever possible
they have shared their work with other ANDS partners, and
publicly online at
http://www.swinburne.edu.au/lib/researchdata.
They have also planted the seed—through their website,
and in devoting a section of the data management checklist
to detailing reuse of existing data—for Swinburne
researchers to consider the possibility of themselves as
data ‘end users’ as well as data ‘creators’.
University of
Melbourne
Seeding the
Commons
The Seeding the Commons project at The University of
Melbourne was an eclectic exercise in spreading the word
about the importance of research data, its management, its
discovery, and its potential re-use. Through the exercises
undertaken they have been able to communicate to a
broad range of individuals and groups across the university.
They have partnered with key stakeholders and developed
strong relationships that will hold them in good stead for
continuing the work that has been started and built upon.
The project has provided tangible outputs which illustrates
what can be achieved with the provision of targeted
ANDS 2012-13 Business Plan Page 38 of 118
Institution Project Name Project Description
resources. To this end they have been able to secure
internal funding for continuing some of the services
implemented during this project as well as continuing to
extend data management support.
University of
New South
Wales
Research Data
management
Services
The Seeding the Commons project at UNSW has developed
a number of services, applications and resources to support
research data management. Key deliverables have included:
- More than 80 research collections from the University of
NSW were documented and contributed to Research Data
Australia. Records of associated projects, services, people
and organisations have also been made available.
- Resources about research data management have been
developed and disseminated. This includes a detailed guide
to best-practice research data management as well as the
material from a series of workshops on data librarianship
and research data management.
- The ResData Deposit Tool was developed to enable UNSW
researchers to contribute descriptions of their data via a
simple web form. Records of research data captured via the
Deposit Tool are saved in ResData, an institutional registry
of research data and collections, as well as appearing in
Research Data Australia.
- The Data & Publications Linkage Tool was developed to
link research datasets with associated publications. The
ability to cite, attribute and link to research data is
desirable for researchers and other repository users. This
tool is designed to interoperate with Fedora, a popular
storage layer for institutional repositories and metadata
registries.
University of
Newcastle
Newcastle
Research Data
Online (Seeding the
Commons)
The aim of the Newcastle Research Data Online Project was
to enable the University of Newcastle to contribute to the
Australian Research Data Common (ARDC). The University
of Newcastle collaborated with a team from the University
of Southern Queensland on the development and
deployment of ReDBox 1.0 at the University of Newcastle.
Web based ingest forms were developed to capture rich
metadata around research data collections.
The project implemented strategic systems, interfaces and
processes to enable the capture of metadata for our
ANDS 2012-13 Business Plan Page 39 of 118
Institution Project Name Project Description
Research Data Collections. The project also identified and
developed interfaces or linkages to pre-determined
strategic triggers within the organisation to assist with the
identification of potential relevant metadata capture. A
number of these triggers have been implemented and
others will be implemented post project in the near future.
The project also focused on populating NOVA, the
institutional research repository, with Research Data
Collection records including links to research data wherever
possible. Or alternatively, to provide contact information
where linking was not possible so as to establish a
connection between the metadata record and the data. RIF-
CS compliant records are published from ReDBox to NOVA
allowing Research Data Collection Records to be harvested
by the ARDC from within NOVA in the required format,
which enables the visibility of the University’s data.
University of
Queensland
Seeding the
Commons Funding
To identify UQ-based data collections resulting from
publicly funded projects that can potentially be shared via
the Australian Research Data Commons;
To identify any potential barriers to sharing publicly funded
data collections and develop strategies and/or mechanisms
to overcome these barriers;
To establish a UQ Data Collections Registry that enables the
registration of new collections and streamlined generation
of metadata/collection descriptions that include metadata
about the researchers/parties and research
projects/activities associated with each collection
University of
South
Australia
Taking Australian
Architectural and
Built Environment
Records into the
Commons
Metadata on collections held at the Architecture Museum
was originally available in PDF Finding aids on the
museum’s website.
All collection and party metadata has been entered into
Metatecture and can be searched by UniSA staff and the
public using the Public Interface to Metatecture.
Metadata is now stored in a consistent manner in a
centralised location, which will lead to improved archival
practices and procedures.
Metadata is also available to a wider audience via Research
ANDS 2012-13 Business Plan Page 40 of 118
Institution Project Name Project Description
Data Australia (RDA), Trove etc.
The Library has used and expanded the database model
from this project to develop 2 research repositories at
UniSA - for SA.NT DataLink and the Division of Health
Sciences. We hope to develop repositories for other
research areas in UniSA using a similar model.
University of
Tasmania
(UTAS)
Publication of
collections into the
ARDC by UTAS
The Tasmanian Partnership for Advanced Computing (TPAC)
hosts and publishes an extensive range of data sets from
around 150 separate collections. The collections hosted
and/or published by TPAC comprise data from
approximately one million NetCDF data sets, and total
approximately 50 terabytes (at TPAC) and a further 30
Terrabytes as part of this federation of Open-source Project
for a Network Data Access Protocol (OpeNDAP) servers at
CSIRO and ANU http://dl.tpac.org.au/ . The collections are
produced by research organisations both internal and
external to UTAS.
University of
Technology
Sydney (UTS)
Community Tools
and Processes for
Effective Data
Management
Planning
The UTS Library, ITD and RIO will collaborate to develop an
effective process and relevant tools to enable the capture,
storage, access and reuse of data and metadata created at
UTS. Building on the existing work being undertaken at UTS
for the NSW node of Australian Social Sciences Data Archive
(ASSDA) and the national archive for Aboriginal and Torres
Strait Islander materials (ATSIDA), the project will identify
ARC and NHMRC funded research.
University of
Wollongong
(UOW)
Identifying and
locating UOW data
sets to seed the
Australian
Research Data
Commons and the
development of a
supporting
research data
management policy
Identify research data collections at the University of
Wollongong. The selected data collections will be described
using RIF-CS. The descriptions will be stored on site in a
metadata store and harvested into ARDC.
Develop university wide data management procedures
relevant to UOW.
ANDS 2012-13 Business Plan Page 41 of 118
4.2.4 Agreed 2012-13 Activities
Through this program, ANDS will concentrate in 2012-13 on a coherent approach to research data
management and infrastructure, on growing the data commons and expanding the use of the
product catalogue, working with ANDS funded projects, and creating a community of data managers.
The coherent approach to research data management and infrastructure will entail a review of the
state of play at institutions, including all existing projects and engagements and potential
engagements such as Metadata Stores. ANDS will meet with the partner to discuss needs and
aspirations, the agree a Statement of Work indicating which services ANDS will work to implement,
what resources the partner will provide, and when things will happen. Through this process, ANDS
should also be able to develop the next goal.
To grow the data commons, ANDS will emphasise the recruitment of existing content into
repositories, identifying existing repositories of useful content, and making all that content
discoverable through the ARDC. Where institutions with valuable existing content do not have the
required systems, ANDS will work with them to improve their ability to store, describe, persistently
identify and register their research data assets, through the re-use of tools described in the product
catalogue and in collaboration with the Metadata Stores program. If demand for this assistance
exceeds available capacity, ANDS will develop a transparent process for allocation of ANDS
resources. Where repositories (or federations) already exist, ANDS will assist with their integration
into the ARDC. This may be work performed by staff in the program target (with ANDS advice if
needed) or by ANDS staff (building on the technical consultancy expertise in the Frameworks and
Capabilities program).
To enable ANDS to undertake this across the country, partnerships have been established with
various eResearch support organisations. ANDS funds staff at the partners, whose role is to work
with ANDS, the partners and other regional or discipline specific organisations to achieve the
program’s goals. It is ANDS’ intention to continue to fund these positions in 2012-13 with selected
partners. Support for part of the program will also come from within the ANDS team. This includes
both work undertaken through partners, and provision of specialist advice.
The availability of EIF funding, and the consequent reorganisation of ANDS has freed up considerably
greater resources for use in the Seeding the Commons program. However, the amount of funding
available for ANDS is still insufficient to provide a data management solution across the entire
research-producing sector. To focus funding effectively, a large number of Expressions of Interest
were invited to understand better the needs and aspirations of institutions in this area. This provided
funding for institutions to better understand their data management needs, their current data
holdings, and put in place systems to manage and exploit this data. This can then be fed to the ARDC.
ANDS staff will continue to work with these projects and institutions to ensure that the work is
consistent with ANDS needs and recommendations. Any systems that are implemented will need to
manage data and metadata that is captured, stored, persistently identified and made accessible
(initially at collection level) through the ARDC. These institutional engagements will be undertaken
with an awareness of the tensions between international disciplinary practice and national or
institutional mandates.
ANDS 2012-13 Business Plan Page 42 of 118
ANDS has either entered into contracts (or has substantially agreed on project descriptions) for
Seeding the Commons projects at the institutions listed in the table below. The projects have
generally taken one of three forms:
Broad: a wide audit of available data, and of policies currently in place, with a view to describing data and creating wider policy.
Exemplar: working with exemplar data collections within the institution, with a view to applying the lessons learnt to other areas, and to the broader data management policy framework within the institution.
Combined: working with the Data Capture projects funded by ANDS, to apply the lessons learnt and help create a broader data management policy framework.
The shaded institutions in the table below are those that were offered and have accepted funding in
the second round of this program.
Institution Project Description/Name Form of Project
Australian Catholic
University
Data Management Plan and Policy Broad
Australian National
University
The ANU Seeding the Commons Project Broad
Bond University Data Management Plan and Policy Broad
Charles Darwin
University
Data Management Plan and Policy Broad
Curtin University of
Technology
Researcher Centric Model for the management
of Research Data
Broad
Deakin University Description and discovery of research data
collections available at Deakin University
Combined
James Cook University Systematic JCU Tropical data collection discovery
and description
Exemplar
Macquarie University Macquarie University Seeding the Commons Broad
Murdoch University Integrating precision agriculture Exemplar
Southern Cross
University
Data Management Plan and Policy Broad
University of Adelaide Research Data Storage and Management Broad
University of Ballarat Data Management Plan and Policy Broad
ANDS 2012-13 Business Plan Page 43 of 118
Institution Project Description/Name Form of Project
University of Canberra Cross-communication and enhanced accessibility
for research data management systems
Broad
University of New
England
UNE’s N.C.W. Beadle Herbarium Database Exemplar
University of Southern
Queensland
Sustainable policy and procedure for capturing
research data
Broad
University of Sydney Seeding the Commons at the University of
Sydney
Broad
University of the
Sunshine Coast
Data Management Plan and Policy Broad
University of Western
Australia
Seeding the Commons through research data
management at the University of Western
Australia
Broad
University of Western
Sydney
UWS Seeding the Commons Combined
Victoria University Research data framework Broad
In addition, ANDS is funding two “Gold Standard” records projects at Griffith University and QUT to
better understand how Research Data Australia records can be effectively enhanced using various
services funded by ANDS, with a requirement that these outcomes be broadly shared with the wider
data management community.
It is expected that program staff engaged in these institutional activities will continue to be closely
involved with the institutional Data Capture projects.
ANDS will also aim to share lessons learned and examples of best practice across the sector to create
a community of data managers. This will involve continuing to bring together data managers either
directly employed by ANDS, or funded by ANDS, from across the country, to effectively share
knowledge and experiences. To achieve this ANDS will need to continue to develop and update an
effective set of internal resources to centralise the necessary information and then to disseminate it
widely. It is expected that members of the community will both add to and learn from these
resources. ANDS will also work with other programs, key researchers, local bodies and overseas
institutions to identify tools and infrastructure that could be co-developed to improve the quantity
and quality of the data that is managed, and increase the richness of the contextual information
around the data that is available. Close coordination with the Frameworks and Capabilities program
will also play a key role.
ANDS 2012-13 Business Plan Page 44 of 118
As the projects under this and other ANDS programs are completed it is expected that there will a
wide range of services, documents and software made available. ANDS intends to examine these
outputs with a view to adapting, redeploying or further developing them for use in other institutions
or projects. These will be developed into a product catalogue, which will contain information about
the development and re-use potential of these tools. This will be designed in a way that will allow for
other initiatives (such as NeCTAR) to add their outputs to the catalogue, allowing for “one stop
shop”. ANDS staff will work with the institutions and other projects to achieve this.
4.2.5 Process for Other 2012-13 Activities
With the offers of $75,000 project funding to the following universities:
Charles Darwin University;
Southern Cross University;
University of Ballarat;
Australian Catholic University;
University of the Sunshine Coast; and
Bond University.
All funds budgeted for the Seeding the Commons program have now been allocated. Remaining
funds will be used to ensure the successful completion of the existing projects, and to assist in the
redeployment and re-use of the software and other materials developed.
4.2.6 Highlights
A particular highlight to date has been the continued growth in awareness of the importance of
institutional support for data management as a result of these projects. In many cases the exemplar
projects in particular began in isolation (within a division or research group) but have created links
across the institution and have resulted in new alliances and a desire to internally funded needed
infrastructure such as storage and data management professionals. This is seen as a particular
advantage as we take a more coherent approach to research data management and infrastructure.
4.2.7 Challenges
The proposed work to be funded under the contracts mentioned above is often ambitious in scope,
and there are a large number of projects involved. As such there is potential for these not being
completed, or failing to meet all their goals. Assessing and monitoring the number of projects in a
way that is both auditable and yet does not present a roadblock for the partners is an issue, although
considerable work has been done to refine the processes for this.
The new coherent approach to research data management and infrastructure will present new
challenges, as ANDS will need to offer a more cohesive set of advice than just that of completing
individual projects, and will need to develop engagements without the benefit of funding as a
stimulus. Considerable work has been done in 2011-12 to prepare for this new challenge.
ANDS 2012-13 Business Plan Page 45 of 118
In addition, work on these projects has often been slower than expected, for a number of reasons,
including difficulty in obtaining sufficient researcher engagement, staff recruitment and turnover,
partners not understanding ANDS requirements and the complications that come from software
development. ANDS has worked closely with partners to try and keep projects on track, and we are
confident that these issues can be overcome, and the projects will be finished by the ANDS
completion date.
Some of the material developed may not be suitable for redeployment, or may only be able to be
redeployed with considerable extra expense.
More recently the NeCTAR funding rounds have distracted many of our partners from their ANDS
projects. In a number of institutions the same staff who have been working on ANDS projects were
asked to participate in the NeCTAR applications, which was given a higher priority. This has delayed
some ANDS projects as a result. ANDS has also caused a similar problem for itself with the Metadata
Stores projects, as these were also handed to the same people doing the work on existing projects.
The 2012 Excellence in Research Australia (ERA) exercise is expected to have a similar impact at some
institutions. It is clear that resources to work on eResearch in general remain low across the sector,
with a consequent churn of available staff.
4.2.8 End of 2012-13 Outcomes
By the end of 2012-13 ANDS will have:
increased the amount of data discoverable and accessible through the data commons;
seeded the commons by integrating a number of strategic data sources and federations into ANDS registry and discovery infrastructure, thus increasing their visibility and accessibility;
worked with partners and stakeholders to meet their contractual requirements to ANDS, and deliver as per agreed timelines, and completed all funded projects;
undertaken a coherence engagement with a range of Australian research institutions;
grown and enhanced the Australian community of data managers; and
worked with institutions to redeploy or assist in the redeployment of software and other outputs developed within ANDS or drawn from the product catalogue.
4.2.9 End of 2012-13 Enduring Changes
By the end of the funding period ANDS will have produced the following enduring changes:
A national resource of managed, connected, findable and re-usable research data now exists that previously did not.
More data are being automatically captured from a wide range of instruments, ranging from small sensors to major national facilities.
Data providers are increasingly seeing the value in providing their data through richer shared environments.
Data are now visible to researchers that were never visible before.
Australian researchers can publish and cite their research data outputs.
ANDS 2012-13 Business Plan Page 46 of 118
Larger questions and grander challenges can be addressed by a data commons that spans facilities, institutions, disciplines and sectors.
New types of research are now possible that previously were not.
Data can be captured, managed, and shared more easily at most research organisations in Australia.
Research organisations and public sector agencies now provide policy support and invest more in data management.
Australian research institutions can more easily comply with the requirements of the Code for the Responsible Conduct of Research.
There is an established community of data management professionals.
Australian researchers and research organisations have better capability to operate and exploit the new research data infrastructure.
Research Data Australia reduces duplication of effort and cost of unnecessary data acquisition by facilitating re-use.
Common approaches, standards, and services have increased efficiency and coherence across the national data infrastructure investment.
Established data management infrastructure now allows for stronger national policy and easier institutional compliance.
4.3 National Collections (NCRIS)
4.3.1 Program Aims
The value of establishing national collections of research data is considerable and addresses many
nationally significant data needs. One is to have collections of data that enable big problems to be
addressed, often by investigating “survey” questions. Another is to have a reference collection that
enables researchers to compare their results with a standard. A further need is to have a locus of all
research data of a particular form to be explored. By bringing together ‘like’ data and highlighting
collections of national significance, ANDS supports research through the improvement of its
discoverability and subsequent potential for reuse. This consolidation also has the potential for
showcasing valuable holdings globally and increasing the potential attractiveness of Australian
research environment.
There are many different players that have an interest in addressing part of the problem. Through
the National Collections program ANDS intends to harness this and increase the focus on establishing
a rich set of such collections, leveraging off its many institutional partnerships and proceeding in a
manner complementary to RDSI and RDSI nodes activity in storing national collections. The National
Collections program will also work with other ANDS programs to effect the proposed changes.
4.3.2 Program Overview
The National Collections Program is a new program in ANDS. It is an NCRIS project and led by CSIRO.
In leveraging off work that has been conducted through the other programs, notably Data Capture,
ANDS 2012-13 Business Plan Page 47 of 118
Seeding the Commons and Public Sector Data, the National Collections program will increase focus
on collections of national significance that have already been captured and published via Research
Data Australia. It will also work through the engagement process within the Seeding the Commons
and Public Sector Data programs to identify and expose further collections for reuse with the
research institutions and agencies. Its purpose would be to support the research sector and the
research data providers in this identification of nationally significant collections of research data and
ensuring the availability of these collections; that they are well managed, connected, discoverable
and able to be used as widely as possible for as many purposes as possible.
This program will work with all parts of the research and public sector in determining these national
collections. It will work closely with research organizations to identify holdings of national
collections. It will also work in partnership with the ReDS program of RDSI and the RDSI Nodes on
approaches to making national collections available and connecting nodes that store national
collections to collections descriptions that are made discoverable through Research Data Australia
(RDA) and other portals, and it will work with research communities and research data providers on
establishing sustainable arrangements for national collections. ANDS is already undertaking this sort
of activity, through engagements such as:
working with the linguistics community and Griffith University to establish a national linguistics corpus;
working with EMBL (European Molecular Biology Laboratory) Australia to make EBI (European Bioinformatics Institute) data more visible and discoverable; and
working with CSIRO to establish a collection of pulsar data from the National Telescope that is now more visible, available and organised, as well as exposing their national biological collections.
Each of these discussions would involve the collection holders and RDSI nodes who might host these collections. The program will provide more focus to this type of work, and provide a greater focus on national collections within the broader Australian data holdings. This greater focus has a number of benefits:
provide researchers with unprecedented access to national collections that enable new forms of investigation to take place;
provide ANDS’ institutional research partners with a coherent effort that will enable them to build well managed, connected, discoverable and widely usable collections;
provide a focus for discussions with research infrastructure capabilities and research groups on how national collections might best be established and hosted; and
help provide a very rich set of national collections that might be hosted on RDSI infrastructure, reducing storage costs and improving accessibility of collections.
ANDS has chosen to work with institutions to identify collections of national significance. The
decision on this significance will lie with the institution.
4.3.3 Infrastructure Created to Date
As this is a new program, there is no infrastructure created to date in this program.
ANDS 2012-13 Business Plan Page 48 of 118
4.3.4 Agreed 2012-13 Activities
As this is a new program, there are no agreed activities as yet. However activity commenced in
January 2012 around planning and establishment of the program. Staff have been recruited to fill
technical and collection development roles. Activities have commenced around the identification of
‘fast start’ collections that are already captured within Research Data Australia to identify the
characteristics that would establish them as a National Collection as distinct from other types.
Negotiation with one such collection is scheduled for March 2012. Additionally work has commenced
on identifying areas of interaction and intersection with RDSI and a regular dialogue is occurring to
address items as they arise. Work has also commenced to analyse the Research Data Australia
collection to identify strengths and gaps.
4.3.5 Process for 2012-13 Activities
Activities in the program are designed to achieve the following outputs:
Working with ANDS’ research institutional partners on the development of national collections that are managed, connected, discoverable and able to be used as widely as possible for as many purposes as possible. This will include elements such as metadata quality, appropriate licensing and DOIs to enable reuse and citation. Many of these collections will be stored and available through RDSI.
Working with other research infrastructure providers on the identification and exposure of national collections that are applicable to their domains.
As a result of institutional engagements on national collections, ANDS will help with the development of agreed and well understood descriptions of national collections. ANDS’ role in this will be one of coordination to bring a focus to the data collections that are of high value to research.
Build an effective relationship with RDSI nodes to publish their hosted collections in the Australian Research Data Commons and ensure that aspects of data management that overlap with ANDS services are agreed and managed, for example the management of DOIs.
Broadly these activities cover the following areas:
Engagement with National Collection holders and potential holders
Initial scan of current holdings in Research Data Australia to identify collections already held and obvious gaps.
Develop possible candidates for discussion with institutional partners.
Engagement with institutions to identify and expose any collection they deem to be of national significance.
Work with the other ANDS programs, particularly Seeding the Commons, Data Capture and Public Sector Data in their institutional engagements. Incorporate the identification and exposure of national collections into the services offered in this process. It is important that this activity complement and not conflict with the engagement and partnering processes being established by these programs.
ANDS 2012-13 Business Plan Page 49 of 118
National collection publication support including the description of collections and engagement with one or more stakeholders: sole custodian, distributed collections and topic descriptions to point to smaller related collections. For example distributed collections may establish a data hub, have data collected together through a discipline portal such as AODN or AVO. Topic descriptions would manipulate records in Research Data Australia to present collections of ‘like’ data.
Presentation of National Collections in Research Data Australia in such a way that complements users’ access through RDSI or local institutional data provision; development of appropriate synergies or connections to RDSI, institutional and other relevant discipline portals where applicable.
Work quickly with custodians of existing National Collections and RDSI Nodes to act as ‘fast start’ activities to inform the longer term engagements. Such collections include National Linguistics Corpus, EMBL/EBI (European Molecular Biology Laboratory/European Bioinformatics Institute), the CSIRO National Biological Collections and where data is being captured from National Facilities and will inform further work.
Engaging with RDSI and RDSI Nodes
Ensuring the ANDS – RDSI/ReDS relationship is refined and covers content and responsibilities such as curation and custodianship.
Working with RDSI/ReDS to enable hosted collections to be published in Research Data Australia and ensure consistency for institutions engaging with both initiatives.
Working with RDSI Nodes to identify national collections and their holders to be hosted at that Node.
Establishing a framework for continued operation including ongoing dialogue involving regular interaction to introduce coherence to the overall management of data collections of national significance and minimise the potential for conflicting activity.
4.3.6 Highlights
As projects and engagements in Data Capture, Seeding the Commons and Public Sector Data
progress, collections of national significance have emerged. These include:
EMBL/EBI (European Molecular Biology Laboratory/European Bioinformatics Institute) mirror in Data Capture;
National Biological Collections and Linguistics Corpus in Seeding the Commons; and
Spatial Information Services Stack (SISS) and geospatial data and the engagement with Geoscience Australia in Public Sector Data.
This work should ensure that research institutions together with their storage partners make
important collections of research data that will improve the ability of their researchers to undertake
research with the partners of their choice. There are already collections that can be identified and
exposed within the National Collections program. Additionally there are further projects and
engagements within the other ANDS programs that National Collections can leverage and into which
National Collections can contribute.
ANDS 2012-13 Business Plan Page 50 of 118
The National Collections program will utilise engagement processes and tools developed within other
programs in ANDS to ensure coherence in how ANDS engages with institutions. Activities in the
program will also inform these processes and tools.
The National Collections program will also be able to apply the ANDS connections strategy
exemplifying the value of the collections.
4.3.7 Challenges
The decision to work with the institutions on the selection of collections may result in some variation
over the life of this Business Plan as actual activity will be subject to the preferences of the
institution. However ANDS is not in a position to determine research value and the ultimate
identification of collections that are truly of national significance will be evolutionary and determined
by uptake. In addition, and given that storage for national collections may vary, RDSI collections may
be a subset of a wider group of national collections. Research Data Australia should be the point
from which a consolidated view of collections of national significance is delivered. ANDS’ approach,
which is to engage at the institution level, must effectively complement the RDSI work on national
collections.
Sustainability plans for collections identified as national will need to be considered. What obligations
would accompany the identification of a ‘National Collection’ are yet to be fully determined. It is
anticipated that the early engagements will inform this to ensure real use cases and expediency.
Collection providers approach to collecting will vary. This will require some attention to ensure that
collections stay current and have appropriate coverage. It can be incorporated into sustainability
planning.
Where a collection may be distributed, determining agreed curation and custodianship may present
challenges. For example, different licensing frameworks could adversely affect how a distributed
national collection could be re-used. Again this is an issue that is more effectively addressed in the
course of engagements to leverage off actual use cases.
Nationally significant material is in varying stages of digitisation, making broader access difficult.
4.3.8 End of 2012-13 Outcomes
By the end of the 2012-13 funding period it is anticipated that the establishment of national
collections of research data would provide:
more nationally significant collections with a strong institutional custodian;
a point of access to a wide range of nationally significant data;
links between collections descriptions on RDA and the data held on RDSI Nodes;
the integration of current collection activity with increased definition to the presentation of collections in RDA;
context around these data collections demonstrating:
research activities associated with the collections,
services and tools available to enable and enhance re-use, and
ANDS 2012-13 Business Plan Page 51 of 118
connections to institutions and disciplines with which they are associated; and
a process for the identification and sustained exposure of such collections.
ANDS will also have examples of national collections that demonstrate the four transformations
ANDS espouses:
having clarity surrounding their management with description, curation and custodianship clearly defined and where applicable complementary storage arrangements with RDSI;
being well connected to provide paths into activities, parties and services that emanate from or are associated with National Collections;
being clearly identified in Research Data Australia with feeds to other appropriate discipline portals to improve their discoverability; and
providing paths to improved accessibility via RDSI where appropriate and identifying tools associated with their re-use.
4.3.9 End of 2012-13 Enduring Changes
By the end of the funding period ANDS will have produced the following enduring changes:
A national resource of discoverable, connected, accessible, and re-usable research data now exists that previously did not.
Data are now visible to researchers that were never visible before.
Data providers are increasingly seeing the value in providing their data through richer shared environments.
A cohort of leading Australian researchers has demonstrated the value of a richer data environment.
Larger questions and grander challenges can be addressed by a data commons that spans facilities, institutions, disciplines and sectors.
New types of research are now possible that previously were not.
Richer data environments facilitate evidence-based policy.
Australian researchers and research organisations have better capability to operate and exploit the new research data infrastructure.
Common approaches, standards, and services have increased efficiency and coherence across the national data infrastructure investment.
Australia is seen to be a leader in research data infrastructure internationally.
These collections become a key part of Australia’s contribution to international research.
ANDS 2012-13 Business Plan Page 52 of 118
4.4 Data Capture (EIF)
4.4.1 Program Aims
The Data Capture program aims to simplify the process of researchers routinely capturing data and
rich metadata as close as possible to the point of creation, and depositing these data and metadata
into well-managed stores. Metadata will need to be held at both collection and object level in order
to support re-use.
4.4.2 Program Overview
The Data Capture program will achieve this aim by augmenting and adapting existing data creation
and capture infrastructure commonly used by Australian researchers and research institutions to
ensure that the data creation and data capture phases of research are fully integrated so as to enable
effective ingestion into the Research Data and Metadata Stores at the institution or elsewhere. This
integration will make it easier for researchers to contribute data to the ARDC directly from the lab,
instrument, fieldwork site, etc. It will also ensure that higher quality metadata (critical for re-use and
discovery) is produced through automated and semi-automated systems. The approach taken will be
to partner with leading research groups and Super Science initiatives to augment or adapt data
creation and capture systems.
The resulting infrastructure components will include software to integrate tightly with the
experimental environment of the researcher to take the data that is being captured/created, and
augment this with metadata that describes the setting within which the data is being
captured/created, as well as other relevant details (where available) about the research project,
researcher, experiment, sample, analysis and instrument calibration details. ANDS will also
adopt/adapt/develop software to facilitate automatic/semi-automatic deposit from instruments into
data stores/repositories.
The Data Capture program was originally allocated $12M in the EIF ARDC Draft Project Plan.
Following the process of public consultation around this Draft Project Plan, this amount was
increased to $18.47M. The consultation process also validated the decision to take an institutional
approach in allocating the bulk of the available funds. An analysis of research intensity for the major
Australian research-producing institutions was undertaken in late 2009 based on the most recent
publicly available data on research productivity, and $11.6M of Data Capture funds was allocated in
bands of $1M, $500K, or $200K. In late 2009 institutions were each sent an individual invitation to
take part in an Expression of Interest process.
4.4.3 Infrastructure Created To Date
The National eResearch Architecture Taskforce (NeAT) projects were conceived as part of the
National Collaborative Research Infrastructure Strategy (NCRIS) under Platforms for Collaboration
(NCRIS 5.16). They were designed as a way of creating infrastructure that responded directly to the
needs of particular discipline communities. The following projects were funded under the Data
Capture program and have been completed:
ANDS 2012-13 Business Plan Page 53 of 118
Project Project Description
DTS: Data Transfer Service This project worked with ARCS Data Services team to develop a general-purpose data transfer service with initial deployments at NCRIS Characterisation facilities.
Human Variome: Software and Data Support for the Australian Node of the Human Variome Project
The Human Variome project is creating a national data repository called the Australian Human Variome Database (AHVD). The database will hold and provide access to information on genetic variations associated with human disease that have been characterised by Australian laboratories and clinics. The project will develop services to enable submission of laboratory and clinic data to the AHVD using existing organisational workflows.
BioFlows: Bioinformatics Workflows
The Bioflows project is providing a simple Web-based workflow tool that enables life sciences researchers to specify genomics and proteomics workflows that can be executed on the ARCS Compute Cloud and interface with the ARCS Data Fabric. The system is deployable as an “appliance,” with required software, middleware and server hardware able to be installed at a site and managed remotely, if required. The appliance can interface with local high performance computing systems and/or submit compute jobs to the ARCS Compute Cloud. The appliance concept is being tested with trial deployments at the Bioinformatics Facility at Murdoch University, the Queensland Facility for Advanced Bioinformatics and the Life Sciences Computation Centre in Victoria.
Aus-e-Stage: Collective Intelligence and Collaborative Visualisation for Creative eResearch
The Aus-e-Stage project is developing two new visually interactive services for exploring information in the AusStage database. It is also creating the capability to generate a new data set of immediate, on-location responses from spectators of Australian performing arts.
Biosecurity: Biosecurity Collaboration Platform
This project has implemented a collaboration platform at the CSIRO’s Australia Animal Health Laboratory (AAHL) facility that comprises two nodes – one on each side of the bio-containment barrier. This will greatly assist in the flow of complex information across the containment barrier from a variety of data sources including pathology and microscopy systems, live in-vivo animal experimental data (e.g. heart rates) and data from simulation models and historical information in both visual and written form. This platform is expected to have broader applicability within the NCRIS Australian Biosecurity Information Network
PODD: Phenomics Ontology Driven Data Management
The Integrated Biological Sciences component of NCRIS contains two major Phenomics initiatives: the Australian Plant Phenomics Facility and the Australian Phenomics Network. These facilities have common requirements to gather and annotate data from both high and low throughput phenotyping devices. The PODD project is delivering a data management service that can handle multiple phenotyping platforms and data formats (text, image, video).
ANDS 2012-13 Business Plan Page 54 of 118
Project Project Description
Remote CT: Remote Computed Tomography Reconstruction, Simulation and Visualisation
The Remote CT project is developing a three-part service for 3D reconstruction and visualisation of Computed Tomography (CT) images. The service will be deployed at the Imaging and Medical Beamline at the Australian Synchrotron and the ANU micro-CT facility.
AusCover Workflow: Workflow Services to Enable a Large-Scale Temporal-Spatial Ecosystem Digital Information Service
AusCover is a component of the National Collaborative Research Infrastructure Strategy (NCRIS) Terrestrial Ecosystems Research Network capability. It focuses on organising remote sensing data sources and products for terrestrial ecosystems research. AusCover will enable – for the first time in Australia – the online storage of data sets in a form that makes them directly accessible to the user community. The AusCover Workflow project is providing easy-to-use workflow tools and services that enable researchers to process AusCover data sets using the ARCS Cloud Computing infrastructure. The same workflow tools also will allow AusCover data providers to more easily process raw satellite data to generate derived data products in the standard formats users require.
Other ANDS funded projects begun since the beginning of the ARDC project that have been or will be completed before the end of the planning period:
Project Institution Project Description
Metadata management for neutron beam instrument data - a joint project with Australian Synchrotron
ANSTO Data capture from instruments; metadata management (combined project with Australian Synchrotron)
Meta Data Capture and Storage for the Three Mature Beamlines at the Australian Synchrotron - a joint project with ANSTO.
Australian Synchrotron
Data capture from instruments; metadata management. Joint project with ANSTO.
Activities include: Research data policies; identifying, describing and publishing datasets; creating central research data repository & metadata store; changing researcher culture.
ATNF Pulsar Data Management Project
CSIRO The ANDS-CSIRO-ATNF Pulsar Data Management Project enables the discovery of, assessment of and access to Pulsar data observed at the Parkes Telescope.
Automated measurement of the responses of wildlife populations to climate change
Flinders University
This ANDS project has caused the university to consider practical aspects of research data storage regarding copyright, access rights and storage of public and restricted file-based research datasets. Although data management policy seems likely to remain unchanged, practical guidelines are likely to adapt as curators and researchers find themselves in new situations with respect to research data. This project has been a stimulus to
ANDS 2012-13 Business Plan Page 55 of 118
Project Institution Project Description
consider those issues.
The particular project has enhanced the effectiveness of the research data for appropriate analyses. Specifically it has delivered more efficient cleaning software for data files collected from lizard data loggers. The project has also delivered a solution for storage of the lizard data files. This will allow much more effective data sharing among the current group of researchers, and among others who follow in the project.
Finally the creation of a university wide metadata store and the automatic transfer of data to the national register (visible at Research Data Australia) will allow the existence of lizard and other research datasets to be discoverable globally and therefore benefit a wider research community.
Smart Water Griffith University
Water end use study, data capture from instruments, reporting on domestic water use.
A software system was built to collect data from remote smart water meters and integrate it into related data sets obtained from other sources and use metadata to describe the data used and the cross-relationships between data. The system will be used to build and integrate data sets to explore patterns of water use, the data sets, descriptions and mega-processes developed.
This project developed and delivered the Smart Meter Information Portal which stores and processes all total water consumption data downloaded from 250+ homes that are being metered and logged as part of the South east Queensland Residential End Use Study (SEQREUS).
Adult Stem Cell & Neurobiological Microscopy Instrumentation and Research Data Management
Griffith University
To develop a software system to centralise the management of a large volume of microscopy image and related experimental metadata, allowing researchers within the National Centre for Adult Stem Cell Research ("NCASCR") to more effectively organise and analyse their biological imaging experiments.
Objectives of the project are to capture metadata from images generated from microscopy instruments, Import images in a standard format, allow web based browsing and management of image collections capabilities, share imaging collections, allow annotation of images and image collections, searching of metadata and it will generate and send "published" image collection metadata to Griffith's Research Activity Hub
The Centre for La Trobe The major outcome from the researchers' point of view is
ANDS 2012-13 Business Plan Page 56 of 118
Project Institution Project Description
Materials and Surface Science’s Remote Laboratory Instrumentation (CMSS RLI) Metadata Capture and Publication
University that the system provides a facility to store their datasets, describe them, and easily share them online, with persistent identifiers so that they can be cited conveniently.
Secondly, the system has provided a platform for the CMSS to publish data acquired from several standard materials. These datasets are now available to other researchers for comparison with their own data on these materials, as well as others derived from them.
Finally, the system is able to expose these datasets to the wider world through the Research Data Australia portal.
Glycomics Repository Macquarie University
This project advanced the state of data capture and management within the glycomics discipline by installing new repositories, create an automated data and meta-data capture system at the Australian Proteome Analysis Facility (APAF), with an OAI-PMH feed will be created to allow Research Data Australia to harvest details of the collections stored within the GlycoSuiteDB repository.
Research Data Management of the Monash Weather & Climate Program (Climate and Weather)
Monash University
The Climate and Weather Data Capture Project addresses many issues facing Climate and Weather researchers in Monash University, supporting existing research and enabling Monash University Weather & Climate (MW&C) researchers to contribute to the work of evaluating the newly deployed Australian ACCESS climate model. Through ANDS funding and support the climate and weather research data has been transformed in the following ways:
Overall, the Climate and Weather Data Capture Solution provides the MW&C researchers with the ability to conduct more effective research based on exploiting existing data sets, creation and storage of new data sets and better collaborative research opportunities.
Biomedical Data Platform (Molecular Biology)
Monash University
The Biomedical Data Platform Data Capture solution, myTardis, is a multi-institutional collaborative venture that facilitates the archiving and sharing of data and metadata collected at major facilities such as the Australian Synchrotron and the Australian Nuclear Science and Technology Organisation (ANSTO) and within Monash University and other institutions.
Tools for curating and publishing research data in the form of media collections (Multimedia Collections & ARROW)
Monash University
The purpose of this project was to develop generic software tools to support the organisation of pre-existing digital data collections, particularly photographic data, into files and formats suitable for publication into institutional repositories and into other discovery services including Australian Research Data Commons (ARDC).
ANDS 2012-13 Business Plan Page 57 of 118
Project Institution Project Description
Overall, the solution provided by the Multimedia Collections and the Australian Research Repositories Online to the World (ARROW) project has enhanced the library photographic collection curation process. This in turn supports research in areas that benefit from access to these photographic collections. Through ANDS funding and support the digital photographic data collections held by the Monash Library can easily been transformed in various ways.
Capture and publication of Australian ecosystem data from a network of measurement sites (Ecosystem Measurements)
Monash University
The solution provided by the Ecosystem project has enhanced the research process and provided new research opportunities. Overall, the solution cuts down the time significantly before the researcher can begin their research work and facilitates much simpler sharing of data. Due to these benefits the Ecosystem solution has been very well received by the OzFlux community of researchers. This adoption is a testament that the solution funded by ANDS is of significant value to the researchers in the community. The next key challenge is to enhance the Ecosystem solution to facilitate Ecosystem research on a larger scale at the national level.
Capture and publication of data on the history of adoption (History of Adoption)
Monash University
The History of Adoption Data Capture solution has provided automation of the following steps in the existing data capture process:
- The process of capturing stories and associated metadata from the website submission form and storing them in a Data Management system.
- The process of generating a story web page with attached story transcript, metadata files in MODS and DC formats and search tags.
- The process of "publishing" a story to ARROW, the Monash Library public repository.
This project has addressed the problem of simplifying and speeding up a very complex and time-consuming data process. In addition to this as some of the data is of a sensitive nature the act of automating steps has increased security through the removal of potential human error. Technical outcomes developed by this project include processing data through a number of platforms and repositories including Confluence, Mediaflux and Monash’s ARROW repository. In particular the ‘Publish to ARROW’ functionality developed on Confluence has been built as a global menu option that may be leveraged in future to simplify publication of data from Confluence to ARROW for other projects. Lessons learnt in developing
ANDS 2012-13 Business Plan Page 58 of 118
Project Institution Project Description
this project are that it is important to work collaboratively with the researcher throughout the project to ensure the solution is relevant and useable. The solution developed for this project has been specifically tailored to allow for manual steps to apply appropriate ethical and legal censorship.
Data Publication to Interferome (MIMR/Interferome)
Monash University
The researchers at the Monash Institute of Medical Research (MIMR) use Agilent, a microarray platform, and large genomics data repositories like ArrayExpress and GEO. They require the ability to assimilate the research data from these complementary data sources into a single repository to facilitate comprehensive analysis.
The system currently in place at MIMR does not capture research data directly from researchers and it also has limited capabilities to query the research data. Therefore some of the key data issues faced by the MIMR researchers are access, sharing, publishing and reuse. The Interferome System addresses these issues, supporting existing research and enabling MIMR researchers to perform complete analysis and maximise the value of available research data.
Comprehensive Data Management for Microscopy Research Datasets
Monash University
This solution allows the Faculty of Medicine, Nursing and Health Sciences to generate and capture experimental information and raw data from optical microscopy instruments in a centralised location, rather than the current practise of capturing this on researchers’ labnotes and personal disks. This approach will establish a concept for the mandatory annotation of digital imaging data to maximise the use, re-use and distribution of experimental information within the scientific community.
Greenhouse Gas Emissions from Australian Soils
Queensland University of Technology
The N2O Project aimed to re-engineer an existing client application called Morpho used in the ecological research domain. The Morpho software is used by researchers as part of work associated with the national N2O Network research program. Soil emissions data is automatically collected at various sites around Australia using automated gas sampling systems.
A Java-based data management application called MetaMaster was developed to overcome these problems. MetaMaster facilitates the management of Scientific Data Packages within the ecological research space and assists with the creation of metadata records that conform to the RDA metadata standard (RIF-CS). These are ingested into the QUT Metadata Hub which has an OAI-PMH handler that allows ANDS to harvest the N2O metadata records and publish them to Research Data Australia (RDA).
ANDS 2012-13 Business Plan Page 59 of 118
Project Institution Project Description
In addition, the N2O and the other two data capture projects provided QUT with an opportunity to establish a system architecture and suitable workflows for publishing research data information to Research Data Australia. The system architecture and workflows are now existing components of infrastructure that other research groups can use in the future, and therefore support the broader development of research data management and the publishing of QUT research data.
Biodiversity Queensland University of Technology
The Bio-diversity Data Capture Project aimed to study the management and publishing of audio data collected from acoustic sensors for environmental health monitoring led by Professor Paul Roe’s research group at Queensland University of Technology (QUT). The researchers have developed a research application, the Environmental Health Sensor Monitoring (EHSM) system, which adopts acoustic sensing, database and web service technologies to capture audio data from different environments for monitoring species that have regular and predictable vocalisations.
To help researchers to publish the information that are collected using the EHSM system, a Java-based desktop application called “RDB2RDF Mapper” was developed through the Bio-diversity Data Capture Project. The application assists a researcher and data manager to create a mapping between the metadata stored within the relational database and a given schema or web ontology. The mapping can then be used to transform the relational database data into a form (e.g., Resource Description Framework, or RDF) which can then be easily ingested into QUT’s Metadata Hub and published to Research Data Australia.
In addition, the Bio-diversity and the other two data capture projects provided QUT with an opportunity to establish a system architecture and suitable workflows for publishing research data information to Research Data Australia. The system architecture and workflows are now existing components of infrastructure that other research groups can use in the future, and therefore support the broader development of research data management and the publishing of QUT research data.
B150 Big Jam Queensland University of Technology
The Q150 Data Capture Project aimed to study the management of metadata relating to multimedia data captured during Q150 Big Jam Live Music Festival. The multimedia data includes video, image and text that was recorded relating to the music, bands, artists and other
ANDS 2012-13 Business Plan Page 60 of 118
Project Institution Project Description
background information during the festival.
A Java-based desktop software application, “Media Crawler”, was developed for this purpose through the data capture project. The application provides a view onto the file-system containing the multimedia assets, and which allows the user to annotate, search and group the recorded assets. Metadata is automatically extracted from various file formats and other metadata can be imported as well, for example, from CSV files. The software is meant to facilitate this process so that views, or “collections”, may be defined and exported with the associated metadata. Collection metadata may also be exported as XML.
In addition, the Q150 and the other two data capture projects provided QUT with an opportunity to establish a system architecture and suitable workflows for publishing research data information to Research Data Australia. The system architecture and workflows are now existing components of infrastructure that other research groups can use in the future, and therefore support the broader development of research data management and the publishing of QUT research data.
Data Capture from High Performance Computing Multi-User Environments
RMIT University
RMIT is currently a very large user of state and national high-performance computing (HPC) facilities. This project will develop and deploy software tools and applications which will be deployed for the NCI Supercomputer National Facility, the VPAC Supercomputer Facility and the RMIT HPC Facility.
Automated capture and publishing of data generated on high throughput plant phenomic platforms.
University of Adelaide
Extract and curate data generated by the Plant Accelerator's LemnaTec Smarthouse platforms.
Publish the data to a navigable public repository, and metadata to Research Data Australia.
Enhanced Metadata Capture for Sustainable Management, Sharing and Re-use of APN Histopathology Research Data
University of Melbourne
This project helped establish a national database of mouse pathology to enhance the utilisation of mouse models of disease by Australian researchers. It enhanced metadata capture facilities for the Histopathology and Organ Pathology Service based at the Department of Anatomy and Cell Biology, The University of Melbourne as part of facilitating the sharing and re-use of mouse pathology research data both now and into the future. The project addressed current metadata scalability and sustainability issues associated with the service in order for the Melbourne Histopathology Service to participate in and contribute to emerging research data networks like PODD and ANDS.
ANDS 2012-13 Business Plan Page 61 of 118
Project Institution Project Description
Melbourne Neuropsychiatry Centre (MNC) Bioinformatics Development Project
University of Melbourne
The MNC has one of the largest databases of brain scans and associated neuropsychiatric research data in the world. It has National and International collaborators using and contributing to the database. Broadly ANDS-related activities will involve:
- Building a workflow for automatic documentation of dataset segments used in individual studies and publications. This will include researchers, datasets, associated projects and publications.
- Building a workflow for automating creation of citable persistent identifiers for unique studies and linking with publications.
- Building software to automate capture of public facing metadata to University of Melbourne Registry which will deliver collections metadata to the ARDC.
- MNC has 270+ publications resulting from datasets stored in the MNC database. Completing the work above will result in ~ 100 dataset descriptions described in the ARDC by August 2011, with an expected 25+ being registered each year after that.
Youth Research Centre’s Life Patterns Project: Longitudinal qualitative and quantitative survey data capture and reuse
University of Melbourne
The Youth Research Centre's Life Patterns Research Program maintains an extensive qualitative and quantitative data base on a cohort of 2000 young Australians who left secondary school in 1991 and of a second cohort of 3000 who left school in 2005.
The project resulted in the implementation of a local system and related workflows and practices to describe the collection. Metadata about the Life Patterns Research Program is now exported to the University Research Data Registry and harvested to Research Data Australia.
It is anticipated that nationally and internationally there will be researchers who will better access the Life Patterns program and contact the research program to work in collaboration with the Life Patterns team. It also means that the public face of the program can be maintained and updated to remain current and therefore relevant. It is also expected that as a result of this project, policy-makers, media and other institutions and social actors interested on youth research and data will have a better access to the Life Patterns program.
Video data in the Social Sciences. Optimising Metadata Capture, Data Sharing
University of Melbourne
The University of Melbourne has an especially rich humanities and social science research community that utilises video as its primary form of data capture. The increasing use of video as a research tool poses particular
ANDS 2012-13 Business Plan Page 62 of 118
Project Institution Project Description
Procedures and Long-term Reuse
challenges for aggregated data storage initiatives. This project will integrate metadata capture facilities at selected sites within the University of Melbourne as part of facilitating sharing and re-use. The project will address current metadata issues associated with large-scale audio-visual repositories and workflows to enable efficient generation of metadata, ensuring that stored video data is accessible and searchable through the ARDC.
Federated Neuroimaging Collections in the National Data Commons
University of Melbourne
DaRIS is a raw data management system based on the Mediaflux digital asset management platform and has been in operation for the last 3 years at the Neuroimaging Computational and Data Management Facility (CDMF). There it has been used to routinely receive MR images from researchers and organise them into a subject-centric data model, ready for access by project members. It hosts over 70 mouse and human projects, each with many tens of subjects and some with time-dependent data.
Humanities and Social Science Research Data at the University of Melbourne
University of Melbourne
The University of Melbourne has one of the most rich and diverse humanities and social science (HASS) research communities in Australia and is well ranked internationally. HASS researchers at Melbourne generate and hold valuable data sets and associated materials that are currently not easily discoverable, accessible or configured for further research purposes. This project will build infrastructure (tools and services) to connect this diverse community with the UoM Registry (Vitro) which will in turn communicate the relevant metadata to the ARDC.
Capture of Complex Data to Support Clinical Research in Cardiovascular and Neurological Medicine
University of Melbourne
Complex physiological data is routinely collected on patients as part of clinical care (echocardiography, intravascular ultrasound, x-ray angiography, optical computerised tomography, patient clinical data, etc.). However, this rich multi-model data is not usually subjected to subsequent analysis nor is it made available to researchers from other disciplines for novel analysis. Making this multi-model data available along with patient outcomes such as morbidities will provide the opportunity for collaborative groups to employ novel strategies to developed assessments and models based on this data. This project will form necessary base of making multi-model data collections available, enabling the establishment of new links between biomedical research groups in engineering, physics and bioinformatics. This project will occur in collaboration with BioGrid Australia where it will use the access, de-identification and privacy protection protocols already established there.
ANDS 2012-13 Business Plan Page 63 of 118
Project Institution Project Description
Founders and Survivors Project
University of Melbourne
The Founders and Survivors Project (http://www.foundersandsurvivors.org/ ) has brought together a number of research data sets created from records relating to the 73,000 convicts transported to Tasmania in the 19th century and their descendants to create a population database of national and international significance for historical, demographic and population health researchers.
This project has:
- Developed a toolkit based around the projects XML/TEI workflow for further relevant records sets to be systematically ingested into the population database,
- Built the infrastructure to enable persistent identification and descriptions of derived data sets produced on request from the population database to be made available to the ARDC
An international antibiotic‐resistance gene cassette database
University of NSW
The project made technologies that allow us to archive relevant elements and to identify them in bacterial DNA sequences. It has also built a knowledge repository of antibiotic resistance gene cassettes available to the wider research community and to allow the community to contribute new entries to the repository as these are found.
Providing this application should enhance the team’s global reputation as leading researchers of antibiotics resistance and pioneers in the analysis of larger-than-gene structures. The application makes this research accessible to many research teams that don’t have the necessary computer skills to otherwise use it, thus increasing our impact on the scientific community.
Managing and Sharing Genomic Data
University of NSW
A new generation of DNA sequencers has recently been installed at UNSW, Southern Cross University and the Australian National University. These instruments can generate DNA sequence data 1000x faster than old technology, and can sequence the genomes of small organisms in a week. This project established databases for this DNA sequence information, so that users of the DNA sequencers can access their information in an efficient way. This will centralise DNA sequence data from these facilities, enable collaborative projects and facilitate data sharing. On publishing, research data and associated metadata can be made available for public use, contributing to the Australian Research Data Commons (ARDC).
ANDS 2012-13 Business Plan Page 64 of 118
Project Institution Project Description
Spatially Integrated Social Science
University of Queensland
The Urban and Regional Analysis Research Program (URARP) in the Institute for Social Science Research (ISSR) is developing a repository for capture, integration and sharing of diverse socio-spatial data sets.
Aquatic Species Tracking Repository
University of Queensland
This project is collecting data captured by the ECO-Lab at the University of Queensland (Prof Craig Franklin and Dr Hamish Campbell) - who are using arrays of underwater, acoustic receivers to track aquatic animals (crocodiles, turtles, rays and sharks) within river systems over large temporal (3-10 years) and spatial scales (100's km). The data is being collected to improve our understanding of the ecology and habitats of these species and to provide information to aid in their conservation and management.
The Health-e-Reef Project
University of Queensland
This project is developing the data capture and sharing services for coral-reef related data being generated by researchers at the University of Qld Centre for Marine Studies, together with their collaborators in community/volunteer groups (CoralWatch, ReefCheck) and government organizations (EPA, DERM). Together these researchers are monitoring and studying the impact of climate change and human activities on coral reef ecosystems. There will be particular focus on automatically capturing the metadata necessary to support discovery, decision, and reuse. The discovery metadata will be made available through RDA.
Development And Testing Of A Data Capture Tool For Social Datasets Being Used For Record Linkage
University of South Australia
The Ian Wark Research Institute (IWRI) is a major node for both the Characterisation and Fabrication initiatives, funded by NCRIS with supplementary EIF support. Both initiatives feature unique (in Australia) flagship instrumentation together with IWRI's existing equipment infrastructure, which generate data of a variety of types and sizes. The project is developing a MetaData Capture Tool, a semi-automatic tool to facilitate capture of both data and metadata from the IWRI TRIFT V and Mastersizer 2000 instruments.
Clarke eHealth (Early Activity): Capture, management, re-use and discovery of breast cancer microscopy virtual images
University of Sydney
This project constructed software to allow wide use and re-use of microscopy images in breast cancer research, generated at the Westmead Institute for Cancer Research (WICR), in analysis and study by approved researchers across Australia, with collection descriptions available through RDA.
AgDataCapt: Capturing Agricultural Data
University of Sydney
Standards-based body of data and metadata will be coordinated and include data from sensors on soil, water and rain, greenhouse gas and carbon data, and weather. Generic and extensible tools will be developed to
ANDS 2012-13 Business Plan Page 65 of 118
Project Institution Project Description
integrate appropriate standards across the areas of data capture. The areas of data capture are: Soil moisture, soil and air temperature, radiation, 3D wind speed and directions, CO2, water vapour, weather and tree water use data, greenhouse gas emissions from soils; GPS-referenced data on wheat and barley crop inputs and performance; Soil data from sensing system for monitoring of agro-ecosystems for sustainable landscape; Soil data and spatial prediction functions for soil variables; and Data from farm-based private rain gauges.
AMMRF Live Cell Microscope Data Capture
University of Sydney
Experiments conducted using high-end microscopy instruments currently record information in multiple dimensions, such as 3D and 4D, often with multiple detectors. Research data collections are therefore increasing in size as microscopy techniques evolve, often resulting in extremely large images that are difficult to manage using current infrastructure and systems. Images also often require post-acquisition processing steps where the research data moves between analytical platforms, which can be situated in different sites.
The University of Sydney DC2E Microscopy data capture project will complement the NeAT Australian Microscopy and Microanalysis Research Facility (AMMRF) PfC project by catering for additional light microscopy instruments, extending the scope for the repository that the current project is developing.
Metadata Store/Aggregator
University of Sydney
An institution-wide metadata store will be implemented in collaboration with University of Sydney ICT. This store will aggregate information from suitable University enterprise systems, have applicability across many areas of data capture and complement the Sydney “Seeding the Commons” project.
Data Capture of state-wide hydrological datasets
University of Tasmania
The aim of the project is to capture and publish data from the CSIRO and Forestry Tasmania sensor webs. The data will be exposed by two sensor observation services (SOS) serving twenty two sensors, and a THREDDS server providing forecast data.
Biomechanics Data Capture Project System
University of
Wollongong
This project aims to; • Create metadata entries associated with datasets. The tagging of metadata will be a combination of automatically from the devices and manually entered by the user. Some may be able to be captured from the instruments and they rest will be manually entered. If it proves more effort to capture automatically then all metadata may be manually entered. • Provide a central storage repository where metadata
ANDS 2012-13 Business Plan Page 66 of 118
Project Institution Project Description
coupled with raw datasets can be stored. This repository is not related to Research Online. All data coupled with metadata will be stored in a database. • Provide a mechanism whereby collections of data can be defined. • Provide a mechanism whereby metadata can be harvested and automatically transported to the ARDC making data collections discoverable within and outside of UOW. To achieve this there will be an automatic OAI-PMH feed from the system to ARDC. • Provide researchers with an interface to search for and download datasets associated with their research.
These projects have not only delivered on their goals but have also helped ANDS to offer advice and
refinements on how better to run other projects. ANDS is actively examining the outputs of these
projects to understand their potential for re-use, as part of a more coherent approach to research
data management and infrastructure, and for general dissemination.
4.4.4 Agreed 2012-13 Activities
The Data Capture projects are designed to build infrastructure on at least one of three levels,
although most work across these levels, as indicated in the table below. These levels are:
Instruments: building infrastructure “pipes” between instruments and well supported data and metadata storage facilities.
Software: to create these “pipes” and other infrastructure to enable better management and descriptions of research data and associated metadata.
Management: use of this infrastructure to better manage and describe research data and associated metadata to enable it to be connected, and feeding the relevant records into the ARDC to facilitate discovery of the data and then re-used.
Work has now commenced on the following Data Capture projects. These projects are all intended to
be completed during the current planning period.
Institution Project Description Instruments Software Management
Australian
National
University
Earth Sciences ✔ ✔ ✔
Optical Astronomy (Skymapper) ✔ ✔ ✔
Phenomics ✔ ✔ ✔
Humanities and allied disciplines ✔ ✔
ANDS 2012-13 Business Plan Page 67 of 118
Institution Project Description Instruments Software Management
CSIRO
Sensor Data management ✔ ✔ ✔
Curtin
University of
Technology
Curtin deployment and configuration of
Institutional Metadata Repository and
Research Data Portal
✔ ✔
Deakin
University
Filtration Membrane Fouling Data
Collection for Water Treatment
Research
✔ ✔
Crystal Orientation Data Collection for
Conversion to a General Data Type ✔ ✔ ✔
James Cook
University
Tropical Data Hub ✔ ✔ ✔
Macquarie
University
Glycomics Repository ✔ ✔ ✔
Papyri Data Capture ✔ ✔
University of
Adelaide
Genomics Data Capture ✔ ✔ ✔
University of
New South
Wales
Australian and
New Zealand Neonatal Network
(ANZNN) Neonatal Data Capture Portal
✔ ✔
Data capture and integration across
multiple platforms (includes Integrated
Health Data Network)
✔ ✔
University of
Newcastle
Data Capture for the Data Commons ✔ ✔
University of
Queensland
Microscopy/Microanalysis Image and
Data Repository ✔ ✔ ✔
The Diffraction Image Experiment
Repository (DIMER) ✔ ✔ ✔
ANDS 2012-13 Business Plan Page 68 of 118
Institution Project Description Instruments Software Management
3D Anthropological and Archaeological
Collection Repository ✔ ✔ ✔
Linking the EMBL Australia EBI Mirror
with the Australian Research Data
Commons ($1m)
✔ ✔
University of
Sydney
Square Kilometre Array Molonglo
Prototype (SKAMP) Data Capture:
astronomy
✔ ✔ ✔
NSW TARDIS Node ✔ ✔ ✔
FieldHelper: a workflow and tools for
improving fieldwork data collection and
submission to institutional repositories
✔ ✔ ✔
University of
Tasmania
Redmap Australia ✔ ✔ ✔
University of
Technology,
Sydney
Maximising the Benefit from Data-
Intensive Processes at UTS ✔ ✔ ✔
University of
Western
Australia
Deployment and configuration of
Institutional Metadata Repository
✔ ✔
Integrated Data Capture for
Characterization and Analysis ✔ ✔ ✔
Archaeological Rock Art Data Capture ✔ ✔
Marine Ecology Video Capture and
Storage
✔ ✔
University of
Western
Sydney
Climate Change and Energy Research
Facilities (CCERF) ✔ ✔ ✔
University of
Wollongong
Satellite data capture ✔ ✔ ✔
The projects all commence by providing a small number of hand-crafted feeds of Collections with
associated Activities and Services. These feeds are used to inform the process of specifying the
software that is required to automate the creation of this metadata, as well as to allow feedback on
ANDS 2012-13 Business Plan Page 69 of 118
the metadata to be produced. Over the remainder of the project, the software will be specified, built,
tested and documented.
In order to capitalise on the work already funded ANDS intends to further develop the product
catalogue established in the current year, and to assist in the redeployment and reuse of the
software developed at all institutions.
4.4.5 Process for Other Activities
All funds allocated for the Data Capture program have now been allocated. Remaining funds will be
used to ensure the successful completion of the existing projects, and to assist in the redeployment
and re-use of the software developed.
4.4.6 Highlights
The main highlight of the previous year has been the completion of a large number of projects. This
has resulted in the delivery of a wide variety of software from across most research areas, and in the
description and dissemination of many hundreds of research data collections. These projects have
delivered infrastructure that supports the ANDS transformation goals, as well as offering substantial
material to the product catalogue that ANDS will further develop and enhance in 2012-13.
4.4.7 Challenges
The Data Capture program has had to deal with many of the same challenges that have been
experienced in other ANDS Programs: starting the engagement, agreeing on proposals, and finalising
contracts. Getting the right level of engagement with institutions in order to start having the
conversations about what is possible has often been difficult. Once the conversation has started,
getting to agreed proposals has also often required many iterations. Finally, there have been delays
in getting to finalised contracts once agreement has been reached on what each proposal involves.
ANDS has identified these issues and put in place improved processes to reduce their occurrence and
impact. Partners have also found that finding suitable staff has also presented challenges.
Assessing and monitoring the number of projects in a way that is both auditable and yet does not
present a roadblock for the partners is an issue, although considerable work has been done on the
processes to refine this.
In addition, work on these projects has often been slower than expected, for a number of reasons,
including difficulty in obtaining sufficient researcher engagement, staff recruitment and turnover,
partners not understanding ANDS’ requirements and the complications that come from software
development. ANDS has worked closely with partners to try and keep projects on track, and we are
confident that these issues can be overcome, and the projects will be finished by the ANDS
completion date.
More recently the NeCTAR funding rounds have distracted many of our partners from their ANDS
projects. In a number of institutions the same staff who have been working on ANDS projects were
asked to participate in the NeCTAR applications, which was given a higher priority. This has delayed
some ANDS projects as a result. ANDS has also caused a similar problem for itself with the Metadata
ANDS 2012-13 Business Plan Page 70 of 118
Stores projects, as these were also handed to the same people doing the work on existing projects.
The 2012 Excellence in Research Australia (ERA) exercise is expected to have a similar impact at some
institutions. It is clear that resources to work on eResearch in general remain low across the sector,
with a consequent churn of available staff.
4.4.8 End of 2012-13 Outcomes
By the end of 2012-13 ANDS will have:
completed the institutional activity funded under the Data Capture program which will continue to build on the work undertaken previously to produce a greatly increased number of collections in the ARDC, as well as more data with associated rich metadata under management;
made available collections that will span a wide disciplinary range and contribute towards the ANDS vision;
used the Data Capture infrastructure developed in the NeAT projects to contribute significantly to the amount of data made available for re-use both within the disciplines and across other disciplines; and
made the infrastructure resulting from both institutional and NeAT activity available for adaption and redeployment through the product catalogue.
4.4.9 End of 2012-13 Enduring Changes
By the end of 2012-13 ANDS will have produced the following enduring changes:
A national resource of managed, connected, findable and re-usable research data now exists that previously did not.
More data are being automatically captured from a wide range of instruments, ranging from small sensors to major national facilities.
Data providers are increasingly seeing the value in providing their data through richer shared environments.
Data are now visible to researchers that were never visible before.
A cohort of leading Australian researchers has demonstrated the value of a richer data environment.
Australian researchers can publish and cite their research data outputs.
Larger questions and grander challenges can be addressed by a data commons that spans facilities, institutions, disciplines and sectors.
New types of research are now possible that previously were not.
Data can be captured, managed, and shared more easily at most research organisations in Australia.
There is more cost-effective reuse of research data software in Australia.
Research Data Australia reduces duplication of effort and cost of unnecessary data acquisition by facilitating re-use.
ANDS 2012-13 Business Plan Page 71 of 118
Common approaches, standards, and services have increased efficiency and coherence across the national data infrastructure investment.
Established data management infrastructure now allows for stronger national policy and easier institutional compliance.
Australia is seen to be a leader in research data infrastructure internationally.
4.5 Metadata Stores (EIF)
4.5.1 Program Aims
The Metadata Stores program aims to assist institutions and disciplines to better manage the
collection and object level metadata associated with research data outputs and associated entities.
4.5.2 Program Overview
Information that can be held about data (often called metadata) can be grouped into four categories.
The first is information for discovery, and is primarily held at the level of a collection. This consists of
the range of pieces of information that will assist in the discovery of the collection. The second is
information for determination of value (also primarily at collection level). This includes information
such as the name of the researcher, institution or funding program that might help a potential user
to decide whether they want to access the data. The third is information for access that might be a
direct link to the data objects (stored elsewhere, such as on national and institutional data stores),
both at collection and possibly object level, or contact details for where to source the data. The
fourth kind of information is information for re-use, and will include things like reading scales, field
names, variables, calibration settings that are needed in order to effectively re-use the data. This will
mostly be at object level.
In practice, the distinction between data and metadata can be somewhat arbitrary and depends on
the system that is being used to manage the data. If this system is files-oriented, then the metadata
will almost always be separately managed in some sort of associated system. If data management
system is database-oriented, then much of the metadata will either be attributes of rows and
columns for the database tables.
ANDS is concerned with information about data collections and data objects, but importantly also
with information about associated entities. These include parties (both people and organisations),
activities (that produce the data) and services (associated with the capture of, and access to, the
data). These associated entities serve as part of the rich context for the data collections, and also
contribute to the information for discovery and information for determination of value. This rich
context will come from existing institutional systems via software infrastructure that might be
thought of as pipes along which the contextual information flows. There also pipes between
metadata stores and data stores, and between metadata stores and the ARDC Core infrastructure.
So, software that is being developed or deployed by the Metadata Stores program needs to support
a range of functions for different kinds of objects. The first is the creation and management of these
kinds of information, or their harvesting from other sources (research management systems, human
ANDS 2012-13 Business Plan Page 72 of 118
resources systems, finance systems). In addition, the software needs to manage the relationships
between the information about data collections/objects and the data collections/objects themselves.
The software may need to support queries over the data by users within the institution. Finally, the
software needs to be harvestable by ANDS services, as well as by other organisations. This program
will therefore need to develop, configure and make available this metadata infrastructure at research
producing institutions.
The required functions can be provided in a wide variety of ways, and via different configurations of
software components. In practice, a small number of design patterns are appearing, in part because
of the ways in which ANDS has been funding activity at institutions. The current situation contains
four kinds of extant stores:
Combined Stores: manage both Collections and Object Metadata for a single institution across a range of disciplines.
Collection Stores: manage the information about data collections within an institution; may also accept feeds from enterprise systems (some of which ANDS has funded), and also feed the ANDS Data Collections Registry.
Instrument Stores: tightly coupled to particular instruments or clusters of instruments. A significant number of these have been developed, not with Metadata Stores funding, but with Data Capture funding. These solutions either feed the ANDS Collections Registry directly (the commonest pattern), or via an institutional Collections store (much less common).
For some disciplines, there are well-established international practices for managing data and metadata, as well as associated software. These Discipline Store solutions might be deployed within institutions or at national or international data centres. ANDS might fund some pipes from instances of these to institutional Collection stores.
In addition to these, ANDS sees the potential need in some settings for an institutional Object Store.
This would hold metadata crucial to enable reuse of the data. It would meet the needs of an
institution to manage object-level metadata for researchers whose needs are not met either by a
Discipline Store or Instrument Store. This solution (and the pipes connecting it to data stores and the
institutional Collection Store) is not currently part of the ANDS offering (nor deployed in any
institutions). The final solution would need to manage metadata that is common to a range of
disciplines, as well as specific metadata from particular disciplines.
As well as these different kinds of metadata stores, the data itself needs to be stored somewhere.
This might be a local store (either just associated with an instrument or institutionally supported),
one of the offerings that might be made available through RDSI, or an international discipline-
focussed data store (such as the Protein Data Bank (PDB) or European Molecular Biology
Laboratory/European Bioinformatics Institute (EMBL/EBI)).
Based on our existing engagements with ANDS partners, the most common implementation pattern
at an institutional level is a Collection Store to support Seeding the Commons funded projects,
combined with one or more Instrument Stores associated with Data Capture funded projects. In this
pattern, the Collections descriptions for the Data Capture data are usually fed directly to ANDS rather
than via the Collection Store.
ANDS 2012-13 Business Plan Page 73 of 118
4.5.3 Infrastructure Created To Date
By the commencement of this Business Plan, ANDS will have funded the development of the
following infrastructure components.
ANDS has funded three Combined Stores solutions. The CSIRO Data Management System has
completed its initial development and now meets the needs of astronomy and water data. CSIRO is
extending and progressively generalising it to the entire range of research areas within CSIRO. At this
point it will fully meet the criteria for an institutional combined store. The SQUIRREL system has been
built by Monash University to meet its institutional metadata store needs, managing both object and
collection metadata. This development of this system was co-ordinated with the combined metadata
store activities at the Australian Synchrotron and ANSTO, and was built on the same codebase.
ANDS has also funded two Collection Stores from the Metadata Stores program. The VITRO solution
was developed by Melbourne University (based on the VITRO software and ontology from Cornell)
and was deployed by QUT and Griffith University as the basis for their Research Metadata Store Hub.
The ReDBOX solution is being developed by QCIF and was initially deployed by the University of
Newcastle. Both solutions are being used to support Seeding the Commons and Data Capture funded
projects at those universities.
ANDS is funding nearly seventy Data Capture projects. Many of these are in institutions that have no
suitable local metadata solutions and so approximately 35 are creating their own Instrument Stores
to capture object metadata and provide collections descriptions to ANDS (and of these 35, 25 will be
unique).
Although this seems like a wide range of solutions, they have each been designed to be appropriate
to their particular setting. In addition, they are all interoperable through their use of the RIF-CS
interchange metadata format.
In late 2011, in a variation on the EOI approach used in Data Capture, ANDS offered the uncommitted
Metadata Stores funding to 22 institutions to improve their existing metadata store infrastructure.
Particular features of this program included:
More research intensive institutions were invited to participate, and not all institutions were invited to participate.
All institutions who were offered funding were offered the same amount (on the assumption that the costs to deploy were likely to be roughly equivalent for each institution).
Institutions that had already received Metadata Stores funding received an amount discounted by the previous funding – in one case this meant no funding went to one of the Data Capture institutions.
Funding could be used to enhance existing solutions, deploy new solutions, or provide improved connections to institutional context, to ensure that institutions were not penalised if they already put a store in place, and to enable the most suitable solution for each individual institution.
Institutions were expected to demonstrate a whole-of-institution commitment to their metadata infrastructure.
ANDS 2012-13 Business Plan Page 74 of 118
Institutions needed to install a metadata store solution that was integrated with other data infrastructure and provided at least a feed to RDA.
This program is focussing heavily on redeployment of existing solutions rather than the development
of new ones. Given the emphasis on existing solutions, ANDS is also encouraging the development of
communities of practice to provide ongoing support and share/align development plans. It is hoped
that this provides the maximum return on the investment across the sector.
4.5.4 Agreed 2012-13 Activities
The agreed 2012-13 Activities will primarily be the projects to deploy and configure (or in some cases
extend) metadata stores infrastructure at 22 institutions. These institutions are CSIRO, the University
of Sydney, the University of Melbourne, the University of Queensland, the University of New South
Wales, the Australian National University, the University of Adelaide, the University of Western
Australia, the University of Wollongong, Queensland University of Technology, Macquarie University,
Griffith University, Curtin University of Technology, RMIT University, the University of South
Australia, the University of Newcastle, the University of Technology, Sydney, La Trobe University,
Deakin University, the University of Western Sydney, the University of Tasmania, Flinders University
and James Cook University. All of these institutions have now agreed on a program of work which
has been contracted by the beginning of the planning period.
In addition to these activities, the funded activity through QCIF to support the maintenance of the
ReDBoX solution will complete early in 2012-13.
4.5.5 Process for Other 2012-13 Activities
As the remaining funds allocated to the Metadata Stores program are now fully committed, there
will be no need for any process for allocation of funds. ANDS will be providing support for any
institutions that need it in the form of expertise and effort.
4.5.6 Highlights
The funding offer described above was received extremely enthusiastically; a consistent message
coming from the institutions was that this funding program came at a very opportune time. The last
three years of ANDS’ engagement with the sector has produced a greatly improved level of maturity
of thinking, and institutions are now ready to engage with a whole-of-institution approach in a way
that wasn’t a possibility even a year ago.
This new funding round is being seen as a way of both complementing existing Seeding the
Commons and Data Capture activity, and of supporting institutional transformation in management
of research data outputs. A tangible demonstration of this increased institutional commitment is the
level of in-kind activity being proposed. Across the projects that have been contracted, the mean in-
kind is 43% of the funding provided by ANDS, and the standard deviation is 34%. The largest in-kind
contribution is 115% (RMIT) and the smallest is 0% (UWS).
Another pleasing aspect of this funding was the level of interest in taking up one of the existing ANDS
funded solutions. At least six universities will be installing/extending ReDBoX, four will be using VIVO
ANDS 2012-13 Business Plan Page 75 of 118
(this includes Griffith and QUT, who we funded in 2010-11 and who had already chosen VIVO), and
one is using the ORCA software. The remainder will be making their decisions early on in the life of
the projects, and are actively considering our existing offerings. ANDS part sponsored community
days in February 2012 (for VIVO) and March 2012 (ReDBoX) to bring together institutions adopting
the same software to build communities of practice, share experiences, and collaborate on
extensions.
4.5.7 Challenges
The main challenge for the Metadata Stores program during the previous planning cycle was
determining the best way to support institutional activity in a co-ordinated way. The funding
program that was implemented late in 2011 was the result of a significant amount of work by ANDS
staff and the Steering Committee to identify the best approach.
The main challenge in the current cycle will be the effective completion of all the projects during the
remaining time left to the project. ANDS is well positioned to ensure that this occurs, as long as we
keep a careful track on the work. Additionally, the integration of the projects in this area with ANDS’
overall coherent approach to research data management and infrastructure will also need to be
undertaken carefully.
An ongoing challenge is how best to support ongoing development and deployments of the software
solutions that have been created now that this program has committed all its funds.
4.5.8 End of 2012-13 Outcomes
By the end of 2012-13, the following outcomes will be achieved:
The publicly funded research institutions in Australia who are responsible for nearly 90% of research output (as measured by publications under the Higher Education Research Data Collection (HERDC)) will have in place an institutional metadata store solution for research data outputs.
This solution will be integrated into institutional sources of truth and will be providing feeds to Research Data Australia and other registries as appropriate.
4.5.9 End of 2012-13 Enduring Changes
By the end of 2012-13 ANDS will have produced the following enduring changes:
A national resource of managed, connected, findable and re-usable research data will exist that previously did not.
Data are now visible to researchers that were never visible before.
Data can be captured, managed, and shared more easily at most research organisations in Australia.
Research organisations and public sector agencies now provide policy support and invest more in data management.
ANDS 2012-13 Business Plan Page 76 of 118
Australian research institutions can more easily comply with the requirements of the Code for the Responsible Conduct of Research
There is more cost-effective reuse of research data software in Australia.
Australia has improved ability to measure the quality and impact of all of its research outputs.
Research Data Australia reduces duplication of effort and cost of unnecessary data acquisition by facilitating re-use.
Common approaches, standards, and services have increased efficiency and coherence across the national data infrastructure investment.
Established data management infrastructure now allows for stronger national policy and easier institutional compliance.
Australia is seen to be a leader in research data infrastructure internationally.
4.6 Public Sector Data (EIF)
4.6.1 Program Aims
To develop the infrastructure necessary to ensure the establishment of automated feeds of rich
collection level information from Federal, State and Territory government and non-government
agencies into the ARDC and thus the Research Data Australia.
To engage with government agencies and provide them with funding or assistance to ensure that key
data holdings are identified and the automation of data feeds into the ARDC are established.
4.6.2 Program Overview
Many areas of research are heavily dependent on government data – from cadastral data to
economic data to government-organised surveys – or could increase their use of such data if it were
more widely discoverable and accessible. The responsibilities inherent in data custody are a shared
challenge and include the need to address preservation, access and description. As such there is a
very close potential relationship between ANDS’ concerns and those government agencies that are
custodians of data or that are influential in data policy.
The Public Sector Data program will provide the infrastructure necessary to ensure that feeds of data
collection descriptions are made available from a range of public sector agencies. Identified agencies
include producers of research data, such as the Bureau of Meteorology (BOM), the Australian Bureau
of Statistics (ABS), GeoScience Australia (GA), the Australian Antarctic Division (AAD), CSIRO and
Departments of Primary Industry (DPI). Owners of data gathering activities and collections which
might be possible inputs to other research activities, such as the museum and library sectors, are also
in scope. ANDS also needs to maintain and develop stronger relationships with other organisations
with significant data holdings or interest in these areas such as the National Archives Australia (NAA)
and the Australian Government Information Management Office (AGIMO), for example. Finally,
ANDS will investigate the identification and exposure of collections of national significance held by
public sector agencies.
ANDS 2012-13 Business Plan Page 77 of 118
The key deliverable from this program is to make existing public sector data resources more
discoverable to the research community and to work with federal, state and territory government
agencies to improve access to data. Activities will vary across agencies according to their existing
infrastructure and the types of data being made available. In all cases there will be a strong
preference to have data services exposed using relevant international standards.
The Public Sector Data program was originally allocated a $10m budget in the ARDC Draft Project
Plan. During the review phases for ANDS this budget has been reduced to $6.66m. This was as a
result of discussions with key government agencies in the first quarter of 2010 where they identified
that their desire was for capability assisted infrastructure development from ANDS in preference to
the provision of funding to undertake the infrastructure development themselves.
4.6.3 Infrastructure created to date
ANDS has either entered into contracts, or has substantially agreed on project descriptions, for the
following Public Sector Data projects:
ANDS 2012-13 Business Plan Page 78 of 118
Agency or
Institution or
Project
Project Status and Description
AuScope SISS
Deployment
Deployment of the Spatial Information Services Stack (SISS) into 10
government and state or territory agency data providers including CSIRO,
Geoscience Australia, BoM, ODSM and DPI(Vic) and to release 73 spatial
information data collections into the Australian Research Data Commons.
A relationship has been established with VeRSI as a Victorian node to
provide support to Victorian deployments of SISS and work is underway on
the identification and cpture of datasets from the Bureau of Meteorology.
CSIRO Water
Resources
Observation
Network
Completed. 5 significant collections of data from the Murray Darling Basin
Sustainable Yields projects have been exposed via Research Data Australia.
Eight open source technology tools associated with the project were
released for re-use, including: one to enable the ingest of data and
metadata into the CSIRO Data Management System, a RIF-CS harvesting
tool, a software library that provides programmatic access to the ANDS
Persistent Identifier (PID) Service, NetCDF conversion tools, establishment
of data storage and serving capability and a data and metadata
management infrastructure.
Additionally a web interface to provide data consumer access to data held
within CSIRO was established and will be implemented for access to other
data from the organisation. Work from this and other ANDS projects has
been leveraged into the establishment of an enterprise wide data
management and delivery system and service.
AODN Data from
National Research
Vessels
This project has delivered 900 core data sets captured from research
vessels (Southern Surveyor and Aurora Australia) published via the
Australian Ocean Data Network portal and Research Data Australia (RDA)
with further feeds of other data from the AODN portal under review.
These data collections will expose the data of 6 commonwealth agencies
with primary responsibility for marine data. Subsequent to the completion
of the project a further 9000 datasets have been made available.
ANDS 2012-13 Business Plan Page 79 of 118
Agency or
Institution or
Project
Project Status and Description
Australian Legal
Information Institute
(AustLII)
The original commitment has been met with over 400 databases and
collections from courts tribunals and government agencies exposed via
RDA. These data sets are principally the ‘raw materials’ of most legal
research: legislation of all forms; Court and Tribunal decisions; treaties;
official materials interpreting legislation; and reports proposing law
reform. Exposure to AustLII databases provides the only free access,
comprehensive national view of legal data of this nature. Additionally
exposure has been given to LawCite, a completely automatically-
generated case citation service.
An extension has been granted to December 2011 – to enable the
gathering of a further 50 significant data collections and feeds of this data
are underway.
The AustLII team are also keen to network further, with for example,
NCJRDN, to provide an even more comprehensive view of Australia’s legal
data. They have also obtained a LIEF grant to digitise backsets. This data
will also be contributed to Research Data Australia and the ARDC and
when finalised will provide a comprehensive longitudinal view.
Museums Metadata
Exchange
This project has coordinated metadata from 18 museums across Australia
including both major museums such as Powerhouse, state museums and
the Australian Museum as well as smaller regional museums. The project
will deliver a metadata exchange which will gather metadata from these
museums and automate a feed into RDA. A test ingest of initial records
into RDA test area to facilitate exposure of aggregated museum and
history collections has already taken place. It was anticipated that on
completion 700 collection records would be published. It delivered over
1000 collections. As well as the metadata store collecting metadata from
other museums and the tools to automate the feed to RDA, the project
will establish a vocabulary tool to support data standardisation in the
museum sector.
National Archives of
Australia
Proposal reviewed and value of commitment revised. NAA are re-assessing
their intentions.
Public Records Office
of Victoria
A proposal is being developed to identify collections a feed metadata from
the state based archives agencies as well as the National Archives of
Australia. This is currently under negotiation. If successful it wil provide a
comprehensive view of Australian archives data.
ANDS 2012-13 Business Plan Page 80 of 118
Agency or
Institution or
Project
Project Status and Description
GeoSciences
Australia
This engagement has enabled the automated exposure of data holdings
from the GeoMet and GeoCat data catalogues and has resulted in an initial
release of approximately 700 data collections which was soon followed by
a feed bringing it to 7300 collections. It has enabled the identification of
relevant data to be fed to other discipline portals, for example marine
data. Automation of feeds has leveraged off the SISS deployment at GA.
Work continues on the development of ANZSRC classification to allow
greater manipulation of the data collections.
4.6.4 Committed 2012-13 Activities
The following table represents engagements that are at the initiation stage of the process (a
commitment has been made but the detail has not yet been finalised) for the Public Sector Data
program of work:
Agency or
Institution or
Project
Project Status and Description
Australian Institute
of Health and
Welfare
The original proposal to automate the exposure of key holdings related to
aggregated Australian health and morbidity data which was rescheduled at
agency’s request has now commenced and indications are that over 100
collections descriptions will be added to Research Data Australia by Q3
2012
Australian
Government
Information
Management Office
This is an ongoing strategic engagement to ensure alignment with GOV2.0
deployment taskforce
Australian Bureau of
Statistics
Engagement activities have commenced and work is proceeding on the
identification of data collections and establishment of a feed into Research
Data Australia. Initially the engagement will be targeting Timeseries
Products, which cover high level topics of Population, Key Economic
Indicators, Census data, Consumer Price Index, Labour Force, National
Accounts and Measures of Australia's Progress. Later targets include query
services for Census data and investigation of spatial services which is
under development.
ANDS 2012-13 Business Plan Page 81 of 118
Agency or
Institution or
Project
Project Status and Description
Bureau of
Meteorology
The engagement activities have commenced both directly and via the
AuScope project. Quality of metadata has been reviewed within BoM in
the light of NPEI activities and work will have to be undertaken within the
bureau on this aspect.
AODN Investigation of the addition of state fisheries data will be commenced in
the coming period.
Australian Antarctic
Division
The intention of this engagement is to create an exemplar of collection
presentation and transformation. It will provide data directly from the
institution providing a vehicle for establishing tools to address the
presentation of data from varying contexts. This work will build on the
work done by the AAD to establish the Polar Information Commons. This
engagement was put on hold owing to availability of staff at AAD during
the period, but will be re-commenced in Q3 2012.
Murray Darling Basin
Authority
An engagement has commenced with this agency and a statement of work
is under negotiation. It is anticipated that work to identify and expose
datasets will be undertaken in Q3 and Q4 2012
Atlas of Living
Australia
Work is underway to establish a feed of data collections. It is anticipated
that this work will be completed in Q3 2012 delivering over 400
collections.
AIMS An engagement to explore data not fed to AODN will be initiated.
State agencies The program is planning to engage with state agencies through the state
based eResearch agencies. To this end approaches have already been
made to initiate work in Queensland and NSW. Further interactions will be
commenced later in 2012.
The focus this year will be on providing government sector agencies with either the capability or the
funding to ensure that the requirement to expose public sector data is achieved.
4.6.5 Process for Other 2012-13 Activities
To date the selection of government agencies with whom to engage has been as an opportunistic
response to either identified researcher needs or as a follow through of enquiries initiated from the
government sector itself. In the coming period a review of engagements will be mapped against an
environmental scan. A survey will also be conducted to provide a high level view of data demand and
this will be compared to the gap analysis and inform the prioritisation of approaches to further
engagements. There will, however, continue to be a requirement for flexibility to accommodate
opportunities as they may arise.
ANDS 2012-13 Business Plan Page 82 of 118
For capability-focussed public sector data access the engagement model will continue to concentrate
on responding to the needs of those state, federal and territory government agencies where the
organisation has indicated a desire for skills, knowledge and capability transfer. Implementation
services will be established and offered by ANDS to offer assistance with the identification of key
data holdings, and the implementation of automated data collection information into the ANDS
ARDC by partnering consultants with government personnel for short periods of time with pre-
arranged tangible outcomes. This will ensure that standard ANDS infrastructure requirements be
applied to the data collections and that the agency has an opportunity to develop and build data
management capability, standards and policies which would continue to make more public sector
information available through ANDS well after the consultancy engagement has concluded.
For public sector institutions providing data feeds, the model of engagement will establish and fund
contracted partnerships with representative agencies from federal, state and territory governments
for the provision of agreed infrastructure components that will result in the release of rich collection
level metadata to the ARDC. This work might be conducted by the agency or by partner such as
AustLII for legal data, or AuScope for the Geological Surveys. It is expected that ANDS will fund
partner engagements with those large government agencies that have existing capabilities and skills
required to make their data holdings visible through the ARDC consistent with the selection process
described above.
4.6.6 Highlights
The majority of contracted engagements will be complete by June 2012, with only the commitment
to Public Records Office of Victoria to be resolved. These contracts and engagements to date have
delivered approximately 20,000 data collections of varying sizes from about 30 agencies ranging
across disciplines. Many of these datasets have never been exposed before and interest across
disciplines is starting to emerge. For example representatives from the Museums Metadata Exchange
(MME) project have been invited to speak about the MME at the Institute of Australian Geographers
Annual Conference in July 2012.
In the previous period the Public Sector engagements tested some of the protocols around content
provision as the business of these institutions differs from the university sector. While this has, on
occasion delayed delivery, it has resulted in revised and strengthened processes that will enable
more efficient delivery in subsequent engagements. It has also focussed the activity to seek data of
high value.
Agencies have welcomed an approach from the Public Sector Data Program to offer the services of
skilled ANDS personnel for short durations to provide assistance with the identification and exposure
of data holdings and build organisational data management capability. However they have
experienced some difficulty in resourcing the activity within their agencies.
4.6.7 Challenges
This is an ambitious program of work and engagement with the government sector is often difficult
as not all agencies have internal policies and standards fully developed to support or enable the
ANDS 2012-13 Business Plan Page 83 of 118
automated release of data. Engagements have also discovered compliance issues with data
descriptions that have required effort from agencies at a time when resources are stretched.
Most government agencies do recognize the need to comply with OIC policy and the value of good
data sharing and reuse but are constrained by the state of legacy data and resourcing to address the
issues associated with it. There is an overriding shift towards data management practices throughout
the whole of government at a policy level. This provides the opportunity for PSD to work in partnership
with Capabilities and Frameworks to achieve the change necessary to effective utilize the EIF funding.
ANDS has commenced negotiation with agencies with identified data holdings and has found the
offer of skilled capabilities remains more attractive than the offer of funding. However the issue of
sustainability beyond the engagement remains an issue.
The creation of the National Collections program and initial engagement has revealed the desire for
some key collections from public sector agencies that are of national significance. Work in this period
will focus on leveraging of this information.
4.6.8 End of 2012-13 Outcomes
By the end of the 2012-13 funding period ANDS will have:
commenced partnered engagements with a further 8 government agencies to establish infrastructure to expose data through the ARDC and RDA;
made key collections from over 120 government data collecting agencies visible through the ARDC;
engaged with agencies holding data collections of national significance and worked with them and the National Collections program to make the collections visible through the ARDC and RDA;
supported key NCRIS capabilities in making public collections significant to them visible through both the ARDC and capability specific portals; and
strengthened partnerships with key NCRIS capabilities to provide a coordinated approach to the exposure of highly connected public data.
4.6.9 End of 2012-13 Enduring Changes
By the end of the funding period ANDS will have produced the following enduring changes:
Researchers have better visibility of and access to public sector data.
Data are now visible to researchers that were never visible before.
Data providers are increasingly seeing the value in providing their data through richer shared environments.
Larger questions and grander challenges can be addressed by a data commons that spans facilities, institutions, disciplines and sectors.
New types of research are now possible that previously were not.
Richer data environments facilitate evidence-based policy.
ANDS 2012-13 Business Plan Page 84 of 118
Research organisations and public sector agencies now provide more recurrent investment in and policy support for data management.
Research Data Australia reduces duplication of effort and cost of data acquisition.
Common approaches, standards, and services have increased efficiency and coherence across the national data infrastructure investment.
Established data management infrastructure now allows for stronger national policy and easier institutional compliance.
Australia is seen to be a leader in research data infrastructure internationally.
4.7 ARDC Core Infrastructure (EIF)
4.7.1 Program Aims
To ensure necessary technical and 24x7 operational services are established so that the content in repositories can be findable, re-usable and linkable, thus underpinning the development of the Australian Research Data Commons
To ensure that services develop and evolve to meet changing data reuse requirements.
To catalyse the emergence of core data commons infrastructure operated by government agencies and research organisations
4.7.2 Program Overview
The ARDC Core Infrastructure program provides fundamental services for a cohesive network of data
collections and enables discovery, access and other value-add services across the resulting data
commons. A technical consultancy is also available to assist integrating distributed data commons
infrastructure at research and government instrumentality repositories with core ANDS utilities. The
program has three areas of activity.
ANDS is establishing a range of utility data services at a sector-wide level (such as cross-discipline discovery services, national collections registry, persistent identifier service, etc). As appropriate these utility services either aggregate information nationally or provide component services across several NCRIS domain areas, Super Science facilities, and other research organisations and communities. This ANDS program also improves existing services and establishes new data commons utility services. Robust infrastructure will be established together with a service delivery framework that defines the roles and responsibilities of those providing, supporting and participating in the services.
ANDS is commissioning the establishment of a distributed set of data commons infrastructure typically run and operated by government agencies and research organisations. This information infrastructure allows elements of the research data commons to be connected in an information mesh of interconnected references to the people, organisations, projects, fields of research, and locations related to research data.
ANDS offers specialist technical advice and consultancy services around establishing data federation utility services. This advice and assistance applies to research organisations, facilities, and government agencies establishing their own data commons infrastructure and others seeking to make use of and integrate with ANDS core infrastructure.
ANDS 2012-13 Business Plan Page 85 of 118
4.7.3 Infrastructure Created to Date
National Service Status
National Data Collections Registration Infrastructure:
Systems to allow information about research data collections to be
registered and maintained dynamically through automated harvesting
systems
In production
National Data Collection Discovery Infrastructure:
The Research Data Australia portal that aims to provide a
comprehensive window onto the Australian Research Data Commons
In production
National Data Collection Page Creation Infrastructure:
Allowing web pages to be created to advertise every registered
collection
In production
Dataset Identifier Infrastructure:
Allowing unique, persistent, and internet resolvable identifiers to be
minted for objects in the research data commons
In production
Place Names Infrastructure:
Web-service enabled gazetteer of Australian places established in
partnership with OSDM/ GA
Under development
Researcher Identification Infrastructure:
Infrastructure to enable access to definitive source information and
identifiers for Australian researchers and research organisations
(established with NLA) providing valuable information context to the
ARDC.
In Production
Research Activity Information Infrastructure:
Web service enabled information system publishing definitive source
information and identifiers for all funded Australian research projects
(established with ARC and NHMRC) providing valuable information
context to the ARDC.
In design
Standard Vocabularies Infrastructure:
Infrastructure enabling the web-service publication, management and
creation of standardised vocabularies used in research data laying a
foundation for better linkages between datasets
In Prototype
ARDC Infrastructure Establishment: Ongoing
ANDS 2012-13 Business Plan Page 86 of 118
National Service Status
Setting up best practice procedures for commissioning, establishment
and management of national IT infrastructure commissioned through
the ARDC project.
4.7.4 Agreed 2012-13 Activities
National Service Development Activities
National Data Collections
Registration
Infrastructure
Multi-schema harvest and transformation support (xml storage)
Downloadable software package and support for independent
database layer.
Integration with Location Infrastructure and example code
snippets for partners
Integration with FOR vocabulary and example code snippets for
partners
“Community source” development support (Plug-in component
architecture)
National Data Collection
Discovery Infrastructure
'See Also' linking: DataCite, NLA, ALA.
Enhanced subject browse: “Field of Research”’
Integration with Location Infrastructure
Downloadable software with independent indexing layer.
“Community source” development support (Plug-in component
architecture)
National Data Collection
Description
Infrastructure
Topic pages
Institutional Party pages
Visualise the data commons (contributors map; connections
mapping)
Graphic design update
Integration with Location Infrastructure
Downloadable package
ANDS 2012-13 Business Plan Page 87 of 118
National Service Development Activities
“Community source” development support (Plug-in component
architecture)
Dataset Identifier
Infrastructure
ANDS Cite My Data service integrated with existing ANDS
services: allocation of DOIs to already registered collections and
simultaneous DOI allocation and collection registration.
In collaboration with the International Infrastructure program,
continue to contribute to the design and implementation of
global data identifier and citation system
Place Names
Infrastructure
ANDS is in partnership with responsible government agencies
(GA/OSDM) to establish a robust national infrastructure that will
allow place names to be validated by both individuals and
software systems against an Australian Gazetteer (a directory
that lists names of geographical place and features and includes
spatial co-ordinates)
Phase 2 (marine, boundary boxes, expanded dataset)
Phase three planning (historical names, vernacular names,
crowd sourcing…)
Integrate Gazetteer services with existing ANDS services
Researcher Identification
Infrastructure
NLA party identifier project benefit realisation stage
Trial / adopt new directions: ORCID.
Integrate Researcher Identification Infrastructure with existing
ANDS services
Research Activity
Information
Infrastructure
In collaboration with the Australian Research Council and the
National Health and Medical Research Council create software
components and services that will improve the accessibility and
quality of information about research activities undertaken
within Australia. When completed, the project will enable the
public to discover and track the research undertaken through
ARC and NH&MRC funding and to follow its connections to
research outputs, both nationally and internationally.
Standard Vocabularies
Infrastructure
Build on the existing prototype to enable the creation,
management, and publication of human and machine readable
‘terminologies’ (also known as controlled vocabularies) for use
by the Australian innovations sector.
Maintain an external advisory group (technical working group)
ANDS 2012-13 Business Plan Page 88 of 118
National Service Development Activities
to receive input from government and research stakeholders.
ARDC Infrastructure
Establishment
Provide technical assistance and design support to those
organisations building the distributed infrastructure of the ARDC
Provide specialist technical backup to members of the Australian
research and higher education sector in the uptake of ANDS
infrastructure
Continue the establishment of ANDS infrastructure to support
those services which have been implemented to date
In collaboration with the International Infrastructure program,
continue to work to enable integration of ARDC Core
Infrastructure with international infrastructure networks (such
as the National Science Foundation and JISC)
Facilitate adoption of ANDS software solutions
4.7.5 Process for Other 2012-13 Activities
There are no other national services planned for 2012-13. Some unallocated budget for 2012-13
activities is retained to enable ad hoc opportunities to collaborate with local and international
partners over the course of the year. This is to allow ANDS to respond to any requirements that arise
from partner adoption of the ANDS software products or cloud deployments of ANDS services. There
is also the possibility of new requirements related to aggregation and harmonization efforts with
respect to an emerging international research data infrastructure. If these are to be completed in
the 2012-13 time frame additional resource allocation will be needed within ANDS and in some cases
in coordination with initiatives. These resources may be allocated internally and if external are
unlikely to be allocated through open call. Activity in this area is tentatively scheduled for Semester
1, 2013.
4.7.6 Highlights
ANDS has established significant national services to underpin the Australian Research Data
Commons. Highlights include:
National Data Collections Registration Infrastructure;
National Data Collection Discovery Infrastructure;
National Data Collection Page Creation Infrastructure;
Dataset Identifier Infrastructure;
Place Names Infrastructure; and
Researcher Identification Infrastructure.
ANDS 2012-13 Business Plan Page 89 of 118
In broad terms, the base national infrastructure now exists for researchers and research
organisations to publish, discover and re-use research data with richer contextual information.
Potential now exists for tracking and acknowledgement of re-use, but further infrastructure
establishment and policy development are required into the future.
4.7.7 Challenges
For this program, within the 2012-13 period it will be most challenging to:
Take software custom-made for ANDS needs and transform it to be used by a community of adopters and contributors
Support development partners and applications within ANDS projects and engagements for the ARDC Core Infrastructure
Implement transitional arrangements for post 2013 service provision
4.7.8 End of 2012-13 Outcomes
By the end of 2012-13 ANDS will have provided the following outputs:
provided a re-usable software product for data collection description aggregation and enabled community sourced contribution;
enhanced Research Data Australia including topic and institutional views, graphic design review and visualisation of the data commons;
enhanced the ANDS Collections Registry to allow support of multiple schema;
integrated the DOI service into ANDS workflows and contributed to the international infrastructure network;
commissioned the Location Infrastructure;
commissioned the Research Grant Project Information Infrastructure;
taken the terminology support service from prototype to production service;
enabled additional universities and organisations to take advantage of ANDS utility infrastructure and assisted them with local implementations of data utility services; and
facilitated a community of adopters and adapters of ANDS software solutions.
4.7.9 End of 2012-13 Enduring Changes
By the end of the funding period ANDS will have produced the following enduring changes:
A national resource of managed, connected, findable and re-usable research data now exists that previously did not.
Data are now visible to researchers that were never visible before.
Researchers can more easily find and gain access to public sector data.
Australian researchers can publish and cite their research data outputs.
Larger questions and grander challenges can be addressed by a data commons that spans facilities, institutions, disciplines and sectors.
ANDS 2012-13 Business Plan Page 90 of 118
New types of research are now possible that previously were not.
Richer data environments can facilitate evidence-based policy.
There is more cost-effective reuse of research data software in Australia.
Australia has improved ability to measure the quality and impact of all of its research outputs.
Research Data Australia reduces duplication of effort and cost of unnecessary data acquisition by facilitating re-use.
Common approaches, standards, and services have increased efficiency and coherence across the national data infrastructure investment.
Established data management infrastructure now allows for stronger national policy and easier institutional compliance.
Australia is seen to be a leader in research data infrastructure internationally.
Australian researchers can more easily engage with international partners.
Strong international technical partnership enables Australian infrastructure to leverage international research infrastructure networks.
4.8 ARDC Applications (EIF)
4.8.1 Program Aims
To develop a range of compelling demonstrations of the overall value of the ARDC by bringing
together a range of data sources combined with new integration and synthesis tools to enable new
research or generate new policy outcomes.
4.8.2 Program Overview
Applications is intended to leverage the outputs from the other ANDS programs, which have been
designed to:
provide underpinning infrastructure to support discovery and citation (ARDC Core, in collaboration with International Infrastructure);
enable rich metadata about data to be managed and accessible (Metadata Stores);
make new data and associated metadata available from a range of instruments (Data Capture);
make a selection of existing data and associated metadata available from the bulk of Australia’s research-producing universities (Seeding the Commons);
make data and associated metadata available from government departments (Public Sector Data);
work with RDSI to ensure a rich landscape of well-managed and reusable national data collections (National Collections); and
provide the overall policy and practice frameworks to support better data management and re-use (Frameworks and Capabilities).
ANDS 2012-13 Business Plan Page 91 of 118
ANDS has been funded to bring about an Australian Research Data Commons. This has required a set
of co-ordinated programs of activity that are described elsewhere in this Business Plan. The resulting
infrastructure supports data discovery and access. Once accessed the data can be re-used as is, but
bringing together different data sets can enable new kinds of research. Before this can occur the data
may need to be transformed or recoded. Once combined special analysis techniques may be needed
to provide the right starting point for further research. There are many possibilities here across a
whole range of research problems, and so ANDS is selecting a subset to demonstrate what is
possible.
The goal of the Applications program is to produce compelling demonstrations of the value of having
data available for re-use. These demonstrations of value should:
result in data being transformed or integrated across multiple sources to produce new forms of information that enable innovative, high-quality research outcomes;
deliver value to a high-profile research champion;
be relevant to a range of government portfolios; and
engage with national research capabilities.
Consistent with these criteria for demonstrations of value, ANDS has funded a portfolio of around 25 projects across two broad thematic areas: Climate Change Adaptation, and Characterisation of Biological Systems. There is also a collection of other projects to provide balance and demonstrate what is possible across a range of discipline areas. These projects have been carefully balanced across major research institutions, States/Territories, NCRIS Capabilities and national research priorities.
4.8.3 Infrastructure Created To Date
All of the projects have commenced, but none have yet completed.
4.8.4 Agreed 2012-13 Activities
The following Climate Change Adaptation projects have been funded:
Tropical Data Hub;
Tropical Data Hub - Tools Development;
Species Distribution Records;
Climate Change Impacts Downscaling tool;
Marine Video;
Health Impacts of Climate Change;
Climate Change Adaptation Information Hub; and
Impacts on Ports Infrastructure.
The following Characterisation of Biological Systems projects have been funded:
BPA X-omics tool;
Data-Models-Papers interconnections;
ANDS 2012-13 Business Plan Page 92 of 118
Satellites to Soils;
Mouse Brain Map;
Multiscale Kidney Imaging
Biology NeCTAR Project;
ALA Bird Distribution; and
Human Chromosome 7 proteomics.
The following other projects have been funded:
Public Open Space modelling;
SMART Infrastructure dashboard;
Founders and Survivors;
CODCD/BMRI demonstrator;
Marine NeCTAR Project;
TERN/e-MAST Project;
TERN/ACEAS Project;
Urban Climate Modelling; and
Catchment to Coast.
4.8.5 Process for other 2012-13 Activities
An amount of $400K is being retained for targeted expenditure late in the period covered by this
Business Plan. The process for allocating these funds will be determined by the ANDS Directors and
approved by the Steering Committee.
4.8.6 Highlights
The main highlight of the Applications program to date has been the appetite within the research
sector for this kind of infrastructure activity. The projects that have been funded will deliver a strong
demonstration of the value of bringing data together. Another highlight has been the way in which a
number of the projects in the Climate Change Adaptation space have strong connections (both
personal and technological) with each other. In the case of this theme, the whole will be bigger than
the sum of its parts.
4.8.7 Challenges
One challenge for the Applications program has been the timelines to commence each project. The process from identifying a candidate through a series of discussions to describe the engagement through to contracting has taken up to 12 months in some cases. ANDS has become better at terminating discussions that are not progressing (or where the project is likely to be problematic) and this has helped.
ANDS 2012-13 Business Plan Page 93 of 118
Another challenge has been constructing the balanced portfolio of activities. Early on in the process this was easier as the portfolio had more degrees of freedom. As the process commenced, the criteria that each new project needed to satisfy became more restrictive and demanding. Despite this, we believe the resulting portfolio provides excellent balance.
The last challenge has been the delivery of the project outcomes. While all the projects will be contracted before the start of this Business Plan, some may take until the start of Q3 2012 to commence. Because ANDS wants to see the demonstrations of value as soon as possible, these later projects will either be required to complete in 9 months or less, or will be required to deliver their demonstrations well before they complete. In particular, ANDS is proposing the end of Q1 2013 as an absolute drop-dead date for the demonstrations of value.
4.8.8 End of 2012-13 Outcomes
By the end of the 2012-13 funding period ANDS funding will have delivered 25 innovative and
compelling projects that enable new kinds of research, deliver the potential for evidence-based
policy outcomes, and demonstrate the value of the Australian Research Data Commons.
4.8.9 End of 2012-13 Enduring Changes
As a result of the ANDS investment in the Applications program, the following enduring changes will
have occurred:
Data providers are increasingly seeing the value in providing their data through richer shared environments.
A cohort of leading Australian researchers has demonstrated the value of a richer data environment.
Larger questions and grander challenges can be addressed by a data commons that spans facilities, institutions, disciplines and sectors.
New types of research are now possible that previously were not.
Richer data environments can facilitate evidence-based policy.
Australia is seen to be a leader in research data infrastructure internationally.
4.9 International Infrastructure
4.9.1 Program Aims
To work collaboratively with international organisations and partners to establish an effective
international data-sharing environment for Australian researchers.
4.9.2 Program Overview
ANDS has been established to provide services to, and build infrastructure with, the Australian
research and innovation sector. As described elsewhere in this document, ANDS is partnering with
Australian research organisations, government agencies, and cultural organisations to build the
Australian research data commons so as to provide a research advantage to Australian researchers.
ANDS 2012-13 Business Plan Page 94 of 118
Despite this essentially Australian focus for the implementation and outcomes of ANDS, there are
still strong motivations for ANDS to engage internationally. Research itself is an international activity.
Research organisations in Australia are in a global business and are involved in international
collaboration and competition. Research infrastructure is also increasingly international.
ANDS will therefore pursue international opportunities in order to:
1. join mature international infrastructure initiatives to provide better infrastructure and services at home;
2. be exposed to alternative national, regional, and domain specific infrastructure implementation approaches to inform the ANDS approach;
3. collaborate on and explore directions in international infrastructure initiatives with peer organisations to maintain our position of international prominence by pulling our weight; and
4. influence developing international infrastructure initiatives to support the ANDS mission by adopting technologies and approaches compatible with those adopted by ANDS.
The intention of ANDS’ international engagements is to further its mission and take a leading place in
emerging international infrastructure initiatives. The value of this international engagement on
research data infrastructure includes:
ensuring that Australian researchers use infrastructure that makes international engagement simpler;
ensuring Australian researchers work with data that can be easily shared;
ensuring that Australian researchers have an advantage in data cooperation;
increasing the value of Australian research data investments by future proofing them;
decreasing the cost of Australian research data investments by sharing the costs with international partners; and
demonstrating to stakeholders the value of the Australian approach to research data
infrastructure by establishing international pre‐eminence.
Key goals of ANDS in all of this activity continue to be emphasizing the importance of richly described
and connected research data collections, and focusing on the significance of research institutions in
managing research data.
4.9.3 Infrastructure Created To Date
ANDS has been particularly engaged to date with the UK and the Netherlands. In the UK it has
engaged with the JISC Managing Research Data (JISC MRD) program and the Digital Curation Centre
(DCC). ANDS and the DCC have an active program of co-operation and have already produced a co-
branded guide to assessment of digital resources.
In the Netherlands ANDS has engaged with Data Access and Networking Services (DANS). DANS was
originally focussed on social science data, but has recently broadened its remit to include humanities
data as well as some of the sciences. The engagement has focused on the NARCIS.nl (National
ANDS 2012-13 Business Plan Page 95 of 118
Academic Research and Collaborations Information System) portal which provides access to
hundreds of thousands of scientific datasets, e-publications, e-theses and research projects in the
Netherlands. ANDS has also engaged with SURF, a collaborative IT organisation for higher education
institutions and research institutes.
In the US, other than data activities embedded in funding for specific disciplines, the major national
commitment is the DataNet program from NSF (National Science Foundation). ANDS has engaged
through representation on the DataNet evaluation panel. ANDS is a foundation member of the
DataCite consortium, acting as the Australian registration authority in order to deliver an
international research data identification service for Australian researchers (more details on the Cite
my Data Service can be found in section 4.7 ARDC Core Infrastructure (EIF)).
4.9.4 Agreed 2012-13 Activities
ANDS will actively participate with counterparts to establish global research data infrastructure. This
work will enable Australian researchers to be the beneficiary of this infrastructure and will also
enable the distinctive elements of the ANDS approach to research infrastructure, most particularly
the generalist collection approach and the strong institutional role, to influence the form of the
global approach. Over the course of 2012-13, this will be carried out in the following ways:
ANDS has been designated as the Australian Non-Government Sector organisation (NGS) for the DataWeb Forum initiative; this is being led by the NSF and NIST in the US and involves the EC and Canada as partners. The DWF seeks to combine the bottom-up strengths of the Internet Engineering Task Force approach with a community-based governance model to work on data interoperability issues.
ANDS is a partner in the EU FP7-funded ODIN (ORCID and DataCite Interoperability Network) project. This is working on connections between data, authors and publications. Other partners include DataCite, arXiv at Cornell, the British Library, CERN, Dryad at Duke University and ORCID Europe.
ANDS will run the data plenary at the Second EU-AU Research Infrastructure workshop in Brussels in July, as well as a collocated one-day data infrastructure workshop to develop plans for concrete data bridges in the Marine and Linguistic domains.
ANDS will continue to work directly with national counterparts in the UK, Netherlands, Finland, Germany and Denmark.
ANDS may also further develop its relationships with two NSF DataNet projects: the Data Conservancy and DataONE.
4.9.5 Process for other 2012-13 Activities
Not applicable.
4.9.6 Highlights
The main highlight was being designated as the Australian NGS for the DWF. This initiative looks to
have the potential to bring about significant improvements in co-ordination of data interoperability.
ANDS 2012-13 Business Plan Page 96 of 118
4.9.7 Challenges
The main challenge is the tyranny of distance and time zone. It simply isn’t possible to have the same
quality of interaction with international colleagues over voice or video linkages as face to face. This
means that this area of ANDS will continue to require overseas travel, although this will be
undertaken only when required and for significant events.
4.9.8 End of 2012-13 Outcomes
By the end of 2012-13, ANDS will have:
contributed significantly to the successful establishment of the DWF
delivered concrete outcomes from the one-day data infrastructure workshop collocated with the Second EU-AU Research Infrastructure workshop
4.9.9 End of 2012-13 Enduring Changes
By the end of 2012-13, the DWF will be in existence, co-ordinating a range of working groups to
deliver data and data infrastructure interoperability.
4.10 Overall 2012-13 Outcomes ANDS has been operating since January 2009. In that time ANDS has built a consensus on the
importance of research data and research data infrastructure. In 2010-11, ANDS created a number of
national research data services and engaged with a large number of organisations to start the
realisation of the Australian Research Data Commons. In 2011-12, ANDS expanded and deepened
that institutional engagement, as well as added to the range of available services.
As a result, substantial infrastructure has already been created. By July 2012, ANDS will have in place
the following national services:
a data collections registration service;
a data collection description publication service;
a national gazetteer service;
a data citation service;
a researcher identification service;
a research project identification service; and
an enhanced Research Data Australia – a data collections discovery service.
By July 2012 there will be coherent institutional research data infrastructure:
48 tools will have been deployed to automatically capture rich metadata along with the data for a wide range of instruments.
6 institutions will operate metadata stores.
40 institutions will provide collections descriptions feeds to ANDS, both Research Institutions and Public Sector data holders.
ANDS 2012-13 Business Plan Page 97 of 118
35,000 collections will be available for discovery through Research Data Australia.
7 discipline oriented portals will be cross connected to Research Data Australia.
By July 2012 the following tools, frameworks and capability will be in place to exploit the ARDC:
improved institutional guidance for internal institutional data management;
improved institutional guidance for responding to national instruments such as the Australian Code for the Responsible Conduct of Research will be available;
institution wide research data management planning frameworks at 5 research institutions and 33 institutions with improved research data management;
increased institutional capability for research data management with 150 more staff trained with research data management capability; and
10 new tools that enable more effective re-use of research data.
In 2012-13, ANDS will build on this infrastructure that has been created. By July 2013, ANDS will have
in place additional and enhanced national services:
a “see-also” service that enables other discovery tolls to use the ANDS discovery service;
a more tightly integrated data citation service;
a research activity identification service; and
an enhanced Research Data Australia.
By July 2013 there will be further coherent institutional research data infrastructure in place:
70 tools will have been deployed to automatically capture rich metadata along with the data for a wide range of instruments.
22 institutions will operate metadata stores.
60 institutions will provide collections descriptions feeds to ANDS, both Research Institutions and Public Sector data holders.
At least 20 institutions responsible for nationally significant collections will have been supported in making those collections connected, visible and available, often through an RDSI node.
At least 40,000 collections will be available for discovery through Research Data Australia.
10 discipline oriented portals will be cross connected to Research Data Australia.
further institutional guidance for internal institutional data management;
institution wide research data management planning frameworks at 20 research institutions and all institutions;
ANDS partners will have improved research data management;
increased institutional capability for research data management with 160 more staff trained with research data management capability;
40 new tools that enable more effective re-use of research data; and
this capability will be demonstrated by 20 high profile researchers showing the value of this infrastructure.
ANDS 2012-13 Business Plan Page 98 of 118
We will continue to engage with the National Infrastructure Capabilities in a variety of ways – direct
engagement on collections descriptions, common engagement with Government data providers,
support for research data licencing, and publishing of research data collections held by the
capabilities, especially by establishing collection descriptions feeds, thus providing another access
point to the collections and capability portals.
As a result of the individual programs, the following integrated and aggregated list of outcomes will
be realised for the life of the ANDS Project.
Better Data
A national resource of managed, connected, findable and re-usable research data now exists that previously did not.
More data are being automatically captured from a wide range of instruments, ranging from small sensors to major national facilities.
Data providers are increasingly seeing the value in providing their data through richer shared environments.
Data are now visible to researchers that were never visible before.
Researchers can more easily find and gain access to public sector data.
Better Research
A cohort of leading Australian researchers has demonstrated the value of a richer data environment.
Australian researchers can publish and cite their research data outputs.
Larger questions and grander challenges can be addressed by a data commons that spans facilities, institutions, disciplines and sectors.
New types of research are now possible that previously were not.
Richer data environments can facilitate evidence-based policy.
Better Institutional Capacity
Data can be captured, managed, and shared more easily at most research organisations in Australia.
Research organisations and public sector agencies now provide policy support and invest more in data management.
Australian research institutions can more easily comply with the requirements of the Code for the Responsible Conduct of Research.
There is an established community of data management professionals.
Australian researchers and research organisations have better capability to operate and exploit the new research data infrastructure.
ANDS 2012-13 Business Plan Page 99 of 118
Better Investment
There is more cost-effective reuse of research data software in Australia.
Australia has improved ability to measure the quality and impact of all of its research outputs.
Research Data Australia reduces duplication of effort and cost of unnecessary data acquisition by facilitating re-use.
Common approaches, standards, and services have increased efficiency and coherence across the national data infrastructure investment.
Established data management infrastructure now allows for stronger national policy and easier institutional compliance.
Better International Engagement
Australia is seen to be a leader in research data infrastructure internationally.
Australian researchers can more easily engage with international partners.
As a consequence of these individual outcomes there is an over-arching outcome; Australian
researchers now have access to infrastructure that enable them to:
systematically, reliably and authoritatively connect their research data to project, institutional and disciplinary descriptions; and
simultaneously publish citable research data collections through institutional, disciplinary and national services.
This will ensure that Australia has a mature, globally leading capability in research data, making it
a key locus for data intensive research. This capability will be demonstrated by leading researchers
in a variety of disciplines to show the power of this infrastructure to enhance their research.
It will be possible to do this as a result of ANDS and its partners developing a wide-ranging set of
coherent software, policy and process outputs that support the Australian Research Data Commons
(ARDC).
5 Program Engagement Strategies ANDS succeeds by building strong partnerships that enable the changes it seeks. Generally its
partners will have needs that require a response from a number of ANDS programs, and its partners
should never need to navigate ANDS’ internal structures. Moreover the programs have strong
interdependencies. For a university to respond to the Australian Code for the Responsible Conduct of
Research effectively there is a particularly strong need for efforts from both the Capabilities program
and the Frameworks program. Consequently from an external point of view partners should be
engaged with ANDS as a whole, not a specific program within ANDS.
ANDS has developed a customer relationship approach that enables it to have a single point of
contact for a given level in the organisation – this might mean that for a given university one of the
ANDS 2012-13 Business Plan Page 100 of 118
directors oversees the relationship, and one of the ANDS team is the point of contact for the
university. In this way the university never needs to work out whom to talk to in order to discuss the
challenges of data publication; rather the ANDS client liaison officer will ensure the appropriate
conversations take place.
Engagements will also be quite varied in nature. ANDS is engaging with partners to directly change
research data practice at universities, NCRIS Capabilities, Publically Funded Research Organisations,
government departments conducting research, and other locations of publicly funded research.
ANDS is engaging with organisations that have a direct influence on the Australian research system –
data providers and holders including government departments such as the ABS, NAA, GA, Cultural
Collections organisations, policy and funding bodies, such as the ARC, NHMRC, The Committee of
Australian University Librarians (CAUL), The Council of Australian University Directors of Information
Technology (CAUDIT), AVCC, AGIMO, etc. ANDS sees AeRIC, NECTAR, RDSI, and NCI as important
partners in delivering the vision of the Platforms for Collaboration. Finally and most importantly
ANDS is engaged with government through DIISRTE.
ANDS’ forms of engagement will be equally varied – in some cases ANDS will have a staff member
work alongside a staff member in the partner organisation so that together they can institute good
data practises within that organisation in a nationally consistent manner. ANDS will do this for
example with TERN and IMOS. In some instances ANDS will have several staff work intensively but for
a short period of time with a partner such as it has done with the University of Newcastle. In some
instances ANDS might simply build a feed to a data repository to capture collection information that
is already locally held.
ANDS intends to support local engagement as much as is possible, consistent with the view that
ANDS is seeking cultural change, not just technical change, so personal engagement and
relationships are important. Staff have been appointed in most states working on the ANDS
relationships based in that state. These state based staff are generally expected to be located at the
state based eResearch organisations with local line management, and ANDS project management.
6 Promotion ANDS is a complex, innovative research infrastructure program, the continued success of which
requires an increased awareness of the function of ANDS amongst its partners and the recipients of
its services. ANDS has reached an unprecedented level of activity and delivery of outcomes and thus
the need to promote ANDS – its successes, its value and the impact on research – is paramount.
ANDS will make ‘taxpayer impact’ a particular focus during this period in order to communicate the
return on the government’s investment to a wider audience. To provide a focus for promotion in this
period ANDS has developed an external communication plan, which details the key messages, key
audiences, communication channels and promotional strategies.
Another primary focus during the 2012-13 period will be to work with DIISRTE and other
infrastructure partners on a co-ordinated communications approach to eResearch.
Key promotional activities and communication channels planned for the 2012-13 period include, but
are not restricted to:
ANDS 2012-13 Business Plan Page 101 of 118
ANDS Website
Following a restructure of the site to ensure it aligns with ANDS’ overall Communication
goals, ands.org.au will be used to build communities through features such as a blog,
webinars and links to communities of practice.
The website, through continuous content refinement will also act as a comprehensive,
focused tool for ANDS’ partners and reinforce ANDS’ reliability and credibility.
Collaboration with infrastructure partners (National and State-based)
Harnessing promotional activities such as road shows, workshops, media relations
Media Relations
Demonstrate the value of the Federal Government’s investment through taxpayer
impact stories focusing on specific outcomes that have been achieved through the reuse
of data which was made available through the research infrastructure ANDS developed
Events
Launches of major impact eResearch initiatives such as the Tropical Data Hub and
Climate Change Adaptation projects.
Collaborative events with other capabilities, in co-ordination with DIISRTE’s
communication strategy.
In collaboration with the International Infrastructure program, continued international engagement with a view to maintaining Australia’s position as a global leader in data management
Meetings, working groups, conferences, presentations
Social media
ANDS has recently opened an official twitter account (@andsdata). This communication
channel is an effective channel for ANDS to communicate with researchers and the
research community, both within Australia and internationally.
Other social media platforms are under evaluation.
A key element of the ANDS external communication plan is ongoing communications with the
community through various engagements. This approach utilises the expertise of three distinct areas
within ANDS to ensure that our key messages are being delivered to our key audiences through
correct communication channels, utilising the extensive expertise within ANDS:
Directors (including the Executive Director)—This group is the strategic ‘face’ of ANDS. They liaise with research institutions and organisations at a high level, both in Australia and Internationally.
Client Liaison Officers (CLOs) and the Capabilities Team—This group plays an operational role in communications. They communicate with practitioners and researchers within institutions; they provide advice, guidance and community building through training, knowledge sharing and community building activities.
Communications Officer—based within a larger operations team, this role promotes ANDS to a broader audience, and specifically promotes the value of ANDS (i.e. the government’s
ANDS 2012-13 Business Plan Page 102 of 118
investment) and the benefit to taxpayers. This role is also responsible for working with other infrastructure partners on the co-ordinated communications approach to eResearch mentioned above.
7 Access and Pricing The mechanisms for deciding access and pricing will be consistent across the ANDS services.
Generally speaking, ANDS will provide services for research purposes and aims to ensure the
legitimate research use of those services will be free and access to the services open.
Software developed under the programs will be released as Open Source code, with the choice of
licence and licensing conditions varying on a case-by-case basis. Documents produced or funded by
ANDS will be made available as public documents, on a no warranty, royalty free basis using the CC-
BY license. ANDS will maintain a register of software that is produced through its funded projects.
However content access and charging regimes belong in the hands of content providers, so that the
access and pricing issue in ANDS relates to the rules under which content may be provided into the
ARDC and therefore supported by ANDS utilities and other support activities.
ANDS services will be restricted to users who are non-commercial, and engaged in research, with the
exception of Research Data Australia, which will be searchable by any interested party. Research
Data Australia will not be used for the advertisement of paid services. Equally, the Identify my Data
service will only be accessible for non-commercial use.
ANDS 2012-13 Business Plan Page 103 of 118
8 Governance, Management and Implementation The Governance and Management arrangements for ANDS are described in the contracts for the
NCRIS project and the EIF project, as well as in a separate Collaboration Agreement. These
arrangements have been deliberately designed to ensure that the governance is as open as possible,
consistent with the acceptance and management of risk by the lead agency. The Governance
arrangements were established for the NCRIS contract, but DIISRTE, Monash University, the lead
agent, and the Steering Committee have agreed to use the same approach to governance and
management for the EIF ARDC contract as well.
8.1 Governance Framework Monash University has entered into an agreement with DIISRTE to implement the Projects, receive
NCRIS and EIF Funds and be accountable to DIISRTE for execution and performance of both Projects.
Monash University has established a Collaboration agreement with the Australian National University
and CSIRO as partners in the projects.
Monash hosts and operates one of the ANDS Offices, which will be used to manage the Project. ANU
hosts the other office that houses both ANU and CSIRO staff.
Monash appointed the independent Chair of the Steering Committee after consultation with DIISRTE
and the ANDS partners and formally includes the independent Chair in the performance
management arrangements of the Executive Director of ANDS. The Executive Director of ANDS is Dr
Ross Wilkinson.
8.1.1 Steering Committee
The current ANDS Steering Committee comprises a minimum of four (4) and a maximum of eight (8)
voting members, including;
(a) an independent chair appointed by Monash;
(b) one representative appointed by each of the ANDS Members; and
(c) such additional persons as the ANDS Steering Committee may agree, such as data provider, data policy and other specialist representatives.
DIISRTE has nominated a non-voting observer, Anne-Marie Lansdown or her nominee.
The processes of the ANDS Steering Committee will be as transparent as possible.
As at March 2010 the current ANDS Steering Committee Members are:
Independent Chair: Dr Ron Sandland;
Ms Cathrine Harboe-Ree (Monash University);
Mr David Toll (CSIRO);
Prof Robin Stanton (The Australian National University);
ANDS 2012-13 Business Plan Page 104 of 118
Prof Mark Ragan (University of Queensland);
Mr Paul Sherlock (University of South Australia);
Dr Siu Ming Tam (Australian Bureau of Statistics);
Prof Craig Johnson (University of Tasmania); and
Executive Director (ex-officio): Dr Ross Wilkinson (Australian National Data Service).
It is anticipated that the ANDS Steering Committee membership can be expanded over time to
incorporate any additional requirements of the Project.
8.1.2 Management structure
ANDS is currently managed by a full time executive staff comprising an Executive Director (located at
Monash), and four Directors (2 Monash directors, an ANU director and a CSIRO director) as currently
agreed under the ANDS Collaboration Agreement.
Directors report to the Executive Director with regard to ANDS activities and to a nominated person
in the host institution for administrative purposes (the Supervisor). The Supervisor is normally the
host institution’s representative on the Steering Committee.
Directors normally have a high degree of autonomy within their areas of responsibility but work
under the leadership of the Executive Director.
If there is disagreement or conflict between the Executive Director and a Director the matter is
discussed with the Supervisor in the first instance, after which it can be escalated to the Chair of the
Steering Committee and, if necessary, the Steering Committee.
Any alterations to this arrangement will be as a result of, and documented in, a revised Collaboration
Agreement that takes account of this Project.
ANDS staff work collaboratively with each other and support activities across ANDS. Some will be
located at ANDS Member institutions and others out ‘in the field’. These field locations may include
members of the Australian Research Collaboration Services consortium, a Division of CSIRO or major
data federating institutions.
ANDS staff within or appointed by an ANDS Member institution report to the relevant Director, or as
otherwise negotiated for staff located in other institutions. These staff are appointed in consultation
with the Executive Director.
If necessary, the Executive Director can direct, through the Directors, or other supervisory
arrangements applicable at other institutions, the work of ANDS staff located in any institution.
The ANDS central office at Monash provides administrative support to ANDS and its staff, including
communications, branding, and website maintenance.
ANDS 2012-13 Business Plan Page 105 of 118
8.2 Risk Management ANDS maintains a Risk Register. The risk assessment methodology, adapted from the Australian Risk
Management Standard AS/NZS 4360:2004, involves identifying and analysing each risk in terms of
how likely it is to happen (Likelihood) and the possible impacts (Consequence).
The key risks for ANDS in executing the Projects and the risk management strategies to be employed
can be grouped into four major categories.
8.2.1 Political and Governance
Risk 1 – That there are persistent negative perceptions of the Project among funding agencies and influential groups leading to a lack of buy-in
Risk Factors:
A particular project does not have the confidence of a subsection of a community.
Lack of confidence in governance, management, or Project delivery.
Perceptions of slow engagement with areas of the sector.
Change of emphasis with regard to the policies around publicly funded research data.
Lack of certainty of the funding of the function of ANDS.
International engagement is halted as a result of the closure of ANDS.
Risk Mitigations:
The communications plans have been updated to ensure that the specific research communities have input into specific projects and their outcomes before, during and after the projects are undertaken.
Diagnostic strategies have been implemented and run to mitigate against failure.
Use a central point where progress of the ARDC is being tracked by metrics such as number of collections available, and numbers of datasets accessed, and the status of every project is tracked.
Clearly articulate the Project's message and brand.
Engage actively with communities to avoid perception (or reality) of not meeting its needs.
Ensure that the Project reflects the Government’s expectations through constant dialogue.
Maintain close contact with key DIISRTE officers to ensure they provide input to decision making, including having an observer on the Steering Committee.
ANDS communicates the message about the longer term vision of the function of ANDS in the sector.
Working with funding agencies on future plans for investment in the function of ANDS.
Risk 2 – That the Project is not managed effectively
Risk Factors:
ANDS 2012-13 Business Plan Page 106 of 118
Lack of effective mechanisms for planning, leadership and management.
The structure of ANDS has a negative impact on coordinated delivery of required activities.
Collaboration between the Project and across locations is not effective.
EIF funding guidelines do not allow for sufficient Project staff to administer funded programs of work.
State based staff have mixed allegiances.
Projects start too late.
Risk Mitigations:
Management and planning processes have been put in place that include formal reporting and regular reviews to ensure the efficient conduct of the Project.
Regular meetings of Project staff are held to build a team approach. Communication structures in place to facilitate working together.
Staffing levels are monitored and adjusted as required.
Contracts and partnerships with state based organisations that host Project staff have been put in place that ensure that staff are clear about their role. Ensure that ANDS-funded staff based in organisations who are ANDS sub-contractors are not placed in a position of conflict of interest.
Ensure all projects commence by June 2012.
Ensure all late starting projects are closely managed.
Risk 3: That the increased emphasis on external contracted engagements represents too big a burden on the lead agent
Risk Factors:
University processes, focussed on student and supplier engagement, are not a good fit for “funding agency” activities. ANDS’ role as a “funding agency” in many of its programs has imposed additional requirements on the lead agent causing pressure on its staff to assist ANDS.
ANDS EOI approach generates clusters of work with tight timelines that impact on specific university functions such as the Solicitors’ Office and Finance.
Risk Mitigations:
Approval has been obtained for stream-lined approaches at Monash University to enable ANDS to work more effectively.
Fund additional staff or specific work at Monash University to enable ANDS to work more effectively.
8.2.2 Relationships
Risk 4 – That the Project's external stakeholders are not effectively engaged
Risk Factors:
ANDS 2012-13 Business Plan Page 107 of 118
Stakeholders are not prepared to undertake the changes within their own organisations that are necessary for the realisation of the ARDC.
Stakeholders do not see their interests in data management and those of the Project as being aligned.
ARDC Applications program – Stakeholders might feel that the wrong decisions have been made in determining which projects to fund.
Substantial new funding for other eResearch activities and reduction in new funding for ANDS projects.
Risk Mitigations:
Maximise the effectiveness of connections between the Project and related PfC and other initiatives, including involvement of groups outside ANDS in the ANDS Policy Forum, the ANDS Technical Forum, and the ANDS Content Forum.
Ensure that ANDS’ engagement with stakeholders meet their research data ambitions as well as ANDS’ requirements.
Ensure ongoing, strong engagement with the Research Sector, including research infrastructure capabilities.
All activity plans were developed after consultation with relevant stakeholders.
Membership of the Steering Committee includes key stakeholders.
Performance measurement for the Project should include effective stakeholder engagement.
Effective communication of benefits to stakeholders.
Provide a clear rationale behind the decision process for project funding.
Communications activities have been increased to create awareness of the value of ANDS' activities.
ANDS effort has been increased in creating partnerships as compared to contracting.
Risk 5 – That the Project's partners do not appropriately contribute to the Project
Risk Factors:
Partner produces outcomes of low quality or does not meet the requirements of the contract.
Partner expends funds in a way that is not consistent with the EIF guidelines.
Lack of effective arrangements in place to ensure the contracted services are provided to an agreed service level.
Service providers see themselves as disconnected from the Project's decision-making or strategic planning.
Risk Mitigations:
Formal procurement processes have been implemented to ensure that the requirements are understood and that potential suppliers meet the set criteria.
Provide ongoing contract management to ensure the delivery of required outcomes to the contracted service levels.
ANDS 2012-13 Business Plan Page 108 of 118
Effective vendor and partner engagement approaches have been put in place.
Risk 6: That ANDS is not perceived as a long-term partner and hence the services are not taken up
Risk Factors:
The impending end of ANDS NCRIS and EIF funding causes a perception that ANDS initiated services will not continue.
Risk Mitigations:
ANDS gained approval to expend existing funding over longer timelines (consistent with other Super Science funded activities).
ANDS creates reliable sustainable services that are offered over the longer term by other long term service providers.
Strong contribution to DIISRTE Roadmap processes will be a mitigating factor.
Risk 7: That there is confusion about role of ANDS versus other related service providers in e-Research sector which impedes effective service delivery
Risk Factors:
ANDS and PfC partners’ offerings are confused by possible users.
Relationship between ANDS and state-based eResearch providers (such as Intersect) is not clear to users.
Risk Mitigations:
Ensure that ANDS’ communications to a range of stakeholders provide greater clarity about ANDS services.
Ensure that ANDS’ offerings are clearly targeted and that this is clearly stated.
Seek greater clarity from other eResearch service providers about their offerings, avoiding either actual or perceived overlap with ANDS’ offerings.
Increased coordination of offerings by eResearch service providers through eResearch Infrastructure.
Discussion with NCI, NeCTAR and RDSI taking place to ensure clarity of eResearch service offerings.
8.2.3 Impact
Risk 8 – That data providers/federators do not make their data available
Risk Factors:
The storage needs of researchers are not met, so will not consider sharing.
Researchers do not wish to share their research data.
Confidentiality agreements prevent researchers from making their data available.
ANDS 2012-13 Business Plan Page 109 of 118
Existing data federations see insufficient value in making their data available.
Risk Mitigations:
ANDS will co-ordinate with RDSI and Institutional stores to mitigate this risk.
Enable data citation so that researchers get recognised for the publication of their research data.
Encourage the use of access controlled data stores.
Ensure that ethics agreements balance confidentiality with openness.
Recommend that funding be linked to the provision of data via the ARDC as it becomes available.
Risk 9 – That re-users of research data do not use ANDS Services to discover, access and exploit data
Risk Factors:
The various strategies for exposing data in the ARDC do not result in the data being easily discoverable.
Access control mechanisms are too restrictive or complex.
Other sources of data for re-use are more attractive or easier to use.
Risk Mitigations:
Ensure a nuanced and multi-faceted approach to exposing the Project's accessible data.
Work with AusGOAL and the Australian Access Federation to identify a simple set of licensing and standard access control policies.
Ensure that it is easy to re-purpose ARDC accessible data.
Risk 10: That the standards and technologies that ANDS adopts are not adopted more widely
Risk Factors:
ANDS is the only user and maintainer of actual or de facto standards, leading to inability to share maintenance and development costs.
ANDS is the only source of development activity on particular technologies (RIF-CS, ORCA, ANDS Handle code).
Risk Mitigations:
Seek international engagements and partnerships to take up standards and technologies favoured by ANDS and share development load.
Ensure enough people are trained on the standards and technologies that ANDS is adopting to support wide adoption.
Make implementation decisions such that ANDS is not dependent on particular standards and technologies, but on general approaches that can be transferred across technologies.
ANDS 2012-13 Business Plan Page 110 of 118
Encourage the use of ANDS-developed technologies by other data aggregators such as Terrestrial Ecosystem Research Network (TERN).
8.2.4 Resourcing
Risk 11 – That high quality staff are hard to recruit and retain
Risk Factors:
Limited availability of skilled staff (both within ANDS and in ANDS-funded projects) impacts ability to perform tasks funded by ANDS
The life of the ANDS project is limited and towards the end of that life, staff may lose motivation
Risk Mitigations:
Build a vision for the function of the ANDS for the longer term and communicate this to staff.
Demonstrating to ANDS staff their contribution to the value and outcome of ANDS projects.
ANDS 2012-13 Business Plan Page 111 of 118
8.3 Milestones for 2012-13 The main milestones for ANDS in 2012-13 are based on the activities and outcomes from the
individual programs. Some of these will be derived directly from an individual program; many require
activities across several programs to succeed. The milestones for 2012-13 are:
Milestones 12Q3 12Q4 13Q1 13Q2
Frameworks and
Capabilities
Training
delivered for
Metadata Stores
projects
Community
support webinars
phase 1 held:
data
management,
licensing
ANDS projects
showcase held
Materials
developed:
metadata stores,
location,
vocabularies,
national
collections
Citation proof of
concept projects:
first reports
received
Briefings
provided to
DIISRTE, CAUDIT,
CAUL on national
data sharing
policy agenda
Community
support webinars
phase 2 held:
ANDS project
support Apps,
Metadata Stores
Briefings
provided to
funding agencies
on national data
sharing policy
agenda
ANDS projects
showcase held
Citation proof of
concepts: final
reports received
Materials
development
phase 2
completed
Seeding the
Commons
Completion of 5
projects
Other projects
being monitored
and assessed as
required.
Delivery of 100
records to
Research Data
Australia
3 new data
management
tools available via
Product
catalogue
Completion of 6
projects
Other projects
being monitored
and assessed as
required.
Delivery of 200
records to
Research Data
Australia
2 new data
management
tools available via
Product
catalogue
Completion of 9
projects
Other projects
being monitored
and assessed as
required.
Delivery of 400
records to
Research Data
Australia
Institutional
engagements
underway
3 Community
events held
3 new data
management
tools available via
Product
catalogue
3 Community
events held
Institutional
engagements
underway
ANDS 2012-13 Business Plan Page 112 of 118
Milestones 12Q3 12Q4 13Q1 13Q2
3 Community
events held
3 policy
documents
available or in
process
5 institutional
engagements
begun
3 Community
events held
5 policy
documents
available or in
process
5 institutional
engagements
begun
8 policy
documents
available or in
process
National
Collections
Establish
cooperative
working
relationship with
RDSI/ReDS team
Establish
cooperative
working
relationship with
RDSI Nodes
Initiate
institutional
collection
discussions
Work on 3 identified and fast start collections to inform development of principles and to act as exemplar
Produce
collection
development
plan and socialise
with potential
data providers
Produce material
to increase
uptake in
discovery and re-
use of collections:
topic pages in
RDA,
communications
material,
presentations
Work with 10
data providers to
publish
collections
Work with 10
data providers to
publish
collections
Implement
connections to
provide rich
context
Delivery of
records relating
to 20 collections
to Research Data
Australia
ANDS 2012-13 Business Plan Page 113 of 118
Milestones 12Q3 12Q4 13Q1 13Q2
Data Capture Completion of 7
projects
Other projects
being monitored
and assessed as
required.
Delivery of 70
records to
Research Data
Australia
5 new data
management
tools available via
Product
catalogue
Completion of 11
projects
Other projects
being monitored
and assessed as
required.
Delivery of 500
records to
Research Data
Australia
10 new data
management
tools available via
Product
catalogue
Redeployment of
2 tools
Completion of 10
projects
Other projects
being monitored
and assessed as
required.
Delivery of 300
records to
Research Data
Australia
10 new data
management
tools available via
Product
catalogue
Redeployment of
5 tools
Completion of 1
projects
Delivery of 200
records to
Research Data
Australia
1 new data
management
tools available via
Product
catalogue
Redeployment of
5 tools
Metadata Stores All projects
underway and
project plans
received.
Projects being
monitored and
assessed as
required.
Completion of 2
projects
Other projects
being monitored
and assessed as
required.
Completion of 10
projects
Other projects
being monitored
and assessed as
required.
Completion of 10
projects
Public Sector
Data
Engagement with
agencies to
deliver water
data
Contract with
PROV to deliver
archives data
from key state
agencies and NAA
Agreements with
state eResearch
agencies to
deliver state
Engagements
commenced with
4 government
agencies
Engagements
with state
eResearch
agencies
commenced
Automated feeds
of collection level
data by initial 4
government
agencies
Engagements
commenced with
a further 4
agencies
Automated feeds
of data into the
other discipline
portals in ARDC
has been
provided by
selected
government
agencies
Automated feeds
of collection level
data by 4
prioritised
ANDS 2012-13 Business Plan Page 114 of 118
Milestones 12Q3 12Q4 13Q1 13Q2
public sector data engagements
ARDC Core RDA Institutional
Pages and topic
pages
Integration of
Citation Identifier
Infrastructure
integration – first
phase
User feedback to
Registration,
Discovery and
Data Collection
Page Creation
Infrastructure
Vocab service
phase 2
Integration of
Citation Identifier
Infrastructure
second phase
Research Activity
infrastructure
complete
ANDS Software
Products re-
factoring,
packaging, and
documentation
ANDS software
full integration
with national
services (ARC,
NHMRC,
GeoScience Au)
ARDC
Applications
25 projects
started
2 projects
complete
15 projects
produce early
demonstrations
of value
13 projects
complete
10 remaining
projects produce
demonstrations
of value
10 remaining
projects
complete
International
Infrastructure
ANDS appointed
as DWF NGS for
Australia
Australia's
membership in
DWF confirmed
Data thematic
session held at
the 2nd EU-AU
Joint Workshop
for Research
Infrastructure
1-day data/e-
infrastructure
workshop held
with Australian
and European
participants
Outcomes from
1-day data/e-
infrastructure
workshop funded
and commenced
ODIN project
funding agreed
by EC and project
commenced
Activity plan
outlining
Australia's future
participation in
DWF accepted by
DIISRTE
Outcomes from
1-day data/e-
infrastructure
workshop
completed and
reported
DWF successfully
commenced
Data
contributions
provided as part
of 3rd EU-AU
Joint Workshop
for Research
Infrastructure
ANDS 2012-13 Business Plan Page 115 of 118
Milestones 12Q3 12Q4 13Q1 13Q2
Whole of ANDS Proposal to
DIISRTE of ANDS
activity in
13Q1/Q2
informed by
whether ANDS
concludes at the
end of June 2013,
or transitions
through a
bridging year
8.4 Key Performance Indicators A condition of the NCRIS Funding Agreement for ANDS is that “Key Performance Indicators (KPIs)
acceptable to DIISRTE must be developed” (Attachment A, Section 5.1). The KPIs will enable ANDS to
guide its behaviour, and DIISRTE and the steering committee to monitor the success of ANDS.
8.4.1 Key Performance Indicator Series
1. The number and coverage of data repositories providing metadata feeds to the national registry compared to the number of data repositories.
2. The number and coverage of institutions and number of research groups with which ANDS has engaged
3. The number of institutions with research data management policies and practices consistent with ANDS recommendations
4. The number of times a search is initiated with an ANDS discovery service 5. The number of times an ANDS data page (defined below) is accessed 6. The satisfaction of researchers and partners (see below) with ANDS services as measured by an
annual survey 7. The number of data access and sharing agreements with stakeholders – principally research
institutions, government data agencies, government research agencies
There are two measures that ANDS will not have full control over, but that are important and will
measure success in influencing others’ behaviour:
8. The number of research data sets in harvestable repositories 9. The number of research data sets with persistent identifiers
There is a final measure that ANDS aspires to – it will be measured but is unlikely to be a useful short-
term KPI
10. The number of times a data set is reused and referenced – the ultimate long term measure
These KPIs address ANDS objectives (refer 2.1) as follows:
The commons: KPIs 1, 2, 4, 5, 7, 8, 9 and the long-term measure 10 address objective A.
Data management: KPIs 3, 6 and ANDS’ long-term measure address objectives B and D.
ANDS 2012-13 Business Plan Page 116 of 118
Repositories: KPIs 3, 8 and 9 address objective C.
Access: KPIs 4, 5, 6, and 7 address objective E.
Use: KPIs 4, 5, 6, 7 and the long term aspirational measure 10 address objectives F and G. (Note –
when KPIs 4 and 5 are being measured, not only use will be noted, but where it is initiated so that
analysis can be done both within and across disciplinary use. The satisfaction survey will be
qualitative, enabling an understanding of how well disciplinary, cross-disciplinary and multinational
interaction is being facilitated.)
The form in which ANDS services are offered will be shaped by adherence to the guidance provided
above. This guidance will be reflected in the business plans, and adherence to this guidance will be
determined in discussion with stakeholders.
Notes:
An ANDS data page is a page generated from the ANDS collections registry that describes a data set,
a collection, a research group, a research project, or an institution.
ANDS will focus on monitoring Institutions that are research data producing organisations, such as
the Bureau of Meteorology, Landsat, the Australian Synchrotron, the Cultural Collections sector, etc.,
and the research data using organisations, such as the Universities, the PFROs, and affiliates. Many
organisations have both roles.
Researchers have many partners in carrying out research and ANDS needs to satisfy their needs as
well – this includes funders, assessors, institutional representatives, such as DVC-Rs, eResearch
Directors, Information providers such as libraries, IT providers such as University ITS Departments,
partner service providers, such as ARCS and NCI, as well as umbrella organisations such as
disciplinary bodies such as the Academies, international research bodies, etc.
The qualitative measures are intended to capture not only usage figures, but also attitudinal
attributes – ANDS only succeeds with cultural change, so this will be measured as well. The first
survey will again set benchmarks, but also help inform future surveys.
8.4.2 Key Performance Indicators for 2012-13
1. The number and coverage of data repositories providing metadata feeds to the national registry compared to the number of data repositories. ANDS intends to build at least 60 automatic plus 40 manual metadata feeds. This will cover at least 60 research data-holding institutions.
2. The number and coverage of institutions and number of research groups with which ANDS has engaged: ANDS will continue to engage with all Australian universities, PFROs, and 20 major Government data providers this year, and through them at least 50 research groups.
3. The number of institutions with research data management policies and practices consistent with ANDS recommendations: 29
4. The number of times a search is initiated with an ANDS discovery service: 25,000. 5. The number of times an ANDS data page (defined below) is accessed: 100,000 (now counting
filtered page views using Google Analytics standard excluding robots spiders etc). 6. The satisfaction of researchers and partners (see below) with ANDS services as measured by an
annual survey - no number can be given here, but a report will be provided.
ANDS 2012-13 Business Plan Page 117 of 118
7. The number of data access and sharing agreements with stakeholders – principally research institutions, government data agencies, government research agencies: we aim to strike at least 30 agreements to make information available.
8. The number of research data sets in the ARDC: more than 40,000 collections 9. The number of research data sets with persistent identifiers: 10,000 (no increase in target) 10. The number of times a data set is reused and referenced – this will be tracked but cannot yet
be reported.
These measures will provide indication of the effectiveness of the ANDS outputs this year and will be
used as indicators to determine whether the overall outcome for 2011-12 is being delivered - the
first time ever, researchers can:
systematically, reliably and authoritatively connect their research data to project, institutional and disciplinary descriptions, and
simultaneously publish citable research data collections through institutional, disciplinary and national services.
8.4.3 Measuring our Performance against the Transformations
The KPIs just described provide a picture of ANDS’ overall performance, but do not explicitly show how well ANDS is addressing the key transformations that focus on our data improvement strategy. In this section we discuss ANDS performance in this light.
Data that is unmanaged into managed collections: ANDS has worked on the basis that data management is best centred at the institutions, so we have focused on improving research data management at institutions. Our KPIs 1, 2 and 3 provide an indication of this capability. More generally we see increase in the people, policies and procedures available at institutions to support research data management. We can also determine whether the relevant systems are in place to manage research data – does an institution have a metadata store, and is there software in place to support automated metadata capture? There has been a substantial improvement in institutional capability in this regard. Another important measure would be the number of funded research projects that have data management plans. This number is currently very small and is only likely to improve dramatically if there are changes in policy settings at funding agencies – this is where the work of the frameworks program is focused.
Data collections that are connected: There a number of ways for data collections to be connected: they may have a persistent identifier that enables them to be connected to publications, they may be described using standard terminology that connects the collections to similar collections, and they may have links to descriptions of the projects that generated them, parties associated with them, and services that are available to operate on them. They may be deliberately associated through an overarching collection, or through commentary. Our KPIs 7, 8 and 9 provide an indication of this capability, but the number of records describing collections in RDA – currently over 32,000 – provides an indication of a much more richly connected ARDC.
Data collections that are discoverable: Discovery of research data can occur in many ways: through an institutional portal, a disciplinary portal, through Research Data Australia, or though internet search. KPIs 4 and 5 give an indication of discovery through Research Data Australia, but it is equally important that data is discovered through the form that makes most sense to a researcher. Whether water data is discovered through a Murray Darling Basin portal, through a TERN portal or through Research Data Australia is unimportant. That the data is discoverable is more important, and the number of feeds between different portals as indicated by KPI 1 is important to maximise discovery.
ANDS 2012-13 Business Plan Page 118 of 118
Data collections that are re-usable: Rather than simply measuring the level of reuse of research data
(quite difficult to do, and with potentially significant lags) it is also useful to measure the enablement
of reuse. The number of tools that support automatic capture of metadata to enable wider use, a
particular focus of the data capture project, is a useful measure. As reported in the Data Capture
program there are 25 tools in place. Equally important is to ensure that there are data tools in place
to enable researchers to reuse the data – this has been the focus of the NeAT program and the
Applications program but has not been counted to date. The value of the National Corpus of
Australian Language has been greatly enhanced not only by collecting the data, but by having
associated analysis tools available, and we see great potential in other areas for similar associations
between data and tools. Data collections that have clear descriptions of licencing terms greatly
facilitate integration and reuse. This has been another key focus of the Frameworks program, but
statistics for this have not yet been gathered. Finally, the behaviour of routinely reusing research
data is greatly enhanced through demonstrations by leading researchers of what is possible. The
number of compelling demonstrations of reuse is intended to increase substantially this year through
the Applications program.