5
For many of us, Gumby brings to mind fond childhood or college memories of the green, stretched-out, clay character who was accompanied by his trusted sidekick Pokey, a talking orange pony, and Nopey, his dog who possessed a single word vocabulary— “nope”. Gumby was frequently battling his mischievous nemeses, the Blockheads, who constantly caused chaos. Gumby also reflects how many, if not most, scientists feel these days: stretched way too thin and spending far too much time warding off today’s academic version of the Blockheads—hypercompetition and administrivia (the overwhelming bureaucratic details and record keeping that affect all aspects of our academic/research lives). In a thoughtful opinion piece in PNAS 1, Daniel Geman and Stuart Geman argue that the pace of science is slowing because scientists today engage less in deep thinking and spend much Gumby, Professional Selfies, Technology Traps & Irreproducibility Volume 5 Issue 1 ©2016 DataONE 1312 Basehart SE University of New Mexico Albuquerque NM 87106 of their time taking “professional selfies”, i.e. publishing quickly and frequently in “minimum publishable units” and focusing attention on those vita elements that make us visible (e.g. extensive lists of publications, talks, proposals, conferences attended, etc.). They attribute this lack of deep thinking to hypercompetition and the present academic reward structure that focuses on quantity as opposed to quality. As a consequence of being stretched too thin, researchers often unwittingly fall into technology traps—e.g. software tools that seemingly meet all of our needs and are simple and easy to use (or misuse, as the case may be). One such technology trap is Excel— the ubiquitous spreadsheet program that can be used for organizing and storing data as well as for QA/QC, statistical analysis, modeling and visualization. The literature is full of reports about the problems and mathematical errors incurred when using various versions of Excel for statistical analysis and modeling. As if these issues are not serious enough, Mark Ziemann and colleagues 2 recently examined and reported on another equally challenging problem with Excel—i.e., inadvertently converting codes or gene symbols to dates and floating point numbers (e.g. automatically converting “SEPT2” (a commonly used code for Septin 2) to “2-Sep”). Of the supplementary materials associated with almost 3,600 genomics-related papers examined from journals such as Nature, PLOS One and Science, approximately one-fifth contained erroneous name/data conversions. Such errors clearly also extend to the environmental and Earth sciences communities who use codes extensively for species names, locations and so forth, although the extent of the problem has yet to be documented. Errors such as those discussed above contribute to the alarming irreproducibility crisis in science. Based on a recent Nature survey 3 , “more than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments.” In addition to faulty software, immediate causes of shoddy work and irreproducibility include inadequate metadata (or documentation that describe the data and analyses), poor use of statistics, difficult and finicky techniques and protocols, incomplete data, and selective reporting. But, all of the factors attributed to the crisis are greatly exacerbated by hypercompetition for grants and positions as well as the heavy bureaucratic burdens that are placed on scientists. Hypercompetition, administrivia and irreproducibility are difficult and seemingly insolvable problems. Individual scientists interviewed as part of the Nature study 3 reported that efforts to improve reproducibility such as improving documentation, pre- registering studies, standardizing experimental methods and self-repeating experiments required 30-100% more time to do so. That said, the path forward is straightforward albeit challenging: cont’d page 4 ›››

Gumby, Professional Selfies, Technology Traps ... · provide better and more effective tools and services. UX testing measures the usability, efficiency, and effectiveness of a product

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Gumby, Professional Selfies, Technology Traps ... · provide better and more effective tools and services. UX testing measures the usability, efficiency, and effectiveness of a product

For many of us, Gumby brings to mind fond childhood or college memories of the green, stretched-out, clay character who was accompanied by his trusted sidekick Pokey, a talking orange pony, and Nopey, his dog who possessed a single word vocabulary—“nope”. Gumby was frequently battling his mischievous nemeses, the Blockheads, who constantly caused chaos. Gumby also reflects how many, if not most, scientists feel these days: stretched way too thin and spending far too much time warding off today’s academic version of the Blockheads—hypercompetition and administrivia (the overwhelming bureaucratic details and record keeping that affect all aspects of our academic/research lives).

In a thoughtful opinion piece in PNAS1, Daniel Geman and Stuart Geman argue that the pace of science is slowing because scientists today engage less in deep thinking and spend much

Gumby, Professional Selfies, Technology Traps & Irreproducibility

Volume 5 Issue 1

©2016 DataONE 1312 Basehart SE University of New Mexico Albuquerque NM 87106

of their time taking “professional selfies”, i.e. publishing quickly and frequently in “minimum publishable units” and focusing attention on those vita elements that make us visible (e.g. extensive lists of publications, talks, proposals, conferences attended, etc.). They attribute this lack of deep thinking to hypercompetition and the present academic reward structure that focuses on quantity as opposed to quality.

As a consequence of being stretched too thin, researchers often unwittingly fall into technology traps—e.g. software tools that seemingly meet all of our needs and are simple and easy to use (or misuse, as the case may be). One such technology trap is Excel—the ubiquitous spreadsheet program that can be used for organizing and storing data as well as for QA/QC, statistical analysis, modeling and visualization. The literature is full of reports about the problems and mathematical errors incurred when using various versions of

Excel for statistical analysis and modeling. As if these issues are not serious enough, Mark Ziemann and colleagues2 recently examined and reported on another equally challenging problem with Excel—i.e., inadvertently converting codes or gene symbols to dates and floating point numbers (e.g. automatically converting “SEPT2” (a commonly used code for Septin 2) to “2-Sep”). Of the supplementary materials associated with almost 3,600 genomics-related papers examined from journals such as Nature, PLOS One and Science, approximately one-fifth contained erroneous name/data conversions. Such errors clearly also extend to the environmental and Earth sciences communities who use codes extensively for species names, locations and so forth, although the extent of the problem has yet to be documented.

Errors such as those discussed above contribute to the alarming irreproducibility crisis in science. Based on a recent Nature survey3, “more than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments.” In addition to faulty software, immediate causes of shoddy work and irreproducibility include inadequate metadata (or documentation that describe the data and analyses), poor use of statistics, difficult and finicky techniques and protocols, incomplete data, and selective reporting. But, all of the factors attributed to the crisis are greatly exacerbated by hypercompetition for grants and positions as well as the heavy bureaucratic burdens that are placed on scientists.

Hypercompetition, administrivia and irreproducibility are difficult and seemingly insolvable problems. Individual scientists interviewed as part of the Nature study3 reported that efforts to improve reproducibility such as improving documentation, pre-registering studies, standardizing experimental methods and self-repeating experiments required 30-100% more time to do so. That said, the path forward is straightforward albeit challenging:

cont’d page 4 ›››

Page 2: Gumby, Professional Selfies, Technology Traps ... · provide better and more effective tools and services. UX testing measures the usability, efficiency, and effectiveness of a product

� Fall 2016

2

WorkingGroupFOCUS Usability and Assessment Working Group

One of the important tasks of the Usability and Assessment (U&A) Working Group is to assess the value and usability of DataONE products and services. By observing scientists at work, U&A helps DataONE to provide better and more effective tools and services.

UX testing measures the usability, efficiency, and effectiveness of a product or system by capturing the experience of users to identify any problems. DataONE has incorporated UX testing into every stage of the design process. The Usability and Assessment Working Group incorporates user surveys and personas throughout all stages of the project to continuously identify user needs. During the design and development phases, heuristic evaluations, prototype testing, and eye tracking studies are completed to identify any problems before the product is released. Once the product is released, iterative UX testing is performed to ensure the product continues to meet the users’ needs.

DataONE UX testing is conducted wherever the users are; UX testing is performed at conferences, scientists’ work places, and the University of Tennessee’s

state-of-the-art User-Experience Lab. In addition to going where the users are,

the U&A working group solicits a wide range of users to participate in the UX testing. Users come from the DataONE users group, friends of friends, and a scientist panel - a voluntary list of scientists who are willing to perform quick UX scenarios with little notice. DataONE’s goal is to perform UX testing with user groups that represent all aspects of the DataONE community.

From DataONE’s beginning, UX has been an important part of the project. In particular, in the past two years the U&A working group has performed over 20 UX tests including tests of the DataONE Search, MatLab Toolbox, DataONE website, and Member Node UX testing for United States Geological Survey (USGS) and United States Department of Energy (DOE) Atmospheric Radiation Measurement Program (ARM).

At this year’s DataONE User Group Meeting (DUG) in Durham, North Carolina on July 17th and 18th 2016, UX testing evaluated the DataONE Search. Throughout its development and life, the DataONE Search has undergone multiple UX tests. The goal of the DUG testing was to assess the usability of new search features added to the interface. Users were asked to 1.) Sign into the search; 2.) Locate the area to create groups; 3.) Find Member Node profiles; and 4.) Perform searches using multiple filters. Testing identified potential usability issues with the sign-in button, how to access member node profiles, map functions (e.g., showing world map, using geographical locators/icons), heading of summary page, engaging/unengaging the map filter, and overall showed that the UX testing has improved the search. Changes were made to the search UI as a result of the UX results.

A guiding principle in UX design is “don’t make the users think.” In order to fulfill that UX principle, DataONE’s UX program has identified some basic UX lessons. First, DataONE’s users have opinions and want to be heard. When users are involved in the design and development of a product they feel a sense of pride and relationship with the project.

Second, users have existing habits that need to be accounted for in the design. For example, users tend not to read large bodies of text and prefer quickly scanning a page for the necessary information.

Third, it is important to keep in mind that the DataONE developers and designers are not the users, and involving the users early and throughout the design process creates better products.

Finally, trust is important. When problems arise and the user cannot solve the issue or seek help, they lose trust in the product. If, for example, they used the mapping feature in the Search to limit their search to data sets in Australia and they receive data sets from North America, they worry that the results they are given may not be accurate.

As a result of the UX lessons DataONE has learned over the years, the U&A team has created a list of UX best practices to keep in mind as new tools and products are developed:

• Conduct iterative testing• Use a simple and clean design• Remember aesthetics are important• Provide help• Have a feedback mechanism• Open new tabs• Make links obvious

Figure 1: Heat map created through eye-tracking study

Figure 2: UX testing (top) and an eye tracking study (bottom) at the University of Tennessee’s User Experience Laboratory

cont’d page 4 ›››

Page 3: Gumby, Professional Selfies, Technology Traps ... · provide better and more effective tools and services. UX testing measures the usability, efficiency, and effectiveness of a product

� Fall 2016

3

Each Member Node within the DataONE federation completes a description document summarizing the content, technical characteristics and policies of their resources. These documents can be found on the DataONE.org site at bit.ly/D1CMNs. In each newsletter issue we will highlight one of our current Member Nodes.

National Ecological Observatory Network (NEON) https://www.neonscience.org

The National Ecological Observatory Network (NEON) is a continental-scale ecological observation facility sponsored by the National Science Foundation and operated by Battelle that gathers and synthesizes data on the impacts of climate change, land use change and invasive species on natural resources and biodiversity.

The NEON project was conceived as an instrument to advance the ability of scientists to examine and understand the interactions between life and the environment at the scale of an entire continent. During this century, technology and research tools have rapidly evolved to support science at a scale that hardly anyone imagined a generation ago. Hundreds of scientists and engineers have contributed their expertise to plan, design and operate a grand instrument that could harness the power of networked technology to gather and provide high-quality information on interactions between land, life, water and climate across a continent and over the course of a human generation.

The observatory is designed to collect high-quality, standardized data from 81 field sites (47 terrestrial and 34 aquatic) across the U.S. (including Alaska, Hawaii and Puerto Rico). Data collection methods are standardized across sites and include in situ instrument measurements, field sampling, and airborne remote sensing. Field sites are strategically selected to represent different regions of vegetation, landforms, climate, and ecosystem performance. NEON data and resources are freely available to enable users to tackle scientific questions at scales not accessible to previous generations of ecologists.

The NEON project entered its construction phase in Spring 2012 after a five year planning and designing phase for the infrastructure. As sites are constructed and transition to operations, “provisional”-grade data products are increasingly available on the Data Portal or available by request, and free to use. Nearly 50 data products are currently available, including physical data such as wind speed, air temperature, and surface and groundwater physical properties; observational data including plant biomass and small mammal abundance; and remote sensing data including LiDAR and hyperspectral data. Physical and biological samples are also increasingly available for requested use by researchers; to date this includes small mammal fecal samples, genomic extracts for beetles and mosquitoes, and soil samples.

NEON’s data services are rapidly expanding as well. Some remote sensing data that were previously only shipped on hard drives can now be downloaded by request. New data products from instrumented and observational monitoring are being posted on the data portal on an ongoing basis. Structured metadata, in the ecological metadata language (EML) format, is also now available with each data package downloaded from the data portal. These files are also shared with DataONE search to promote data discovery. Most recently, an early prototype of a public NEON API (application programming interface) has been released for access to information about data products and study locations as well as for direct access to data. As this is an early release, community feedback is highly welcome.

The insights gleaned from NEON data and tools may inform decisions at the national and community levels that will impact natural resource management and human well-being for generations to come.

MemberNodeDESCRIPTION�

Page 4: Gumby, Professional Selfies, Technology Traps ... · provide better and more effective tools and services. UX testing measures the usability, efficiency, and effectiveness of a product

� Fall 2016

4

CoverSTORY cont’d• Geman and Geman1 recommend that the hiring, promotion and reward system be changed

to focus on “reviewing a small number of publications chosen by the candidate”, an action that would result in “more time spent on each project” and “less time taking professional selfies.”

• The burgeoning tidal wave of administrivia may be most difficult to stem as bureaucrats and administrators seem inclined to take two steps back for every step forward. At a minimum, departments, colleges, and research offices can and should provide significant administrative support to scientists. All too many researchers are failing at work-life balance issues, study reproducibility and quality-over-quantity because they necessarily must become immersed in the endless details of local and national bureaucratic rules, regulations and financial reporting systems.

• Researchers and students can all benefit from training in and adoption of research data management concepts, practices and enabling tools such as those provided by DataONE and the Center for Open Science. Research reproducibility starts with having a solid data management plan and proceeds when attention is paid to: (1) using standardized and well-documented approaches for data collection, QA/QC and analysis; (2) creating comprehensive metadata that allow you and others to fully understand and replicate a study; and (3) quality-assuring the data and the workflows using statistical and graphical QA/QC methods and by sharing data and results with others.

• Lastly, like Gumby we must all be vigilant in fending off the Blockheads. This entails recognizing the causes and consequences of hypercompetition and administrivia and fighting back whenever possible through such activities as serving on promotion and tenure committees, responding to requests for comments regarding policies and procedures that affect you and your colleagues (just say “nope” to bad policies!), and promoting scientific and data management best practices. n

— Bill Michener Principal Investigator

1 Don Geman and Stuart Geman. 2016. Science in the age of selfies. PNAS 113(34): 9384-9387.

2 Mark Ziemann, Yotam Eren and Assam El-Osta. 2016. Gene name errors are widespread in the scientific literature. Genome Biology 17:177.

3 Monya Baker. 2016. 1,500 scientists lift the lid on reproducibility. Nature 533: 452-454.

Members of the DataONE Team will be at the following events. Full information on training activities can be found at bit.ly/D1Training and our calendar is available at bit.ly/D1Events.

Dec. 12-16 American Geophysical Union Meeting San Francisco, CA https://fallmeeting.agu.org/2016/

Dec. 12-13 Coalition for Networked Information Washington, DC https://www.cni.org/events/membership-meetings/upcoming-meeting/fall-2016

Feb. 20-23 Int. Data Curation Center Meeting Edinburgh, UK http://www.dcc.ac.uk/events/idcc17

Apr. 5-7 Research Data Alliance Plenary Barcelona, Spain https://www.rd-alliance.org/plenaries/rda-ninth-plenary-meeting-barcelona

UpcomingEVENTS��

As DataONE goes forward, UX testing will continue to be involved throughout the design and development process to help DataONE effectively meet its mission and objectives. UX testing is also available for DataONE Member Nodes. For more information on DataONE UX testing, please contact Rachel Volentine ([email protected]).

— Rachel VolentineUser Experience Researcher

University of Tennessee

WorkingGroupFOCUS cont’d

Concurrent with the start of a new academic calendar we launched the 2nd full season of the DataONE Webinar Series. Our opening webinar was a panel presentation from International Data Week, a five day meeting of inspirational and motivating talks that comprised the SciDataCon, International Data Forum and RDA line up. Jens Klump (CSIRO), Heidi Lane (University of Helsinki) and Fiona Murphy (University of Reading)gave an insightful overview of the incentives, challenges and barriers that face researchers when sharing data, despite the audio challenges we encountered with an off site webinar. As with all our webinars, a recording was made publicly available and that webinar can be viewed at https://www.dataone.org/

webinars/incentives-challenges-barriers-exploring-social-institutional-and-economic-reasons-sharing.

In October, Stephanie Hampton from the University of Washington joined us to talk about “How not to collect data”. Stories from the field so-to-speak. Stephanie provided a very candid and entertaining review of her journey in the world of data management, imparting her lessons learned. That webinar can be viewed at https://www.dataone.org/webinars/how-not-collect-data-organizing-data-long-term-use-and-re-use and some of the material she presented can be found in Lesson 4 of the DataONE Education Modules (https://www.dataone.org/education-modules).

Moving forward into the remainder of this

OutreachUPDATE

Let us know how you value DataONE

Complete this brief survey on our Tools and Services:

http://bit.ly/D1MemberSvy

Page 5: Gumby, Professional Selfies, Technology Traps ... · provide better and more effective tools and services. UX testing measures the usability, efficiency, and effectiveness of a product

� Fall 2016

5

Status UpdateThe DataONE production environment

now has 34 participating Member Nodes after the addition of the The National Ecological Observatory Network (NEON) Member Node (see page 3). Combined, these Member Nodes provide access to more than 127,759 publicly readable, current version data sets comprised of 238,171 metadata and 407,431 data objects. A total of 1,074,002 individual objects are resolvable and retrievable through DataONE and the participating Member Nodes.

The most recent update to the DataONE infrastructure included internal updates to the Metacat software and metadata indexing software on the Coordinating Nodes. These incremental updates resolve help issues encountered when processing certain types of Metdata where the referenced schemas may overlap in some features such as namespace with other standards. Numerous security and other patches have also been applied across all infrastructure to help ensure ongoing reliable services.

The development team is currently busy testing the next production release which will include updates to the operating systems and Java version on all the DataONE infrastructure. Besides these updates, the

CyberSPOT

Figure 1: Counts of data/metadata/resource maps uploaded to DataONE since release in July 2012

1312 Basehart SEUniversity of New MexicoAlbuquerque, NM 87106

Fax: 505.246.6007

DataONE is a collaboration among many partner organizations, and is funded by the US National Science Foundation (NSF) under

a Cooperative Agreement.

Project Director:

William [email protected]

505.814.7601

Executive Director:

Rebecca [email protected]

505.382.0890

Director of Community Engagement and Outreach:

Amber [email protected]

505.205.7675

Director of Development and Operations

Dave [email protected]

new release also significantly improves metadata indexing performance, and so will allow more frequent iterations on index tuning to ensure that DataONE is able to provide continually improving performance on precision and recall of searches conducted through the search web interface or more directly through the query APIs. n

year and into 2017 we aim, as always, to provide webinar presentations that engage participants in relevant and cutting-edge topics concerning data management within Earth and environmental sciences. Topics may be broad conceptual themes or more specific instructional webinars focussed on open science, stages of the data lifecycle or community tools for data management. The upcoming season of webinar topics and speakers includes:

•� November:�Analyzing, interpreting, and implementing data management plans. Amanda Whitmire, Heidi Imker, Sarah Jones (register here: https://attendee.gotowebinar.com/register/6815189062176067330)

•� December: Data Preservation - Title TBA. Nancy McGovern

•� January: Currently open•� February: Jupyter Notebooks - Title

TBA. Fernando Perez•� March: Data Collection - Title TBA.

Bob Arko.•� April: Currently open•� May: Currently openWe are always open to suggestions for

speakers and topics, and welcome your feedback via [email protected]. Please also join us online to view previous webinars and engage in our interactive discussion forum. n

OutreachUPDATE cont’d

0  

50  

100  

150  

200  

250  

300  

350  

400  

450  

Jun  '12   Nov  '12   May  '13   Nov  '13   May  '14   Nov  '14   May  '15   Nov  '15   May  '16  

Thou

sand

s  

Date  Uploaded  to  DataONE  

Data      Metadata      Resource