12
Linking systems to improve data quality Javier Otegui and Rob Guralnick

Linking systems to improve data quality

Embed Size (px)

DESCRIPTION

My talk at TDWG 2013 showing our work on linking different biodiversity information systems to improve the quality of the data on both sides. We enabled an interoperability workflow between data aggregators, such as GBIF or VertNet, and other biodiversity information aggregators, like Map Of Life, with information such as IUCN expert range maps, regional checklists or gridded surveys. Although this was a work in progress at the time of the presentation (therefore the "warning" sign), we were able to show that intercommunication between both systems allowed us to detect spatio-taxonomic biases and issues in both sources. We also explored the possible causes for those errors and tried to model the error rates, finding that new data, published through new mechanisms showed better error rates. We concluded that even though we still lack more work to get a deeper understanding, we believe that we are getting into a new age of biodiversity information sharing, where quality, and not that much quantity, is becoming the key feature. We also believe that the Integrated Publishing Toolkit (IPT), developed by GBIF, might be the banner of this new movement towards a better quality data sharing and that it might be because it is an easier-to-use tool, because building auxiliary tools and mechanisms for improving the quality is easier, or simply because people are getting aware of the importance of having a good quality data set.

Citation preview

Page 1: Linking systems to improve data quality

Linking systems to improve data quality

Javier Otegui and Rob Guralnick

Page 2: Linking systems to improve data quality
Page 3: Linking systems to improve data quality
Page 4: Linking systems to improve data quality
Page 5: Linking systems to improve data quality
Page 6: Linking systems to improve data quality
Page 7: Linking systems to improve data quality
Page 8: Linking systems to improve data quality

>210M occurrence points+ All country boundaries+ All IUCN range maps----------------------------------9 different spatial and spatio-taxonomic issues

Causes??

Page 9: Linking systems to improve data quality

>210M occurrence points+ All country boundaries+ All IUCN range maps----------------------------------9 different spatial and spatio-taxonomic issues

Causes??

Page 10: Linking systems to improve data quality

Less records without coordinates

More records inside range map

Most populated: IPT

Less issues in data: IPT

Proportion of terrestrial vertebrate records with any issue:Amphibia 12%Aves 14%Mammalia 17%Reptilia 14%

Proportion of terrestrial vertebrate records inside range maps:Amphibia 30%Aves 69%Mammalia 37%Reptilia 7%

Proportion of terrestrial vertebrate records without range maps:Amphibia 50%Aves 7%Mammalia 38%Reptilia 76%

Time Source Some values

Page 11: Linking systems to improve data quality
Page 12: Linking systems to improve data quality

Thank you!