Upload
drew
View
49
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Data Science and Visualization. 2014 Summer Internship - Tetherless World Constellation. Sumithra Gnanasekar Lakshmi Chenicheri. Objective. Visualize Minimum Information about a Marker Gene Sequence ( MiMarks ) compliant datasets A dark data exercise. *. MiMarks. - PowerPoint PPT Presentation
Citation preview
Data Science and Visualization
2014 Summer Internship - Tetherless World Constellation
Sumithra GnanasekarLakshmi Chenicheri
Objective
• Visualize Minimum Information about a Marker Gene Sequence (MiMarks) compliant datasets
• A dark data exercise
*
MiMarks
• A standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences
• Describes the environment from which the sample has been taken from
• Ensures contextual data is collected and submitted
*
MiMarks Checklist
Datasets
• Two datasets from a bacterial diversity study from the Western English Channel
• Focused on the seasonal structure of microbial communities
• Dataset 1 was converted from Excel to CSV
• Dataset 2 was converted from SRA to CSV
• Data cleaning was undertaken to retrieve relevant fields *
Tools for Visualization
• R
• Google charts integrated with R
• Shiny R Studio
• D3.js
D3.js was finally used due to its flexibility of use and range of visualizations available
*
Scatter Plot Dataset 1
• Allows the user to filter fields
• Drill and expand
• Group based on fields
• Handy in determining correlations between variables
*
Analysis of Scatter Plot Dataset 1
• Depth, density, total_Depth of water column, longitude and latitude were found to be independent of the other environmental variables
• Near linear correlation between nitrate and silicate, and nitrate and phosphate
*
Scatter Plot Dataset 2
• Allows the user to filter fields
• Drill and expand
*
Analysis of Scatter Plot Dataset 2
Linear trend seen in the scatter plots of:
1. Spots vs Bases
2. Nitrate vs Phosphate
3. Org_nitro vs Ord_carb
4. Temperature vs Density
*
Temporal Visualization
Allows one to filter values based on time and analyze its effect on other variables
*
DOI Visualization
• Visually represents DOIs associated with data points
• On clicking a bubble, the metadata for that DOI is fetched and displayed
*
Bubble Chart
• Visually represents the environment data associated with each sample
• Bubble size corresponds to organism count
*
RDF Conversion
The RDF conversion for MiMarks compliant datasets involves two steps:
1. Construct an Ontology or use an existing one2. Convert the dataset into a triple instance using CSV to RDF
conversion tools
csv2rdf4lod is an open source tool that can be used to easily convert the data in a CSV file into RDF encoded data
*
Spatio-temporal feature of MiMarks, VAMPS and CoDL datasets
Some tools or visualizations that can be used to visualize the MiMarks, VAMPS and CoDL datasets are as follows:
• Planetary.js, an open source tool will be effective in representing the spatial features in an interactive way
• Motion charts that show the change over a period of time can be effective, by showing a change in the quantity represented as the size of the bubble in the motion chart
• Calendar based representation of values if there is continuous data, is another option
*
Links to Visualizations
• Timeline crossfiltering visualization: http://dco.tw.rpi.edu/viz/timeline/index.html
• DOI visualization: http://dco.tw.rpi.edu/viz/doiVis/index.html
• Scatterplot visualization for Dataset 1: http://dco.tw.rpi.edu/viz/scatterPlot/demo/demo.html
• Bubble chart Visualization: http://dco.tw.rpi.edu/viz/Bubblechart/bubble_dataset2/index.html
• Scatterplot visualization for Dataset 2: http://dco.tw.rpi.edu/viz/scatterplot_dataset2/demo/demo.html
*