If you can't read please download the document
Upload
al-costa
View
74
Download
0
Embed Size (px)
Citation preview
Big Data and Genomics
Al Costa Alkol Biotech
Low sequencing costs = lots of data
As the costs of sequencing a genome decreases, the DNA of more and more organisms become publicly available, meaning more data
Low sequencing costs = lots of data
This problem is increased if we consider initiatives such as 1000Genomes or if we were to sequence everyone in the US today (313 Exabytes)
genomics = lots of data
In fact, the number of bytes involved in each DNA genome is in the range of millions to billions
genomics = lots of data
And if you are still unconvinced, take the Minion, by UK company Oxford Nanopores, which sells for US$900, is the size of a USB stick, and can sequence a human genome in 8 hours
Comparative genomics = promising
However, if we compare the genomes of different species, well realize they share a lot of common ground
Tools used = complicated
For genomics data we use ADAM, BLAST and several comparison tools
ADAM is an open-source, high performance, distributed platform for genomic analsys. ADAM defines a:1 - Data schema and layout on disk2 - A Scala API3 - A command line interface
BLAST is an aligment tool which is able to reconstruct the entire strand based on shotgun chunks.
An example = our project
We are currently using Big Data to find promising strands among millions of DNA sequences, using the tools described as Ill explain now
How we use it = to build new crops
The current state of the biobased industry (biofuels, bioplastics, etc) is trying to adapt to unsuitable feedstocks. That is exactly the opposite to what making did with food, where it adapted crops to its feeding needs
Sugarcane = much more than sugar!
Among the feedstocks currently used by the biobased industries, one stands out: sugarcane. However, it currently grows only in tropical regions. A pity, considering the amount of products it originates.
Eunergycane = European sugarcane
Thus, being able to adapt sugarcane to grow in Europe would mean a lot of new products being sustainably produced. We are half-way in that project with our EUnergyCane variety, the only one genuinely european
a pine tree and an edelweiss?
Maybe the only thing that is common between a pine tree and an edelweiss is the fact that both can stand cold places.
Looking for a philosophers stone
Thus, a comparison between the DNA strand of the pine tree and of the Edelweiss should reveal common regions, one of which responsible for example for giving a crop the ability to withstand the cold
How we use it = to build new crops
This is how we develop our work: by analizing DNA strands of crops which can resist the cold in order to find that Philosophers Stone which, when inserted into sugarcane, would make it able to grow in Europe. For that, new techniques such as CRISPR/CAS 9 prevent the use of plasmids and GMOs
Conclusion = big data is much more
Big Data is not only for gathering customer data at banks and telcos, but a valuable tool in finding new and unsuspecting data in any area of human knowledge.
It use in Genomics may allow finding cures for otherwise incurable diseases, develop new crops with increased capabilities, and much more
Thank you
Click to edit the title text formatClick to edit Master title style
11/04/16