Upload
wesley-de-neve
View
56
Download
0
Tags:
Embed Size (px)
Citation preview
Ghent University Global Campus (GUGC) Research Seminar
Wesley De Neve
Ghent University – iMinds & KAIST
Het Pand, Ghent, Belgium
January 19, 2015
Towards Using Multimedia Technologyfor Biological Data Processing
2
• Credentials
- Master’s degree in computer science (2002)
• at Ghent University, Belgium
- Ph.D. degree in computer science engineering (2007)
• at Ghent University, Belgium
• Employment
- Multimedia Lab @ Ghent University - iMinds, Belgium (since 2011)
- Image and Video Systems Lab @ KAIST, Korea (since 2007)
Background
4
• Main track
- machine learning for social mediaand video content understanding
• Side track
- compression of genomic data using video coding tools
Research Activities
7
• Challenge: data handling
- the ability of researchers to sequence DNA is outrunning their ability tostore, transmit, and analyze DNA
• Research question
- how about compressing DNA by making use of video coding tools in order to alleviate storage, transmission, and analysis problems?
Problem Statement
9
DNA Compression Framework
Input filter Encoding filterPipe
Output filterPipe PipePipe
Statistics
software for reading DNA data from the hard disk or the network
10
DNA Compression Framework
Input filter Encoding filterPipe
Output filterPipe PipePipe
Statistics
software for compressing DNA data
11
DNA Compression Framework
Input filter Encoding filterPipe
Output filterPipe PipePipe
Statistics
software for writing DNA data to the hard disk or the network
12
DNA Compression Framework
Input filter Encoding filterPipe
Output filterPipe PipePipe
Statistics
software for gathering compression performance statistics
13
• Modular and extensible
- thanks to the use of the pipes and filters design pattern
Characteristics (1)
Input filter Encoding filterPipe
Output filterPipe PipePipe
Statistics
14
• Block-based compression
- allows selecting the best compression tool per block (adaptivity)
- enables random access, streaming, and parallel processing
Characteristics (2)
Input filter Encoding filterPipe
Output filterPipe PipePipe
Statistics
15
Characteristics (3)
Efficiency
FunctionalityEffectiveness
Proposed solution
SOTA
allowing for a flexible trade-off betweenefficiency, effectiveness, and functionality
has always been a major design goal
16
• Effectiveness: compression of the human Y chromosome
• Efficiency: no meaningful measurements thus far
Experimental Results
Format File size (MB)
No compression (FASTA) 18.70
Binary 7.01
Huffman 5.16
Proposed framework 4.26
(*) Tom Paridaens, Yves Van Stappen, Wesley De Neve, Peter Lambert, Rik Van de Walle,Towards block-based compression of genomic data with random access functionality,
Proceedings of the IEEE GlobalSIP 2014 Workshop on Genomic Signal Processing and Statistics
17
• Compression
- integration of advanced entropy coding
- support for the protein alphabet
- performance optimizations (I/O, GPU)
• Privacy protection
- encryption
• Streaming
• Compressed-domain manipulation
- only download and decode that part of the compressed genome that belongs to a particular gene (region-of-interest)
Future Research (1)
Past
Future
18
• From
- What video coding technologies can be re-used in the context of DNA data compression?
• To
- What multimedia technologies can be re-usedin the context of biological data processing?
Future Research (2)
Past
Future
20
[1] Tom Paridaens, Wesley De Neve, Peter Lambert, Rik Van de Walle, Genome Sequences as Media Files: Towards Effective, Efficient, and Functional Compression of Genomic Data, Proceedings of DCBIOSTEC 2014
[2] Tom Paridaens, Yves Van Stappen, Wesley De Neve, Peter Lambert, Rik Van de Walle, Towards block-based compression of genomic data with random access functionality, Proceedings of the IEEE GlobalSIP 2014 Workshop on Genomic Signal Processing and Statistics
References