20
Ghent University Global Campus (GUGC) Research Seminar Wesley De Neve Ghent University – iMinds & KAIST Het Pand, Ghent, Belgium January 19, 2015 Towards Using Multimedia Technology for Biological Data Processing

Towards using multimedia technology for biological data processing

Embed Size (px)

Citation preview

Ghent University Global Campus (GUGC) Research Seminar

Wesley De Neve

Ghent University – iMinds & KAIST

Het Pand, Ghent, Belgium

January 19, 2015

Towards Using Multimedia Technologyfor Biological Data Processing

2

• Credentials

- Master’s degree in computer science (2002)

• at Ghent University, Belgium

- Ph.D. degree in computer science engineering (2007)

• at Ghent University, Belgium

• Employment

- Multimedia Lab @ Ghent University - iMinds, Belgium (since 2011)

- Image and Video Systems Lab @ KAIST, Korea (since 2007)

Background

3

Teaching Activities

Informatics 1 Informatics 2

4

• Main track

- machine learning for social mediaand video content understanding

• Side track

- compression of genomic data using video coding tools

Research Activities

5

COMPRESSION OF GENOMIC DATA USING VIDEO CODING TOOLS

In what follows…

6

• DNA sequencing (digitization) is quickly becoming cheaper

Context

7

• Challenge: data handling

- the ability of researchers to sequence DNA is outrunning their ability tostore, transmit, and analyze DNA

• Research question

- how about compressing DNA by making use of video coding tools in order to alleviate storage, transmission, and analysis problems?

Problem Statement

8

DNA Compression Framework

Input filter Encoding filterPipe

Output filterPipe PipePipe

Statistics

9

DNA Compression Framework

Input filter Encoding filterPipe

Output filterPipe PipePipe

Statistics

software for reading DNA data from the hard disk or the network

10

DNA Compression Framework

Input filter Encoding filterPipe

Output filterPipe PipePipe

Statistics

software for compressing DNA data

11

DNA Compression Framework

Input filter Encoding filterPipe

Output filterPipe PipePipe

Statistics

software for writing DNA data to the hard disk or the network

12

DNA Compression Framework

Input filter Encoding filterPipe

Output filterPipe PipePipe

Statistics

software for gathering compression performance statistics

13

• Modular and extensible

- thanks to the use of the pipes and filters design pattern

Characteristics (1)

Input filter Encoding filterPipe

Output filterPipe PipePipe

Statistics

14

• Block-based compression

- allows selecting the best compression tool per block (adaptivity)

- enables random access, streaming, and parallel processing

Characteristics (2)

Input filter Encoding filterPipe

Output filterPipe PipePipe

Statistics

15

Characteristics (3)

Efficiency

FunctionalityEffectiveness

Proposed solution

SOTA

allowing for a flexible trade-off betweenefficiency, effectiveness, and functionality

has always been a major design goal

16

• Effectiveness: compression of the human Y chromosome

• Efficiency: no meaningful measurements thus far

Experimental Results

Format File size (MB)

No compression (FASTA) 18.70

Binary 7.01

Huffman 5.16

Proposed framework 4.26

(*) Tom Paridaens, Yves Van Stappen, Wesley De Neve, Peter Lambert, Rik Van de Walle,Towards block-based compression of genomic data with random access functionality,

Proceedings of the IEEE GlobalSIP 2014 Workshop on Genomic Signal Processing and Statistics

17

• Compression

- integration of advanced entropy coding

- support for the protein alphabet

- performance optimizations (I/O, GPU)

• Privacy protection

- encryption

• Streaming

• Compressed-domain manipulation

- only download and decode that part of the compressed genome that belongs to a particular gene (region-of-interest)

Future Research (1)

Past

Future

18

• From

- What video coding technologies can be re-used in the context of DNA data compression?

• To

- What multimedia technologies can be re-usedin the context of biological data processing?

Future Research (2)

Past

Future

Thank you for your attention

Any questions or comments?

20

[1] Tom Paridaens, Wesley De Neve, Peter Lambert, Rik Van de Walle, Genome Sequences as Media Files: Towards Effective, Efficient, and Functional Compression of Genomic Data, Proceedings of DCBIOSTEC 2014

[2] Tom Paridaens, Yves Van Stappen, Wesley De Neve, Peter Lambert, Rik Van de Walle, Towards block-based compression of genomic data with random access functionality, Proceedings of the IEEE GlobalSIP 2014 Workshop on Genomic Signal Processing and Statistics

References