13
Enabling Data Analytics in a SSD A Converged Platform Vladimir Alves, PhD CTO – NxGn Data Flash Memory Summit 2015 Santa Clara, CA 1

Enabling Data Analytics in a SSD - Flash Memory Summit Data Analytics in a SSD A Converged Platform Vladimir Alves, PhD CTO – NxGn Data Flash Memory Summit 2015 Santa Clara, CA 1

  • Upload
    vophuc

  • View
    219

  • Download
    3

Embed Size (px)

Citation preview

Enabling Data Analytics in a SSD

A Converged Platform Vladimir Alves, PhD CTO – NxGn Data

Flash Memory Summit 2015 Santa Clara, CA

1

The Evolution of Storage

Flash Memory Summit 2015 Santa Clara, CA

2

Since 1956 storage devices have been performing the same basic functions: read and write and sometimes fail…

Flash Memory Summit 2015 Santa Clara, CA

3

Computational Storage

Flash Memory Summit 2015 Santa Clara, CA

4

A  Converged  Pla.orm  that  takes  compute  to  data    

High performance storage

Computational capabilities

Virtualization

A Paradigm Shift in Storage

Flash Memory Summit 2015 Santa Clara, CA

5

Flash  Media  

HOST  I/F  &  FLASH  MANAGEMENT  

OS  

HW  ACCELERATION  

VIRTUALIZATION  (CONTAINER)   SW  AGENTS  VIRTUAL  TUNNEL  

(TCP/IP)  

+  

In-­‐Situ  processing  

A Paradigm Shift in Storage

Flash Memory Summit 2015 Santa Clara, CA

6

+  

In-­‐Situ  processing  

§  ~10x  accelera&on  in  data  analy&cs  §  Virtually  eliminate  network  &  interface  traffic  §  90%  reduc&on  in  energy  consump&on  §  75%  cost  reduc&on  

In-­‐Situ  Processing    

A Paradigm Shift in Storage

Flash Memory Summit 2015 Santa Clara, CA

7

+  

In-­‐Situ  processing  

Distributed  Grep  on  Amazon  S3/  CloudFiles  Ø  Low-­‐cost  search/scan  feature  in  cloud-­‐storage    

In-­‐situ  Cloud–enabled  MapReduce  Ø  Low-­‐cost  MapReduce  embedded  into  cloud-­‐storage  

Swi]/Ceph  Storage  Node  Ø  Lightweight,  low-­‐cost  Storage  Node  for  scalability  

Making  Hadoop  Fly  in  Clouds  Ø  Smarter  cloud  storage  for  performance  op&miza&on  

And  more…  

System Use Case: MapReduce

Flash Memory Summit 2015 Santa Clara, CA

8

System Use Case: MapReduce

Flash Memory Summit 2015 Santa Clara, CA

9

Leveraging    In-­‐Storage  Computa&on  

System Use Case: Distributed Grep

Flash Memory Summit 2015 Santa Clara, CA

10

Common Unix command for search: grep -Eh <regex> <inDir>/* | sort | uniq -c | sort -nr - counts lines in all files in <inDir> that match <regex> and displays the counts in descending order

C B B C

C A

3 C 1 A

Result File 2 File 1

System Use Case: Distributed Grep

Flash Memory Summit 2015 Santa Clara, CA

11

Map function in this case: - input is (file offset, line) - output is either:

1. an empty list [] (the line does not match) 2. a key-value pair [(line, 1)] (if it matches)

Reduce function in this case: - input is (line, [1, 1, ...]) - output is (line, n) where n is the number of 1s in the list.

Future Applications

Flash Memory Summit 2015 Santa Clara, CA

12

Genome Sequencing

High Performance Scientific Computing

IoT

Facial Recognition

Thank you

NxGn Data

Flash Memory Summit 2015 Santa Clara, CA

13