20

Radu Pastia - Couchdoop - Connecting Hadoop with Couchbase

  • Upload
    huguk

  • View
    119

  • Download
    0

Embed Size (px)

DESCRIPTION

Radu Pastia: I've been working with Hadoop two years ago, when I started the Big Data Team at Avira. At first I was oriented more towards the operations side - sizing and setting up our new Hadoop cluster to run smoothly. As our setup stabilized, I started delving deeper into data science and machine learning. I have been coding ever since I had my first home computer running BASIC and my background before Hadoop is in backend scripting for web-based applications.

Citation preview

Page 1: Radu Pastia - Couchdoop - Connecting Hadoop with Couchbase
Page 2: Radu Pastia - Couchdoop - Connecting Hadoop with Couchbase

pas$aro.wordpress.com    

@rpas$a  

Page 3: Radu Pastia - Couchdoop - Connecting Hadoop with Couchbase
Page 4: Radu Pastia - Couchdoop - Connecting Hadoop with Couchbase

Building a connector – The Wrong Way

Mapper   Reducer  

Page 5: Radu Pastia - Couchdoop - Connecting Hadoop with Couchbase
Page 6: Radu Pastia - Couchdoop - Connecting Hadoop with Couchbase

Building a connector – The Right Way

Mapper   Reducer  Par$$oner  

Input  Split  

Input  Format  

Record  Reader  

Record  Writer  

Output  Format  

Page 7: Radu Pastia - Couchdoop - Connecting Hadoop with Couchbase
Page 8: Radu Pastia - Couchdoop - Connecting Hadoop with Couchbase
Page 9: Radu Pastia - Couchdoop - Connecting Hadoop with Couchbase
Page 10: Radu Pastia - Couchdoop - Connecting Hadoop with Couchbase

The InputFormat: From Input to Mapper --range 2014-09-01;2014-09-20

--number_of_mappers 4

2014-­‐09-­‐01   2014-­‐09-­‐02  2014-­‐09-­‐03  

2014-­‐09-­‐04  

2014-­‐09-­‐05  

…  …  …  

2014-­‐09-­‐06  

2014-­‐09-­‐20  

2014-­‐09-­‐01  

2014-­‐09-­‐02  

2014-­‐09-­‐05  

.

.

.    

Input Split 1

(2014-­‐09-­‐01-­‐A;  record  A)  

(2014-­‐09-­‐01-­‐B;  record  B)  

(2014-­‐09-­‐01-­‐…;  record  …)  

(2014-­‐09-­‐02-­‐A;  record  A)  

(2014-­‐09-­‐02-­‐B;  record  B)  

(2014-­‐09-­‐02-­‐…;  record  …)  

(2014-­‐09-­‐05-­‐A;  record  A)  

(2014-­‐09-­‐05-­‐B;  record  B)  

(2014-­‐09-­‐05-­‐…;  record  …)  

Record Reader 1

Mapper  

Page 11: Radu Pastia - Couchdoop - Connecting Hadoop with Couchbase
Page 12: Radu Pastia - Couchdoop - Connecting Hadoop with Couchbase
Page 13: Radu Pastia - Couchdoop - Connecting Hadoop with Couchbase
Page 14: Radu Pastia - Couchdoop - Connecting Hadoop with Couchbase
Page 15: Radu Pastia - Couchdoop - Connecting Hadoop with Couchbase
Page 16: Radu Pastia - Couchdoop - Connecting Hadoop with Couchbase

The InputFormat: From Input to Mapper --range 2014-09-01;2014-09-20

--number_of_mappers 4

2014-­‐09-­‐01   2014-­‐09-­‐02  2014-­‐09-­‐03  

2014-­‐09-­‐04  

2014-­‐09-­‐05  

…  …  …  

2014-­‐09-­‐06  

2014-­‐09-­‐20  

2014-­‐09-­‐01  

2014-­‐09-­‐02  

2014-­‐09-­‐05  

.

.

.    

Input Split 1

(2014-­‐09-­‐01-­‐A;  record  A)  

(2014-­‐09-­‐01-­‐B;  record  B)  

(2014-­‐09-­‐01-­‐…;  record  …)  

(2014-­‐09-­‐02-­‐A;  record  A)  

(2014-­‐09-­‐02-­‐B;  record  B)  

(2014-­‐09-­‐02-­‐…;  record  …)  

(2014-­‐09-­‐05-­‐A;  record  A)  

(2014-­‐09-­‐05-­‐B;  record  B)  

(2014-­‐09-­‐05-­‐…;  record  …)  

Record Reader 1

Mapper  

Page 17: Radu Pastia - Couchdoop - Connecting Hadoop with Couchbase
Page 18: Radu Pastia - Couchdoop - Connecting Hadoop with Couchbase
Page 19: Radu Pastia - Couchdoop - Connecting Hadoop with Couchbase
Page 20: Radu Pastia - Couchdoop - Connecting Hadoop with Couchbase