20
1 Gerrit and Jenkins for Big Data Continuous Delivery London, UK, June 2015

Gerrit jenkins-big data-continuous-delivery

Embed Size (px)

Citation preview

Page 1: Gerrit jenkins-big data-continuous-delivery

1

Gerrit and Jenkins for Big Data Continuous Delivery

London, UK, June 2015

Page 2: Gerrit jenkins-big data-continuous-delivery

www.gerritforge.com

#jenkinsconf

2

About GerritForge

• Founded in 2009 in London• Committed to OpenSource

Page 3: Gerrit jenkins-big data-continuous-delivery

www.gerritforge.com

#jenkinsconf

3

The Team

Luca Milanesio• Co-founder and Director of GerritForge • over 20 years in Agile Development and ALM• OpenSource contributor to many projects

(BigData, Continuous Integration, Git/Gerrit)

Antonios Chalkiopulos• Author of Programming MapReduce with Scalding• Open source contributor to many BigData projects• Working on the "land-of-Hadoop' (landoop.com)

Page 4: Gerrit jenkins-big data-continuous-delivery

www.gerritforge.com

#jenkinsconf

4

The Team (2)

Tiago Palma• Data Warehouse & Big Data Development

• Senior Data Modeler

• Big Data infrastructure specialist

Stefano Galarraga• 20 years of Agile Development• Middleware, Big Data, Reactive Distributed Systems. • Open Source contributor to many BigData projects.

Page 5: Gerrit jenkins-big data-continuous-delivery

www.gerritforge.com

#jenkinsconf

5

Agenda

• Why continuous deployment on BigData?• Our Development Lifecycle ingredients

– Gerrit, Jenkins, Mesos, Marathon, CDH / Spark

• Topics to address in BigData development – Type of tests (Unit vs. Integration)– Testing the "real thing" (aka the Cluster)

• Our BigData virtualised infrastructure– Marathon, Mesos and Dockers all around

• Live (minimised) Demo

Page 6: Gerrit jenkins-big data-continuous-delivery

www.gerritforge.com

#jenkinsconf

6

WHY?

• Early BigData had no process at all = may fail at any time• Mature BigData is mission critical decision maker• Need for more stable sw-engineering methodologies:

– Test-Driven Development (Stefano's ScaldingUnit)– Continuous Integration with Jenkins– Integration & Performance testing– Code review and validation

Page 7: Gerrit jenkins-big data-continuous-delivery

www.gerritforge.com

#jenkinsconf

7

Code-Review BigData Lifecycle (1)

• GIT used by distributed teams (UK, Israel, India)• Topics and Code Review• Jenkins build on every patch-set• Commits reviewed / approved via Gerrit Submit

Page 8: Gerrit jenkins-big data-continuous-delivery

www.gerritforge.com

#jenkinsconf

8

Code-Review BigData Lifecycle (2)

Page 9: Gerrit jenkins-big data-continuous-delivery

www.gerritforge.com

#jenkinsconf

9

Code-Review BigData Lifecycle (3)

• Submitting a Topic automatically does:– all patch-sets merged (semi-atomically)– trigger a longer chain of CI steps– automatically promote a RC if everything passes

• Jenkins automation via Gerrit Trigger Plugin

Page 10: Gerrit jenkins-big data-continuous-delivery

www.gerritforge.com

#jenkinsconf

10

Ingredients: Gerrit

• Git-based Code Review system

• Pre-commit review• Allows multiple validation steps

(pipeline)• Validation + Integration flags

Page 11: Gerrit jenkins-big data-continuous-delivery

www.gerritforge.com

#jenkinsconf

11

Ingredients: Jenkins

• Plugins:– Gerrit trigger– Docker build step– Post-build script plugin

Page 12: Gerrit jenkins-big data-continuous-delivery

www.gerritforge.com

#jenkinsconf

12

Fitting CDH Into this Picture

• Integration Test– Running integration tests into an CDH-enabled docker

container– Hadoop/local and Spark/standalone is not enough– Need to test classes serialisation– Validate package fat-jars (libs conflicts with CDH)– Performance on a real cluster

Page 13: Gerrit jenkins-big data-continuous-delivery

www.gerritforge.com

#jenkinsconf

13

Fitting CDH Into this Picture

• Acceptance / performance test with short-lived CDHs• Solution: Mesos, Marathon and Docker:

– Ephemeral clusters with defined capacity– Automatic cluster-config– All controlled via Docker/Mesos

Page 14: Gerrit jenkins-big data-continuous-delivery

www.gerritforge.com

#jenkinsconf

14

Mesos + Marathon

• Apache Mesos– Abstracts CPU, memory, storage, other compute

resources away from machines

• Marathon Framework– Runs on top of Mesos – Guarantees that long-running applications never

stop– REST API for managing and scaling services

Page 15: Gerrit jenkins-big data-continuous-delivery

www.gerritforge.com

#jenkinsconf

15

CDH Components

• CDH 5.4.1 distribution– Apache Spark– Hadoop HDFS– YARN

Page 16: Gerrit jenkins-big data-continuous-delivery

www.gerritforge.com

#jenkinsconf

16

Slave Host

Integration Test Flow on CDH Cluster

Jenkins Master

MesosMaster

Marathon PrivateDocker Registry

MesosSlave

Docker

POST to Marathon REST API to start 1 docker container with Cloudera Manager and N docker containers with cloudera agents

Marathon Framework receives resource offers from Mesos Master and submits the tasks

The task is sent to the Mesos Slave

Mesos slave starts the docker container

Docker image is fetched from Docker registry if not present in Slave hostW

aitin

g fo

r D

ocke

rs

Doc

kers

UP

Install Cloudera packages via Cloudera Manager API using Python

Deploy the ETL, run the ETL and the Integration Tests

Page 17: Gerrit jenkins-big data-continuous-delivery

www.gerritforge.com

#jenkinsconf

17

Unit and Integration Tests sample

• Test project:– Test Spark project – ETL from Oracle to HDFS

• Unit-test directly on Spark logic• Integration tests for every patch-set:

– VERY small dataset just for this demo– CDH and Oracle Docker Images

Page 18: Gerrit jenkins-big data-continuous-delivery

www.gerritforge.com

#jenkinsconf

18

O

Unit and Integration Tests

Hadoop Pseudo-distributed mode

Spark Standalone

Jenkins

Oracle

CDH

Build Jobinit

Submit job

Init/read HDFS

Page 19: Gerrit jenkins-big data-continuous-delivery

#jenkinsconf

DEMOSmall-scale of BigData Delivery Pipeline

19

Page 20: Gerrit jenkins-big data-continuous-delivery

www.gerritforge.com

#jenkinsconf

20

References

• Demo sources

https://github.com/GerritForge• Blog:

https://gitenterprise.me• Twitter:

@GerritReview @GitEnterprise @GerritForge• Learn Gerrit Code Review book:

GerritHub.io/book• Get in touch with GerritForge:

[email protected]