devops@cineca

Embed Size (px)

Citation preview

Riccardo Capecchi Andrea CapriottiMatteo Turra

Cineca - About Us

Cineca is a non profit Consortium, made up of 70 Italian universities*, 5 Italian Research Institutions and the Italian Ministry of Education.Today it is the largest Italian computing centre, one of the most important worldwide. With around 700 employees, it operates in the technological transfer sector through high performance scientific computing, the management and development of networks and web based services, and the development of complex information systems for treating large amounts of data.

Cineca Organization

In 2012:5 Big Development teams (split in smaller project teams)

1 Infrastructure Team (In charge of all services and split to give support to the different dev teams)

1 High Performance Computing team

Cineca Organization

Ops How it used to be

2012 Golden Image installation and management of the configurations via overrides

Ops Infrastructure as code

2013 A corporate merge with Cilea and Caspur gave us new goals to manage the new infrastructure and merge the 3 Ops teams.After a scouting of the best/more used tools we choose to use puppet.The 3 IT-Ops teams started a join project to write and configure everything (related to Linux servers) with Puppet.With puppet we started also to use Git as VCS to keep all puppet files.

Ops Expanding our code

From 1 of January 2014 we started to manage all our Linux installations with Puppet and started to expand the modules base to manage more services/configurations.

We also started a massive training of all Ops staff, in around 2 months more than 60 people received internal training regarding puppet.

Ops Automation

On 2015 we continued to expand our code in puppet and started to integrate it with other tools such as Nagios, Monit and Collectd to automatically get the services with everything they needed to be production-ready.We also expanded our use of Jenkins to easily manage and automate some common tasks.On summer 2015 we received the task of re-make the infrastrucure related to the application Pentaho.

Use Case:

Use Case:

New platform release to deliver on 80 customers on premise and hosted infrastructure!

Platform Components

Tomcat + a bunch of jars and wars

Use Case:

NO PROBLEM!

I'm a Black Belt Tomcat Expert

Java WebApp deployment

Oracle JDK 6/7

Apache Tomcat

Apache HTTPD with mod_jk

Jar(s) & War(s)

Some shell scripts

Mega Package!

Everytime a little update
is needed

i.e: Xalan bugfix

Solution: Standardize

DEV ENVIRONMENT

OPS ENVIRONMENT

DEV vs OPS

Interaction between DEV and OPS

Interaction between DEV and OPS

Developer

Operation

Merge Request

Accept/Reject Request

GitLab Web User Interface

Test Driven Puppet:

How to test puppet scripts?

Clean environment

Reuse of code

Side Effect: docker images

From Puppet Apply to Pupper Agent

Provisioning: Docker vs Puppet

Puppet is slow but module are extremly powerful

Dockerfile has limited configuration but very fast to build a new image based on an existing one.

Use slow puppet to build images (run once)

Note: Puppet can be used for container provisioning

Portability: The same image can be used in test, staging, production and development, lowering the diversity of environments

Deploy on Docker

Now we have Docker images.

Where do we store it?

How can I run container for test, qa, production environment?

How to reach high-availability and scalability?

OPS have the answer!

Docker swarm

Turns a pool of Docker host machines into a single virtual host

Allows us to distribute container workloads across multiple machines running in a cluster

Serves the standard Docker API

Ships with simple scheduling and discovery backend

Consul

Service discovery and configuration

Failure detection

Key/Value storage e DNS server

Distributed and highly available system with gossip protocol (Serf based)

Registrator

Service registry bridge for docker

Monitors the Docker UNIX socket for events

Dynamically registering and unregistering Docker containers services

Docker registry
(Distribution)

Image registry to store and distribute images

Private registry to host customer applications

Test and continuous integration images

DockerUI

A web interface for docker

Start, stop, kill, pause, restart and commit containers

Provide details about running containers

Log management: Elk

Central point to collect, manage and visualize logs from all containers

Docker apps

Docker Swarm, Consul - Registrator

Devops done, but we forgot ...

What we have now

Puppet~800 Installations of Linux servers fully managed via puppetMore than 100 modulesTraining done on ops and some devDev can do merge-request to apply change to infrastructureDockerProcedures to build images from puppet recipesTest done on a full docker environmentTraining done on dev AND ops.OpenshiftOrchestration of containersManage RBAC for dev and ops

Conclusions

Devops in practice it's more about organization than tools and softwares.But tools can help in day by day operations, using gitlab we have been able to give RBAC access to different Git repositoriesA lot of training is needed on both sides, learning the best dev practice to ops people and viceversa.After months of work together to find a solution we forgot to detail it to our CTO, and so he was thinking we were on stale, lesson learnt.

How it ended

Questions