Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes...

Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production

michele.orsi@lastminute.com manuel.ranieri@lastminute.com

KubeCon - Berlin, 29-30 March 2017

An inspiring travel company ..

A tech company to the core

Tech department: 300+ people

Applications: ~100

Database: 4 TB of data

Servers: 1400 VMs, 300 physical machines

Locations: Chiasso, Milan, Madrid, London, Bengaluru

Business: "technology is slow"

https://www.pexels.com/photo/turtle-walking-on-sand-132936/

Technology: "the monolith is the problem"

https://www.flickr.com/photos/southtopia/5702790189

https://www.pexels.com/photo/gray-pebbles-with-green-grass-51168/

"... let’s break into microservices!"

A lot of issues

● LONG provisioning time

● LACK OF alignment across environments

● LACK OF alignment across applications

● LACK OF awareness about ops

A year-long endeavour

● build a new, modern infrastructure

● migrate the search (flight/hotel) product there

... without:

● impacting the business● throwing away our whole datacenter

https://www.pexels.com/photo/colorful-toothed-wheels-171198/

Our infrastructure and our architecture

Virtualization platform

VIRTUAL MACHINES

Virtualization platform

VIRTUAL MACHINES

● CoreOS, the all-in-one choice

○ Cloudconfig configuration

○ Automatable in a shot

○ Really simple patch management

Engage

Our Kubernetes on CoreOS architecture is born

● The stack○ ETCD○ FLANNELD○ DOCKER

● KUBERNETES (Google!)

DOCKER

CoreOS

Server

NODE 2

NODE 1

NODE 2

NODE 1

How to talk with pods

tcp http

NodePort Ingress

In the name of service

- host: awesomeservice.prd.mykubecluster.intra http: paths: - path: / backend: serviceName: awesomeservice servicePort: 8081

awesomeservice-ingress.yaml

In the name of service

*.[prd|qa|dev].mykubecluster.intra. IN CNAME kubef5ingress

The return of NodePort

np Proxy

NODE n

F5 TLS TLSTLS

The registry brought another question...

Seriously?

Rear window on kubernetes

Server

graphite

OScollectd

Nagios first Grafana 4 now

icons from http://www.flaticon.com

Kube API

We were happy!

Not happy anymore

Seriously?

The change… It’s a kind of magic

KEEP CALM

andTRUST KUBERNETES

What we learned

Lots of things!

The final architecture (so far…)

DOCKER

Ubuntu

ETES INSIDE KUBERNETES:

3 different environments7 MASTERS

2 REGISTRYs+ 70 PHYSICAL NODES

+ 47 ETCDs+ 7 DNS

+ 140 Namespaces+ 1300 PODs

ingress

Our infrastructure and our architecture

https://www.pexels.com/photo/colorful-toothed-wheels-171198/

Our core axioms

● same architecture across environments● a common framework to align software● centralized monitoring/logging, with alerts● zero downtime deployment ● automation everywhere

APP3-PRODUCTION

Kubernetes: our architecture

APP2-PRODUCTIONAPP1-PRODUCTION

APP3-PRODUCTIONAPP2-PRODUCTIONAPP1-PREVIEW

APP3-PRODUCTIONAPP2-PRODUCTIONAPP1-DEVELOPMENT

APP3-PRODUCTIONAPP2-PRODUCTIONAPP1-QA

nonproductionproduction

Kubernetes: our architecture and choices

APP1-PRODUCTION

deployment

replica-set

app1-production.prd.mykubecluster.intra

secret configmap

POD-3POD-2POD-1

production

"To ingress or not to ingress? .."

NODE 1

NODE 2

NODE 3

● easier DNS management● customizable proxy server

● 3rd party tool● requires external sync● all requests go through it● reload risks

APP1-PRODUCTION

Kubernetes: our architecture and choices

production

applicationfluentdcollectd

carbon

APP1-PRODUCTION

Monitoring and alerting: grafana/graphitecluster

graphite

applicationcollectd

Grafana 4

icons from http://www.flaticon.com

carbon

Zero downtime (1): graceful shutdown

lifecycle: preStop: exec: command: ["/stop_helper.sh"]

deployment.yaml

#!/bin/bash

wget http://localhost:8002/stop

stop_helper.sh

Zero downtime (2): graceful startup

private CompletableFuture run(Stream<CompletableFuture> startupJobs) {

return allOf(startupJobs.toArray(CompletableFuture[]::new)) .thenAccept(this::raiseReadinessUp) .exceptionally(this::shutdown);

JobsExecutor.java

Automate everything: pipeline DSL

microservice = factory.newDeployRequest().withArtifact("com.lastminute.application1",2).fromGitRepo("git.lastminute.com/team/application")

lmn_deployCanaryStrategy(microservice,"qa") lmn_deployCanaryStrategy(microservice,"preview")lmn_deployCanaryStrategy(microservice,"production")

pipeline

Automate everything: pipeline

pulljar

builddocker

(gate)

QAcanary

(gate)

QAstable

(gate)

PREVcanary

(gate)

PREVstable

(gate)

PRODcanary

(gate)

PRODstable

● git push○ continuous integration○ continuous delivery

https://www.flickr.com/photos/ghost_of_kuji/2763674926

.. failure ..

nginx ingress controller problem

10.0.0.1

10.0.0.2

10.0.0.3

10.0.0.4

10.0.0.5

10.0.0.6

https://www.pexels.com/photo/grayscale-photography-of-person-at-the-end-of-tunnel-211816/

There’s light .. at the end

● 20K req/sec in the new cluster● 2M metrics/minute flows● 10 minutes to create a new environment ● whole pipeline runs in 16 minutes

○ 4 minutes to release 100 instances of a new version

Give me the numbers .. again!

Yes, we’re hiring!

THANKScareers.lastminutegroup.com

Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes...

Software

Immutable Infrastructure - USENIX · 2019-12-18 · 3 @zehicle #immutable Storytime! “Self-Bootstrapping Kubernetes” Kubecon in Nov 2017 we created this demo Simple “immutable”

KubeCon NA, Seattle, 2016: Performance and Scalability Tuning Kubernetes for OpenShift and Docker

KubeCon EU 2016: Kubernetes Storage 101

KubeCon EU 2016: Bringing an open source Containerized Container Platform to Kubernetes

Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

Lastminute.com - Make It Count

Join us for KubeCon + CloudNativeCon Virtual€¦ · Kubernetes CNI Integrating Project Antrea and NVIDIA Mellanox ConnectX SmartNICS ... Plumbing eth0 (network interface) into Pod

Kubernetes and lastminute.com: our course towards better scalability and processes

KubeCon + CloudNativeCon Europe 2021 - Virtual

3 About KubeCon + CloudNativeCon · 2019. 10. 28. · 3 KubeCon + CloudNativeCon Europe 2020 Amsterdam, The Netherlands | March 30 - April 2, 2020 10,000+ attendees KubeCon + CloudNativeCon

Lastminute.com Introduction. Who are we? lastminute.com is the UK’s leading online travel & leisure retailer, with over 6.5 million visitors per month,

KubeCon USA 2017 brief Overview - from Kubernetes meetup Bangalore

Deploying Kubernetes without scaring off your security team - KubeCon 2017

KubeCon EU 2016: SmartCity IoT on Kubernetes

KubeCon EU 2016: Scaling Open edX with Kubernetes

KubeCon EU 2016: Using Traffic Control to Test Apps in Kubernetes

KubeCon EU 2016: What is OpenStack's role in a Kubernetes world?

Transforming The Government - KubeCon

KubeCon EU 2016: Kubernetes meets Finagle for Resilient Microservices

KubeCon EU 2016: Kubernetes in Production in The New York Times newsroom