39
Resilience, ServiceDiscovery and ZD Deployment Bodo Junglas, York Xylander

Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

Resilience,  Service-­‐Discovery  and  Z-­‐D  Deployment

Bodo  Junglas,  York  Xylander

Page 2: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

100

Who  is  leanovate? 2

25  people,  HQ  in  Kreuzberg

Mission:  Build  learning  organizaIons

Page 3: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

100

Dev  

Java,Scala,  SPA,..

Meth  

Kanban,  Scrum,  Lean*,..

What  do  we  do? 3

ConsulIng

Coaching

Training

Doing

Page 4: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

100

Promises  of  µServices  architectures 4

Independent  (parallel)  development

Independent  release  cycles

Independent  scaling

Page 5: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

100

Challenges  of  µServices  architectures 5

• Developing & Running

• Configuring

• Debugging

• Deploying

• Discovering

• Resilience 

• …

Tough  to  learn  &  understand!

Page 6: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

Microzon:  A  lab  for  µServices

hYps://github.com/leanovate/microzon

Page 7: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

Case  Study:  Microzon-­‐Shop  

• Browse  through  catalog  • Select  product  • Put  into  cart  • Checkout

7

DEMO

Page 8: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

Case  Study:  Microzon-­‐Shop 8

Web Facade

Customer Product Cart Billing

Load-Balancer / Firewall

mysql mongo mysql mysql

Page 9: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

Technology-­‐Babylon   9

Customer Product Cart Billing

mysql mongo mysql mysql

Web FacadePlay 2.3

Scala 2.11

Spring-Boot 1.2.1JPA/Hibernate

Spray 1.3 / Akka 2.3Scala 2.10

ReactiveMongo

Scalatra Jetty…finatra 1.6

async-mysqlfinagle 6.20

Dropwizard 0.7JDBI

Page 10: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

Challenges:   10

• Running:

• How long does it take to get a dev system up and running for a new team member?

• How to run your ci system it in your favorite cloud?

• Configuring

• Debugging

Page 11: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

Case  Study:  Microzon-­‐Shop  

• Start  system  with  docker  • Create  products    • Logstash  • Zipkin

11

DEMO

Page 12: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

Why  is  that  a  challenge?   12

Web Facade

Customer Product Cart Billing

mysql mongo mysql mysql

Page 13: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

Why  is  that  a  challenge?   13

Web Facade

Customer Product Cart Billing

mysql mongo mysql mysql

Page 14: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

Why  is  that  a  challenge?   14

Web Facade

Customer Product Cart Billing

mysql mongo mysql mysql

Support UI Marketing tool Reporting

Paymentadapter

Page 15: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

Why  is  that  a  challenge?   15

Web Facade

Customer Product Cart Billing

mysql mongo mysql mysql

Support UI Marketing tool Reporting

Paymentadapter?

Page 16: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

100

Running/Configuring/Debugging

Takeaway 16

• …  will  quickly  become  a  non-­‐trivial  maYer  • We  have  chosen  …  

• docker  for  development  system  • puppet  for  »producIon«  • elas.csearch/logstash/kibana  for  distributed  logging  

• zipkin  for  request  tracing  • …  but  that  is  not  this  focus  of  this  talk

Page 17: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

Challenge:  Deploying   17

Zero-downtime deployment strategies/variants:

Blue/Green, Wave, Canary,…

ServiceVersion n+1

ServiceVersion n

Load

bala

ncer

/Ro

uter Delivery

Pipeline

ServiceVersion n

Deploy

Page 18: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

Challenge:  Service  discovery   18

• How to remove service nodes from the cluster or take them temporary offline?

• How to add new ones or take them online again?

How do services find each other? => ServiceDiscovery

Page 19: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

Service  Discovery  by  configuraFon

You  (mis)use  your  configuraIon  management  tool  (puppet/chef)  to  generate  service  configuraIon  with  explicit  endpoints  

19

ConfigManagement

Node2Node1 Node3

....................

....................

....................

....................

....................

....................

update

add/removenode entry

Page 20: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

Service  Discovery  by  configuraFon

Pros:  Simply  works  No  extra  technology  involved  (and  thereby  no  extra  point  of  failure)  

Cons:  To  take  a  service  node  offline  one  has  to  update  all  of  its  consumers  (or  wait  for  them  to  be  updated)  All  consumers  have  to  be  able  to  reload  their  configuraIon  without  restart  

20

Page 21: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

Service  Discovery  by  DNS

DNS  is  actually  a  service  discovery

21

CustomerService

ProductService2

DNS

ProductService1

10.X.X.1

ProductService3

10.X.X.2 10.X.X.3

{10.X.X.1,10.X.X.2,10.X.X.3}

product.loc ?

Page 22: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

DNS

Pros:  Old  technology  that  is  proven  to  work  on  a  very  very  large  scale  Supported  by  almost  everyone  

Cons:  Rather  crude  (and  very  inconvenient)  interface  (especially  for  updates)  Resolved  service  names  might  be  cached  on  mulIple  levels  Focusses  purely  on  service  nodes,  not  on  the  services  itself

22

Page 23: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

Also:  DNS  might  lead  to  wrong  assumpFons 23

public void connect( ... ) throws IOException { ... final InetAddress[] addresses = this.dnsResolver.resolve(host.getHostName()); ... for (int i = 0; i < addresses.length; i++) { final InetAddress address = addresses[i]; final boolean last = i == addresses.length - 1; Socket sock = sf.createSocket(context); ... try { sock.setSoTimeout(socketConfig.getSoTimeout()); ... conn.bind(sock); return; } catch (final SocketTimeoutException ex) { ... } catch (final ConnectException ex) { ... } } }

Apache HttpClient 4.3: org.apache.http.impl.conn.HttpClientConnectionOperator

Page 24: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

Service  discovery  service

Zookeeper  Curator  consul  etcd  doozerd  SkyDNS  Eureka  …

24

DiscoveryService

Customer

Product Cart

Billing

Page 25: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

EvaluaFon  criteria

-­‐  High  availability  -­‐  Consistency  (consistency  over  availability?)  -­‐  Support  for  service-­‐level  checks  -­‐  API  (hYp,  DNS?,  …)  -­‐  Footprint  (Memory/CPU)  -­‐  MulIple  datacenter  support  -­‐  UI  -­‐  Template  engine  (for  config  files)  

25

Page 26: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

Consul 26

Consul server cluster

consulserver

consulserver

consulserver

consulserver

consulserver

Raftconsensus

service node A

consul agent

service

http / dns

consul template(optional) config files

write

restartreload

health check

service node A

consul agent

service

http / dns

consul template(optional) config files

write

restartreload

health check

Page 27: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

Consul 27

Consul server cluster

consulserver

consulserver

consulserver

consulserver

consulserver

Raftconsensus

service node A

consul agent

service

http / dns

consul template(optional) config files

write

restartreload

health check

service node A

consul agent

service

http / dns

consul template(optional) config files

write

restartreload

health check

Page 28: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

Consul  

• Show  consul  UI  • Show  web-­‐service  status  page  • Add  cart/remove  cart

28

DEMO

Page 29: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

Challenge:  Resilience 29

Every  system  has  a  retry!

Page 30: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

Do  not  do  failover/retry  on  connecFon  

GET  ok  DELETE  ok  PUT  ok  POST  ???  

Duplicates  might  be  ok  (e.g.  create  new  shopping  cart)  …  or  not  (e.g.  register  new  customer)  might  be  solved  by  a  request  token  (e.g.  the  xsrf  token  from  the  web)  as  long  as  the  service  supports  this  

GET  really  ok?  What  about  streaming?

30

Page 31: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

Failover  should  be  part  of  the  business  

Usually  the  failover  strategy  depends  on  the  concrete  use-­‐case  Handling  failover  on  the  protocol  layer  (hYp-­‐client)  might  hide  error  scenarios  from  the  programmer  It  can  be  difficult  to  disInguish  between  technical  and  business  error  on  the  protocol  layer  

As  a  rule  of  thumb:  You  want  to  retry  all  technical  error,  but  not  the  business  errors  …  though  even  that  is  discussable  in  some  cases

31

Page 32: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

How  »not«  to  do  service  failover 32

public class ServiceFailover { private static final Random RANDOM = new Random(System.currentTimeMillis()); public static <E, R> R retry(final List<E> endpoints, final Requester<E, R> requester) throws IOException { final int size = endpoints.size(); if (size == 0) throw new RuntimeException("No active endpoints found"); final int offset = RANDOM.nextInt(size); IOException lastException = null; for (int idx = 0; idx < size; idx++) { final E endpoint = endpoints.get((idx + offset) % size); try { return requester.performTry(endpoint); } catch (IOException e) { lastException = e; } } throw lastException; } @FunctionalInterface public interface Requester<E, R> { R performTry(E endpoint) throws IOException; } }

Page 33: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

Resilience 33

• Kill  2  consul  nodes  • Kill  one  cart  node  •

DEMO

Page 34: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

Hystrix

Circuit-­‐Breaking  Fail-­‐Early  Developer’s  are  »forced«  to  think  in  commands  with  potenIal  fallback  result  rather  than  REST-­‐calls

34

Page 35: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

How  it  actually  should  look  like 35O

bser

ve

Serv

ice T

imeo

ut

Requ

est D

rain

ing

Load Balancer

Mon

itor

Trac

e

Obs

erve

Failu

re A

ccru

al

Requ

est T

imeo

ut

Pool

Fail

Fast

Expiration Dispatcher

according to finagle

Page 36: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

What  other  people  do

Nenlix  Many  libraries  and  tools  that  build  up  on  top  of  each  other  RxJava/ReacIveX:  ReacIve  extension/ReacIve  streaming  

Based  on  neYy:  RxNeYy  Hystrix:  Basic  command  system  for  circuit-­‐breaking/fail-­‐early  Eureka:  Service  registry  Ribbon:  REST-­‐Client  with  failover/service  discovery  based  on  Hystrix/Eureka  …

36

Page 37: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

1001

What  other  people  do

TwiYer  Services  based  on  finagle  (scala)  

…  which  is  itself  based  on  neYy  …  which  contains  als  the  basics  for  for  failover/retry/service-­‐discovery/monitoring  

Service-­‐Discovery  done  via  zookeeper,  but  can  be  adapted/extended  to  other  tools  Several  frameworks/connectors  build  on  top:  finatra,  async-­‐mysql-­‐connector  …

37

Page 38: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

100

Service  discovery

Takeaway 38

• Helps  a  lot  to  realize  …  • …  any  kind  of  zero-­‐downIme  deployment  strategy  • …  a  self-­‐healing  micro-­‐service  jungle  

• Does  not  create  a  fully  resilient  system  by  itself,  even  though  it  is  the  basis  of  it  

• Might  conflict  with  your  exisIng  configuraIon  system  (when  creaIng  config  files  via  templates)  

• Might  be  just  another  central  component  that  fails

Page 39: Resilience,)Service-Discovery) and)Z-D)Deployment · 2020-04-01 · 1001 Consul 26 Consul server cluster consul server consul server consul server consul server consul server Raft

100

Failover/Retry

Takeaway 39

• The  failover  strategy  usually  depends  on  the  business  case  

• A  full  failover  stack  is  quite  a  piece  of  work  • Emerging  frameworks  might  make  life  easier  or  at  least  provide  a  reference  implementaIon