61
Alban Crequy Exploration of Linux Container Network Monitoring and Visualization ContainerCon Europe - October 2016 https://goo.gl/iDL8te

An Exploration of Linux Container Network Monitoring and

  • Upload
    ngoanh

  • View
    253

  • Download
    3

Embed Size (px)

Citation preview

Page 1: An Exploration of Linux Container Network Monitoring and

Alban Crequy

Exploration of Linux Container Network Monitoring and

Visualization

ContainerCon Europe - October 2016https://goo.gl/iDL8te

Page 2: An Exploration of Linux Container Network Monitoring and

Alban Crequy

∘ Worked on the rkt container run-time∘ Contributed to systemd

https://github.com/alban

Page 3: An Exploration of Linux Container Network Monitoring and

Berlin-based software company building foundational Linux technologies

Some examples of what we work on...

OSTreegit for operating system binaries

Page 4: An Exploration of Linux Container Network Monitoring and

Find out more about us…

Blog: https://kinvolk.io/blog

Github: https://github.com/kinvolk

Twitter: https://twitter.com/kinvolkio

Email: [email protected]

Page 5: An Exploration of Linux Container Network Monitoring and

∘ First use-case: visualizing tcp connections∘ Microservices application with containers: Weave Socks∘ CoreOS Linux, Kubernetes, Weave Scope

∘ Using /proc & conntrack∘ Limitations∘ proc connector, eBPF & kprobes

∘ Next use cases:∘ L7, HTTP: eBPF & kprobes∘ Simulating degraded networks with traffic control

Plan

Page 6: An Exploration of Linux Container Network Monitoring and

The demo application

Page 7: An Exploration of Linux Container Network Monitoring and

microservices-demo

https://github.com/microservices-demo/microservices-demo

Page 8: An Exploration of Linux Container Network Monitoring and

Some micro-services

front-end Firefox

catalogue

ordersorders-db

payment

Page 9: An Exploration of Linux Container Network Monitoring and
Page 10: An Exploration of Linux Container Network Monitoring and

Orchestrating containersWith Kubernetes

Page 11: An Exploration of Linux Container Network Monitoring and

Kubernetes Replica Sets

Kubernetesnode 1

front-end

Kubernetesnode 2

front-end

Kubernetesnode 3

ordersorders

catalogue catalogue

Page 12: An Exploration of Linux Container Network Monitoring and

Kubernetesnode 1

front-end

Kubernetesnode 2

front-end

Kubernetesnode 3

ordersorders

Kubernetes Services

orders service

Page 13: An Exploration of Linux Container Network Monitoring and

Kubernetes ServicesProxying the traffic from the virtual service IP to a Kubernetes pod

Several implementations possible:

- Userspace proxy in kube-proxy- Iptables rules (Destination NAT) installed by kube-proxy- Cilium implements a load balancer based on eBPF (tc level)

Page 14: An Exploration of Linux Container Network Monitoring and

Weave Scope

Page 15: An Exploration of Linux Container Network Monitoring and

Weave Scope

Page 16: An Exploration of Linux Container Network Monitoring and

Weave Scope

demo

Page 17: An Exploration of Linux Container Network Monitoring and

procfs

Page 18: An Exploration of Linux Container Network Monitoring and

procfs files- /proc/$PID- /proc/$PID/ns/net network namespace- /proc/$PID/fd/ file descriptors- /proc/$PID/net/tcp tcp connections

Page 19: An Exploration of Linux Container Network Monitoring and

procfs files

Page 20: An Exploration of Linux Container Network Monitoring and

procfs limitations- No notifications- Need to read procfs for

- new processes- new network namespaces- new sockets- every second?

- CPU intensive for systems with high number of processes- Missing short-lived connections- Issues with packet modifications (e.g. DNAT)

Page 21: An Exploration of Linux Container Network Monitoring and

Packet modifications

Local process

Socket lookup

Traffic control, ingress

packet

Protocol layer

Network layer

Link layer

Local process

NAT

Traffic control, egress

Kubernetes node 1 Kubernetes node 2

Page 22: An Exploration of Linux Container Network Monitoring and

Netlink

Page 23: An Exploration of Linux Container Network Monitoring and

Netlink socketssocket(AF_NETLINK, SOCK_RAW, NETLINK_...);

Several Netlink sockets:

- NETLINK_ROUTE- NETLINK_INET_DIAG- NETLINK_SELINUX- NETLINK_CONNECTOR- NETLINK_NETFILTER- ...

Page 24: An Exploration of Linux Container Network Monitoring and

conntrack

Page 25: An Exploration of Linux Container Network Monitoring and

conntrack -E- Use NETLINK_NETFILTER sockets to subscribe to Conntrack events

from the kernel - Is aware of NAT rewritings

Page 26: An Exploration of Linux Container Network Monitoring and

conntrack limitations- Conntrack events don’t include:

- Process ID- Network namespace ID

- Conntrack zones included but not necessary used by container run-times

- So harvesting procfs regularly still necessary

Page 27: An Exploration of Linux Container Network Monitoring and

Other kind of Netlink sockets?

Page 28: An Exploration of Linux Container Network Monitoring and

NETLINK_INET_DIAGsocket(AF_NETLINK, SOCK_RAW, NETLINK_INET_DIAG);

- Fetch information about sockets- Used by ss (“another utility to investigate sockets”)- Basic bytecode to filter the sockets (e.g. “INET_DIAG_BC_JMP”)

- But no notification mechanism- Patch “sock_diag: notify packet socket creation/deletion” (2013)

rejected

Page 29: An Exploration of Linux Container Network Monitoring and

Kernel Connectorsocket(AF_NETLINK, SOCK_RAW, NETLINK_CONNECTOR);

Several Kernel Connector agents:

- Device mapper- HyperV- Proc connector

Page 30: An Exploration of Linux Container Network Monitoring and

Proc connectorbind(sockfd, ...CN_IDX_PROC...);

sendmsg(sockfd, ...PROC_CN_MCAST_LISTEN...)

- Since Linux v2.6.15 (January 2006)

Notifications for:

- fork- exec- exit

Page 31: An Exploration of Linux Container Network Monitoring and

Proc connectorMissing:

- network namespace- RFC patch “proc connector: add namespace events” last month

https://lkml.org/lkml/2016/9/8/588- Sockets

So harvesting procfs regularly still necessary

Page 32: An Exploration of Linux Container Network Monitoring and

Proc connector

demo

Page 33: An Exploration of Linux Container Network Monitoring and

BPF

Page 34: An Exploration of Linux Container Network Monitoring and

Classic BPF (cBPF)

socket

kernel

userspace

BPF_JMP...BPF_LD...BPF_RET...

setsockopt(sockfd,SOL_SOCKET,SO_ATTACH_FILTER,&bpf, sizeof(bpf));recvfrom()

Page 35: An Exploration of Linux Container Network Monitoring and

Extended BPF (or eBPF)Program type:

- BPF_PROG_TYPE_SOCKET_FILTER- BPF_PROG_TYPE_KPROBE- BPF_PROG_TYPE_SCHED_CLS- BPF_PROG_TYPE_SCHED_ACT- BPF_PROG_TYPE_TRACEPOINT (Linux >= 4.7)- BPF_PROG_TYPE_XDP

Page 36: An Exploration of Linux Container Network Monitoring and

eBPF classifier for qdiscs

eth0

classifier

kernel

userspace

BPF_JMP...BPF_LD...BPF_RET...

if (skb->protocol…) return TC_H_MAKE(TC_H_ROOT, mark); compilation

clang... -march=bpf

uploadin the kernel:

- bpf()- Netlink

x86_64 codeJIT compilation

Page 37: An Exploration of Linux Container Network Monitoring and

eBPF maps

kernel

userspace

x86_64 code

eBPF maps

Userspace program

∘ Keep context between calls∘ Report statistics to userspace

Page 38: An Exploration of Linux Container Network Monitoring and

Tracepoints with eBPF- BPF_PROG_TYPE_TRACEPOINT since Linux 4.7- Find the list of tracepoints in /sys/kernel/debug/tracing/events- Stable API- But limited tracepoints

Page 39: An Exploration of Linux Container Network Monitoring and

kprobes with eBPF- BPF_PROG_TYPE_KPROBE since Linux 4.1- No ABI guarantees- Probe any kernel function

Page 40: An Exploration of Linux Container Network Monitoring and

Socket events with kprobe / eBPF- BPF Compiler Collection (BCC)

- bcc/examples/tracing/tcpv4connect.py- Iago’s tcp4tracer (WIP)

- Get connection tuple, pid, netns- tcp_v4_connect- tcp_close- inet_csk_accept

Page 41: An Exploration of Linux Container Network Monitoring and

Packet modifications

Local process

Socket lookup

Traffic control, ingress

packet

Protocol layer

Network layer

Link layer

Local process

NAT

Traffic control, egress

Kubernetes node 1 Kubernetes node 2

Page 42: An Exploration of Linux Container Network Monitoring and

tcp4tracer & NAT- The connection tuple from the process’ point of view is not enough

- NAT- Kubernetes Services

- Iago’s tcp4tracer (WIP)- nf_nat_ipv4_manip_pkt- nf_nat_tcp_manip_pkt

Page 43: An Exploration of Linux Container Network Monitoring and

More metrics

Page 44: An Exploration of Linux Container Network Monitoring and

Weave Scope architecture

Kubernetesnode 1

Kubernetesnode 2

Scope App

Scope Probe

Firefox

Scope Probe

Page 45: An Exploration of Linux Container Network Monitoring and

Weave Scope plugins

Kubernetesnode 1

Kubernetesnode 2

Scope App

Scope Probe

Firefox

Scope Probe

plugin plugin plugin plugin

Page 46: An Exploration of Linux Container Network Monitoring and
Page 47: An Exploration of Linux Container Network Monitoring and

HTTP requests plugin- Number of HTTP requests per second- Without instrumenting the application- eBPF kprobe on skb_copy_datagram_iter

kernel

userspace

HTTP serverHTTP client

recvfrom()sendmsg()

GET / HTTP/1.1 skb_copy_datagram_iter()copies the skb into the iovec

Page 48: An Exploration of Linux Container Network Monitoring and

HTTP responses plugin- Number of HTTP responses by category (404, etc.)- Without instrumenting the application- eBPF kprobe on skb_copy_datagram_from_iter- Using an eBPF map to track the context between kprobe & kretprobe

kernel

userspace

HTTP serverHTTP client

sendmsg()recvfrom()

HTTP/1.0 200 OK skb_copy_datagram_from_iter()copies the iovec into the skb

Page 49: An Exploration of Linux Container Network Monitoring and

Testing degraded networks

Page 50: An Exploration of Linux Container Network Monitoring and

Traffic control, why?

web server client

client

client

THEINTERNET

∘ fair distribution of bandwidth

∘ reserve bandwidth to specific applications

∘ avoid bufferbloat

Page 51: An Exploration of Linux Container Network Monitoring and

∘ Network scheduling algorithm∘ which packet to emit next?∘ when?

∘ Configurable at run-time:∘ /sbin/tc∘ Netlink

∘ Default on new network interfaces: sysctl net.core.default_qdisc

Queuing disciplines(qdisc)

eth0 THE INTERNETqdisc

Page 52: An Exploration of Linux Container Network Monitoring and

Stochastic FairnessQueueing (sfq)

eth0

THE INTERNET

FIFO n

FIFO 1

FIFO 0

...

round robin

Page 53: An Exploration of Linux Container Network Monitoring and

Demo

Reproduce this demo yourself: https://github.com/kinvolk/demo

Page 54: An Exploration of Linux Container Network Monitoring and

Network emulator(netem)

eth0 THE INTERNETnetem

bandwidth

latency packet loss

corrupt...

Page 55: An Exploration of Linux Container Network Monitoring and

Testing with containers

container 1 container 2

eth0eth0

Testing framework

configure “netem” qdiscs:bandwidth, latency, packet drop...

Page 56: An Exploration of Linux Container Network Monitoring and

Add latency on a specific connection

front-end Firefox

catalogue

ordersorders-db

payment

latency=100ms

Page 57: An Exploration of Linux Container Network Monitoring and

How to define classes of traffic

eth0

netem

interface

latency=100ms

dest_ip=10.0.4.* dest_ip=10.0.5.* other

Page 58: An Exploration of Linux Container Network Monitoring and

u32: filter on contenteth0

HTB

HTB

HTBHTB HTB

netemnetem netem

interface

root qdisc (type = HTB)

root class (type = HTB)

leaf qdiscs (type = netem)

leaf classes (type = HTB)

filters (type=u32)

otherip=10.0.5.*ip=10.0.4.*

latency=10ms

Page 59: An Exploration of Linux Container Network Monitoring and

Filtering with cBPF/eBPF

eth0

BPF

netemnetem

kernel

userspace

BPF_JMP...BPF_LD...BPF_RET...

if (skb->protocol…) return TC_H_MAKE(TC_H_ROOT, mark); compilation

clang... -march=bpf

uploadin the kernel:

- bpf()- Netlink

x86_64 codeJIT compilation

Page 60: An Exploration of Linux Container Network Monitoring and

eBPF maps

eth0

BPF

netemnetem

kernel

userspace

x86_64 code

eBPF map

tc

Page 61: An Exploration of Linux Container Network Monitoring and

Questions?The slides: https://goo.gl/iDL8te