Upload
trankiet
View
221
Download
0
Embed Size (px)
Citation preview
About Me• Lei (Harry) Zhang @resouer
• #CNCF member, #Microsoft MVP
• Previous: VMware, Baidu
• Feature Maintainer of Kubernetes
• HyperCrew: https://hyper.sh/
• Publication: Docker & Kubernetes Under the Hood
• Phd Candidate #Large-scale cluster scheduling and management
A survey about “boundary”
• Are you comfortable with Linux containers as an effective boundary?
• Yes, I use containers in my private/safe environment
• No, I use containers to serve the public cloud
As long as we care security…• We have to wrap containers inside full-blown virtual machines
• But we lose cloud-native deployment
• Slow startup time
• Huge resources wasting
• Memory tax for every container
• …
dream
reality
Revisit container
• Container Runtime
• The dynamic view and boundary of your running process
• Container Image
• The static view of your program, data, dependencies, files and directories
namespace cgroups
FROM busybox
ADD temp.txt /
VOLUME /data
CMD [“echo hello"]
Read-Write Layer & /data
“echo hello”
read-only layer
/bin /dev /etc /home /lib /lib64 /media /mnt /opt /proc /root /run /sbin /sys /tmp /usr /var /data /temp.txt
/etc/hosts /etc/hostname /etc/resolv.conf
read-write layer
/temp.tx
t
json
json
init layer
FROM busybox ADD temp.txt / VOLUME /data CMD [“echo hello"]Docker Container
HyperContainer• Container Runtime
• RunV
• https://github.com/hyperhq/runv
• A OCI compatible hypervisor based runtime implementation
• Control daemon
• https://github.com/hyperhq/hyperd
• Container Image
• Docker Image Spec
Combine the best parts• Portable and behaves like a Linux container
• $ hyperctl run -t busybox echo helloworld
• sub-second startup time*, ~12MB memory cost
• Fully isolated sandbox with an independent guest kernel
• $ hyperctl exec -t busybox uname -r
• 4.4.12-hyper (or your provided kernel)
• security, backward compatibility, maturity
See: http://hypercontainer.io/why-hyper.html
HyperContainer is a Pod
• That’s how HyperContainer fits into the Kubernetes philosophy
• Wait, why Pod is so important?
Pod: lesson learned from Borg
• InitContainers: one or more containers started in sequence before the pod's normal containers are started.
• Share volumes, perform network operations, and perform computation prior to the app containers.
So, Pod is• The group of super-affinity containers
• The atomic scheduling unit
• The process group in container cloud
• Do right things
• without modifying your container image
• Kubernetes = Spring Framework
• Pod = IoC
Pod
log app
infra container
volume
init container
Pod is not easy to simulate• log super affinity app
• Requirement:
• app: 1G, log: 0.5G
• Available:
• Node_A: 1.25G, Node_B: 2G
• What happens if app scheduled to Node_A?
HyperContainer is a Pod
• Linux container based runtimes
• wraps and encapsulates several app containers into a logical group
• Hypervisor container based runtime
• hypervisor serves as a natural boundary of Pod
HyperContainer is a Pod• Container Runtime Interface
• create sandbox Foo --> create container C --> start container C
• stop container C --> remove container C --> delete sandbox Foo
• Sandbox
• Normally: the infra container
• HyperContainer: hypervisor
• with HyperKernel
• a HyperStart process as PID 1
• setup mnt namespace, launch apps from the images etc
Hypernetes• Also: h8s
• Kubernetes + HyperContainer runtime
• officially supported by using kubernetes/frakti
• Multi-tenant network and persistent volumes
• battle tested Neutron + Cinder plugin
Multi-tenant Network• Goal:
• leveraging tenant-aware neutron network for Kubernetes
• following the network plugin workflow
• Non-goal:
• break k8s network model or hack k8s code
Define the Network
• Network
• a top class api object
• each tenant (created by Keystone) has its own Network
• Network mapping to Neutron “net”
• a Network Controller is responsible to manage Network lifecycle
Examplekubelet
SyncLoop
controller-managerControlLoop
kubeletSyncLoop
proxy
proxy
networkpod replica namespace service job deployment volume petset …
etcd
scheduler
api-server
Desired World Real World
Call Neutron to create/delete
network
Kubernetes Network Model• Container reach container
• all containers can communicate with all other containers without NAT
• Node reach container
• all nodes can communicate with all containers (and vice-versa) without NAT
• IP addressing
• Pod in cluster can be addressed by its IP
How h8s fits that?• Network can be assigned to one or more
Namespaces
• Pods belonging to the same Network can reach each other directly through IP
• a Pod’s network mapping to Neutron “port”
• kubelet network plugin is responsible for Pod network setup
Examplekubelet
SyncLoop
kubeletSyncLoop
proxy
proxy
3.1 New pod object detected3.2 Bind pod with node
etcd
scheduler
api-server
Examplekubelet
SyncLoop
kubeletSyncLoop
proxy
proxy
4.1 Detected pod bind with me4.2 Start containers in pod
etcd
scheduler
api-server
Design of kubelet
InitNetworkPlugin
Choose Runtimedocker, rkt, hyper/remote
InitNetworkPlugin
HandlePods{Add, Update, Remove, Delete, …}
NodeStatus
Network Status
status Manager
PLEG
SyncLoop
Pod Update Worker (e.g.ADD) • generale Pod status • check volume status (talk later) • call runtime to start containers
• set up Pod network (see next slide)
volume Manager
PodUpdate
image Manager
kubestack
A standalone gRPC daemon
1. to “translate” the SetUpPod request to the Neutron network API
2. handling multi-tenant Service proxy
Service$ iptables-save | grep my-service -A KUBE-SERVICES -d 10.0.0.116/32 -p tcp -m comment --comment "default/my-service: cluster IP" -m tcp --dport 8001 -j KUBE-SVC-KEAUNL7HVWWSEZA6
-A KUBE-SVC-KEAUNL7HVWWSEZA6 -m comment --comment "default/my-service:" --mode random -j KUBE-SEP-6XXFWO3KTRMPKCHZ -A KUBE-SVC-KEAUNL7HVWWSEZA6 -m comment --comment "default/my-service:" --mode random -j KUBE-SEP-57KPRZ3JQVENLNBRZ
-A KUBE-SEP-6XXFWO3KTRMPKCHZ -p tcp -m comment --comment "default/my-service:" -m tcp -j DNAT --to-destination 172.17.0.2:80 -A KUBE-SEP-57KPRZ3JQVENLNBRZ -p tcp -m comment --comment "default/my-service:" -m tcp -j DNAT --to-destination 172.17.0.3:80
portal 10.10.0.116:8001
random mode rules
backend rule_1
backend rule_2
172.17.0.2.:80
172.17.0.3.:80
OnServiceUpdate
OnEndpointsUpdate
Multi-tenant Service• Default iptables-based kube-proxy is not tenant aware
• Endpoint Pods and Nodes with iptables rules are isolated into different networks
• Hypernetes uses a built-in HAproxy as the Service portal
• to proxy all Service instances within same namespace
• the same OnServiceUpdate and OnEndpointsUpdate process
• ExternalProvider
• a OpenStack LB will be created as Service
• e.g. curl 58.215.33.98:8078
Kubernetes Persistent Volume
Host
path
Cinder volume plugin
Pod PodmountPath mountPath
attach
mount
VolumeManager desired
World
reconcile
• Get mountedVolume from actualStateOfWorld
• Unmount volumes in mountedVolume but not in desiredStateOfWorld
• AttachVolume() if vol in desiredStateOfWorld and not attached
• MountVolume() if vol in desiredStateOfWorld and not in mountedVolume
• Verify devices that should be detached/unmounted are detached/unmounted
• Tips:
1. -v host:path
2. attach VS mount
3. Totally independent from container management
Persistent Volume with HyperContainer• Enhanced Cinder volume plugin
• Linux container:
1. full OpenStack cluster
2. query Nova to find node
3. attach Cinder volume to host path
4. bind mount host path to Pod containers
• HyperContainer:
• directly attach block devices to Pod
• thanks to the hypervisor based Pod boundary
• eliminates extra time to query Nova
Host
vol
Enhanced Cinder volume plugin
Pod PodmountPath mountPath
attach vol
desired World
reconcile
VolumeManager
Future of CRI
• Keep Docker as the only one default container runtime
• ocid, rktlet, hyperd
• Frakti: the Remote Container Runtime Kit
• https://github.com/kubernetes/frakti
• welcome to tryout, star and fork
“if image becomes non-standard”
• e.g. Docker image becomes somehow Docker specific
• Don’t worry, kubelet.imageManager is moving to runtime specific
• but then k8s will probably choose
• NO DEFAULT runtime
Node Node
Full TopologyNode
kubestack
Neutron L2 Agent
kube-proxy
kubelet
Cinder Plugin
Pod Pod Pod PodKeyStone
Neutron
Cinder
Master
Object: Network
Ceph
Object: Pod
Object: …
Summary• A new way to build secure and multi-tenant Kubernetes
• Kubernetes + HyperContainer + Neutron Plugin + Cinder Plugin + Keystone
• Roadmap
• Graduate HyperContainer runtime on k8s upstream
• Neutron CNI plugin
• Project URL: https://github.com/hyperhq/hypernetes
• Tip: https://hyper.sh is totally built on Hypernetes, try it out :)