Container Deployment Foundation Enterprise readiness white ... · Container Deployment Foundation Enterprise readiness white paper Document Status: PUBLISHED Document Version Date

Container Deployment Foundation

Enterprise readiness white paper

Document Status: PUBLISHED

Document Version Date

2.0 July 2017


Enterprise Readiness White Paper

Version 2017.06 Page 2

Contents Intended Audience ....................................................................................................................................................... 8Disclaimer – Forward-looking statements ................................................................................................................. 9Introduction ................................................................................................................................................................. 10Software requirements .............................................................................................................................................. 11Changes from 2017.03 to 2017.06 ............................................................................................................................. 12ITOM Suites ................................................................................................................................................................. 14Container standards and policy ................................................................................................................................ 15

Why containers? ....................................................................................................................................................... 15

What drives container adoption? .............................................................................................................................. 16

Standards and policies ............................................................................................................................................. 16

ContainerOSStandard.......................................................................................................................16

DockerandEnginesStandard.............................................................................................................16

DockerContainerRepositoryStandard..............................................................................................16

DockerStoreDeliveryStandard.........................................................................................................17Container Deployment Foundation (CDF) ................................................................................................................ 18

Layered and containerized ....................................................................................................................................... 18

Architecture .............................................................................................................................................................. 19

Portable infrastructure .............................................................................................................................................. 20

Maximizecontainerization.................................................................................................................20

DualDockerDaemons........................................................................................................................20

Open source ............................................................................................................................................................. 22Deployment ................................................................................................................................................................. 22

CDF: physical, virtual and cloud ............................................................................................................................... 22

RegularLinuxhosts............................................................................................................................22

Internetaccess...................................................................................................................................22

Layers.................................................................................................................................................22

Hybriddeployments...........................................................................................................................22

Cloudsupport.....................................................................................................................................23

CDF configurations ................................................................................................................................................... 23

CDF Installation ........................................................................................................................................................ 24

Packaging............................................................................................................................................24

JDBCdrivers........................................................................................................................................24

Suite installation ....................................................................................................................................................... 25

Deploymentarchitecture...................................................................................................................25




Suiteinstallation.................................................................................................................................26Networking .................................................................................................................................................................. 29

Docker ...................................................................................................................................................................... 29

CDF .......................................................................................................................................................................... 29

Ports...................................................................................................................................................29

CDFConnectionsmap........................................................................................................................31

Overlaynetworking............................................................................................................................31

Closedclusters....................................................................................................................................34

PodsandServices...............................................................................................................................35

Ingress/egress...................................................................................................................................35

Nameresolution(DNS).......................................................................................................................36

ITOM Suites ............................................................................................................................................................. 37Security ....................................................................................................................................................................... 38

Host firewall .............................................................................................................................................................. 38

SELinux .................................................................................................................................................................... 38

Host-level...........................................................................................................................................38

Docker-level.......................................................................................................................................38

Linux kernel process isolation features .................................................................................................................... 38

Container users and privileges ................................................................................................................................. 38

Users...................................................................................................................................................38

Privilegedcontainers..........................................................................................................................39

Userandprivilegedsummary............................................................................................................39

Docker configuration ................................................................................................................................................ 41

Certificates and secure communication ................................................................................................................... 41

Installation..........................................................................................................................................41

Certificatedeploymentconfigurations..............................................................................................41

Certificatezones.................................................................................................................................43

Certificategeneration........................................................................................................................43

Useofcustomercertificates..............................................................................................................44

Secure configuration store and cluster-internal certificate generation ..................................................................... 44

VaultandKubernetes-vault................................................................................................................44

Kubernetes ............................................................................................................................................................... 45

Securitycontextforcontainers..........................................................................................................45Storage security ....................................................................................................................................................... 45




Nodelocaldirectory/fileaccesspolicies............................................................................................45External NFS volume security .................................................................................................................................. 47

Image security .......................................................................................................................................................... 48

Imagescanning...................................................................................................................................48

Imagesigning......................................................................................................................................48

Imagesignaturechecking...................................................................................................................48

CDF user management ............................................................................................................................................ 48Storage ........................................................................................................................................................................ 50

Overview .................................................................................................................................................................. 50

Containers ................................................................................................................................................................ 50

Docker storage ......................................................................................................................................................... 51

Thedevicemapperstoragedriver......................................................................................................51

Dockerimagecache...........................................................................................................................51Kubernetes storage .................................................................................................................................................. 52

Host storage ............................................................................................................................................................. 53

Dockerstoragedriver.........................................................................................................................53

ETCD...................................................................................................................................................54

Otherhoststorage.............................................................................................................................54

Container Deployment Foundation storage .............................................................................................................. 54

CDFexternalstorage..........................................................................................................................54Suites storage .......................................................................................................................................................... 55

Suiteexternalstorage........................................................................................................................55

Performance ............................................................................................................................................................. 55

High availability ........................................................................................................................................................ 55High availability (HA) .................................................................................................................................................. 56

Overview .................................................................................................................................................................. 56

HA summary ............................................................................................................................................................. 57

Host services monitoring .......................................................................................................................................... 59

Docker runtime high availability ................................................................................................................................ 59

Docker container high availability ............................................................................................................................. 59

Kubernetes’ own high availability ............................................................................................................................. 60

Container Deployment Foundation lifecycle high availability ................................................................................... 60

Kubernetesbasicrestartpolicy..........................................................................................................61

Livenessandreadinessprobes...........................................................................................................61

Multiple component instances .................................................................................................................................. 61




Container resource requirements and limits ............................................................................................................. 62

Request/Limit.....................................................................................................................................62

Kubeletpodeviction..........................................................................................................................62

Storage high availability ........................................................................................................................................... 63

NodelocalstorageforKubernetescodeandruntimedata...............................................................63

Persistentcontainerstorage..............................................................................................................63

Component replicated database .............................................................................................................................. 63

Componentexternaldatabases.........................................................................................................63Multi-master and HA virtual IP .................................................................................................................................. 64

LeaderelectionforETCD....................................................................................................................64

HAvirtualIP........................................................................................................................................64

Generatingtokensandcertificatesforcontainers.............................................................................64

Embedded Postgres database high availability ........................................................................................................ 65

High availability through use of external databases ................................................................................................. 65Updates ....................................................................................................................................................................... 66

Overview .................................................................................................................................................................. 66

Experienceandchallenges.................................................................................................................66

Suiteversionsandimageversions.....................................................................................................66

Update,upgrade,hotfix.....................................................................................................................66

CDFvsSuite........................................................................................................................................67

Upgrading the CDF .................................................................................................................................................. 67

Runtimeandinstalltimecompatibility..............................................................................................68Patching/hot fixing the CDF ...................................................................................................................................... 68

Updating/upgrading the Suite ................................................................................................................................... 68

Hot fixing the Suite ................................................................................................................................................... 70Databases ................................................................................................................................................................... 71

Embedded databases .............................................................................................................................................. 71

External databases ................................................................................................................................................... 72Backup/restore and disaster recovery ..................................................................................................................... 73

Backup and restore .................................................................................................................................................. 73

Backup................................................................................................................................................73

Restore...............................................................................................................................................73

Disaster recovery ..................................................................................................................................................... 73

Cattleinsteadofpets.........................................................................................................................73

CattleandCDFdisasterrecovery.......................................................................................................74




NFSvolumesfailure............................................................................................................................75

Embeddeddatabasefailure...............................................................................................................75

Externaldatabasefailure....................................................................................................................75Multi-tenancy .............................................................................................................................................................. 76

Kubernetes multi-tenancy isolation .......................................................................................................................... 76

Technical multi-tenancy in the Container Deployment Foundation .......................................................................... 76

Additional namespace information ........................................................................................................................... 76Elasticity ...................................................................................................................................................................... 77

What is elasticity? ..................................................................................................................................................... 77

Manual vs automatic elasticity .................................................................................................................................. 77

Suite installation scaling ........................................................................................................................................... 78

Suite capabilities scaling .......................................................................................................................................... 78

Kubernetes controller scaling ................................................................................................................................... 78

Cluster node scaling ................................................................................................................................................. 79

Addingnodes......................................................................................................................................79

Reconfiguringnodes...........................................................................................................................79

Deletingnodes...................................................................................................................................80Monitoring ................................................................................................................................................................... 81

Overview .................................................................................................................................................................. 81

Heapster ................................................................................................................................................................... 81

InsidetheCDF....................................................................................................................................81

AdditionalCDFmanageability............................................................................................................81Licensing ..................................................................................................................................................................... 82

Single License File ................................................................................................................................................... 82

Redeem License ...................................................................................................................................................... 82

Offline.................................................................................................................................................82

Online.................................................................................................................................................82

License Management ............................................................................................................................................... 82Supportability ............................................................................................................................................................. 83

Logs .......................................................................................................................................................................... 83

Locations,commands.........................................................................................................................83

Containerlogsinpods........................................................................................................................85

CDFcore.............................................................................................................................................85

Suite...................................................................................................................................................86

Configuringlogrotation.....................................................................................................................86




Monitoring ................................................................................................................................................................. 86

Support tool .............................................................................................................................................................. 86

support-dumptool.............................................................................................................................86

Tooloutput.........................................................................................................................................86

Collecteddataandintendeduse........................................................................................................86

FAQ and known issues ............................................................................................................................................ 87Migration ..................................................................................................................................................................... 88Internationalization & Localization ........................................................................................................................... 89Federation ................................................................................................................................................................... 90Documentation ........................................................................................................................................................... 91Send documentation feedback ................................................................................................................................. 92

Legal notices ............................................................................................................................................................ 92

Warranty............................................................................................................................................92

Restrictedrightslegend.....................................................................................................................92

Copyright notice ....................................................................................................................................................... 92

Trademark notices .................................................................................................................................................... 92

Documentation updates ........................................................................................................................................... 92

Support ..................................................................................................................................................................... 92




IntendedAudienceThis information is intended for IT Operations Management administrators, solutions architects and integrators,

software administrators, virtual infrastructure administrators, and operations engineers, deploying, operating, scaling

or building solutions around the HPE ITOM Suites portfolio.




Disclaimer–Forward-lookingstatementsThis document may contain forward looking statements regarding future operations, product development, product

capabilities and availability dates. This information is subject to substantial uncertainties and is subject to change at

any time without prior notification. Statements contained in this document concerning these matters only reflect

Hewlett Packard Enterprise's predictions and / or expectations as of the date of this document and actual results and

future plans of Hewlett Packard Enterprise may differ significantly as a result of, among other things, changes in

product strategy resulting from technological, internal corporate, market and other changes. This is not a commitment

to deliver any material, code or functionality and should not be relied upon in making purchasing decisions.




IntroductionThe HPE IT Operations Management Container Deployment Foundation (CDF) powers the deployment of the

container based software suite and significantly drives the overall customer’s time to value.

The installation and deployment of ITOM software packages in an IT environment has been traditionally very complex

and expensive to setup and operate, involving different IT teams, including infrastructure, network, storage, and

virtualization groups as well as additional resources, such as professional services. You would also expect spending

additional time for post-deployment activities such as product-to-product integrations, and licensing activities.

Upgrades have also been very slow to be adopted with very large regression cycles gated by many environments

and processes, impacting final delivery of updates into production.

By building a new and agile software delivery platform, alongside a modernized software, the Container Deployment

Foundation allows customers to install pre-integrated, feature-rich software suite. Not only Day-1 type of operations

have been resolved immediately due to the nature of a container based infrastructure, the same platform allows for

easier access and deployment and operation of subsequent upgrades.

The distribution unit of the software delivery is container-based, leveraging the speed and format of the new

containerized environment. By bundling a fully automated orchestration layer to bootstrap and manage the lifecycle of

many suite -related containers, we are able to standardize on the deployment, upgrade and patching, scaling and

rollbacks of ITOM Suites while accelerating our software delivery with innovation and quality.




SoftwarerequirementsThe following table shows the major components required to use this implementation.

Software Supported version

Recommended version

Container Deployment Foundation (CDF) 2017.06 2017.06




Changesfrom2017.03to2017.06

- “ITOM Suites”

o Updated diagram “Figure 1 – IT Operations Management Suites”

- “Container Deployment Foundation (CDF)”

o Updated section”Layered and containerized”. Kubelet is now a host process.

o Updated architecture diagrams in section “Architecture”.

o Updated section “Portable infrastructure”. Kubelet is now a host process.

- “Deployment”

o Added generic CDF AWD deployment blueprint in section “Cloud support”.

o Updated diagram in section “CDF configurations”.

o Updated section “CDF installation”.

§ Removed production support restriction for UI-based worker node installation.

§ Added JDBC driver information

o Updated “Interactive installation” section with new diagram showing multiple volume configuration.

- “Networking”

o Added subnet / networks detail in section “Overlay networking”.

o Added master and worker node inbound communication details

o Added CDF connections map

o Added Flannel host-gw configuration and data flows

o Updated section “Explicit ingress” to match 2017.06 default installation.

o Added local hostname resolution section “Customer DNS resolution is not available”.

- “Security”

o Updated section “Host firewall”. Now supports installation with firewall up.

o Updated section “User and privileged summary” to match 2017.06 default installation. Added new

table for host processes.

o Added new section “Certificate deployment configurations”.

o Added new section “Certificate zones”.

o Updated section “External NFS volume security” to reflect multiple volume support for Suite

installation.

- “Storage”

o Updated sizes in section “Overview”

o Updated “CDF external storage” with mention of new offline_sync_tools/ directory.

o Updated “Suite external storage” with mention multiple suite volumes are now supported.

- “High availability”

o Update HA levels definitions

o Requalified achievable HA levels for the 2017.06 release.

o Added mention to qualify which services suffice with liveness probes and which services require

multiple instances for HA support.

o Update table in section “HA summary” to match 2017.06 default installation.

§ Liveness probes ; multiple instances ; external database support

o Added section “Host services monitoring”.

o Added comment about readiness probes in section “Liveness and readiness probes”.

o Added section “Kubelet pod eviction”.

o Updated references to external databases indicating support for external databases for

foundational services.

- “Updates”

o Refined definitions for updates, upgrades and hotfixes.

o Described Suite update, upgrade and hotfix

o Added diagrams showing UX/UI for Suite update/upgrade

o Added detail on CDF upgrade process.

o Added detail for run- and install-time compatibility for CDF upgrades.

- “Databases”

o Indicate external database support for CDF foundational services

o Updated versions of embedded database engines used.




- “Elasticity”

o Added details about horizontal Pod auto-scaling.

- “Monitoring”

o Updated kubectl command to get its logs.




ITOMSuitesHPE IT Operations Management delivers key software suites to protect, manage and grow your IT environments. We

just released a set of Suites powered by the Container Deployment Foundation alongside a new delivery and

deployment model to accelerate the time-to-value to our customers.

• IT Service Management Automation ---

• Operations Bridge ---

• Datacenter Automation ---

• Hybrid Cloud Management ---

Each suite embeds the same Container Deployment Foundation as part of their installation and deployment process,

and use it to continuously access and download updates and new software revisions

Figure 1 – IT Operations Management Suites

We recommend you familiarize yourself with each suite container-based versions, including the benefits they are able

to leverage from the Container Deployment Foundation. A set of dedicated documentation is offered by each Suite on

how they can best deploy and operate in this new environment. Adoption of key services may vary from Suite to

Suite.




ContainerstandardsandpolicyWhycontainers?

The application landscape is changing, let’s list 4 main areas:

- Architecture: “glued” applications to loosely coupled services

- Process: slow changes to continuous updates

- Technology: human to machine operated

- Sourcing: single delivery to a hybrid delivery model

The adoption of containers is driven by the transformation of applications, desire for higher productivity and agility

and a SaaS like experience requirement.

Write once run anywhere is an enticing proposition for any software developer and opens up new ways of delivering

software with SaaS-like experience.




Whatdrivescontaineradoption?

Standardsandpolicies

In the process of containerizing products, HPE is defining usage standards in the following areas: container OS,

images and engines, repositories and delivery.

ContainerOSStandard

openSUSE is our container OS base layer due to its relatively small size, support by a commercial entity, alignment

with our host OS strategy and the fact that it’s under the same Micro Focus entity as HPE Software.

DockerandEnginesStandard

A consistent set of best practices for how to build a lean and quality container image have been defined along with

support policy for Docker engine versions, both open source and Enterprise Edition.

DockerContainerRepositoryStandard

All container images are stored in Docker Hub for internal and external usages as primary repository.

A local cache Registry2 is configured.




DockerStoreDeliveryStandard

For software products that are delivered through the new Docker Store, a set of standards and best practices outline

how to prepare for container publishing in aspects such as vulnerability scanning and cleaning, image structure best

practices etc.




ContainerDeploymentFoundation(CDF)Layeredandcontainerized

The CDF provides a common runtime platform for containerized ITOM Suites. Similar to the Suites the CDF is almost

completely containerized. The CDF uses a layered approach with host, Docker, Kubernetes, CDF core and Suite as

the layers.

The following diagram shows the layers:

Figure 2 - CDF layers

The following table shows the detailed layer components

Layer Component Containerized? Purpose Maintained by Supported by

Hosts Host OS No Base OS ;

supported Linux

distro

Depends on

chosen Linux

distro

Customer

System Docker No Container runtime Docker HPE

System Etcd Yes Configuration

database for Vault

and Kubernetes

CoreOS HPE

System Flannel Yes Network overlay for

container-to-

container

communication

CoreOS HPE

System Vault Yes Secure

configuration data

store ; certificate

generation

Hashicorp HPE

Kubernetes Kubelet No Primary node

agent for

Kubernetes

Google HPE

Kubernetes Kubernetes Yes Container

orchestration

Google HPE

CDFSystem

Kubernetes

CDFCore

Suite

Hosts

Docker Kuber-netes




Layer Component Containerized? Purpose Maintained by Supported by

CDF Core Kubernetes-

vault

Yes Vault token

generation &

distribution ;

certificate

generation

Boostport.com HPE

CDF Core Heapster Yes Monitoring Google HPE

CDF Core All HPE code Yes CDF core HPE HPE

Suite All HPE Suite

code

Yes Suite HPE HPE

Architecture

The following diagram shows a functional layer diagram with the three main parts:

• System: ETCD, Vault, Flannel

• Orchestration: Kubernetes

• CDF core: hybrid of HPE components and open source

Figure 3 - Architecture block diagram

The components that make up the layers can run on the same host if master and worker node are one and the same.

However, typically multi-node configurations will be employed and in such cases, not all components run on all

nodes.

The following diagram shows the component distribution across the master and worker node types:




Figure 4 - Components by node type

PortableinfrastructureMaximizecontainerization

Almost all of the CDF components (except Docker and Kubernetes Kubelet) run containerized.

This increases portability because the touch points where direct compatibility with the underlying OS is required are

minimized.

The CDF employs two instances of Docker per node:

- Bootstrap Docker

o This Docker instance runs ETCD, Flannel and Vault

- Workload or platform Docker

o This Docker instance runs Kubernetes, the CDF core and the Suite.

The components that run inside the bootstrap Docker would typically run directly on the Linux hosts in case of less

portable infrastructure.

In case of the CDF everything is containerized and runs on Docker.

Note: The exception is Kubelet. The reason is that the Kubelet options detailed on

https://kubernetes.io/docs/admin/kubelet/ describe containerized Kubelet as an experimental feature.

DualDockerDaemons

At installation time, two Docker instances are configured and started. Run: ps -ef | grep dockerd

root 7173 1 0 Apr10 ? 00:28:06 dockerd -H unix:///var/run/docker-bootstrap.sock --exec-root=/var/run/docker-bootstrap -g <CDF root>/data/docker-bootstrap -p /var/run/docker-bootstrap.pid root 8916 1 3 Apr10 ? 12:21:34 dockerd --bip=172.77.33.1/24 --mtu=1500 -H tcp://127.0.0.1:4243 -H unix:///var/run/docker.sock -g <CDF root>/data/docker

To access the bootstrap Docker instance, use the socket prefix -H. Run:

docker -H unix:///var/run/docker-bootstrap.sock ps




CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 4b9da745280b gcr.io/google_containers/flannel-amd64:0.5.5 "/opt/bin/flanneld --" 2 weeks ago Up 2 weeks kube_flannel 1786a3598cc7 localhost:5000/vault:0.6.3 "/init" 2 weeks ago Up 2 weeks vault_container b46ba8a5d202 gcr.io/google_containers/etcd-amd64:2.2.1 "/usr/local/bin/etcd " 2 weeks ago Up 2 weeks etcd_container

To run Docker commands against the workload Docker, the -H parameter can be omitted.

Figure 5 - Docker multi-node architecture

The Docker instances are installed and can be controlled as regular system services.

# systemctl | grep -E docker.*service docker-bootstrap.service loaded active running Docker Application Container Engine docker.service loaded active running Docker Application Container Engine

See also: https://github.com/kubernetes/community/wiki/Roadmap:-Cluster-Deployment, portable deployments.

Masternode(s) Workernode(s)

BootstrapDocker

WorkloadDocker

BootstrapDocker

WorkloadDocker

Vault

Etcd

Flannel

Kubelet

Kubernetesmaster

Flannel

User-pod-1

Kubelet

User-pod-n

…




Opensource

HPE uses open source components wherever possible.

The open source components come bundled with the CDF installation.

Support and maintenance of the open source components is HPE’s responsibility.

The following table lists the main open source components that make up the CDF:

Use Maintainer link

Container runtime Docker: https://www.docker.com/

Container orchestration Kubernetes: https://kubernetes.io/

Secure configuration store / certificates Vault: https://github.com/hashicorp/vault

Distributed configuration database Etcd: https://github.com/coreos/etcd

Network overlay Flannel: https://github.com/coreos/flannel

Monitoring and metrics Heapster: https://github.com/kubernetes/heapster

Token and certificate generation/maintenance Kubernetes-vault:

https://github.com/Boostport/kubernetes-vault

DeploymentCDF:physical,virtualandcloudRegularLinuxhosts

The CDF is installed on a regular Linux OS host. No special configuration except a number of prerequisite packages

is required. Refer to the Suite installation documentation for the package prerequisites.

HPE uses minimal installations of the supported Linux distributions. Refer to the platform matrix for the supported

Linux distributions.

Internetaccess

There is no requirement for Internet access for the servers that will host the CDF infrastructure. However, if the 1st

master server has Internet access (direct or via proxy) then this will improve the installation experience as the Suite

containers can be downloaded directly from the node where they will need to be imported from into the local Docker

registry.

Layers

The only components that execute directly on the host are the installer, Docker and Kubernetes Kubelet. All of the

other CDF components are containerized. See “Understanding the Container Deployment Foundation” for details.

The above applies to physical servers, virtual servers or cloud-based servers.

Hybriddeployments

A hybrid deployment where a CDF cluster spans both physical and virtual servers simultaneously is supported.




A hybrid deployment where a CDF cluster spans physical, virtual and cloud-based nodes is not supported.

Cloudsupport

The CDF and Suite deployments are supported on AWS. Refer to the AWS Deployment Guide for specific details

relating to a particular Suite.

A generic blueprint of a CDF / Suite deployment on AWS is shown here:

Figure 6 - AWS generic deployment blueprint

CDFconfigurations

The CDF can be deployed in various configurations for various purposes: test, demo, proof of concept, single node

installation, multi-node, multi-node for production, multi-master, and multi-master for production.

The diagram below shows the four basic configurations and the typical use-cases.

For production use the two rightmost configurations can be used, although only the rightmost multiple master with

multiple workers has the highest level of high availability (HA). For a discussion about HA see its specific chapter.




Figure 7 - CDF deployment configurations

CDFInstallationPackaging

The CDF code is located on the HPE Software Depot. It comes as a single compressed and signed archive.

The steps to install the CDF are:

- Prepare systems to act as cluster nodes.

- Download the CDF compressed archive and uncompress.

- Configure the installation for master and worker nodes using the install.properties file.

- Copy the installation files to the nodes.

- Execute the installation of the CDF.

Refer to the Suite installation/administration guide for complete package prerequisites.

The CDF and/or Suite installations are supported using command line script and/or UI-based, as shown in the table

below.

Component Command line-based install UI-based install

CDF Master Yes No

CDF Worker Yes Yes

Suite No Yes

JDBCdrivers

License restrictions limit HPE to redistribute the JDBC drivers for the Oracle database. These drivers are required if

customers want to deploy CDF using Oracle as the external database engine.

Before the installation of the CDF, the JDBC drivers can be placed under <CDF install sources>//tools/drivers/jdbc as

a JAR archive. They can then be injected and used by the deployment.




SuiteinstallationDeploymentarchitecture

A next-generation Suite deployment is fully containerized. (*)

Suite functionality is split into capabilities which themselves have been split into front- and backend services.

A Suite deployment always also has installation, configuration and administration services.

Actual services (front- or backend) are modeled using Kubernetes controllers such as Deployment or Replication

Controllers. These deploy a number of Pods (1 to n, the unit of scheduling in Kubernetes) which in turn consist of 1 to

n containers.

Each container executes (typically) a single process such as web server or a database engine.

A simplified architecture may look like this:

Figure 8 - Simplified application architecture

A sample deployment of a Suite is shown below.

It shows how public endpoints are defined using Ingress rules and exposed via the Ingress controller and the

resulting unified URL namespace that this creates. The Suite functionality is grouped into capabilities, implemented

as front- and backend services.

Client Loadbalancer (Optional)

Webappservice

DeploymentorReplicationController

Webapp

Webapp

Webapp

Databaseservice

DeploymentorReplicationController

Data-base

Data-base

Data-base

Kubernetes




Figure 9 - Sample application deployment

(*) During the transition phase from classic deployment to containerized deployment it is likely that some parts of the

Suite installation are still classic installations. This hybrid mode will be addressed over time as the Suite components

are re-architected so they can be deployed and run as containers.

SuiteinstallationDownloadandinstallationpreparation

All of the Suite code is hosted in Docker Hub. The container image repositories are private. All of the images are

signed.

Authentication is provided by Docker Hub. Access requires a Docker Hub account. HPE does not create Docker Hub

accounts, HPE only authorizes Docker Hub accounts.

Once a Docker Hub account has been created, and with a valid agreement in place with HPE for Suite access, HPE

will authorize access to one or more Suite container image repositories.

A typical suite may require anywhere from 10 to 50 container image repositories.

HPE container images will only work within the CDF and within a particular Suite deployment. It is not supported to

download and execute individual containers based on the Suite repository images.

The following diagram shows the supported offline mode installation steps/flow:

CapabilityA

SMWeb Web

SMWeb Backend

LB <Nginx>

Scheduler

Backend

Integra-tion

[fqdn]:31190

/capa

CapabilityB

SMWeb CAPBSRV

/capb

CapabilityD

FarmD

CapabilityC

SMWeb CAPCSRV

/src

DB

DBBrowser

DB

DBSVC1

/db-ui

/db-browser

/console

/db-coll

Admin&Config

ConfigUI

ConfigSvc

/config

Installer

InstallerUI

InstallerSvc

/install

IDM

IDM

AllpublicendpointsexposedthroughIngress




Figure 10 - Offline installation flow

Interactiveinstallation

Once the CDF is installed, Suites can be installed using the Suite Installer.

The installation user experience (UX) is similar for all Suites and consists of the following steps:

- Review and accept the License Agreement and (optionally) allow HPE to collect suite and product usage

data.

- Select the Suite and a version

- Select the Suite edition and capabilities

- Define the parameters for the container storage (database, configuration, etc.)

o In this release the backend storage technology is limited to NFS.

- Configure the Suite capabilities

Figure 11 - Suite installation experience




After the aforementioned steps, the installation of the Suite will occur. Depending on the Suite, the installation

progress can be monitored from the Suite Installer or from the Management Portal in the RESOURCES | Workloads |

Deployments (and/or Pods) sections.

Under RESOURCES, select the correct namespace, which can be viewed via SUITE | Management, then column

“Namespace”. A namespace in Kubernetes allows creation of virtual clusters on the same physical cluster. Its primary

use in the CDF is to separate core CDF services from Suite services. See also chapter “Multi-tenancy”

CLI-based

In this release it is not possible to automate the Suite installation via scripting.




NetworkingNetworking in the CDF is based on Docker and Kubernetes networking concepts and implementation so first some

information about these.

Docker

See: https://docs.docker.com/engine/userguide/networking/

In the default Docker networking model every container has a single network interface eth0 which is mapped into the

container from a virtual interface (veth) at the host level that Docker creates for every container. These virtual

interfaces are connected to a bridge-type interface “docker0” and a subnet from which the containers receive an IP

address.

Figure 12 - Docker host/container networking

This model allows each container on a Docker host to talk to every other container on the same host but not to a

container running on another host.

To allow cross-host communication requires setting up port forwarding and/or proxying.

Cross-host communication is required in a Kubernetes cluster which almost always consists of multiple Docker hosts.

CDFPortsMasternodeinbound

Protocol Port

Range

Source Purpose

TCP 2380 Master nodes Etcd peer communication

TCP 4001 Master and worker nodes Etcd client communication

TCP 5443 External Access to CDF management portal

TCP 8200 Master and worker nodes Vault client communication

docker0 bridge

veth1interface

veth2interface

veth3interface

veth4interface

eth0interface

eth0interface

eth0interface

eth0interface

Container1

Container2

Container3

Container4

Host

eth0




Protocol Port

Range

Source Purpose

TCP 8201 Master nodes Vault peer communication

TCP 8443 Master and worker nodes Communication with Kubernetes API server

TCP 10250 Master nodes Communication with node Kubelet

TCP 10255 External Kubelet read only metrics

Direct layer 2 - Master and worker nodes Cross-node container to container for non-

host IP shared containers

Workernodeinbound

Protocol Port

Range

Source Purpose

TCP 10250 Master nodes Communication with node Kubelet

TCP 10255 External Kubelet read only metrics

Direct layer 2 - Master and worker nodes Cross-node container to container for non-

host IP shared containers




CDFConnectionsmap

Overlaynetworking

See: https://kubernetes.io/docs/concepts/cluster-administration/networking/

Setting up port forwarding across multiple containers, hosts and users is not a viable model as the number of

containers, hosts and users rises.

Therefore Kubernetes takes a different approach: overlay networking.

Overlay networking exists almost since computer networks were invented. It is essentially a network built on top of

another network. See: https://en.wikipedia.org/wiki/Overlay_network

Kubernetes by itself will not work if you were to install it across multiple Docker hosts without the addition of an

overlay network.

The overlay network provides the ability for each container on every host to talk to every other container on every

other node.

No network address translation (NAT) is required. The Kubernetes documentation words this nicely as three items:

- all containers can communicate with all other containers without NAT

- all nodes can communicate with all containers (and vice-versa) without NAT

- the IP that a container sees itself as is the same IP that others see it as

Flannel

When the CDF installs and configures Kubernetes, it configures an overlay network. In the 2017.06 release, the CDF

supports the Flannel overlay network option only.

Every host gets its own flannel daemon running which creates a subnet allocated in a preset range: 172.77.0.0/16

For Docker / Kubernetes, Flannel provides each node with its own subnet from which IP addresses are then handed

out to the containers/Pods using internal DHCP.

The subnet that Flannel generates is unique for cluster node. This guarantees that all Pods get a cluster-wide unique

IP address. The uniqueness is guaranteed by Flannel using Etcd to store the subnet definitions.




The Flannel subnet is then passed onto the Workload Docker daemon using the --bip option. This happens at install

time.

The Flannel generated and assigned subnet does not change for a node during the node lifecycle. It persists across

reboots.

Host-gwconfigurationanddataflow

Figure 13 - Flannel host-gw configuration and data flow

Note: the Bootstrap Docker daemon does not use the Flannel generated subnet. This daemon’s container subnet is a

separate Docker self-generated subnet. However, it is not actively used by any bootstrap containers. All bootstrap

containers running on the Bootstrap Docker instance share the host IP through the use of the --net=host switch

when started.

Note 2: In a cloud setup on AWS the Flannel backend is set to aws-vpc. See section “Cloud support” for AWS detail.

ConfiguringtheDockernetwork

Options from Flannel for Workload Docker:

#cat /run/flannel/docker DOCKER_OPT_BIP="--bip=172.77.35.1/24" DOCKER_OPT_MTU="--mtu=1500" DOCKER_OPTS="-H tcp://127.0.0.1:4243 -H unix:///var/run/docker.sock -g /opt/kubernetes/data/docker --bip=172.77.35.1/24 --mtu=1500 "

Subnet passed to Workload Docker daemon:

# ps -ef | grep "\--bip" root 5938 1 8 17:46 ? 00:04:50 dockerd --bip=172.77.35.1/24 --mtu=1500 -H tcp://127.0.0.1:4243 -H unix:///var/run/docker.sock -g /opt/kubernetes/data/docker --log-driver=json-file --log-opt labels=io.kubernetes.container.name,io.kubernetes.pod.uid --log-opt max-size=10m --log-opt max-file=5




The Workload Docker daemon uses this network to configure the docker0 bridge on the node:

[root@myd-vm01841 scripts]# ifconfig docker0 docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.77.35.1 netmask 255.255.255.0 broadcast 0.0.0.0 inet6 fe80::42:4aff:fe06:8ceb prefixlen 64 scopeid 0x20<link> ether 02:42:4a:06:8c:eb txqueuelen 0 (Ethernet) RX packets 37570 bytes 5998282 (5.7 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 40072 bytes 33943594 (32.3 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

The Workload Docker daemon assigns DHCP IP addresses for the containers on this node from this network.

Kubernetes records this container IP address and stores the information in Etcd and decorates the pause container

with the network information.

For this above example node, IP addresses for non-Kubernetes, non-cluster services get IP addresses assigned from

172.77.35.1/24. This will be different on every node.

There are two exception cases to this rule.

Case 1: containers share the host IP address:

- Containers started directly on Bootstrap Docker with --net=host: Etcd, Vault, Flannel, keepalived

- Kubernetes Pods started on Workload Docker with “hostNetwork: true”: Kubernetes API server, Kubernetes

controller manager, Kube Proxy, Kube Scheduler

Case 2: cluster services (Kind:Service): assigned from a network defined when the Kubernetes API server is started.

This does not overlap with any Pod IPs.

# kubectl describe svc mng-portal -n core Name: mng-portal Namespace: core Labels: app=mng-portal Annotations: <none> Selector: app=mng-portal Type: ClusterIP IP: 172.78.78.118 ß Cluster service IP address Port: <unset> 80/TCP Endpoints: 172.77.35.8:9090 ß Pod IP address Session Affinity: None Events: <none>

The cluster services IP address is specified on the Kubernetes API server, see <CDF_ROOT>/manifests/kube-

apiserver.yaml:

….. - command: - /hyperkube - apiserver # - --advertise-address=myd-vm01841.hpeswlab.net - --bind-address=0.0.0.0 - --etcd-servers=https://myd-vm01841.hpeswlab.net:4001 - --insecure-port=8080 - --secure-port=8443 # - --token-auth-file=/etc/kubernetes/ssl/token - --service-cluster-ip-range=172.78.78.0/24 ß Cluster service IP address range …..

All other CDF system/core containers have an IP addresses assigned by the Workload Docker daemon from the

network defined by Flannel.




The following diagram shows a practical example:

- The host node has a statically assigned IP address 16.59.63.32

- The Bootstrap Docker is only accessible via localhost and has a self-generated container subnet of

172.17.0.0/16. This subnet is unused because all the containers on the Bootstrap Docker share there IP

address with the host.

- The Workload Docker is only accessible via localhost and has a container subnet of 172.77.35.1/24. This

subnet has been generated by Flannel for this node.

- The three CDF system services that run on the Bootstrap Docker share the host’s IP address.

- Kubernetes Kubelet runs directly on the node and shares the host’s IP address.

- Various other Kubernetes components run containerized but share the host’s IP address. See (*).

- All other CDF and Suite components run containerized and get an IP addressed assigned in the

172.77.35.1/24 subnet.

Figure 14 - CDF networking

The Pod is decorated with the DHCP IP address in the local node subnet address range.

All containers that make up the Pod share the same IP address. Inside the Pod, containers can talk to each other’s

processes using “localhost”.

The decoration is stored on a container called “pause”, using the image gcr.io/google_containers/pause-amd64:3.0

(different CDF releases may use a newer or different image). These containers are only visible when using direct

“docker” commands.

For more detail on Flannel, see: See: https://github.com/coreos/flannel

Closedclusters

Kubernetes clusters are closed from a network perspective. All intra-container traffic stays inside the cluster. No

outside access to any internal cluster services is possible. All Ingress has to be declared explicitly: traffic from outside

the cluster is only possible if explicitly allowed.

Cross-node traffic is secured using TLS.

Note: refer to the Suite documentation for any exceptions.




PodsandServices

Pods in Kubernetes are ephemeral. They can be created and destroyed dynamically. When destroyed, they will not

have the same network identify.

For this reason, Services were defined as a logical group of Pods. Services are long lived. Services load balance

incoming traffic across the set of Pods that they group.

A service represents a micro-service, such as a web server or a database engine. Services can be discovered using

environment variables (every container gets injected with all of the services in its namespace) or DNS (services are

named get a fixed long-lived DNS name and can be addressed this way).

Inside the cluster containers can talk to micro-services exposed via Services without having to know where the

service runs, how many replicas serve it or what its IP address is.

Cluster services receive IP address from the cluster service IP range. This is preset to 172.78.78.0/24. All cluster

services will receive an IP address allocated in this range. (This range is a configuration parameter to the Kubernetes

API server, see https://kubernetes.io/docs/admin/kube-apiserver/, --service-cluster-ip-range parameter.

Ingress/egressIngress

To access a service from an external network requires adding some additional information to expose it via a port on

all nodes or via a load balancer.

In the CDF services are of type clusterIP so by default they are not visible from outside the cluster. They may or may

not be grouped under an Ingress controller to provide a unified URL namespace and access to the cluster services.

For example, when the CDF is installed, the following services are installed. Run: kubectl get svc --all-namespaces

NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE core heapster-apiserver 172.78.78.87 <none> 80/TCP,443/TCP 14d core idm-postgresql-svc 172.78.78.234 <none> 5432/TCP 14d core idm-svc 172.78.78.253 <none> 443/TCP 14d core kube-dns 172.78.78.78 <none> 53/UDP,53/TCP 14d core kube-registry 172.78.78.19 <none> 5000/TCP 14d core kubernetes-vault None <none> 80/TCP 14d core mng-portal 172.78.78.92 <none> 80/TCP 14d core suite-db-svc 172.78.78.16 <none> 5432/TCP 14d core suite-installer-svc 172.78.78.143 <none> 8080/TCP 14d default kubernetes 172.78.78.1 <none> 443/TCP 14d

There are services for ultimately externally-facing services, and services that will never be exposed.

By default, none of the services are exposed externally. The default Kubernetes service type is ClusterIP which

keeps the services externally invisible but internally fully visible.

To allow external access for CDF services, an Ingress definition processed by an Ingress controller is needed or

redefinition of a service to type nodePort.

For example, to access the CDF Management Portal, an Ingress definition is created. Run: kubectl describe ing mng-portal -n core

Name: mng-portal Namespace: core Address: 16.59.63.32 Default backend: default-http-backend:80 (<none>) TLS: nginx-default-secret terminates myd-vm01841.hpeswlab.net Rules:




Host Path Backends ---- ---- -------- myd-vm01841.hpeswlab.net / mng-portal:80 (<none>) /mngPortal mng-portal:80 (<none>) Annotations: rewrite-target: / secure-backends: true

- Both the / and /mngPortal URLs will redirect traffic to the mng-portal service port 80.

- TLS termination happens at the Ingress layer.

The mng-portal service is defined as follows. Run: kubectl get svc mng-portal -n core

Name: mng-portal Namespace: core Labels: app=mng-portal Selector: app=mng-portal Type: ClusterIP IP: 172.78.78.92 Port: <unset> 80/TCP Endpoints: 172.77.33.8:9090 Session Affinity: None

- The service itself is of type ClusterIP so only visible externally when explicitly described by an Ingress rule.

- The cluster internal IP is 172.78.78.92.

- The actual Management Portal service is running inside a Pod with IP address 172.77.33.8 on port 9090.

Ingress for Suites is similar to CDF Ingress.

Note: some Suites may not use Ingress rules and Ingress controller but rather expose their functionality via

nodePort-type services or a combination of both.

Egress

All containers can talk to the outside world. They are only limited by firewall and network rules at the host and host

network(s) level.

Nameresolution(DNS)CustomerDNSresolutionisavailable

Inside the cluster, a container resolves a DNS names as follows:

"<namespace>.svc.cluster.local", "svc.cluster.local", "cluster.local", "<search in host /etc/resolv.conf>"

So if a container running in Kubernetes namespace “suite1” wants to talk to service svc1, then it will try the following

path:

svc1.suite1.svc.cluster.local", svc1.svc.cluster.local", svc1.cluster.local", svc1.<prefix(es)> from search statement in host /etc/resolv.conf

CustomerDNSresolutionisnotavailable

All of the internal Kubernetes resolution will be unaffected. Hostnames outside of the cluster (including the host

nodes) will have to be statically specified.




For this purpose, customers can specify a fixed hostname-to-IP list on the CDF core NFs volume in the baseinfra-1.0/kube-dns-hosts directory. See the Administration Guide for details.

ITOMSuites

Accessing Suite functionality can be via defined Ingress and/or directly via services exposed using nodePort or both.

Suites may use:

- “clusterIP” services exposed via Ingress

- “nodePort” services available on a particular port on every cluster node

- A hybrid of both approaches.

Refer to the Suite installation/administration guides for details.

Figure 15 - Cluster ingress

Refer to the Suite installation/administration documentation for details on accessible end points.

Clusternode

Container Container Container

Service(clusterIP)

Ingress

Service(nodePort)

Client

Clusternode

Container Container Container

Service(clusterIP) Service

(nodePort)

ViaIngress

Directtonodeport




SecurityThe CDF and Suite installation leverage and/or employ various techniques for tight security control:

- Host firewall

- Host SE Linux

- Linux kernel process isolation features

- Container users and privileges

- Host directory / file permissions

- Docker configuration

- Kubernetes configuration

- Certificate-secured communication

- Dedicated secure configuration store and certificate generation: Hashicorp Vault

- CDF user management

Hostfirewall

The host firewall can be up and running during and after the CDF is installed.

A Suite can be installed with the host firewall running.

Security limitations in Kubernetes 1.4 require additional host firewall configuration. Consult the

installation/administration guide on how to configure the host firewall to restrict access to the Kubernetes cluster

nodes from external systems for particular services and ports.

SELinuxHost-level

In this release, host-level SELinux can be in either “disabled” or “permissive” mode for installation and runtime of the

CDF.

Docker-level

The CDF does not support reconfiguring the Docker daemon to enable SELinux support using the --selinux-enabled flag.

Linuxkernelprocessisolationfeatures

The basis of container security is provided by several Linux kernel isolation features:

- namespaces: http://man7.org/linux/man-pages/man7/namespaces.7.html

- cgroups: http://man7.org/linux/man-pages/man7/cgroups.7.html

- chroots: http://man7.org/linux/man-pages/man2/chroot.2.html

Docker uses the above Linux kernel features to provide secure and isolated execution of application processes inside

Docker: containers.

For details see: https://www.docker.com/sites/default/files/WP_IntrotoContainerSecurity_08.19.2016.pdf

The remainder of this chapter focuses on the specific CDF security configuration.

ContainerusersandprivilegesUsers

Suite containers generally do not use the root user inside the container.

They use a regular user account and group inside the container, defined when the container image is built.

This non-privileged user is then used to start all the processes inside the container.




Below example shows a typical suite container, showing output of the ‘id’ command and running processes:

sh-4.3$ id uid=495(bvd) gid=493(bvd) groups=493(bvd) sh-4.3$ ps -ef UID PID PPID C STIME TTY TIME CMD bvd 1 0 0 Apr12 ? 00:03:15 /bin/node /bvd/server.js control bvd 180 0 0 11:42 ? 00:00:00 sh bvd 197 180 0 11:45 ? 00:00:00 ps -ef

CDF containers generally do use the root user.

This is because:

- HPE uses off the shelf open source images and the container user is set as root

- Containers need Docker in Docker support (requires also privileged container)

- Containers need access to the host to change the host configuration (requires also privileged container)

The “root” user inside the container has much less privileges than a regular “root” user, except in the case of a

privileged container. See Privileged containers.

PrivilegedcontainersDocker

Docker containers can run in privileged mode to support Docker in Docker and/or to have access to the host devices

and processes.

This is separate from the running processes inside the containers’ use of root user.

Any CDF containers run directly on Docker can use either the -privileged or the -cap-add flags. For details see

https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities.

Kubernetes

To allow Kubernetes to start containers in privileged mode via the Pod specification, the Kubelet needs to be

configured to allow this. In the CDF installation Kubelet is configured to allow starting privileged containers.

Note: this does not mean all containers run in privileged mode!

Pod specifications can use the “privileged:true” flag in the securityContext configuration to allow a Pod to start its

containers in privileged mode.

Userandprivilegedsummary

As the below table shows, privileged mode is the exception. The “Why privileged?” columns explains the reason for

running one or all containers in a particular Pod in privileged mode.

CDF or Suite container User Privileged? Why privileged?

Flannel root Y Must be allowed to

change host routing tables

for packet routing of

container traffic between

cluster nodes.

Vault root N ; uses cap-add

ETCD root N




CDF or Suite container User Privileged? Why privileged?

Keepalived root N ; used cap-add Switch HA IP address

across participating HA

master nodes.

Kubernetes API server

load balancer

root N

Kubernetes API server root N

Kubernetes controller root N

Kubernetes DNS root N

Kubernetes Proxy root Y Required to change the

iptables rules on the node

hosts.

Kubernetes registry proxy root N

Kubernetes local registry root N

Kubernetes-vault root N

Kubernetes scheduler root N

Heapster root N

CDF Autopass LMS and

database

Root N

CDF IDM and database root N

CDF Management Portal root Y Requires Docker in

Docker support.

CDF Suite Installer and

database

root N

CDF Nginx (Ingress) root N

Suite container Use of root user is the

exception.

N (*)

(*) The Suite security guide may detail any exceptions.

The following Kubernetes processes run non-containerized:

Process Run as user

Kubernetes Kubelet root




Process Run as user

Docker daemon root

Dockerconfiguration

See: https://docs.docker.com/engine/security/security/

HPE reviewed these Docker security areas for CDF and containerized suites.

- Standard use of kernel isolation features and control groups.

o HPE does not require any special modifications of the host system or its kernel.

- Docker daemon attack service

o HPE recommends restricting the access to the cluster nodes.

o The instrumenting Docker through a containerized web server example does not apply to the CDF

or Suites.

o Docker daemons are only listening on localhost

§ See <CDF root>/cfg/docker and docker-bootstrap files.

- Linux kernel capabilities

o HPE is reducing the use of the “root” user as much as possible.

o The runtime use of privileged containers is limited. See Privileged containers.

o Providing access to additional capabilities for containers is restricted as much as possible. The

following components get additional capabilities:

§ Keepalived: NET_ADMIN

- Other kernel security features

o Support for “well-known systems”:

§ HPE has verified SELinux in the following modes: disabled, permissive. See also chapter

“SELinux”.

o HPE is not using the “User Namespaces” Docker feature.

CertificatesandsecurecommunicationInstallation

At CDF and Suite installation time, certificates are generated for:

- the CDF internal communication

- the Suite internal communication and communication to the CDF (*)

(*) Suite may decide not to use the internal certificate generation feature and generate their own self-signed

certificates or ask for customer-provided certificates.

Certificatedeploymentconfigurations

The following table describes the possible certificate configurations

Case >

Component

v

2017.06 All internal CA

2017.06 Ingress uses customer PKI issued certificate

2017.06 Using ROOTCA/ROOTCAKEY from customer PKI

All certificates generated by external customer PKI

Docker N/A N/A N/A N/A




Case >

Component

v

2017.06 All internal CA

2017.06 Ingress uses customer PKI issued certificate

2017.06 Using ROOTCA/ROOTCAKEY from customer PKI

All certificates generated by external customer PKI

Etcd OpenSSL from

internal CA

OpenSSL from

internal CA

OpenSSL using

ROOTCA/ROOTCAKEY

from customer PKI

Customer PKI issued

certificate manually

loaded

Vault OpenSSL from

internal CA

OpenSSL from

internal CA

OpenSSL using

ROOTCA/ROOTCAKEY

from customer PKI

Customer PKI issued


loaded

Kubelet OpenSSL from

internal CA

OpenSSL from

internal CA

OpenSSL using

ROOTCA/ROOTCAKEY

from customer PKI

Customer PKI issued


loaded

K8S API server OpenSSL from

internal CA

OpenSSL from

internal CA

OpenSSL using

ROOTCA/ROOTCAKEY

from customer PKI

Customer PKI issued


loaded

Containers Vault (from

imported

internal CA)

Vault (from

imported internal

CA)

Vault (from imported

ROOTCA/ROOTCAKEY

from customer PKI)

Vault CA not in use ;

customer provides

certificate for every

container instance

Ingress Vault (from

imported

internal CA)

Customer PKI

issued certificate

manually loaded

Customer PKI issued


loaded

Customer PKI issued


loaded

Recommendations Default

installation case

01|03|06

All self-signed

certificates.

Recommended for

production

All cluster internal

communication

uses self-signed

certificates.

Customer-

generated

certificates loaded

on the externally

facing Nginx

controllers.

NOT RECOMMENDED

Customers must supply

(intermediate) root CA

and key. Trust will

extend into the cluster.

UNSUPPORTED

HPE does not consider

this configuration to be a

technical possibility (or

even realistic) in a

dynamic container-based

system.

The internal certificates are signed by (options are mutually exclusive):

- a self-generated root CA

- a customer provided root CA or better intermediate root CA (not recommended)

The Nginx (Ingress) can be decorated with a customer-generated certificate. The CDF does not generate that

certificate. The customer needs to use their PKI to generate and sign a certificate for the externally facing Nginx

services: CDF and suite. See “Use of customer certificates”.




Certificatezones

The following diagram shows the certificates zones in an out-of-the-box CDF installation:

The following diagram shows the certificate zones when customer PKI generated and signed certificates are loaded

on the externally facing Nginx controllers:

Certificategeneration

The below diagram shows:

- Creation of self-generated root CA (uses openssl). The validity period is 10 years.

- Creation of cluster communication certificates for bootstrap Docker and base Kubernetes components (uses

openssl). The validity period is 1 year.

- Import of root CA into the Vault pki backend

CDF

Suite SuiteNGINX

CDFNGINX

Allcertssignedwithself-generatedCA

CDF

Suite

Suite

NGINX

Allcertssignedwithself-generatedCA

CDFNG

INX

Certificates issuedby customer

PKI.




- Generation of Vault access tokens for secure configuration data access for remaining CDF and Suite

containers (uses Vault). The validity is 30 minutes for an access token; inside the Pod the certificate is

renewed automatically to prevent expiry. See also sidecar container and Kubernetes-vault chapters.

- Generation of server certificates for server processes inside remaining CDF and Suite containers (uses

Vault). The validity period is 1 year.

Figure 16 - Token and certificate structure

UseofcustomercertificatesForCDFNginx(Ingress)

By default, the certificate used by the CDF Nginx (Ingress) that will be used to access CDF services on port 5443 is

generated from either the self-generated root CA or a customer-provided root CA.

If neither of the above options is acceptable, then a customer-provided certificate can be loaded. Refer to the Suite

installation/administration guide for details.

ForSuiteNginx(Ingress)

Suite installations will install a separate Nginx (Ingress) instance that listens on port 443.

A customer-provided certificate for customer access can be loaded into this instance.

Refer to the Suite installation/administration documentation for details.

Secureconfigurationstoreandcluster-internalcertificategenerationVaultandKubernetes-vault

The CDF uses Hashicorp’s Vault solution to provide:

- secure storage of sensitive configuration data

- generation of cluster-internal certificates

HPE leverages the following core functionality:

- Generic secret Vault backends to store configuration data.




o The data is secured at rest and in-flight.

o Vault to component communication is secured using certificates.

- PKI backends Vault backends to generate cluster internal certificates.

- Vault Approle authentication mechanism for services to access secure configuration data.

For ease of use, the Vault server init and unseal steps happen automatically. The init steps happens at installation

time, the unseal steps needs to occur whenever the server is started.

The Vault root key and unseal shards are stored in ETCD.

Note: Vault itself does not have an automatic unseal.

The CDF only supports automatic unlock of the Vault server. This has been added for ease of use. Restrict access to

ETCD so the Vault root key and unseal shards are not compromised.

See also: https://www.vaultproject.io/ ; https://github.com/Boostport/kubernetes-vault

KubernetesSecuritycontextforcontainers

The Kubernetes securityContext is used inside Pod specification to enable privileged mode.

securityContext:

privileged: true

Storagesecurity

See also: Storage

Nodelocaldirectory/fileaccesspoliciesCDFinstallation

Node local storage can only be accessed by the root users.

Note: Root user access to cluster nodes must be strictly controlled and limited. With root permissions, users can

inadvertently create major issues. Root access should be limited to select administrative actions only.

The access permissions for <CDF root>/data have been set such that a regular user cannot view/change sensitive

information when browsing the <CDF root> installation directories. Execution permissions for Kubernetes and other

tools and processes have also been restricted.

NodelocalstorageaccessbyCDFcontainers

The following table lists links (mount) from CDF containers / components to local node storage, except for local node

storage below <CDF root>/data/docker and <CDF root>/data/docker-bootstrap.

Service / component

Path on host Access from inside container

Use

Bootstrap:

CoreOS ETCD

<CDF root>/ssl RW Cluster certificate

access

Bootstrap:

CoreOS ETCD

<CDF root>/data/etcd RW ETCD database

data location

Bootstrap:

CoreOS ETCD

<CDF root>/log/etcd RW ETCD logs




Service / component


Use

Bootstrap:

Hashicorp Vault


access

Bootstrap:

CoreOS Flannel


access

Bootstrap:

CoreOS Flannel

/dev/net RW Flannel must be

able to manage

host subnets,

distribute IP

addresses to

containers and

map routes.

HPE Autopass

license manager

server

None

HPE Autopass

Postgres

database

instance


access

HPE Identity

Manager (IDM)

<CDF root>/cfg/idm/seeded RW IDM initialization

for seeded users

HPE IDM

Postgres

database


access

Boostport

kubernetes-vault

None

Container

Deployment

Foundation

management

portal


access

Container

Deployment

Foundation

management

portal

<CDF root>/zip RW Used in Add Node

through

management

portal

Container

Deployment

Foundation Suite

installer


access




Service / component


Use

Container

Deployment

Foundation Suite

installer Postgres

database


access

Local Docker

registry

None

Kubernetes

Proxy


access

Kubernetes API

Server

<CDF root>/log/apiserver RW API server logs

Kubernetes API

Server

<CDF root>/ssl RO Cluster certificate

access

Kubernetes

Controller

Manager

<CDF root>/log/controller RW Controller logs

Kubernetes

Controller

Manager

<CDF root>/ssl RO Cluster certificate

access

Kubernetes

Scheduler

<CDF root>/log/scheduler RW Scheduler logs

Kubernetes DNS <CDF root>/ssl RO Cluster certificate

access

Nginx (Ingress) None

ExternalNFSvolumesecurity

CDF and Suite containers persist data on a multiple shared volumes. The only supported storage backend for these

shared volume is NFS.

Minimally a setup will require two NFS volumes:

- CDF data

- Suite data

HPE recommend that these volumes be two separate NFS mount points and that the CDF core and suite volume not

be shared.

Suites installations may request creation and use of multiple NFS volumes to separate configuration from logs from

databases and for performance reasons.

All user access is squashed to a particular user and group. The user ID and group ID will be 1999 and 1999 resp.




This uid and gid are defined/used inside the container image and may also be present in Pod specifications in the

securityContext section.

Because of the required all_squash option to anonuid/anongid for the NFS mount pints, access to the NFS volumes

must be restricted using firewall rules so that only mount and access requests from cluster nodes will succeed.

Note: The CDF and suite use uid 1999 and gid 1999. This cannot be changed at this time. Customers need to

contact HPE support if these values conflict with locally defined/used values.

Refer to the Suite installation/administration guide for configuration details.

ImagesecurityImagescanning

All HPE images (including the open source 3rd

party images) are scanned for vulnerabilities and malware.

This scanning occurs a 1st time during the CI/CD (continuous integration / continuous delivery) process that is

integrated into our development process.

This scanning occurs and 2nd

time during the signing process when the images are upload to Docker Hub. See Image

signing.

Imagesigning

All HPE images (including the open source 3rd

party images) are cryptographically signed when uploaded into Docker

Hub. At this time the images are rescanned for malware.

HPE uses the standard Docker Notary process and server to sign and store the image signatures.

The malware scanning occurs on HPE’s servers and is not a feature of Docker Notary.

Imagesignaturechecking

The image signatures can be checked when the images are downloaded from Docker Hub and before they are

pushed into the customer’s local registry.

The <CDF root>/downloadimages.sh script has a -c|--content-trust parameter which when set to ‘on’ will

verify the image signature of all the downloaded images with the Docker Notary server (https://notary.docker.io).

CDFusermanagement

The CDF Management Portal has user management for its “core” tenant.

There are two users “admin” and “suite_admin”.

The following table lists the basic, fixed role-based access control (RBAC) permissions.

Function User “admin” access? User “suite_admin” access?

SUITE | Installation Yes No

SUITE | Management Yes No

ADMINISTRATION | Admin Yes No

ADMINISTRATION | Nodes Yes No

ADMINISTRATION | LDAP Yes No




Function User “admin” access? User “suite_admin” access?

ADMINISTRATION | LWSSO Yes No

ADMINISTRATION | User Management Yes No

ADMINISTRATION | Local Registry Yes No

RESOURCES | Namespace Yes (all namespaces) Yes (all namespaces)

RESOURCES | Workloads Yes Yes

RESOURCES | Services and Discovery Yes Yes

RESOURCES | Persistent Volume Claims Yes Yes

RESOURCES | Configuration Yes Yes




StorageOverview

Note: <CDF root> is the CDF installation root directory, by default it is /opt/kubernetes.

Component/process requiring storage

Backed Path Typical size or size range

CDF installation Node local <CDF root> +- 3GB

CDF image download Node local /var<CDF root> Between 5 GB and 80

GB

CDF external storage NFS /baseinfra-1.0 and /suite-install in the CDF external

volume

Between 5 GB and 50

GB

Note: includes CDF

private registry.

Docker storage driver

(devicemapper) for

container runtime images

Node local <CDF root>/data/bootstrap-docker

<CDF root>/data/docker

Between 5 GB and

200GB

CDF ETCD instance Node local <CDF root>/data/etcd Between 250 M and 2

GB

CDF core logs Node local <CDF root>/log Between 100 KB and 1

MB

Docker image cache Node local <CDF root>/data/bootstrap-docker

<CDF root>/data/docker

Between 5 GB and 50

GB

CDF private registry NFS Below /baseinfra-1.0 in the

CDF external volume

Between 5 GB and 50

GB

Suite external storage NFS Below the Suite external volume See Suite

installation/administration

guides

Containers

At runtime, container file systems are either completely read-only (RO) or read-write (RW).

Files on-disk inside a container are ephemeral. When a container crashes, it will be restarted but the files will be lost.

A container always starts with a clean file system.

Changes to RW container file systems will not survive a container restart. Any changes made to a RW container file

system will be gone if the container is (for whatever reason) restarted.

Therefore, any container state that needs to survive a restart needs to be stored outside of the container, so called

external storage.




When running in Kubernetes (the scheduling unit being a Pod possibly consisting of multiple containers) it may also

be necessary to share files between containers. This needs to be done by accessing a shared external data store.

External storage can be backed by a variety of technologies ranging from traditional local hard drives or SSD, SAN

attached storage (iSCSI, fiber channel), disk clustering software, NFS and others.

A choice of storage depends on what customers are using and/or performance and HA requirements for the data in

the external storage.

The data stored in the external storage can be any of the following (and the list is not exhaustive):

- Configuration files

- Add-on binary files

- Product customizations (can take many forms)

- Database files

- Temporary files

- Platform and Suite logs

Dockerstorage

Docker provides a RW container filesystem by layering a RW layer on top of the RO image layer.

See: https://docs.docker.com/engine/reference/glossary/#union-file-system

Excerpt from the above link: “Union file systems implement a union mount and operate by creating layers. Docker

uses union file systems in conjunction with copy-on-write techniques to provide the building blocks for containers,

making them very lightweight and fast.”

Docker hosts use a storage driver to provide a unified view of the layered file system that makes up Docker images.

Various storage drivers exist such as AUFS, OverlayFS, OverlayFS2, Btrfs, and others.

The Docker hosts inside the Container Deployment Foundation (CDF) use the devicemapper storage driver. This is

the default storage driver on the supported Linux host platforms: Red Hat, Oracle Linux and CentOS.

Thedevicemapperstoragedriver

The devicemapper storage driver can be configured in two modes:

- loop-lvm or loopback mode, suitable for testing and small installations

- direct-lvm, suitable for production use

By default, for easy install, demo or very small Suite installations, the devicemapper configuration for the CDF uses

the loopback mode for the devicemapper storage driver.

The loopback mode will support demo and small suite installations, but it is not suitable for production use.

For production use, we recommend the use of the direct-lvm mode of the devicemapper storage driver.

Storage drivers can use node local storage or can use node local storage that is in fact provided by SAN or NAS

arrays. Once mounted as a volume inside a directory, it is transparent for the storage driver.

For a complete discussion of the devicemapper storage driver including the loop-lvm and direct-lvm modes, refer to

https://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/.

Dockerimagecache

The image cache is located here:

- for the bootstrap Docker instance: <CDF root>/data/docker-bootstrap/devicemapper/devicemapper

- for the workload Docker instance: <CDF root>/data/docker/devicemapper/devicemapper

The directories contain two files:

- data




- metadata

These are sparse files, so the size reported by “ls –l” for the individual files is not the same as the actual size on

disk. Run ls –l on <CDF root>/data/docker-bootstrap/devicemapper/devicemapper:

total 707492 -rw------- 1 root root 107374182400 Apr 26 12:10 data -rw------- 1 root root 2147483648 Apr 19 14:50 metadata

The size reported of the sparse files is: 100GB for data and 21GB for metadata.

However in this example the actual size on disk is 707492 KB, around 700MB, which is also reported when “du -hs”

is run on <CDF root>/data/docker-bootstrap/devicemapper/devicemapper:

691M .

Cleanup of the Docker image cache must be done using Docker commands: docker rmi.

See also “Docker storage driver”.

Kubernetesstorage

Kubernetes provides external storage to containers via an abstraction called Volumes.

Volumes are attached to containers either directly or via a further abstraction called Persistent Volumes.

Essentially, volumes are just directories that are made accessible to running containers inside a Pod.

See https://kubernetes.io/docs/concepts/storage/volumes/#background

The CDF uses both directly attached Volumes (for ephemeral data) and Persistent Volumes (for persistent data)

attached to containers.

Volume mounts are used for:

- exposing a node host path inside a container (hostPath)

- providing ephemeral empty directories (emptyDir)

Persistent Volumes are used for:

- container configuration data

- log files

- database files

Volumes link the container to a specific type of storage and are therefore not ideal when creating portable

deployments of Suites. For this reason, their use is kept limited.

Persistent Volumes do not link the container to a specific type of storage. Through the creation of persistent volume

claims (PVC) that are linked 1:1 with a particular persistent volume (PV) and the use of the PVC as a volume inside

the container, we can achieve volume storage backend agnostic deployments.

Volumes: https://kubernetes.io/docs/concepts/storage/volumes/

Persistent volumes: https://kubernetes.io/docs/concepts/storage/persistent-volumes/

Notes:

- Other than hostPath and emptyDir, CDF containers do not use Volumes directly.

- CDF does not use StorageClasses in persistent volumes

- CDF only supports NFS persistent volumes




HoststorageDockerstoragedriver

See also Docker storage.

Local node storage, i.e. the local disks, is used extensively when running Docker containers.

The Docker storage driver is backed by local storage.

The directories where the Docker storage driver stores its files are <CDF root>/data/docker and <CDF root>/data/bootstrap-docker.

The devicemapper storage driver uses the Copy-on-Write https://en.wikipedia.org/wiki/Copy-on-write technique to

reduce the actual physical size of the container image storage as much as possible.

This may throw off typical utilities to check for used disk space such as ‘df’. The ‘df’ utility will report much more

storage than a node may physically have as it reports all container devicemapper mount points and their reserved

storage.

Running the ‘du’ utility on <CDF root>/data/docker, will show a more accurate picture. Try running ‘du -hs .’ from

<CDF root>/data/docker.

The most accurate information is available when running ‘docker info’.

The storage section in the output of the ‘docker info’ command gives the detailed picture - see bold:

Storage Driver: devicemapper Pool Name: docker-253:1-641978369-pool Pool Blocksize: 65.54 kB Base Device Size: 10.74 GB Backing Filesystem: xfs Data file: /dev/loop0 Metadata file: /dev/loop1 Data Space Used: 25.89 GB Data Space Total: 107.4 GB Data Space Available: 81.48 GB Metadata Space Used: 33.24 MB Metadata Space Total: 2.147 GB Metadata Space Available: 2.114 GB Thin Pool Minimum Free Space: 10.74 GB Udev Sync Supported: true Deferred Removal Enabled: false Deferred Deletion Enabled: false Deferred Deleted Device Count: 0 Data loop file: <CDF root>/data/docker/devicemapper/devicemapper/data WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device. Metadata loop file: <CDF root>/data/docker/devicemapper/devicemapper/metadata Library Version: 1.02.135-RHEL7 (2016-11-16)

It is possible to run out of space and needing more capacity on the thin-pool device.

Increasing the capacity works for both loopback and direct-lvm modes.

See https://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/#increase-capacity-on-a-running-

device for details on how to increase the capacity.

Note: it may be required to perform this task for both the bootstrap and the workload Docker instances. To run the

Docker commands against the bootstrap Docker instance, use the -H socket parameter:

docker -H unix:///var/run/docker-bootstrap.sock info




ETCD

Additional local node storage is used by ETCD, but only on the master nodes.

The directory where the ETCD storage is located is <CDF root>/data/etcd.

A typical CDF and subsequent Suite deployment would require around 500 MB of ETCD storage. This can be more if

more nodes and/or Suite capabilities are added.

Otherhoststorage

Additional storage is needed for CDF component logs for:

- Kubernetes API server

- Kubernetes controller

- Kubernetes scheduler

- ETCD

The directory for these logs is <CDF root>/log.

Typically, after installation of the CDF and a Suite, the size will be between 512 KB and 1 MB.

ContainerDeploymentFoundationstorage

The storage for the CDF is split between:

- Node local storage for the Docker, Kubernetes binaries and their runtime data

- Node local storage for the Docker image cache

- Node local storage for hostPath and emptyDir directly attached container volumes

- External storage backed by an NFS server attached to containers via Persistent Volumes and persistent

volume claims.

For Docker, Kubernetes, hostPath and emptyDir, see the previous chapters.

CDFexternalstorage

The external storage for CDF is backed by a single NFS volume.

NFS is the only external volume backing store that is supported in this release.

The top-level path for the CDF external volume will depend on your NFS configuration.

Let’s assume that it is exported from the NFS server as /var/vols/itom/core.

In this path, two main data locations are located:

- baseinfra-1.0/

o IDM and suite installer PostgreSQL database files

o Docker private registry data store

o Static local DNS resolution configuration

- suite-install/

o Suite Installer configuration data

- offline_sync_tools

o Data for use by suite update process

Note: No folders/files can be changed/edit except the configuration files in <CDF NFS root>/ baseinfra-1.0/kube-dns-

hosts folder.

The typical size (minus the Docker private registry store) of the CDF and the configuration data for an installed Suite

is around 160 MB. This is not expected to grow significantly when the CDF and an installed Suite are run for

extended periods of time.




The size of the Docker private registry depends on the number and size of container images that were downloaded

prior to the installation of a suite. It may range between 5 and 50 GB.

Suitesstorage

The storage for an installed Suite is split between:

- Node local storage for the Docker image cache

- Node local storage for hostPath and emptyDir directly attached container volumes

- External storage backed by an NFS server attached to containers via Persistent Volumes and Persistent

Volume Claims.

For Docker, Kubernetes, hostPath and emptyDir, see the previous chapters.

Suiteexternalstorage

The external storage for suites is backed by one or more NFS volumes.

NFS is the only external volume backing store that is supported.

The top-level path for the suite external volume will depend on your NFS configuration.

For details of the contents of the external storage for Suites, refer to the Suite installation and administration guides.

Performance

See Docker storage driver direct-lvm use.

Highavailability

See High availability (HA) | Storage high availability on page 63.




Highavailability(HA)Overview

Note about ITOM Suites: All of the high availability (HA) aspects that apply to Container Deployment Foundation

containers executing on top of Kubernetes also apply to ITOM Suite containers when a suite is deployed and running

on top of the Container Deployment Foundation and underlying Kubernetes. For specific documentation and

guidance for a particular ITOM Suite, refer to the Suite documentation.

High availability is the result applying/using the following technologies and configurations:

• Highly available redundant storage

• Highly available database instances

• Servers in active/active or active/passive configuration (with or without load balancing)

• Services in active/active or active/passive configuration (with or without load balancing)

• Intra-service data replication (clustering)

• Fast restart in case of failure of a single instance of a server providing a particular service

• Keep-alive monitoring

• Automatic restart on termination

The Container Deployment Foundation employs various techniques listed above to achieve HA for its foundational

services.

Not all foundational services have the same level of HA, and the level of achievable HA is dependent on the

installation configuration: single or multiple master, single or multiple worker nodes.

The Container Deployment Foundation can be installed in 4 different configurations, resulting in different levels of HA

for these foundational services:

• Single combined master + worker node (SCMW)

• Two nodes: single master, single worker (SMSW)

• Multiple workers: single master, multiple workers (SMMW)

• Multi-master: multiple master nodes, multiple worker nodes (MMMW)

In the multi-master configuration, the level of HA is highest. In the single combined master + worker configuration, the

level of HA is lowest.

The level of HA is achieved by adding up all of the various techniques employed at different levels of the software

stack, from bottom to top:

• Host + storage

• Docker

• Container Deployment Foundation ETCD, Vault and Flannel

• Kubelet

• Kubernetes services (API server, controller manager, scheduler, DNS, etc)

• Container Deployment Foundation foundational services

• Suite containers

Each layer builds upon the previous layer and adds additional HA functionality.

The following levels of HA are defined:

- Basic:

o Has Docker HA

o No Kubernetes HA

o Incomplete selected CDF services multiple instances; others liveness and/or readiness probes

o Embedded databases not HA

o None or partial use of external databases

o None or partial use of HA storage

- Intermediate:




o Has Docker HA

o Has Kubernetes HA

o Incomplete selected CDF services multiple instances; others liveness and/or readiness probes

o Embedded databases not HA

o None or partial use of external databases

o None or partial use of HA storage

- Complete:

o Has Docker HA

o Has Kubernetes HA

o Selected CDF services multiple instances; others liveness and/or readiness probes

o Embedded database HA or external databases used (customer-supported clustered HA is

assumed)

o HA storage

The following table shows the maximum level of HA that can be achieved depending in the installed configuration.

Note: In the 2017.06 release, all foundational services are fully HA.

Install configuration Can achieve HA level with embedded DB

Can achieve HA level with external DB and HA storage

Single combined master + worker node Basic Basic

Two nodes: single master, single

worker

Basic Basic

Multiple workers: single master,

multiple workers

Basic Basic/Intermediate

Multi-master: multiple master nodes,

multiple worker nodes

Intermediate Complete

HAsummaryThe following table lists all of the foundational services (including Kubernetes itself) and the type of HA supported.

This may depend on the installed configuration and will be noted as such.

Notes:

- ETCD and Flannel are included in this list as Container Deployment Foundation foundational services.

- N/A = not applicable because one or more of the following

o Not run in Kubernetes

o Not connected to a database (or is a database)

- Readiness probes: HPE considers readiness probes functionality in Kubernetes 1.6.x to be suboptimal

because of the lack of serialization between readiness and liveness probes. Hence, the use of readiness

probes is very limited and does not contribute positively or negatively to the level of HA that can be

achieved.

- Multiple instances: HPE considers liveness probe and fast restart sufficient to declare HA for selected CDF

components. For other components, running multiple instances is supported and will be automatically

enabled if the deployed configuration can support it.




Service / component

Docker run-time

Docker container restart-Policy

Kubernetes own HA

Kubernetes restartPolicy

Kubernetes liveness probe

Multiple instances running

Support external data-base

Kubernetes readiness probe

Bootstrap:

CoreOS

ETCD

Y when

MMMW

Y N/A N/A N/A Y when

MMMW

Always local

on master

nodes.

Replicated

when MMMW.

N/A

Bootstrap:

Hashicorp

Vault

Y when

MMMW

Y N/A N/A N/A Y when

MMMW

N/A N/A

Bootstrap:

CoreOS

Flannel

Y always

run on

every

node

Y N/A N/A N/A N, one

instance per

node

N/A N/A

HPE

Autopass

license

manager

server

Y when

MMMW

Y Y when

MMMW

Y N N N N

HPE

Autopass

Postgres

database

instance

Y when

MMMW

Y Y when

MMMW

Y Y N N/A N

HPE Identity

Manager

(IDM)

Y when

MMMW

Y Y when

MMMW

Y Y Y Y N

HPE IDM

Postgres

database

Y when

MMMW

Y Y when

MMMW

Y Y N N/A N

Boostport

kubernetes-

vault

Y when

MMMW

Y Y when

MMMW

Y Y N N/A N

Container

Deployment

Foundation

management

portal

Y when

MMMW

Y Y when

MMMW

Y Y N N/A N

Container

Deployment

Foundation

Suite installer

Y when

MMMW

Y Y when

MMMW

Y Y N Always

embedded

database

N

Container

Deployment

Foundation

Suite installer

Postgres

database

Y when

MMMW

Y Y when

MMMW

Y Y N N/A N

Local Docker

registry

Y when

MMMW

Y Y when

MMMW

Y Y N N/A N




Service / component

Docker run-time

Docker container restart-Policy

Kubernetes own HA

Kubernetes restartPolicy

Kubernetes liveness probe

Multiple instances running

Support external data-base

Kubernetes readiness probe

Kubernetes

Proxy

Y when

MMMW

Y Y when

MMMW

Y Y N N/A N

Kubernetes

API Server

Y when

MMMW

Y Y when

MMMW

Y Y Y N/A N

Kubernetes

Controller

Manager

Y when

MMMW

Y Y when

MMMW

Y Y Y N/A N

Kubernetes

Scheduler

Y when

MMMW

Y Y when

MMMW

Y Y Y N/A N

Kubernetes

DNS

Y when

MMMW

Y Y when

MMMW

Y Y Y N/A Y

Nginx

(Ingress)

Y when

MMMW

Y Y when

MMMW

Y Y Y N/A N

Hostservicesmonitoring

The following services are run directly on the host:

- Bootstrap Docker daemon

- Workload Docker daemon

- Kubernetes kubelet

These processes run as systemd services and are monitored by the systemd services subsystem.

When these processes crash, they will be automatically restarted.

Note: the Docker daemons will not be automatically restarted when they crash. This will be corrected in a future

release. Until such time, restart on failure can be added to the Docker daemons by changing the systemd service

configuration and adding the following lines in the [Service] section for the following files:

/usr/lib/systemd/system/docker-bootstrap.service

/usr/lib/systemd/system/docker.service

Lines to add in [Service] section:

Restart=on-failure RestartSec=5

Dockerruntimehighavailability

The Docker runtime itself can be considered to be highly available only in case of a multi-master, multi-worker node

CDF cluster.

In this case, foundational services and deployed suites run across multiple Docker hosts whereby Kubernetes provide

the desired / required level of high availability.

For details, please read the following sections.

Dockercontainerhighavailability

The Docker runtime has limited support for HA for containers that it runs.




Essentially, Docker containers can be decorated with a flag so that they restart automatically when they exit.

Other levels of high availability have to be achieved by software that complements the Docker runtime, such as

container orchestration, storage HA or other.

All components of the Container Deployment Foundation that run directly on Docker (thus without being scheduled or

monitored by Kubernetes or another external process), use the following restart policy:

“unless-stopped” (*)

This applies to the following components containers:

- ETCD

- Vault

- Flannel

- Keepalived

The above configuration ensures that the above containers are restarted if they terminate for any reason, except if

they were already stopped before the Docker daemon starts.

(*) Source: https://docs.docker.com/engine/reference/run/#restart-policies---restart

Kubernetes’ownhighavailability

The Kubernetes project describes HA for Kubernetes (*) requirements:

- Multiple master and multiple worker nodes

- A redundant, reliable data storage layer for Kubernetes

- Replicated Kubernetes API servers

- Leader election for controller and scheduler components

- Kubelet component on worker nodes talks to load-balanced master API server endpoint

(*) Source: https://kubernetes.io/docs/admin/high-availability/

The Kubernetes installation inside the Container Deployment Foundation supports all of the above Kubernetes HA

requirements, if the platform is installed with multiple master nodes.

Requirement Implemented support in Container Deployment Foundation

Multiple master and multiple worker

nodes

- Can be installed with multiple master and multiple worker

nodes.

- On a multiple master setup, a virtual IP address can be

defined, assigned and kept alive across multiple master

nodes.

A redundant, reliable data storage layer

for Kubernetes

- ETCD (the Kubernetes backend configuration database) is

configured to run in clustered mode with replicated data

across all master nodes.

Replicated API servers - The Kubernetes API server runs on every master node.

Leader election for controller and

scheduler components

- The “--leader-elect” flag is set to “true” for the controller and

scheduler components on each master node.

Kubelet component on worker nodes

talks to load-balanced master endpoint

- The Kubelet is configured to talk to the configured HA virtual

IP address that is kept alive and automatically assigned to

two master nodes.

ContainerDeploymentFoundationlifecyclehighavailability

See: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/




The Container Deployment Foundation consists of more foundational services than just the Kubernetes services.

These services also can be configured for various levels of HA:

- Basic Kubernetes restart policy

- Kubernetes liveness probes

- Kubernetes readiness probes

Kubernetesbasicrestartpolicy

Kubernetes will restart pods that crash. This can be influenced by configuring the restartPolicy setting in the pod

specification. If absent, the default value is “Always”.

CDF pod specifications do not explicitly contain the restartPolicy setting so the above default is applied.

This means that when containers inside pods crash, they will always be restarted. And the restart will always occur

on the same node.

Livenessandreadinessprobes

On top of the default restart, Kubernetes can restart pods when a probe fails.

Two probe types exist:

- Liveness: restart containers when a probe fails

- Readiness: only send traffic to a pod when a probe succeeds

The following CDF components have liveness probes defined:

- Kubernetes API server

- Kubernetes controller manager

- Kubernetes scheduler

- Kubernetes DNS

- CDF Management Portal

- CDF Nginx (Ingress)

The following CDF components have readiness probes defined:

- Kubernetes DNS

Note: HPE considers readiness probes functionality in Kubernetes 1.6.x to be suboptimal because of the lack of

serialization between readiness and liveness probes. Hence, the use of readiness probes is very limited and does not

contribute positively or negatively to the level of HA that can be achieved.

Multiplecomponentinstances

Additional high availability can be achieved by running multiple instances of a CDF component connected to an

external database (HA assumed) or connected to a cluster-local replicated database.

The following CDF components will run with multiple instances:

- Vault: on every master when in MMMW cluster configuration

- ETCD: on every master when in MMMW cluster configuration

- Flannel: one instance on every cluster node

- IDM: multiple instances on master node

Other CDF components HPE does not deem necessary to configure use of multiple instances.




ContainerresourcerequirementsandlimitsRequest/Limit

All Kubernetes-managed container specifications for the foundational services contain resource request and limit

configurations. (*)

- Resource request: allows the Kubernetes scheduler to improve container scheduling based on available

resources on the worker nodes.

- Resource limits: allows the Kubelet to terminate and or evict pods that exceed allowed memory and CPU.

(*) Source: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/

Resource request and limit helps in achieving high availability because:

- A container will not be scheduled to an already fully loaded node which may cause the node to run out of

resources.

- A node will not be continuously overloaded if a container exceeds its memory and/or CPU limits.

- A container that is terminated or evicted because it has exceeded its memory and/or CPU limits may be

rescheduled on another node that has a lower overall usage.

All the Kubernetes and CDF foundational services have resource request and limits defined. This is defined via the

YAML specification and typically looks like below excerpt. The actual values vary from service to service.

resources: requests: cpu: 100m memory: 100Mi limits: cpu: 100m memory: 200Mi

The following services run either directly on Docker or on the host outside of Kubernetes’ control and are unbound:

- Kubelet

- ETCD

- Vault

- Flannel

- Keepalived

ITOM Suite YAML deployment specifications also use the resource request and limit annotations to help the

Kubernetes scheduler and Kubelet in scheduling and ongoing node workload management.

Kubeletpodeviction

Kubelet will start to evict pods when they use too much system resources:

Out of the box policy:

--eviction-hard=memory.available<500Mi,nodefs.available<5Gi,imagefs.available<5Gi

The above policy means that if there is less than 500MB RAM available, Pods will be re evicted. Same if the file

system free space for volumes and logs falls below 5GB or the file system space for container runtime image storing

falls below 5 GB.

The Kubelet configuration takes a reserved amount of memory into account for system use (non-Kubernetes). By

default this is 1.5GB.

- This may seem high but it includes provisions for: the host OS, two Docker daemons, Kubelet and Vault,

ETCD and Flannel containers.

- The value includes the memory.available value from the eviction policy.




When Pods are evicted, they will be rescheduled to either the same node (provided sufficient resources have been

reclaimed) or another node with sufficient available resources.

Kubernetes details: see https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource

StoragehighavailabilityNodelocalstorageforKubernetescodeandruntimedata

The Container Deployment Foundation core binaries are located in <CDF root>.

The Docker data and log directories are located in <CDF root>/data and <CDF root>/logs.

Only the top level <CDF root> installation directory can be configured at install time through the install.properties file.

The location cannot be changed after the installation.

To increase the level of HA, the master and worker nodes /opt partition can be setup to be hosted on a form of HA

storage such as SAN or NAS arrays.

Additionally, it is recommended to use logical volume management (LVM) in the local partitioning scheme for /opt so

that the storage can be expanded very simply if the partition is in danger of running out of space.

Persistentcontainerstorage

Persistent container storage is hosted using NFS. No other persistent container storage types are supported by the

Container Deployment Foundation.

A typical Linux OS - hosted single NFS server does not offer much in terms of HA.

To increase the level of HA for the NFS storage, various options exist:

- The NFS server data partitions uses software or hardware RAID

- The NFS server data partitions is hosted on network attached storage. Example:

o HPE StoreEasy Storage (http://h20195.www2.hpe.com/v2/GetPDF.aspx%2F4AA4-7477ENW.pdf)

- The NFS server itself is virtualized and provided by the storage array software/hardware. Examples:

o HPE 3PAR File Persona Software (https://www.hpe.com/h20195/v2/GetPDF.aspx/4AA5-

6078ENW.pdf)

o Other storage vendors may offer similar capabilities. Refer to the vendor documentation for details.

- Setup an active/passive NFS server in an HA cluster. This approach is available and documented from

various Linux vendors / distributions. Links:

o Red Hat: https://access.redhat.com/documentation/en-

US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Administration/ch-nfsserver-

HAAA.html

o (*) SUSE: https://www.suse.com/documentation/sle-ha-

12/singlehtml/book_sleha_techguides/book_sleha_techguides.html

(*) SUSE Linux is not a supported OS for the Container Deployment Foundation.

Componentreplicateddatabase

The ETCD component will use a replicated database in MMMW cluster configuration. The database will be replicated

across all master nodes.

Componentexternaldatabases

The CDF components can be configured to use an external database. See also section “High availability through use

of external databases”.




Multi-masterandHAvirtualIPLeaderelectionforETCD

When installing a CDF with multiple masters, the recommended number is 3 master nodes. This is so that the leader

election for ETCD’s distributed database scheme works properly.

HAvirtualIP

All of the master nodes participate in the HA virtual IP configuration. This is so that a pre-defined IP address

configurable through the CDF installer configuration file (install.properties) can be bound to all of the master nodes

depending on which one is found to be alive or load-balance across all master nodes if they are both found to be

active.

For this to work, the installer sets up and runs a copy of keepalived and nginx load balancer to make the Kubernetes

API server highly available. They are setup to run on the workload Docker instance.

Generatingtokensandcertificatesforcontainers

When containers are being started, they needs access to:

- Securely stored configuration data

- (optional) A server certificate to use for server processes

Tokens and certificates are generated dynamically and on demand. The

token is renewed periodically.

The Pod specification for CDF and Suite containers contains annotations to

get Vault tokens and server certificates. The patterns used for this are init (get

token and certificate) and sidecar (renew certificate) containers.

When a Pod starts, the init container is infused with a Vault access token.

Then the init container requests a server certificate from Vault.

While the Pod is running, a sidecar container continuously renews the short-

lived Vault token.

HPE uses the Kubernetes-vault implementation to supply containers with

access tokens. The init container code was updated to also generate server

certificates.

Figure 17 - Token init and sidecar containers




EmbeddedPostgresdatabasehighavailability

The embedded Postgres database instances only benefit from the underlying host, container runtime and Kubernetes

HA technologies.

The embedded database instances are not configured for any form of HA through the use of multiple database

instances.

The database files are stored on the externally provisioned persistent volumes. See Storage high availability to

increase the level of HA by providing underlying HA storage.

Highavailabilitythroughuseofexternaldatabases

Note: This document does not cover setting up external database HA.

Connecting the Container Deployment Foundation services and/or an installed ITOM Suite to (an) external

database(s) can significantly increase the level of HA.

The Container Deployment Foundation itself does supports connecting its services to (an) external database(s).

Connecting ITOM Suites to (an) external database(s) is supported and described in the ITOM Suite documentation.




UpdatesOverviewExperienceandchallenges

Moving to container-based delivery intends to also simplify the update and upgrade experience for customers.

Ideally, the customer experience for installation is simple:

- Install infrastructure

- Install a suite

- Use the suite

At best, all the steps are easy. The outcome is a fully configured, integrated, working Suite.

Similarly, the customer experience for update/upgrade should be equally simple:

- A notification that there is some update or an upgrade available.

- The update is applied

- Use of the updated / upgrade Suite

The complexity involved with making installation and upgrade simple is considerable.

Updating the code is the easy part: swap a container image for another container image. The complexity comes with

database schemas, data migration, configuration data and rollback.

The Suite can be patched and upgraded.

Patching the CDF is manual or at best semi-automatic.

SuiteversionsandimageversionsDefinitions

Suite releases occur quarterly. The version naming scheme is therefore YYYY.MM.

An upgrade is moving from one quarterly release to the next; or the release after that. Skipping release upgrades and

is not supported at this time for CDF. For Suites, it may be possible to skip releases. See the Suite upgrade guide.

A suite version consists of a set of images, each with its own version. This metadata is stored also in Docker Hub as

an image.

Image versions may follow various numbering schemes:

- YYYY.MM with optional <build>

- Semantic versioning (with optional build number)

Regardless of the versioning scheme for image tags, different versions (for upgrade/update/hotfix) will be clearly

distinguishable by their image tag value. There is no need for image introspection to figure out exactly which image

version is available or running.

Update,upgrade,hotfixTypical

When the suite version changes, it is an upgrade.

When one or more image versions change for a particular suite version, it is an update.

Updates occur as and when bugs need to be fixed and this cannot wait until the next quarterly release.

Updates and upgrades are provided via the Docker Hub.




Hotfix

One special case exists: hotfix. These are customer-specific one off fixes that are also supplied via Docker Hub but

may require additional installation enablement through the regular support channels.

Types,delivery,installation

The following table lists the types of updates support.

Update type Delivery Applied how?

Upgrade Docker Hub Automated through

CDF and Suite

Update Docker Hub Automated through

CDF and Suite

Hotfix Docker Hub and

support channel

Manual or semi-

automatic via

external script

CDFvsSuiteLayersplit

The update/upgrade of the CDF is separate from the Suite. HPE identified following runtime layers in the

update/upgrade case:

Type Components in this layer Update how

Non-

containerized

Docker, Kubelet External script

Containerized

in bootstrap

docker

ETCD, Flannel, Vault External script

Workload

docker

Kubernetes, Platform update

service

External script

Workload

docker

IDM, Mgmt Portal, Heapster, External script or platform

update service and

config/update containers

Containerized

suite

Suite images, Autopass Update service and/or

config/update container(s)

UpgradingtheCDF

The 2017.03 CDF can be upgraded to 2017.06. The upgrade will update all components such as Docker,

Kubernetes, Heapster, Vault, Etcd, Flannel and the CDF core to the 2017.06 level.

The upgrade process is done by executing a script on the cluster nodes.




An upgrade should only occur when:

- An upgrade of the installed Suite to a 2017.06-compatible Suite needs to be performed.

- The installed 2017.03-compatible Suite is uninstalled and a 2017.06 Suite will be installed.

See also chapter “Runtime and install time compatibility”.

For details on the CDF upgrade, see the Administration Guide.

Runtimeandinstalltimecompatibility

Installed 2017.04|05 suite will continue to run if CDF is upgraded to 2017.06?

Can (re)install 2017.04|05 suites?

Can (re)install 2017.07|08 suites?

CDF 2017.03 fresh install

N/A Yes No

CDF 2017.06 fresh install

N/A No (*) Yes

CDF 2017.06 upgraded from CDF 2017.03

Yes No (*) Yes

(*) Special situation for CDF update from 2017.03 to 2017.06. Limitation to be removed in later CDF releases.

Patching/hotfixingtheCDF

HPE will provide updates through its support channels.

These updates will contain updated binaries, configuration files, metadata and scripts.

Customers will need to transfer the updates to the CDF nodes and run the update script(s).

Updating/upgradingtheSuite

Updating/upgrading a suite happens via the integrated suite update process.

A prerequisite for a suite upgrade may be the upgrade of the CDF first. If this is required, the Suite upgrade guide will

detail it.

This is the flow of an update or patch to an installed Suite:

- A bug is identified whose fix cannot be delayed until the next quarterly CDF release.

- A code change is made and a new fix is generated. This fix can take the form of:

o A new (binary) executable (such as these in the CDF ZIP distribution)

o An updated configuration file

o An updated YAML specification

o An updated image

§ CDF image or Suite images

§ Updated images will also always be published to Docker Hub

o All (or subset) of the above

- New images are built and provided through Docker Hub

- The customer is notified via the HPE support communication channels.

- Using the Management Portal, the user can select the Update option for the installed Suite.




- A wizard will guide the user through the suite update process.

Note: the wizard process and the user experience for a suite update or upgrade are the same.

Figure 18 - Selecting Update to start updating/upgrading a Suite version 2017.03.001

Figure 19 - A suite upgrade using the Upgrade Wizard – upgrade to 2017.06.02

Figure 20 - Suite upgraded to 2016.06.02




Figure 21 - Updating the Suite from 2017.06.02 to 2017.06.021

HotfixingtheSuite

HPE will provide updates through its support channels.

These updates will contain updated binaries, configuration files, metadata and scripts.

Customers will need to transfer the updates to a CDF master node and run the update script(s).




DatabasesBoth the CDF and a deployed Suite use databases to store their data.

The CDF components support external databases.

Suite deployments support embedded and external databases.

The following table lists the CDF databases:

Use Embedded Database External database supported?

HPE Identity Management (IDM) PostgreSQL, single instance, no

HA, no clustering

Yes

HPE Autopass License Manager PostgreSQL, single instance, no

HA, no clustering

No

HPE CDF Suite Installer PostgreSQL, single instance, no

HA, no clustering

Yes

Kubernetes data store ETCD, distributed and replicated in

multi-master CDF configurations

No

Flannel configuration ETCD, distributed and replicated in


No

Vault data backend ETCD, distributed and replicated in


No

Embeddeddatabases

The CDF uses two embedded databases:

- ETCD 3.0.17 (2)

- PostgreSQL 9.4.12 (2)

(2) Actual versions may change with updates to the 2017.06 and subsequent releases.

Embedded database are always containerized database engine instances with external storage attached to the

container to store the database files.

The container images for the CDF database instances are located under <CDF root>/images:

- itom-platform-postgresql-9.4.12-00136.tgz - etcd-amd64-3.0.17.tgz

The external storage is NFS-based and is attached to the database container using the volume as persistent volume

claim linked to persistent volume pattern. Example: autopass-pg.yaml file under <CDF root>/objectdefs.

… volumeMounts: - name: db-store mountPath: /var/pgdata subPath: baseinfra-1.0/autopass_db …




volumes: - name: db-store persistentVolumeClaim: claimName: itom-vol-claim …

The PostgreSQL database directory is mounted into the container under /var/pgdata. Its NFS volume is the one

backed by the itom-vol-claim persistent volume claim and the database files are in a subfolder called baseinfra-1.0/autopass-db.

For ETCD, as this runs directly on Docker, the mounting mechanism is Docker-specific using the -v flag but the

pattern is the same: the data is stored outside of the running container. To check the database volume mount for

ETCD (see bold):

# docker -H unix:///var/run/docker-bootstrap.sock ps | grep etcd b46ba8a5d202 gcr.io/google_containers/etcd-amd64:2.2.1 "/usr/local/bin/etcd " 2 weeks ago Up 2 weeks etcd_container # docker -H unix:///var/run/docker-bootstrap.sock inspect b46ba8a5d202 … "Mounts": [ { "Source": "<CDF root>/ssl", "Destination": "/etc/etcd/ssl", "Mode": "", "RW": true, "Propagation": "rprivate" }, { "Source": "<CDF root>/data/etcd", "Destination": "/var/etcd", "Mode": "", "RW": true, "Propagation": "rprivate" }, { "Source": "<CDF root>/log/etcd", "Destination": "/var/log", "Mode": "", "RW": true, "Propagation": "rprivate" } ] …

Externaldatabases

The CDF and Suite support using external databases. See the Installation guide for configuration details.

For Suite production use, HPE recommends external databases for the Suite deployments.

Note: Setup, configuration, (high) availability of external databases are a customer responsibility.




Backup/restoreanddisasterrecoveryData backup is a prerequisite to disaster recovery.

Accurately backing up all the data needed so it can be restored in case of a disaster is of paramount importance.

BackupandrestoreBackup

Containers provide the edge over classic product installation because of the standardized way of building and running

containerized applications.

All application components running in containers share the same common runtime and a similar and known way and

place to store external data.

In case of the CDF and the Suite, backing up all of the data is simplified as it lives in only a handful known locations:

Location Use

<CDF root>/data/etcd - Vault, flannel and Kubernetes

- CDF core (management portal, suite installer)

/var/vols/itom/core - CDF NFS external volume (container persisted

configuration and data)

o Database files

o Configuration data

Note: the location can be changed at install time via

install.properties.

<Suite NFS export folders> ; for example /var/vols/itom/hcm

- Suite NFS external volumes (container

persisted configuration and data)

o Database files

o Configuration data

Note: the location can be set as per customer

requirements.

<CDF root>/objectdefs

<CDF root>/runconf

<CDF root>/manifests

<CDF root>/ssl

<CDF root>/cfg

- CDF core container specifications

- CDF certificate CA and certificates

- Docker daemons configuration

- Autopass/IDM configuration

- API server Nginx configuration

Note: hostPath volume storage can be ignored as it is not multi-node cluster usable for persistent data.

Restore

Restoring saved data to a fresh CDF cluster is not supported.

DisasterrecoveryCattleinsteadofpets

Nodes, even master nodes, are much more like cattle than pets.




In 2012, in a presentation given at CERN (CH,

http://www.slideshare.net/gmccance/cern-data-centre-evolution) this new

service model was explained.

Instead of treating servers like an admin’s favorite (set of) unique pets,

caring for them, nursing them back to health if they get sick,

administrators simply treat servers as cattle.

Servers are almost identical, loving care is limited and if they get sick,

admins just kill it and get a fresh one.

Virtualization removes the need to deal with physical machines. Servers

can be provisioned in minutes.

Why try to save a server in a CDF multi-node cluster if that cluster is

resilient to node failure (workload is rescheduled elsewhere and/or is

highly available) and the downed server can be replaced with a fresh new

one and added back into the cluster in minutes?

CattleandCDFdisasterrecovery

In a multi-master node with multiple worker nodes, the system can tolerate

node outages up to a fairly severe level.

On multi-master nodes setups the ETCD database is distributed. So it is essentially sufficient (worst case) to reduce

the number of cluster members to one and then start adding them back in as new members.

Workers nodes are totally like cattle. If a worker node crashes or dies, it can simply be reinstalled or replaced. The

workload will be moved automatically by Kubernetes.

So for disaster recovery, the HPE strategy is to take a fresh look at traditional server management and treat all of the

machines in a cluster like cattle and not like pets.

If we take this approach to disaster recovery, things become much easier, but only if we take care of building

resilient systems that revolve around the concepts of shared nothing, stateless file systems, configuration automation

and repeatability.

As with all things, creating such systems is a journey and the CDF and Suites are underway.

Masternodefailure

Multiple,3masternodecluster

Case Recommended action

ETCD failure on one master

node (*)

- Remove node.

- If part of HA pair, reconfigure HA pair with surviving node.

- Reinstall fresh master node

- Add new master node to cluster

- ETCD will replicate onto the new node.

ETCD failure on two master

nodes (*)

- Find the healthy node

- Remove the unhealthy nodes

- If part of HA pair, reconfigure Kubernetes API server HA to disabled so

it effectively becomes a single node cluster (temporary)

- Create new master nodes

- Add the new master nodes to the cluster

- Reconfigure Kubernetes API server HA to working state.

Figure 22 – Pets or Cattle?




Total node failure one master

node (*)

- See ETCD failure on one master node

Total node failure on two

master nodes (*)

- See ETCD failure on two master nodes.

All master nodes gone - Restore from backup

… Note: HPE is working to compile a more complete list of failure cases and

mitigation.

(*) Note: HPE is working on detailed guidance for mitigation of the above listed node failure cases.

Single

Disaster recovery for single node configurations needs to be covered by backup/restore procedures.

Once HPE delivers restoring a backup functionality, this can be used as disaster recovery for a single master node

cluster.

Workernodefailure

Worker nodes are setup so that they do not contain valuable data or if they do it is only temporary and not critical

data (such as disk caches, temporary files or logs).

In case of a node failure:

- Delete the node from the cluster using kubectl delete node

- Re-install the server, re-provision a new node, re-install the CDF if the host OS is not broken (any of these

three)

- Add the node into the cluster

NFSvolumesfailure

Refer to the chapter on High availability to understand how you can guard from storage failures.

In case of non-HA storage and subsequent storage failure with partial or total loss of data, only restoring a complete

backup will allow a return to normal operation. There will be downtime as the system will be unavailable until the data

is restored and the CDF and Suite brought back online.

Embeddeddatabasefailure

- Engine failure options:

o Restart the database engine container.

o Redeploy the database engine container.

- Database corruption:

o Restore the database from a backup.

Externaldatabasefailure

This document does not cover or recommend customer procedures for disaster recovery of external databases.




Multi-tenancyNote: For multi-tenancy inside an installed Suite, refer to the Suite documentation.

Within the Container Deployment Foundation, we distinguish between administrative and technical multi-tenancy.

- Administrative: ability to separate multiple Suite installations and individually configure and manage them.

This is not supported.

- Technical multi-tenancy:

o Separation of multiple suite installations using Kubernetes namespaces. This is not supported.

o Separation of Suite and foundational services using Kubernetes namespaces. This is supported.

Kubernetesmulti-tenancyisolation

Kubernetes provides multi-tenancy isolation using Kubernetes namespaces. Applications can be deployed into

separate namespaces without interference from other namespaces.

Objects, when created in Kubernetes, can optionally have a namespace attribute. If no namespace attribute is set,

the objects are created inside the “default” namespace.

On the Container Deployment Foundation, every object must have a namespace attribute when created. An object

must exist in either a Suite namespace or the “core” namespace.

Technicalmulti-tenancyintheContainerDeploymentFoundation

In the 2017.06 release, particular namespaces are used to separate the following functionality:

- An installed ITOM Suite. The Suite is installed in a namespace whose name is auto-generated based on

the Suite acronym and a number. Example: “opsbridge1”.

- The platform foundation services. The underlying foundational services that provide authentication,

licensing, installer and management functionality. These are installed in two different namespaces called

“default” and “core”. Most of the foundation services exist in the “core” namespace. The “default” namespace

only holds the foundation services Nginx controller.

Additionalnamespaceinformation

The “default” namespace holds objects that do not have an explicit namespace definition when they are created. It

comes out of the box with Kubernetes and cannot be disabled. In the Container Deployment Foundation, the “default”

namespace should be empty except for the foundation services Nginx controller.

The “kube-system” namespace is for objects created by the Kubernetes system. In the Container Deployment

Foundation, the “kube-system” namespace should remain empty.

Kubernetes namespaces to not imply network segmentation. Network segmentation would be provided by the Flannel

network overlay and that only supports this as an experimental feature (June 2017). Network policies can also be

specified at the Kubernetes level but this is not yet supported by HPE. Network segmentation is not supported in the

Container Deployment Foundation.

DNS resolution for intra-cluster traffic does use Kubernetes namespaces. It is possible to have the same DNS name

in multiple namespaces, which can be accessed by suffixing the service name with the namespace. Internally, the

Container Deployment Foundation uses this pattern to separate core from Suite services.

Namespaces exist across all of the cluster nodes. Therefore it is possible to find both core and Suite namespace

containers and services across all nodes, except in cases where container functionality is restricted to particular

nodes with a separate, unique namespace. The latter is not the case in this release.




ElasticityWhatiselasticity?

Elasticity is the way the cluster resources scale up and down in relation to resource demand.

The Suite resource requirements vary according to a range of parameters, including but not limited to the number of

entities managed, tasks or events processed and/or the number of concurrent users.

Scaling may occur on various levels:

- Cluster: add, reconfigure or remove cluster nodes.

o Example: add one or more nodes as a result of scale up of a particular replication controller

- Kubernetes: scaling up or down of pods (through Kubernetes Deployments, Replication Controllers, Replica

Sets, PetSets or StatefulSets)

o Example: increase the number of replicas for a particular replication controller

- Suite capabilities: business use-case wrapper on top of Kubernetes and/or cluster scaling.

o Example: scale up a web server service results in scale up of particular (one or more) replication

controllers.

- Suite installation: at installation time, one of various static deployment sizes can be selected to define the

initial size of the Suite installation.

So elasticity or scaling is layered and exhibits a possible ripple effect from top to bottom:

Figure 23 - Scaling ripple effect

Scaling up a particular Suite capability requires scaling the Kubernetes controller that controls its deployment, which

in turn may require scaling the cluster by adding, reconfiguring or removing nodes.

Note: Cluster nodes can always be added independently of any sizing specifications or resource demands and this

will almost always have a positive effect on performance.

Manualvsautomaticelasticity

Ideally it should be possible to scale Suite capabilities automatically based on various parameters such as usage

patterns, time-of-day and others.

Suite capabilities can be scaled manually by directly:

- Scaling the Kubernetes controller that controls their deployment

- Scaling the cluster

Alternatively, Suites can use Pod auto-scaling to automatically scale services based on resource consumption.

Suite capabilities are always scaled indirectly because a single capability is provided by a combination of containers.

Level Directly / indirectly scaled

Makes sense to scale independently?

Can be scaled manually?

Can scale automatically?

Suite capability Indirect N/A Yes No

Suitecapability

Kubernetestype

Clusternodes

Suiteinstallation




Level Directly / indirectly scaled

Makes sense to scale independently?

Can be scaled manually?

Can scale automatically?

Kubernetes controller Direct N/A Yes Yes using Pod

horizontal auto scaling

Cluster Direct Yes. Load will be

better distributed.

Yes No

Suiteinstallationscaling

Suite installation scaling allows for selection of one of a set of predefined installation sizes.

Typically these sizes are listed as: small, medium and large.

The exact meaning of the size value differs from suite to suite. Refer to the Suite documentation for details on the

installation sizes.

Based on the selected size, the administrator needs to prepare a cluster with (just) enough nodes. Exactly how many

nodes are needed is described in the Suite documentation.

When the installation is running, the installation process will set an initial size for the Kubernetes controllers for each

Suite capability.

The initial number of cluster nodes and/or the size specification for the Kubernetes controllers can be manually

adjusted after the installation. See Suite capabilities, controller and node scaling.

Note: Some suites may not support install-time sizing. Refer to the Suite documentation to check the Suite

installation supports this.

Suitecapabilitiesscaling

To scale Suite capabilities up or down, refer to the Suite documentation.

The Suite documentation explains how a particular Suite capability can be scaled by configuring the Kubernetes

controllers scaling controls and/or adding or removing cluster nodes.

Through the use of the Reconfigure it may be possible to scale a capability without requiring direct access to the

Kubernetes controllers.

Kubernetescontrollerscaling

See also: https://kubernetes.io/docs/concepts/ | Workloads | Controllers

All Kubernetes controllers have ways to provide horizontal scaling.

The exact how-to is controller-type specific. Refer to the Kubernetes documentation for details.

The following table gives a brief overview of how to scale a Kubernetes controller.

Controller Way to scale

Replica Set Edit the replica set instance change the value for “replicas”.

Deployment Edit the deployment instance change the value for “replicas”.




Controller Way to scale

StatefulSet Edit the replica set instance change the value for “replicas”.

PetSet Edit the replica set instance change the value for “replicas”.

Daemon Set Scales automatically as nodes are added/remove by virtue of Daemon Set design and

purpose.

Note: the Suite documentation will provide information if a particular Kubernetes controller instance that is used

within a Suite deployment for a particular capability supports manual or automatic scaling.

Clusternodescaling

As a result of scaling a Suite capability and thus the Kubernetes controllers it may become necessary to add,

reconfigure or remove a node.

- Add nodes: increase the number of nodes in the Kubernetes cluster.

- Reconfigure nodes: add CPU, memory or disk to existing nodes in the cluster.

- Delete nodes: remove nodes from the cluster

Addingnodes

Cluster nodes can be manually added using the Management Portal | ADMIN | Nodes | Add Node. Cluster nodes can

also be added using the command line.

Nodes provide computing resources in the form of CPU, memory and disk.

Adding nodes allows the Kubernetes scheduler to distribute the container workload over a larger available set of

computing resources, to cope with possible rising resource demands from the installed Suite.

Once a node is added, it will be automatically used by the Kubernetes scheduler and workload will be scheduled on

it.

Reconfiguringnodes

Note: Consider cluster nodes as cattle, not as pets.

Clusters can be scaled by reconfiguring the nodes. Administrators can upgrade the hardware like swapping CPUs,

adding more RAM or upgrading the disks.

Before nodes maintenance can be performed, a node must drained off all running containers and marked so that no

new functionality will be scheduled for it.

The following steps are need to be performed:

- Mark the node as un-schedulable - no new workload to be scheduled.

- Drain the node off all running workload

- Perform the required maintenance

- Mark the node as schedulable

If the node is to be replaced completely then the steps are as follows:



- Delete the node from the cluster

- Decommission the old server

- Prepare the new server

- Install the node and add the node to the cluster




Refer to the Suite documentation for details on how to perform these tasks.

Deletingnodes

Note: Consider cluster nodes as cattle, not as pets.

Before nodes maintenance can be performed, a node must drained off all running containers and marked so that no

new functionality will be scheduled for it.

If the node is to be removed completely then the steps are as follows:



- Delete the node from the cluster

- Decommission the server




MonitoringOverview

Systems that run Docker and Kubernetes can be monitored in various ways.

Docker provides basic runtime metrics through the use of “docker stats”.

See https://docs.docker.com/engine/admin/runmetrics/.

In the CDF, monitoring occurs at the Kubernetes level.

The monitoring data is collected from Docker by cAdvisor which acts as an abstraction layer for particular container

runtime engines.

The CDF includes Heapster for cluster-wide monitoring. Heapster talks directly to Kubernetes metrics endpoints

where the raw data is provided by the Kubelet.

The Kubelet gets this information from cAdvisor.

The Kubelet information is then aggregated by Heapster and visualized via the CDF Management Portal.

See https://kubernetes.io/docs/concepts/cluster-administration/resource-usage-monitoring/ ;

https://github.com/kubernetes/heapster

HeapsterInsidetheCDF

Heapster only runs on the cluster master node(s). The Pod specification is located under <CDF

root>/objectdefs/heapster.yaml.

The visualization of the current resource usage for CPU and RAM is integrated into the CDF Management Portal.

Historical information is available through the visualization layer for 15 mins.

Heapster is not configured to dump the metrics to a database. HPE has not certified connecting Heapster to an

external database for storage (such as InfluxDB) and additional visualization and analysis (such as Grafana).

AdditionalCDFmanageability

HPE has not certified deploying and connecting any other monitoring tools to the Kubernetes metric endpoints.




LicensingSingleLicenseFile

The Container Deployment Foundation delivers a unique Licensing service that allows the Suite to simply deploy a

single license file for their Suite, injecting dynamically all the software license keys in each individual product. This

overly simplify the license management process that traditionally very complex and cumbersome when deploying the

ITOM suite as software bundle.

RedeemLicenseOffline

Based on the customer entitlements, customer can visit our HPE software entitlement site-

https://h20576.www2.hp.com/mysoftware/index – redeem, and download the exact Suite level license. Similar to

other software licenses, we offer both production and non-production licenses allowing a greater flexibility for

evaluation and staging environments.

By visiting our Suite Container Management Console, customer can then plug their downloaded license file and

proceed.

Online

If the customer has direct or proxy based access to the internet from the Suite deployment, customer can

automatically redeem and deploy the license by injecting the Suite activation code received as part of its welcome

HPE order processing emails. This will dynamically redeem, download and deploy the associated license key.

LicenseManagement

The Suite Console has a dedicated licensing section that allow our customers to add/remove/redeem licenses once

the suite has been deployed properly. Customer can manage additional entitlements based on additional purchases

and entitlements

For more information on License Management, review the Container Deployment Administrative Guide. See

Documentation section in the white paper for complete Documentation Availability




SupportabilityLogsLocations,commands

Log files are created in various places throughout the lifecycle of installation and running of CDF and Suite(s).

In this release there is:

- Node level logging for the CDF system and Kubernetes layers

- Cluster level logging for some but not all CDF core services

- Cluster level logging for most but not all Suite services

- No aggregation for CDF node level logs CDF node logs are not aggregated.

- No log rotation

Because of the various layers that make up the CDF, the logs are in a number of locations and/or require command

execution to obtain.

Essentially all container logs are available under <CDF root>/data/docker and <CDF root>/data/docker-bootstrap. It is

difficult to find the right log below this location.

The table below describes more details about the CDF processes and components:

Type Location or command Contents

Location <CDF root>/data/docker/containers All container logs for the workload

Docker instance

Location <CDF root>/data/docker-bootstrap/containers All container logs for the bootstrap

Docker instance

Location Linux syslog /var/log/messages / dmesg Kernel ring buffer contains some

messages related to Docker

Command docker logs <container ID> Stdout from inside a particular

container. Same as <CDF

root>/data/docker/containers but

obtainable via command.

Command “journalctl -u <unit name> Docker daemon logs

Location <CDF root> Installation logs

Location <CDF root>/logs/ Component logs for core

Kubernetes components

Location External NFS service CDF core and Suite logs

The following table lists the locations and the logging information for the individual processes and/or CDF

components:




Process, CDF or Suite container

Location of log files or command to run to obtain logs

CDF installation During installation under /tmp/ ; after installation under <CDF root>/install-YYYYMMDDHHMMSS.log

Bootstrap Docker instance journalctl -u docker-bootstrap.service

Workload Docker instance journalctl -u docker.service

Flannel docker -H unix:///var/run/docker-bootstrap.sock logs `docker -H unix:///var/run/docker-bootstrap.sock ps | grep flannel | cut -d " " -f 1`

Vault docker -H unix:///var/run/docker-bootstrap.sock logs `docker -H unix:///var/run/docker-bootstrap.sock ps | grep vault | cut -d " " -f 1`

ETCD Command:

docker -H unix:///var/run/docker-bootstrap.sock logs `docker -H unix:///var/run/docker-bootstrap.sock ps | grep etcd | cut -d " " -f 1`

Location:

<CDF root>/log/etcd

Keepalived docker logs `docker ps | grep keepalived | cut -d " " -f 1`

Kubernetes API server

Nginx load balancer

kubectl logs <name of API server Nginx LB pod>

Kubernetes API server Location:

<CDF root>/log/apiserver

Command:

kubectl logs <name of Kubernetes-vault pod> -n core

Kubernetes controller Location:

<CDF root>/log/controller

Command:

kubectl logs <name of Kubernetes controller pod> -n core

Kubernetes DNS kubectl logs <name of Kubernetes DNS pod> -n core

Kubernetes Proxy kubectl logs <name of Kubernetes Proxy pod> -n core

Kubernetes registry proxy kubectl logs <name of Kubernetes registry proxy pod> -n core

Kubernetes local registry kubectl logs <name of Kubernetes local registry> -n core




Process, CDF or Suite container

Location of log files or command to run to obtain logs

Kubernetes-vault kubectl logs <name of Kubernetes-vault pod> -n core

Kubernetes scheduler <CDF root>/log/scheduler

Kubernetes Kubelet journalctl -u kubelet -f

Heapster kubectl logs <name of heapster pod> -n core

CDF Autopass LMS and

database

/var/vols/itom/core/baseinfra-1.0/autopass/logs/

CDF IDM and database kubectl logs <name of IDM pod> -n core

kubectl logs <name of IDM database pod> -n core

CDF Management Portal kubectl logs <name of Management portal pod> -n core

CDF Suite Installer and

database

kubectl logs <name of Suite Installer pod> -n core

kubectl logs <name of Suite Installer database pod> -n core

CDF Nginx (Ingress) kubectl logs <name of CDF Nginx pod>

Suite images download /tmp/downloadsuiteimages-YYYYMMDDHHMMSS.log

Suite images upload /tmp/uploadsuiteimages-YYYYMMDDHHMMSS.log

Suite container Refer to the Suite installation/administration guide.

Containerlogsinpods

Pods can contain multiple containers. For kubectl commands, add –c <name of container> to get individual container

logs.

To see what containers are inside a Pod run kubectl describe pod <name of pod> then look for the Init containers: and containers: section to see what containers make up the pod. Then use the container name as

the value for the –c option on kubectl describe. See bold or container names in the example below

# kubectl describe pod mng-portal-2981853324-x8f29 -n core … Init Containers: install: … Containers: mng-portal: … kubernetes-vault-renew: …

CDFcore

The logs can be found on the external volume. For NFS, this is by default /var/vols/itom/core/*/*.log.

Logs contained in this location are CDF core services application logs such as Tomcat server and other logs.




Suite

The logs can be found on the external volume. Assuming a base directory of /var/vols/suite, the path would be

/var/vols/suite/*/*.log.

Logs contained in this location are application container logs such as Tomcat server and other logs

Configuringlogrotation

DOCKER_OPTS in the <CDF root>/cfg/docker files includes these options:

--log-opt max-size=10m --log-opt max-file=5

Option Comment

max-size=10m Every log file will grow until it reaches 10 MB. After that

the log information will roll over into a subsequent file.

max-file=5 There will be a maximum of 5 rollover log files. When

the 5th

file reaches the max-size size, the rollover will

start overwriting the oldest log file.

If a customer wants to change these values then after changing these trigger a daemon configuration reload:

systemctl daemon-reload

Monitoring

See chapter “Monitoring”

Supporttoolsupport-dumptool

The CDF includes a support tool that gathers the data discussed in the “Logs” chapter and adds additional data.

The tool is located in <CDF root>/tools/support-tool and is called support-dump.

Tooloutput

The compressed file that the support tool generates is encrypted and password-protected with a chosen password:

<CDF root>/tools/support-tool/dmp/support_data_YYYYMMDD-HHMMSS.des3

It can be made readable again using the following command:

dd if=<root file name>.des3 |openssl des3 -d -k <your_password>|tar zxf –

A log file is generated below <CDF root>/tools/support-tool/dmp.

Refer to the Suite installation/administration guide for detailed information.

Collecteddataandintendeduse

The data is intended to be handed to HPE support.

The data is a mix of output from OS commands, Docker commands and log files.

Its console output is also generated in report format.

The following data is generated:

- CDF version information




- Docker version information

- Kubernetes version information

- CDF components version information

- Bootstrap Docker running containers (ID, image, status, node)

- Workload Docker running containers (ID, image status, node)

- Cluster nodes

- Kubernetes pods (namespace, name, status, uptime, IP, node)

- Containers by pod (namespace, pod, node, image, container, container ID)

- All suite deployments (including deleted ones) (suite, version, namespace, deployment status, install data,

NFS info)

- Capability information of the deployed suite

- Docker inspect output for all containers

- Kubernetes cluster-info dump output

- Docker describe of pods

- Dump of the suite installer database

- Dump of the suite deployment metadata

- Output of various OS commands listed in <CDF root>/tools/support-tool/conf/supportdump.config.

FAQandknownissues

Refer to the Suite Troubleshooting Guide for the current FAQ. Known issues are listed in the release notes.




MigrationNote: Migration is defined in the context of this document as follows: a customer wants to migrate from a classic

ITOM product-set or classic ITOM Suite to the Next-Gen ITOM Suite.

This is not supported. Refer to the ITOM Suite documentation for more information.




Internationalization&LocalizationThe Container Deployment Foundation through its console is offered in the following localization.

• English

• French

• German

• Spanish

Based on the user browser preference, the CDF management portal will pick up the right local to display. The

Console Web application, including each Suite Installation Wizards have been localized as well.




FederationFederation is not supported. A Suite installation is bound to a single Kubernetes cluster. A Suite installation cannot be

spread across multiple Kubernetes clusters.




DocumentationThe following Container Deployment Foundation is now available online on our software documentation portal at

http://docs.software.hpe.com/

Partners and HPE employees can download directly from https://irock.jiveon.com/docs/DOC-141235

• Support Matrix

• Release Notes

• Quick Start Guide

• Troubleshooting Guide

• Installation Guide

• Administration Guide

• Open Source Guide

• AWS Deployment Guide

• Enterprise Readiness Whitepaper




SenddocumentationfeedbackIf you have comments about this document, you can send them to [email protected].

LegalnoticesWarranty

The only warranties for Hewlett Packard Enterprise products and services are set forth in the express warranty

statements accompanying such products and services. Nothing herein should be construed as constituting an

additional warranty. Hewlett Packard Enterprise shall not be liable for technical or editorial errors or omissions

contained herein. The information contained herein is subject to change without notice.

Restrictedrightslegend

Confidential computer software. Valid license from Hewlett Packard Enterprise required for possession, use or

copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software

Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor’s

standard commercial license.

Copyrightnotice

© Copyright 2015 Hewlett Packard Enterprise Development Company, L.P.

Trademarknotices

Adobe® is a trademark of Adobe Systems Incorporated.

Microsoft® and Windows® are U.S. registered trademarks of Microsoft Corporation.

Oracle and Java are registered trademarks of Oracle and/or its affiliates.

UNIX® is a registered trademark of The Open Group.

RED HAT READY™ Logo and RED HAT CERTIFIED PARTNER™ Logo are trademarks of Red Hat, Inc.

The OpenStack word mark and the Square O Design, together or apart, are trademarks or registered trademarks of

OpenStack Foundation in the United States and other countries, and are used with the OpenStack Foundation’s

permission.

Documentationupdates

The title page of this document contains the following identifying information:

• Software Version number, which indicates the software version.

• Document Release Date, which changes each time the document is updated.

• Software Release Date, which indicates the release date of this version of the software.

To check for recent updates or to verify that you are using the most recent edition of a document, go to the following

URL and sign-in or register: https://softwaresupport.HPE.com.

Select Manuals from the Dashboard menu to view all available documentation. Use the search and filter functions to

find documentation, whitepapers, and other information sources.

You will also receive updated or new editions if you subscribe to the appropriate product support service. Contact

your Hewlett Packard Enterprise sales representative for details.

Support

Visit the Hewlett Packard Enterprise Software Support Online web site at https://softwaresupport.HPE.com.




Documents

Container Deployment Foundation Enterprise readiness white ... · Container Deployment Foundation Enterprise readiness white paper Document Status: PUBLISHED Document Version Date