Upload
gavinlee
View
1.723
Download
0
Tags:
Embed Size (px)
DESCRIPTION
PPTV is using CloudStack 3.0.2 in its production environment. Currently there are more than 150 hosts, and migrate their apps to cloud everyday (10 host per day). At the end of 2013, there will be more than 1000 hosts in a CloudStack environment.
Citation preview
CloudStack Best Practices In PPTV D e a n We i
About Me
OPS Architect at PPTV
• 3 years experience in software development and design
• 6 years experience in technical consultant(infrastructure architecture design , integration , solution , capacity planning and performance tuning) for the top insurance companies (AIG,ASR,ACE,Fortis,SNS REAAL,Chubb,GEL,SBI)
• 1 year experience in ASP(Application Service Provider) platform architecture design,security, performance analysis and optimization ,and operations
• Current focus on the automation operations architecture, cloud platform building, the large-scale distributed system operations and performance analysis and optimization ,continuous delivery, System performance tuning
SINA WEIBO (DeanWei) : http://weibo.com/deanw
Agenda
Why Cloud?
What is Cloudstack?
How to Build?
Overview Why Use Cloud ?
Why Cloudstack ?
What is CloudStack ?
How to build A Cloud-Based Infrastructure Platform?
Cloudstack Best Practices In PPTV
Deployment Architecture
Network Considerations And Design
Storage Considerations And Design
Services Offering Considerations And Design
Troubleshooting Best Practices
Performance Tuning
Background And Challenge
The Original Infrastructure Provisioning Processes
APP OPS 申请资源
IDC 查找CMDB
IDC 初始化 OS IDC 安装VM 软件
IDC 创建VM
监控Team更新Zabbix 监控
APP OPS 更新 CMDB
App OPS 安装应用
App OPS 安装中间件
App OPS 初始 VM
Tools 调整 release 配置
更改控制审批 迁移到环境 重新布线,迁移到产品环境 应用上线
Problems
A. Occupied by a large number of people
B. A large number of manual steps
C. Built one server at a time
D. Non-Self Service
E. Not out of the box by itself
F. Non-elastic
G. Path dependence
H. Long time for building
I. Many fault point
Five Characteristics of Clouds
A. On-Demand Self-Service
B. Scalable
C. Resource Pooling
D. Rapid Elasticity
E. Measured Service
Cloud technology can solve our current confusion!
Cloud-based Infrastructure Provisioning Processes
App OPS 申请应用
环境 OPS 访问
Services UI OPS 挑选应用最
近快照模板
资源自动分配和注册
选择可用资源
(验证资源分配) (选择应用模板和资源规模)
按 “启动”
(资源分配,自动创建VM,监控注册等)
(可用的资源和何时使用)
ERP CRM app
APP
App1
APP2
o Out of the box
o Parallel building
o Self Service
o One-button for All
o Elastic
Provisioned when needed
Cloud Still Requires Architectural Design
Cloud Computing isn’t a magical solution apps need to be able to scale out
Design your architecture with the end in mind
Make your infrastructure easily replicable
Popular Cloud Software Platform
Why CloudStack?
Open Source: Apache 2.0
Cloudstack User(it is proven, and has a good track record)
It is very easy to install and get up and running
Less man hours for implementation
Easy to integration and custom
Match our requirements at this stage
What is CloudStack?
Open source Infrastructure as a Service (IaaS) solution.
Programmable Data Center orchestrator
Hypervisor agnostic
Support scalable storage (Ceph, SWIF,NFS)
Support complex enterprise networking (e.g Firewall, load balancer, VPN, VPC…)
Multi-tenant
Core Components
Hosts o Servers onto which services will be
provisioned
Primary Storage o VM disk storage
Cluster o A grouping of hosts and their associated
storage
Pod o Collection of clusters in the same failure
boundary
Network o Logical network associated with service
offerings
Secondary Storage o Template, snapshot and ISO storage
Zone o Collection of pods, network offerings and
secondary storage
Management Server Farm o Management and provisioning tasks
Zone
CloudStack Pod
Cluster
Host
Host
Network
Primary Storage
VM
VM
CloudStack Pod
Cluster Secondary
Storage
Two Types of Storage
Pod 1
Host 2
Cluster 1
Host 1
Primary Storage
L3 switch
Secondary Storage
L2 switch
• Stores disk volumes for VMs in a cluster • Configured at Cluster-level. • Close to hosts for better performance • Cluster have at least one primary storage • Requires high IOPs (can be expensive)
Primary Storage
• Stores all Templates, ISOs and Snapshots • Configured at Zone-level • Zone can have one or more secondary
storages • High capacity, low cost commodity
storage
Secondary Storage
Deployment Architecture
Pod 1
….
Cluster N
L2
Host 2
Cluster 1
Host 1
Hypervisor is the basic unit of scale.
Cluster consists of one ore more hosts of same hypervisor
All hosts in cluster have access to shared (primary) storage
Pod is one or more clusters, usually with L2 switches.
Availability Zone has one or more pods, has access to secondary storage.
One or more zones represent cloud
Primary Storage
Zone 1
….
L3
Secondary Storage
Pod N
Management Server Cluster
Internet
Software Architecture
Management Server
Orchestration Engine - Drives long running VM
operations - Syncs between resources
managed and DB - Generates events
Resource Management
Cluster Management
Job Management
DB
UI Cloud Portal
CLI Other Clients
Deployment Planning
Network Gurus
Network Elements
Hypervisor Gurus
Database Access
Alert & Event Management
Plu
gin
AP
I
Resource API
Hypervisor Resources
Network Resources
Storage Resources
Image Resources
Snapshot Resources
REST API
OAM&P API End User API EC2 API Pluggable Service API Engine Other APIs
Security Adapters
Account Management Connectors
ACL & Authentication - Accounts, Domains, and Projects - ACL, limits checking
Services API
Serv
ices
AP
I
Console Proxy Management
Template Access
HA
Usage Calculations
Additional Services
Event Bus
Message Bus Usage Server
Data And Control Flow
Data Center 1
Cloud
Data Center 2
Data Center 3
Management
Server
Management Servers control all resources, both virtual and physical
SSVMs deployed to transfer data between zones
CPVMs deployed to transfer VNC console traffic
VR deployed for traffic into public internet
Management Server is never in the data path
SSVM
SSVM
SSVM Transfer of Templates,
ISOs, Snapshots
CPVM CPVM
CPVM
VR
VR
VR
Internet
How to build A Cloud-based infrastructure Platform?
A infrastructure Management Platform constitutes:
Provisioning
Configuration Management
Services Orchestration
Monitoring And Alert
How to build ?
Architecture
A programmable infrastructure architecture
Open Source ToolChains
A infrastructure Management Platform constitutes
Provisioning
Installation of operating systems and other software
Configuration Management
Sets the parameters for servers, can specify initialized parameters
Services Orchestration
Automate tasks across systems
Monitoring And Alert
Records errors and health of infrastructure
Alert Services
A Programmable Infrastructure Architecture
Open Source Provisioning Tools
Year Started License Installation Targets
Kickstart ? GPL Most .dep and RPM based Linux distros
Cobbler (Plus koan for PXE boot of VMs)
2007 GPL Red Hat, OpenSUSE Fedora, Debian, Ubuntu
Spacewalk 2008 GPL Fedora, Centos
Crowbar 2011 Apache (Bare metal provisioning)
Open Source Configuration Management Tools
Year Started
Language License Client/Server
Cfengine 1993 C Apache Yes
Chef 2009 Ruby Apache Chef Solo – No Chef Server - Yes
Puppet 2004 Ruby GPL yes
Salt 2011 Python Apache yes
Open Source Monitoring Tools
License Type of Monitoring
Collection Methods
Cacti / RRDTool
GPL Performance SNMP, syslog
Nagios GPL Availability SNMP,TCP, ICMP, IPMI, syslog
Zabbix GPL Availability/ Performance and more
SNMP, TCP/ICMP, IPMI, Synthetic Transactions
Zenoss GPL Availability, Performance, Event Management
SNMP, ICMP, SSH, syslog, WMI
Open Source Automation/Orchestration Tools
Year Started
Language
License
Client/Server
Support Organization
Capistrano 2006 Ruby MIT Yes None
Controltier/RunDeck
2010 Java Apache Yes DTO Solutions
Func 2007 Python GPL Yes Fedora Project
MCollective 2009 Ruby Apache Yes PuppetLabs
Salt 2011 Python Apache Yes SaltStack Inc. ?
Provisioning Activity Flow And Open Source Tools P
rovi
sio
nin
g A
ctiv
ity
Bootstrapping
Configuration
Command and Control
VM Image Launch
OS Install
Co
bb
ler
Clo
ud
stac
k
System Configuration
Pu
pp
et
Zab
bix
Application Services Orchestration And Management
Co
ntr
olT
ier
Serv
ice
s P
ort
al
Automated Tools Chain in PPTV
BootStrapped Image
Cobbler/CloudStack
Configuration
Puppet
Services Orchestration ControlTier/Zabbix
agent
Provision Cobbler/Cloud
stack/Koan
Monitoring zabbix Cacti
Generate Images
BoxGrinder
CMDB CMDBUILD/Ra
ckTable
Cloudstack In PPTV
CS Version : 3.0.2
Hypervisor : KVM
Host OS : Centos 6.2
KVM Guest OS : Centos 5.8
Multiple management servers are deployed in the multi-line/BGP IDC
Be deployed to all the core IDC and Used for the Non-vod business
More than 150 hosts
Primary storage : local Storage
Secondary Storage : Local NFS Server and GlusterFS
Network : Basic Network
Monitoring : Zabbix
System configuration management : Puppet
Services Orchestration management : ControlTier/Services Portal
Patches for the performance, integration and stability
Workaround for some issues
Deployment Architecture
BGP Zone
BGP IDC
BGP/Multi-line Management Farm
广州电信 IDC
GZTB Zone
Management
Server
SHTB Zone
上海电信 IDC
BJCB Zone
北京网通 IDC 成都电信 IDC
CDTB Zone
沈阳电信 IDC
SYCB Zone
Management Server Deployment Architecture
Slave
User API
Admin API
Load Balancer
Management Server1
Management Server2
MySQL
Replication
Infrastructure Resources
zone1
Infrastructure Resources
Zone2
Infrastructure Resources
Zone3
Network Considerations And Design
Using Basic Network
Custom Network offering for basic network(Only use DHCP)
Disable Iptables for performance consideration(modify Sources Code)
Disable Security Group
Multi-zone design for PrimaryStorage Performance consideration
Use Local Storage
A cluster mapping to a Host
Primary Storage
A local disk only services a VM instance
Backup VM instance as template on schedule
Using shared storage type
Separating application data and log
data to Root Volume and Data Volume
Secondary Storage
Local NFS Server
Backup Data use Inotify and Rsync
Network Card bonding
Up-link to 10G
Failover By manual
GlusterFS over NFS
Storage Considerations And Design
Pod 1
Cluster 1
Host 1 Primary Storage
L3 switch
Secondary Storage
L2 switch
Services Offering Considerations And Design
Disable HA
A disk offering bind the specified disk
A compute offering bind the specified host and disk
Provisioning Processes Best Practices
A. Install Host OS by cobber
B. Install CS agent and system settings by puppet
C. Install and configure monitor by puppet
D. Services Orchestration system trigger scripts to register host to CS
E. Services Orchestration system trigger script to generate Disk offerings and Compute offerings for Host
F. Services Orchestration system register host to CMDB
G. Host go launch
Troubleshooting Best Practices
Analyse Log files
Management Log : /var/log/cloud/management/
Agent Log : /var/log/cloud/agent/
Adjust log4j level for debugging
Source Code
Data Models
Performance Tuning
BIOS Settings for KVM Host
For Dell PowerEdge servers:
A. Set the Power Management Mode to Maximum Performance.
B. Set the CPU Power and Performance Management Mode to Maximum Performance.
C. Processor Settings: set Turbo Mode to enabled .
D. Processor Settings: set C States to disabled.
Performance Tuning (contd)
CS Tuning
NFS Server Tuning
Use NFSV4
noatime,nodiratime,noacl,data=writeback,commit=15
IDE/Sata parameters
NIC &TCP/IP
Use GlusterFS
Management Server Tuning
Increase Worker Process Number
Turn off stats collectors
Tuning Allocation Algorithm
Tuning Direct Agent Load Size
Mysql DB tuning
JVM Tuning
Heap Size Tuning
Use CMS GC Algorithm
Performance Tuning (contd)
KVM Tuning
CPU
Disable KSM in KVM Host
Disable tickless mode in KVM guest
PIN CPU in KVM host
Memory
THP in KVM Host
echo 'yes' > /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag
echo 'always'> /sys/kernel/mm/redhat_transparent_hugepage/enabled
echo 'never'> /sys/kernel/mm/redhat_transparent_hugepage/defrag
network performance issue in centos 6.2
Workaround: blacklist vhost-net. Edit /etc/modprobe.d/blacklist-kvm.conf and include vhost-net.
Linux kernel parameters tuning
TCP Buffer Tuning
Q&A