Upload
xanto
View
101
Download
1
Embed Size (px)
DESCRIPTION
Ben Jones [email protected]. Using OpenStack and Puppet to deliver IaaS at CERN. Agile Infrastructure. Why change the operating model? Twice the compute, same staff levels New DC at Wigner, Budapest “We’re not special” - PowerPoint PPT Presentation
Citation preview
NEC'2013 3
Agile Infrastructure• Why change the operating model?
• Twice the compute, same staff levels• New DC at Wigner, Budapest
• “We’re not special”• Existence of open source tool chain:
OpenStack, puppet, foreman, kibana• “Coffee time” provisioning of cloud servers
12/9/2013
NEC'2013 412/9/2013
NEC'2013 5
New Data Centre
12/9/2013
• Data centre in Geneva at the limit of electrical capacity at 3.5MW
• New centre chosen in Budapest, Hungary
• Additional 2.7MW of usable power
• Local on-site support for hardware maintenance and installations
NEC'2013 6
What is Cloud?• Technology model
• virtualization of compute, network, storage• Operational model
• run your services in a certain way• Consumption model
• “don’t make me talk to IT”• delivered instantly* over the wire, variable price
12/9/2013
NEC'2013 712/9/2013
What is IaaS?
NEC'2013 8
Private Cloud Software
12/9/2013
• We use OpenStack, an open source cloud project http://openstack.org• ATLAS and CMS High Level Trigger clouds• HEP Clouds at BNL, IN2P3, NECTaR, FutureGrid, …• Clouds at HP, IBM, Rackspace, eBay, PayPal, Yahoo!, Comcast,
Bloomberg, Fidelity, NSA, CloudWatt, Numergy, Intel, Cisco …
NEC'2013 9
OpenStack• Apache 2.0 licensed• No “enterprise” versionOpen Source
• Open design summit• Anyone is able to define core architectureOpen Design
• GitHub• LaunchpadOpen Development
• OpenStack foundation in 2012• Now 190+ companies, 3000+ developers, 11000+ membersOpen Community
12/9/2013
12/9/2013 NEC'2013 10
Microsoft Active Directory
CERN DB on Demand
CERN Network Database
Account mgmt system
Horizon
Keystone
NetworkCompute
Glance
Scheduler
Cinder
Nova
Block Storage Provider
NEC'2013 11
Nova• Cloud computing fabric controller• Network manager modified for CERN
• integration with network database• specific to our use case, not pushed upstream
• Nova Compute aware of CERN DNS & AD• Multiple availability zones
• special zone for Hyper-V• scheduler has filter based on image distribution
metadata
12/9/2013
NEC'2013 12
Glance• Services for discovering, registering and retrieving VM images• Aim for automated image creation / update
• common process for Linux & Windows images• common tools – Aeolus Oz• CERN tools to hook up Oz & Glance API
• Images for all CERN supported OS• user defined images supported
• Initial contextualization via cloud-init• Cloudbase contributed cloud-init for windows
12/9/2013
NEC'2013 13
Keystone• Identity service: authentication, authorization and service
catalog• Full integration with Active Directory via LDAP
• CERN’s AD: 44K users & 29K groups• Minimal changes to AD• CERN submitting changes upstream
• Account mgmt. System Integration for project creation / deletion
• SSL for everything
12/9/2013
NEC'2013 1412/9/2013
NEC'2013 15
Operational practices evolving• Security incidents
• old: reinstall, new: replace with new VM• Misconfiguration requiring reboot• Resize a service
• lxplus.cern.ch add VMs to serve demand• resize VMs (or rather, replace with bigger)
• In future resize services automatically12/9/2013
NEC'2013 16
Service Models
12/9/2013
• Pets are given names like pussinboots.cern.ch • They are unique, lovingly hand raised and cared for• When they get ill, you nurse them back to health
• Cattle are given numbers like vm0042.cern.ch• They are almost identical to other cattle• When they get ill, you get another one
NEC'2013 17
Some other use cases…• Hippos are cattle with
block storage. Useful where there is redundancy, ie MongoDB, Cassandra.
• Canaries are cattle at high risk to give early warning of failures. Fail fast and fix.
12/9/2013
NEC'2013 18
Heat• Heat orchestrates composite cloud apps
(stacks)• HA (restarts resources) & “auto-scaling”
12/9/2013
NEC'2013 19
Configuration Management• Adopted puppet
• widely used, large community, scales • Needed to make reproducible services in the
CERN CC• Simplify the configuration of OpenStack
itself.• community modules from RH, puppetlabs, users
12/9/2013
NEC'2013 2012/9/2013
NEC'2013 21
Accounting• CERN computing is funded from CERN central budgets,
no billing but quotas• Experiments don’t have credit cards
• What to do when quota is exceeded?• Unused capacity?
• low SLA usage to plug the gaps?• Fair share across the cloud?
• Worked for supercomputers but heavy for clouds at scale• Bursting to public clouds?
12/9/2013
NEC'2013 22
Ceilometer• Accounting for OpenStack by project• Collects statistics from each compute node
• common OpenStack message bus• Sharded MongoDB store
• 2gb / day• HyperV in Havana• Cinder statistics upcoming
12/9/2013
NEC'2013 23
CERN Status• CERN IT OpenStack Cloud
• Folsom based service ~500 hypervisors on KVM and Hyper-V• New “grizzly” production service opened late July
• 280 hypervisors, 600 VMs, 50 projects and growing rapidly• High availability components using load balancing
• ie 3 nova controllers per cell• All Puppet managed to configure OpenStack
• LHC experiment farms• CMS currently running 1,300 hypervisors with 50,000 cores• ATLAS starting to ramp up to a similar size
• Other science grid sites moving to private cloud on OpenStack• Brookhaven, IN2P3, FutureGrid, NeCTAR, IHEP, …
12/9/2013
NEC'2013 24
Outlook• Track stable Grizzly releases in RedHat RDO
• Up to date but not too close to the leading edge• Scaling
• Expect 15,000 hypervisors, 150,000 VMs by 2015• Manageability
• Metering, Orchestration with Heat, Bare Metal• Functionality
• Load Balancing, High Availability Storage and Pets
12/9/2013
NEC'2013 25
What have we learnt?• Automate everything from the beginning
• Puppet and Stackforge are a great help• Distributions and appliances make getting started much easier
• Constant rate of change requires a different approach• Focus on core technologies and keep up to date• Track new projects but don’t adopt too early unless strategic
• Many of our users are cloud aware• Culture changes for legacy application coding and IT services
• Communities are major motivators• But administrators need to engage and adapt rather than re-invent
12/9/2013
NEC'2013 26
Conclusions• CERN IT is re-engineering to deliver additional
capacity to 11,000 physicists within fixed resources
• Clouds models can simplify current large scale computing infrastructure
• OpenStack and its ecosystem allows us to meet this challenge and help others through open source
12/9/2013
NEC'2013 27
Questions ?
12/9/2013
NEC'2013 28
Preproduction Service
12/9/2013
NEC'2013 2912/9/2013
Bamboo
Koji, Mock
AIMS/PXEForeman
Yum repoPulp
Puppet-DB
mcollective, yum
JIRA
Lemon /Hadoop /
LogStash /Kibana
git
OpenStack Nova
Hardware database
Puppet
Active Directory /LDAP
NEC'2013 30
Training for Newcomers
12/9/2013
Buy the book rather than guru mentoring
NEC'2013 31
Job Opportunities
12/9/2013