© Mach Technology Group Pty LtdABN 58 115 162 [email protected]
Paul Pettigrew, CEO - 27th April 2011Offices & Data Centres: Brisbane | Noosa | Cooroy | USA
Integrated Managed Services“Next Generation” of full-service ICT solution outsourcing
Design Summit / ConferenceSanta Clara, California
Firstly….Thank you• Wish to acknowledge the contributions of the
community and the sponsoring organisations
• Hope that this presentation adds value to this conference – and helps this community through understanding of how this fabulous and innovative software is being taken to market by Australia’s leading next-generation technology solutions company 2
Believe OpenStack is on the cusp of being to Cloud what
Linux was/is to the OS
Scope of Presentation• Current Context
– About Mach Technology– Who is Paul Pettigrew? Why am I here?– Underlying Facilities Platform– How we define Cloud– Cloud Security Lessons
• Integrated Technology + Services Solution– Our “NG” Project’s Business Requirements– Integrated Solution Architecture– Rollout Strategy– Our “todo wish list” for OpenStack
• Conclusion & Questions3
4
Current Context
About Mach Technology• Founded in 2005 on rising wave of Open Source,
Linux and High Performance / Federated Grid Computing – to deliver outsourced solutions, smarter, better & at lower cost than proprietary
• Lines of business and deep expertise in:– Consulting & Project Management– Mach owned & operated Data Centres, Hosting and
Cloud (inc IaaS/PaaS/SaaS)– Service Desk – Technical Services– Onsite Field Services– 24/7 automated deep monitoring and self-healing– Turnkey outsourced ICT Managed Services 5
Who & Why am I here?• Loved technology and computing, since
wrote first program on a 1” thick pile of IBM cards in the early 80’s
• F/A-18 Instructor & Fighter Pilot, also on front line of modern computing introduction into Royal Australian Air Force
• Presenting is Mach’s way to make a contribution to the community and share our hard-learned knowledge
• To learn from the experts, and take knowledge back to Australia and pass onto others in our industry & clients
• Validate and improve our strategy 6
Underlying Facilities Platform & Data Centres
7
#1 Services: Shared nothing, federated HA, 4N
Since 2005 go-live, not a single second of down time
How we define “Cloud”• Elastic: resources are re-allocated, increased or decreased on demand
(typically instantly)• Multi-tenant: compute, storage and network resources are shared to deliver
multiple platform services per each hardware component and dynamically (re)allocated as required
• Thin Provisioning: compute, storage and network assets provided via virtualised, location independent, technology agnostic platforms (no vendor lock-in) – i.e. using real hardware resources only when required
• Over Subscription: compute, storage and network assets are over-allocated but pro-actively capacity managed to have physical provisioning occur in time for actual demand
8
plus• Automation: Extreme levels of automation: provisioning, monitoring, self-
healing, patching/updating & maintaining• Open: Portable sub-systems / no proprietary lock-in• Billing: Sophisticated presentation of data to enable “to the second” invoicing
Cloud Service Layers “X-as-a-Service”
9Servers Network Storage
Virtualisation Layer
Service Management (Billing, Metering, Accounts, Monitoring, etc)
Image Libraries
Inte
grat
ion
API
User Interface
Availability and Security
Resource Management
Dynamic Workload Management
Developer API
Servers Storage Network
Backup Firewall HA Monitoring
Administrator End User Console
Application Catalogue
Custom Templates
Operating Systems ISOs
Platforms
Applications
Directory Database Message Web Other
Application A Application B Application C Application D Application E
SaaS
PaaS
IaaS
Acknowledge: based on depiction produced by Cloud.com
Sample Customer Deployment Pattern
10
Cloud Security Lessons• Current bandwagon/marketing hype is not helping• Mach has government and large commercial clients acutely aware of the issues
– Mach has solved to date, without incident – but not in fashion that meets our future automation and multi-tenant objectives
• We must leverage “education” in market– E.g. “VLAN” accepted, but no understanding of what “OpenFlow” is and if it is
“secure”– Public Key cryptography “PKI” also known and trusted
• Storage must be addressed through unified security and identity management– Easy option for encryption of files, using PKI
• We must be able to answer the question “is my data safe, secure and private” in a single word = Yes!
• Goal: build upon the acceptance of the VLAN ID “tag” concept, and apply it across all IaaS aspects to create unified “security zones”– Networking: VLAN (and also OpenFlow)– Storage: encrypted files– Server: SSH / PKI based access already proven and trusted
11Can “Cloud Security” be as simple as PKI+VLAN per “security zone”?
12
Integrated Technology +
Services Solution
Elevator Pitches• “Bill to the second, on demand”• “Australia’s first Next Generation, IPv6 XaaS Cloud
Hosting Solutions”• “Consulting, Services, XaaS, Onsite – Unified
Platform”• “Price per VM = others’ price per website”• “Single bill, single sign-on”• “Metal + Virtualisation”• “Quad Data Centres, redundant highly available
deployment”
13
Business Requirements• Vision: next-generation of full ICT service outsourcing• Principles:
– 100% free libre Open Source; commodity hardware– Mach value-adds through SI and staff service excellence– Flexible, to accommodate waves of innovation over next 5 years– Extreme levels of automation and self-healing– No real technical skill required to enact subscriptions– Moderate technical skill required to provision and maintain platform– Scale out “just-in-time” per sales/revenue– Multi-DC, federated architecture– IPv6
14
Our first line-up
15
Unified Storage Layer
Storage Node Storage Node Storage Node ... + N
Management/Storage Network Layer
Compute LayerHW Compute
Node
KVM Compute
Node
otherVMCompute
Node ... + N
Applications Layer
SaaS ... + NPlesk DNS
*
Virtual Network LayerVM FW
Appliance ... + NHW FW Appliance
Management Layer
Candidate Technologies
*OpenSolaris/ZFSiSCSI, NFS & CIFS
*Non-routed 10GE switching network
*KVM, Xen, ESX, OpenVZ, Physical
*Plesk, Alfresco, OpenERP, etc
*pfSense, m0n0wall, IPCop, BGP, DNS
*OpenQRM, Zabbix, OTRS::ITSM,
389Directory Server OpenERP, Drupal,
Magento
Provisioning & Resource
Tracking
Monitoring & Alerting Billing
Website & Online
Purchasing
Customer Access
Service Management
Bespoke/ Other
Integrated Architecture• Issue with first line-up? Too much SI• Have progressed with aspects knew would be right
(Phase 1)• Fundamental: Integration of Billing Engine,
Identity Management, Ticketing and Provisioning• Knew from experience that IaaS solution must be
simplified and integrated across Storage + Compute + Network
• Held off over past year for technology to emerge…and then we discovered OpenStack…
16
OpenStack makes it simple
17
Use Cases• Mach is not in the business of competing
with massive scale, constrained product parameters Cloud hosting
• Our value is leveraging emerging technologies in a thrifty + high quality fashion, to deliver a total solution
• Must accommodate both “metal” &/or “virtual”
• Security Lessons must be addressed• Able to abstract to 5x Use Cases…
18
Scenario 1: Single VM• Single VM connected to the internal network for storage access and the
external network for WAN• Should be able to talk L2 to management system, storage nodes and
WAN gateway, but nothing else (VM1 and VM2 cannot see each other)• Should be the default for all new VMs
19
Inte
rnal
Net
wor
k External Netw
ork
Storage nodes and
managementVM1 WAN
Gateway
L2 Zone
VM2
Scenario 2: Multiple VMs• Multiple VMs connected to the internal network for storage access and the
external network for WAN, all in the same security zone• Like scenario 1, but the VMs in the same zone should be able to talk L2 to each
other (VMs 1,2 and 3 can communicate, but cannot talk to VM4)• This is for customers with multiple VMs that need to communicate at L2 (they
can already communicate at L3 via the WAN Gateway)
20
Inte
rnal
Net
wor
k External Netw
ork
Storage nodes and
management
VM1
WAN Gateway
L2 Zone
VM4
VM2
VM3
Scenario 3: VSD• Single or multiple VMs connected to the internal network for storage
access and the external network for WAN, with a VSD in the same security zone
• All access to the VMs from the WAN filtered (bridged or routed modes supported) by VSD, VMs cannot talk to WAN gateway directly
21
Inte
rnal
Net
wor
k External Netw
ork
Storage nodes and
management
VM1
WAN Gateway
L2 Zone
VM4
VM2
VM3
VSD
Scenario 4: Physical FW• Single or multiple VMs connected to the internal network for storage
access and the external network for WAN, with a physical FW (i.e. Cisco ASA device) in the same security zone
• All access to the VMs from the WAN filtered by FW device, VMs cannot talk L2 to WAN gateway directly
22
Inte
rnal
Net
wor
k External Netw
ork
Storage nodes and
management
VM1
WAN Gateway
L2 Zone
VM4
VM2
VM3
Phys. FW
Scenario 5: Dedicated Metal• To support customer requests for non-virtualised, physical
server OS deployment (i.e. Linux/Windows running on metal) – “dedicated metal machines” (DMM)
• We want them to be configured the same as all compute nodes such that they can easily be managed in the same way and converted DMM<->VM– Any solution that gets these servers onto the virtual switching
fabric so we can control their L2 in the same way requires repatching when a DMM becomes a VM host or vice versa
• The solution is to gateway the DMM L2 through a box that is on the virtual switching fabric – see next slide
23
• Dedicated metal machines repatched to gateway through a Linux GW that places them on virtual switching fabric
• The physical switch is configured with every port in it’s own port-based VLAN, with the Linux GW port in every VLAN
• Supporting the feature will guarantee we can meet specialist/dedicated requirements of our large customers
Scenario 5: DMM Patching
24
DMM
Linux GW
Physical Switch
DMM
DMM
Virtual Switch
What was left?• Identity
– Issue is that every subsystem has its own identity/auth solution
– Critical that a centralised, multi-tenant, multi-subsystems platform exist
• Billing– Basic subscription billing for a rudimentary
Cloud hosting product stack easy– As an outsourcer, must support single invoice
• Subscription + Fixed Price or T&M + Known & Ad Hoc
25
Identity Management• Closely integrated with the Billing Engine is the idea of
Identity Management• Needs to map all of our Accounts, all of their Users and all
of the access control for which users can access which Subscriptions
• Billing can use this information for generation of XaaSbilling data
• Ticketing can use this information for linking Tickets to associated Subscription procured– E.g. Add a note “article” to a ticket for 15mins worked on “Hosted
Exchange” for Account “ABC”
• Provisioning can use this information for access control to systems/applications
26
Centralised Directory (389 HA)• Clustered 389 Directory Server for centralised
authentication, account and billing metadata– Multi-master replication of directory writes– Many active nodes for directory reads (authentications
etc)– Support multiple customers in directory hierarchy– Easily add new attributes for billing and account/profile
metadata to be used in other applications– Support SSL authentications over the Internet– Provide password and account metadata
synchronisation for Active Directory
27
Billing Architecture
28
Med
iatio
n Pr
oces
s
Billing Engine (jBilling)
PBASDNS
Plesk
VZ
OTRSWork
Orders
Timesheet
Tickets
SaaSPaaS
OpenStack
Identity Management
(389)
Billing Data
Mach API
Integrate
Integrate
Integrate
Integrate
Single InvoiceSingle Invoice
SaaS
Storage
VMs
Identity Data
Website Shopping Cart Bill Presentment & Payment
Ticket Presentment
In-house Management
Tools
Happy Customers
Networking
Rollout Plan• Phase 1 – Alignment & Initial Steps• Phase 2 – Trial Remaining Pieces• Phase 3 – BETA Launch• Phase 4 – “New Normal”
29
Phase 1 – Alignment / Initial Steps• Phase 1 – Alignment / Initial Steps
– Mediawiki for all doco, smart URL linking access from other systems (1,944 articles, 15,708 edits/updates)
– Zabbix for 24/7 monitoring & (initial) self-healing via federated HA platform (578 hosts; 17573 data items; 6325 smart trigger calculations; 5592 automated tests performed per minute)
– OTRS::ITSM for ITILv3 Service Management (2,700 tickets per month)
– Bacula for backup & recovery independent of cloud platform (42,689 backups, 0.53TB delta per day across 546 volumes)
– 389 Directory Server Cluster, deployed federated HA (~3000 customers)
– BETA: KVM based platforms for Linux/Windows/BSD/Solaris (15 hosts, 55 VMs)
30Completed & working perfectly :-)
Phase 2 – Trial Remaining Pieces
• Phase 2 – Trial Remaining Pieces–OpenStack– IPv6–Billing: Jbilling–Unified web
portal platform (Drupal)–OpenFlow &
Open vSwich
31Photo:
R&D rig, 4x compute + 2x Storage
Phase 3 – BETA Launch• Phase 3 – BETA Launch
– Phase 1 + 2 aspects completed– Across 2x DC’s (only) to prove distributed/federated
solution– Limited (spin-off brand) launch to prove in marketplace
32
Phase 4 – New Normal• Phase 4 – New Normal (no longer
“Next Generation”)– New sales onto new platform– Migrate old services
–Complete 6-9mths• Risks addressed in Phases 2/3 before the
company goes “all in”
33
OpenStack Wishlist• CAVEAT: very happy to be corrected if these points are already
addressed• Unified security zone across compute + storage + network, for each
cloud/domain– Spans multiple DC’s (low WAN comms)– Multiple clouds per account– Encryption of storage objects (files, VM disk images, etc)
• Billing units and metering– Billing platform must not need to know detailed technical operation– Abstract to universal “billing unit”, price book applied in billing platform– Bill “on demand” to the second
• Class of service – abstracted as a high level concept, then given technical meaning and billing alignment, e.g.:– Single– Cold/offline DR failover– Hot/live HA failover
• Extreme automation, especially in management of platform and patching burden– An integrated and supported toolchain 34
35
Conclusion & Questions