Upload
ylew15
View
7.641
Download
1
Tags:
Embed Size (px)
Citation preview
A DEEP LOOK INSIDE WINDOWS AZURE AND ITS VIRTUAL MACHINE
Wely Lau ([email protected]) Microsoft MVP, Windows AzureSolutions Architect, NCS Pte LtdBlog : http://wely-lau.net
AGENDA
• Introduction (10 mins) • Windows Azure Service Model (10 mins)• Fabric Controller Internal (10 mins)• Deploying a Service (15 mins) • Service Allocation and Service Healing (10 mins)• Inside Windows Azure Virtual Machine (15 mins) • Q & A (5 mins)
WHAT IS A “CLOUD”?
• Cloud: on-demand, scalable, multi-tenant, self-service compute and storage resources
TYPES OF CLOUD
• Infrastructure as a Service (IaaS): basic compute and storage resources• On-demand servers• Amazon EC2, VMWare vCloud, Rackspace
• Platform as a Service (PaaS): cloud application infrastructure• On-demand application-hosting environment• E.g. Google AppEngine, Salesforce.com, Windows Azure
• Software as a Service (SaaS): cloud applications• On-demand applications• E.g. Office 365, GMail, Microsoft Office Web Companions
CLOUD: EFFICIENCY VERSUS CONTROL
= Managed for You StandaloneServers
IaaS PaaS SaaS
Applications
Runtimes
Database
Operating System
Virtualization
Server
Storage
Networking
Windows Azure
Efficiency
Control + Cost
WINDOWS AZURE
• Windows Azure is an OS for the data center• Model: Treat the data center as a machine• Handles resource management, provisioning, and monitoring• Manages application lifecycle• Allows developers to concentrate on business logic
• Provides common building blocks for distributed applications• Reliable queuing, simple structured storage, SQL storage• Application services like access control and connectivity
WINDOWS AZURE PLATFORM BUILDING BLOCKS
Service BusAccess Control
Caching
Data SyncDatabase
Reporting
Storage• Tables• Blobs• Queues
Compute• Web Role• Worker
Role• VM Role
• Connect• Traffic
Manager
Virtual Network
FabricController
MULTI-TIER CLOUD APPLICATIONS
• A cloud application is typically made up of different components• Front end: e.g. load-balanced stateless web servers• Middle worker tier: e.g. order processing, encoding• Backend storage: e.g. SQL tables or files• Multiple instances of each for scalability and availability
Front-End
My Cloud Application
Front-End
Middle-TierMiddle-TierMiddle-
Tier
HTTP/HTTPS
WindowsAzure
Storage,SQL Azure
Load Balancer
THE WINDOWS AZURE SERVICE MODEL• A Windows Azure application is called a “service”
• Definition information (Role name, Role type, VM size, etc.)• Configuration information (# of instances, # of update domains, etc.)• At least one “role”• Your codes
• Roles are like DLLs in the service “process”• Collection of code with an entry point that runs in its own virtual
machine• There are currently three role types:
• Web Role: IIS7 and ASP.NET in Windows Azure-supplied OS• Worker Role: arbitrary code in Windows Azure-supplied OS• VM Role: uploaded VHD with customer-supplied OS
My Service
ConfigurationInstances: 2Update Domains: 2Fault Domains: 2
Role: Front-End
DefinitionType: WebVM Size: SmallEndpoints: External-1
ConfigurationInstances: 3Update Domains: 2Fault Domains: 2
Role: Middle-Tier
DefinitionType: WorkerVM Size: LargeEndpoints: Internal-1
SERVICE MODEL FILES
• Service definition is in ServiceDefinition.csdef
• Service configuration is in ServiceConfiguration.cscfg
• CSPack program Zips service binaries and definition into service package file (service.cscfg)
AVAILABILITY: UPDATE DOMAINS
• Purpose: Ensure service stays up while updating and Windows Azure OS updates
• System considers update domains when upgrading a service• 1/Update domains = percent of
service that will be offline• Default is 5 and max is 20, but you
can override with upgradeDomainCount service definition property
• The Windows Azure SLA is based on at least two update domains and two role instances in each role
Front-End-1
Front-End-2
Update Domain 1
Update Domain 2
Middle Tier-
1
Middle Tier-
2
Middle Tier-
3
Update Domain 3
Middle Tier-
3
Front-End-2Front-End-1
Middle Tier-
2
Middle Tier-
1
AVAILABILITY: FAULT DOMAINS
• Purpose: Avoid single points of failures• Similar concept to update domains• But you don’t control the updates
• Unit of failure based on data center topology• E.g. top-of-rack switch on a rack of
machines
• Windows Azure considers fault domains when allocating service roles• 2 fault domains per service• Will try and spread roles out across more• E.g. don’t put all roles in same rack
Front-End-1
Fault Domain 1
Fault Domain
2
Front-End-2
Middle Tier-2
Middle Tier-1
Fault Domain 3
Middle Tier-3
Front-End-1
Middle Tier-1
Front-End-2
Middle
Tier-2
Middle
Tier-3
“SKETCH” OF DATACENTER ARCHITECTURE
TOR
LB LBAgg
PDU
LB LBAgg
LB LBAgg
LB LBAgg
Racks
Datacenter Routers
Aggregation Routers and
Load Balancers
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
…… … …
Top of RackSwitches
Power Distribution Units
Nodes
Nodes
Nodes
Nodes
Nodes
Nodes
Nodes
Nodes
Nodes
DATACENTER CLUSTERS
• Datacenters are divided into “clusters”• Approximately 1000 rack-mounted server (we call them
“nodes”)
• Each cluster is managed by a Fabric Controller (FC)
• FC is responsible for:• Blade provisioning• Blade management• Service deployment and lifecycle
Cluster1
Cluster2
Clustern
…
Datacenter network
FC FC FC
INSIDE A CLUSTER
• FC is a distributed, stateful application running on nodes (servers) spread across fault domains• Top blades are reserved for FC• Installed by “Utility Fabric Controller”• One FC instance is the primary and all others keep view of
world in sync
• Supports rolling upgrade, and services continue to run even if FC fails entirely
TOR
FC1
… …
TOR
FC2
… …
TOR
FC3
… …
FC3
TOR
FC4
… …
TOR
FC5
… …
LB
LB AGG LBLB LB
Nodes
Rack
THE FABRIC CONTROLLER (FC)
• The “kernel” of the cloud operating system• Manages datacenter hardware• Manages Windows Azure services
• Four main responsibilities:• Datacenter resource allocation• Datacenter resource
provisioning• Service lifecycle management• Service health management
• Inputs:• Description of the hardware and network resources it
will control• Service model and binaries for cloud applications
Server
Kernel
Process
Datacenter
Fabric Controller
Service
Windows Kernel
Server
WordSQL
Server
Fabric Controller
Datacenter
ExchangeOnline
SQL Azure
(DataCenter.xml)
X
INSIDE A NODE
Fabric Controller (Primary)
FC Host Agent
Host Partition
Guest Partition
Guest Agent
Guest Partition
Guest Agent
Guest Partition
Guest Agent
Physical Node
Fabric Controller (Replica)
Fabric Controller (Replica)…
Role Instance
Role Instance
Role Instance
Trust boundary
Image Repository (OS VHDs, role ZIP files)
FABRIC VIEWER
• Used by Windows Azure Operation Team to view the fabric inside the datacenter
clusters
racks
RDFEService
Portal Service
US-North Central Datacenter
DEPLOYING A SERVICE TO THE CLOUD:THE 10,000 FOOT VIEW
• Service package uploaded to portal• Windows Azure Portal Service
passes service package to “Red Dog Front End” (RDFE) Azure service
• RDFE converts service package to native “RD” version
• RDFE sends service to Fabric Controller (FC) based on target region
• FC stores image in repository and deploys and activates service
FC
Service
SERVICE RESOURCE ALLOCATION
• Goal: allocate service components to available resources while satisfying all hard constraints • HW requirements: CPU, Memory, Storage, Network• Fault domains
• Secondary goal: Satisfy soft constraints • Prefer allocations which will simplify servicing the host
OS/hypervisor• Optimize network proximity: pack nodes
• Service allocation produces the goal state for the resources assigned to the service components• Node and VM configuration (OS, hosting environment)• Images and configuration files to deploy• Processes to start• Assign and configure network resources such as LB and VIPs
SERVICE ALLOCATION EXAMPLE
Role BCount: 2
Update Domains: 2 Size: Medium
Role ACount: 3
Update Domains: 3 Size: Large
Fault Domain 1 Fault Domain 2 Fault Domain 3
LoadBalancer
10.100.0.36
10.100.0.122
10.100.0.185
www.mycloudapp.net
www.mycloudapp.net
NODE AND ROLE HEALTH MAINTENANCE• FC maintains service availability by
monitoring the software and hardware health• Based primarily on heartbeats • Automatically “heals” affected roles
Problem How Detected Fabric Response
Role instance crashes
FC guest agent monitors role termination
FC restarts role
Guest VM or agent crashes
FC host agent notices missing guest agent heartbeats
FC restarts VM and hosted role
Host OS or agent crashes
FC notices missing host agent heartbeat
Tries to recover nodeFC reallocates roles to other nodes
Detected node hardware issue
Host agent informs FC FC migrates roles to other nodesMarks node “out for repair”
SERVICE HEALING
Role BWorker Role
Count: 2Update Domains: 2
Size: Medium
Role A – V2VM Role (Front End)Count: 3
Update Domains: 3Size: Large
LoadBalance
r10.100.0.36
10.100.0.122
10.100.0.185
www.mycloudapp.net
www.mycloudapp.net
10.100.0.191
Fault Domain 1 Fault Domain 2 Fault Domain 3
WINDOWS AZURE VM SIZES
• Each Windows Azure compute instance represents a virtual server.
• Although many resources are dedicated to a particular instance, some resources associated to I/O performance (network bandwidth and disk subsystem), are shared among the compute instances on the same physical host.
• The different instance types will provide different minimum performance from the shared resources depending on their size.
VM Size CPU Memory Instance Storage
I/O Performan
ce
Cost Per Hour
Extra Small 1.0 GHz 768 MB 20 GB Low $0.05
Small 1.6 GHz 1.75 GB 225 GB Moderate $0.12
Medium 2 x 1.6 GHz 3.5 GB 490 GB High $0.24
Large 4 x 1.6 GHz 7 GB 1,000 GB High $0.48
Extra Large
8 x 1.6 GHz 14 GB 2,040 GB High $0.96
LOCAL DRIVES
Resource Volume
OS Volume
Role Volume
Guest AgentRole HostRole Entry Point
• C: = Resource local drive (transient storage for VM)
• D: = OS drive• E: = Application’s code
(size of the package)
RUNTIME INSTALLED
• .NET 3.5 SP1• .NET 4 (RTM)• VC80 CRT (8.0.50727)• VC90 CRT (9.0.30729)• URL Rewrite Module 2.0
• VC10 CRT (e.g. MSVCR100.DLL) is not fusion-ized and can be packaged together with the application
• Others?• Java runtime
• (planned in future)
• PHP• PHP SDK for Windows Azure (“Web Platform Installer”)
• Else?• Start-up Task is your friend
OS VERSION
• Two OS currently managed by Windows Azure• Guest OS 1.x: WS08 64-bit compatible• Guest OS 2.x: WS08 R2 64-bit compatible
• Windows Azure Guest OS Releases and SDK Compatibility Matrix• http://msdn.microsoft.com/en-us/library/ee924680.aspx
PROCESSES IN WINDOWS AZURE VMWebRole
processWorkerRole
process Description
clouddrivesvc clouddrivesvc enables Windows Azure Drives
csrss (3) csrss (3) client/server runtime subsystem
explorer explorer Windows explorer
IISConfigurator a WCF named pipes service managing IIS configuration of
the web rple
LogonUI LogonUI login UI and screen switching
lsass lsass local security authority subsystem management
lsm lsm local session management
MonAgentHost
MonAgentHost DiagnosticMonitor agent supporting Azure diagnostics
msdtc msdtc Distributed Transaction Coordinator console
osdiag osdiag Remote Desktop Performance Agent
rdpclip rdpclip remote copy/paste support for Remote Desktop Services
RemoteAccessAgent
RemoteAccessAgent
Azure agent supporting Remote Desktop Services (via port 3389)
RemoteForwarderAgent
RemoteForwarderAgent
Azure agent to which all Remote Desktop traffic is routed from the load balancer; this process then routes the traffic to the specific targeted role instance
PROCESSES IN WINDOWS AZURE VM
WebRole process
WorkerRole process Description
services services management of OS servicesSLsvc SLsvc Windows software licensing servicesmss smss Windows OS session manager subsystemsvchost (17) svchost (15) host application for various Windows servicesSystem System kernelvds vds Windows server virtual disk servicevmicsvc (2) vmicsvc (2) Hyper-V guest VM integration servicesw3wp IIS Worker process hosting ASP.NET siteWaAppAgent WaAppAgent Windows Azure guest agentWaHostBootstrapper
WaHostBootstrapper Bootstrapper run by WaAppAgent
WaIISHost Web role host run by bootstrapperWaWorkerHost Worker role host by bootstrapper
wininit wininit core OS process (starts all services)winlogon (2) winlogon (2) OS login management
CONCLUSION
• The Cloud enables pay-as-you-go self-service provisioning of application resources
• Platform as a Service is all about reducing management and operations overhead
• The Windows Azure Fabric Controller is the foundation for Windows Azure’s PaaS• Provisions machines• Deploys services• Configures hardware for services• Monitors service and hardware health
• The Fabric Controller continues to evolve and improve• VM in Windows Azure are provisioned VM that’s
optimally configured running on Windows Azure Hypervisor
REFERENCES
• Inside Windows Azure• http://channel9.msdn.com/Events/PDC/PDC10/CS08
• Inside Windows Azure Virtual Machines• http://channel9.msdn.com/Events/PDC/PDC10/CS63
• Inside Windows Azure: The Cloud Operating Systems • http://channel9.msdn.com/Events/BUILD/BUILD2011/SAC-853T
• Inside The Web and Worker Role VMs• http://blogs.msdn.com/b/jimoneil/archive/2011/01/03/azure-ho
me-part-14-inside-the-webrole-and-workerrole-vms.aspx
• Windows Azure Role Architecture• http://blogs.msdn.com/b/kwill/archive/2011/05/05/windows-azu
re-role-architecture.aspx
QUESTIONS?
Wely Lau ([email protected]) Microsoft MVP, Windows AzureSolutions Architect, NCS Pte LtdBlog : http://wely-lau.net