Upload
mikyung-kang
View
1.077
Download
0
Tags:
Embed Size (px)
Citation preview
General Bare-metal
Provisioning Framework
Mikyung Kang, USC/ISI David Kang, USC/ISI
Ken Igarashi, NTT Docomo Mana Kaneko, NTT Docomo
Hiromichi Ito, Virtual Tech Japan Arata Notsu, Virtual Tech Japan
Agenda ¡ Speakers/Contributors ¡ USC/ISI + NTT Docomo + Virtual Tech Japan: nova-scheduler
¡ HP: scalability and CI process
¡ NEC: security enhancement for network/volume isolation
¡ Calxeda: deployment/scalability
¡ USC/ISI: fault-tolerance
General Bare-Metal Provisioning Framework (Summit Session)
2
Copyright©2011 NTT DOCOMO, INC. All rights reserved.
New features on bare-metal provisioning framework
DOCOMO, INC All Rights Reserved
o Role of Hypervisor Ø Image Provisioning Ø VM’s Power Management Ø Network Isolation (VLAN) Ø Volume Isolation (iSCSI) Ø Console Access (VNC) Ø VM’s Snapshot
Ryota Mibu
Ken Igarashi
Virtual Machine
4
o Virtual Machines Ø Hypervisor exists between physical
resources and virtual machines
NW Storage iSCSI
NW VLAN HW
Hypervisor (OpenStack)
CPU MEM HDD NIC
Host OS
o Bare-Metal Machines Ø There is no Hypervisor Ø Bare-Metal Machine can access
physical resources freely
HW
CPU MEM HDD NIC
Need to achieve same security level as virtual environments
OS image
DB
Difference between Virtual and Bare-Metal Machines
Bare-Metal Machine
NW Storage iSCSI
NW VLAN
OS image
DB
DOCOMO, INC All Rights Reserved
Why Bare-Metal Provisioning? 5
o Virtual machine vs. Bare-metal Machine
Bare-metal machine
Nova DB
Nova-Compute (Virtual)
HW
Hypervisor
CPU MEM HDD NIC
Host OS
m1.tiny m1.midium m1.large IPMI PXE
Nova-Compute (Bare-Metal)
Virtual machine
Id : vcpus : memory_m : cpus_used … Id : vcpus : memory_m : cpus_used … Id : vcpus : memory_m : cpus_used …
CPU
MEM
HDD
m1.large CPU
MEM
HDD
b1.midium
CPU
MEM
HDD
b1.tiny
Nova DB
BM DB
Id : vcpus : memory_m : cpus_used … ipmi_address : ipmi_user : ipmi_password : pxe_mac_address
Id : vcpus : memory_m : cpus_used …
Nova-Compute (Virtual)
DOCOMO, INC All Rights Reserved
o A Nova-Compute can tell only one resource and capability to a scheduler
o Which information should be informed to a scheduler? Ø Aggregated resources :
ü Bare-Metal Nodes must be homogenous (CPU Arch, # of CPUs, # of Memory) ü Instance must be smaller than the bare-metal machine
Ø Maximum resource : inform one bare-metal machine’s resource ü Can not provision multiple instance at a time ü Can not use resources efficiently
Legacy Scheduler and Bare-Metal Provisioning
6
Nova-Scheduler
Nova-Compute (Bare-Metal)
Resources Capability
CPU
MEM
HDD
CPU
MEM
HDD
CPU
MEM
HDD
CPU MEM HDD
CPU MEM HDD
CPU MEM HDD
Aggregated Resources
CPU MEM HDD
Maximum Resources
DOCOMO, INC All Rights Reserved
o Put one Nova-Compute for one Bare-Metal machine
o 1:1 = Nova-Compute : Bare-Metal machine ⇒ Should not configure like this.
Nova-Compute2
Simple Solution
7
Nova-Compute1 Scheduler
Nova DB
MQ
capability via messaging
gets all BM-machines information
BM-machine-1
BM-machine-2
Nova-Compute1 : BM-Machine-1 Nova-Compute2 : BM-Machine-2…
BM DB
DOCOMO, INC All Rights Reserved
Expose BM-Mache to a scheduler o Nova-Compute creates/updates "compute_nodes" entries for
each BM-Machines o Nova-Compute sends a list of capabilities for each BM-Machine
o Scheduler can choose appropriate BM-Machine since all BM-Machines are exposed to the scheduler
8
Nova-Compute
Scheduler
gets all BM-machine’s information
BM DB BM-machines
via compute_nodes table
via capability message
(cpu,mem,disk)x3
(extra specs)x3
Nova DB
MQ
Scheduler Instance Request Nova-
Compute
MQ
instance_system_metadata Node : ID
PXE boot
DOCOMO, INC All Rights Reserved
Changes in Capability Management o ComputeDriver passes a list of capabilities to
ComputeManager with nodenames Ø before: {'cpu_arch': 'x86_64', ...} Ø after: [{'node': 'node-1','cpu_arch': 'x86_64', ...}, {'node':
'node-2', 'cpu_arch': 'tilera', ...},...] o ComputeManager sends them to scheduler
Ø We can reuse almost all code related to passing capability o Scheduler holds the capabilities of the nodes
Ø the dictionary keys are changed: "$host" -> "$host/$node" Ø before:{'host-1': {capability}, ...} Ø after:{'host-1/node-1': {capability}, 'host-1/node-2':
{capability}, ...}
DOCOMO, INC All Rights Reserved
Changes in Resource Tracking
o ComputeManager holds multiple ResourceTrackers corresponding to the nodes Ø In a dictionary: {'node-1': tracker1, 'node-2': tracker2, ...}
o ResourceTracker is aware of node Ø holds the name of the node Ø creates/updates an entry in compute_nodes table corresponding to the
node ü To store the name, use hypervisor_hostname? or add a new column?
o Compatibility Ø Under non-bare-metal drivers, the current behavior is unchanged
DOCOMO, INC All Rights Reserved
Other Topics
o ResourceTracker for bare-metal Ø calculates resources in all-or-nothing manner Ø specify a custom RT class by driver or by flag
ü in reviewing
o Best-match scheduling Ø choose the best-suit node rather than the largest one
o ...