11
General Bare-metal Provisioning Framework Mikyung Kang, USC/ISI David Kang, USC/ISI Ken Igarashi, NTT Docomo Mana Kaneko, NTT Docomo Hiromichi Ito, Virtual Tech Japan Arata Notsu, Virtual Tech Japan

2012 Fall OpenStack Bare-metal Summit Session

Embed Size (px)

Citation preview

Page 1: 2012 Fall OpenStack Bare-metal Summit Session

General Bare-metal

Provisioning Framework

Mikyung Kang, USC/ISI David Kang, USC/ISI

Ken Igarashi, NTT Docomo Mana Kaneko, NTT Docomo

Hiromichi Ito, Virtual Tech Japan Arata Notsu, Virtual Tech Japan

Page 2: 2012 Fall OpenStack Bare-metal Summit Session

Agenda ¡  Speakers/Contributors ¡  USC/ISI + NTT Docomo + Virtual Tech Japan: nova-scheduler

¡  HP: scalability and CI process

¡  NEC: security enhancement for network/volume isolation

¡  Calxeda: deployment/scalability

¡  USC/ISI: fault-tolerance

General Bare-Metal Provisioning Framework (Summit Session)

2

Page 3: 2012 Fall OpenStack Bare-metal Summit Session

Copyright©2011 NTT DOCOMO, INC. All rights reserved.

New features on bare-metal provisioning framework

Page 4: 2012 Fall OpenStack Bare-metal Summit Session

DOCOMO, INC All Rights Reserved

o  Role of Hypervisor Ø  Image Provisioning Ø  VM’s Power Management Ø  Network Isolation (VLAN) Ø  Volume Isolation (iSCSI) Ø  Console Access (VNC) Ø  VM’s Snapshot

Ryota Mibu

Ken Igarashi

Virtual Machine

4

o  Virtual Machines Ø  Hypervisor exists between physical

resources and virtual machines

NW Storage iSCSI

NW VLAN HW

Hypervisor (OpenStack)

CPU MEM HDD NIC

Host OS

o  Bare-Metal Machines Ø  There is no Hypervisor Ø  Bare-Metal Machine can access

physical resources freely

HW

CPU MEM HDD NIC

Need to achieve same security level as virtual environments

OS image

DB

Difference between Virtual and Bare-Metal Machines

Bare-Metal Machine

NW Storage iSCSI

NW VLAN

OS image

DB

Page 5: 2012 Fall OpenStack Bare-metal Summit Session

DOCOMO, INC All Rights Reserved

Why Bare-Metal Provisioning? 5

o  Virtual machine vs. Bare-metal Machine

Bare-metal machine

Nova DB

Nova-Compute (Virtual)

HW

Hypervisor 

CPU MEM HDD NIC

Host OS

m1.tiny m1.midium m1.large IPMI PXE

Nova-Compute (Bare-Metal)

Virtual machine

Id : vcpus : memory_m : cpus_used … Id : vcpus : memory_m : cpus_used … Id : vcpus : memory_m : cpus_used …

CPU

MEM

HDD

m1.large CPU

MEM

HDD

b1.midium

CPU

MEM

HDD

b1.tiny

Nova DB

BM DB

Id : vcpus : memory_m : cpus_used … ipmi_address : ipmi_user : ipmi_password : pxe_mac_address

Id : vcpus : memory_m : cpus_used …

Nova-Compute (Virtual)

Page 6: 2012 Fall OpenStack Bare-metal Summit Session

DOCOMO, INC All Rights Reserved

o  A Nova-Compute can tell only one resource and capability to a scheduler

o  Which information should be informed to a scheduler? Ø  Aggregated resources :

ü Bare-Metal Nodes must be homogenous (CPU Arch, # of CPUs, # of Memory) ü  Instance must be smaller than the bare-metal machine

Ø  Maximum resource : inform one bare-metal machine’s resource ü Can not provision multiple instance at a time ü Can not use resources efficiently

Legacy Scheduler and Bare-Metal Provisioning

6

Nova-Scheduler

Nova-Compute (Bare-Metal)

Resources Capability

CPU

MEM

HDD

CPU

MEM

HDD

CPU

MEM

HDD

CPU MEM HDD

CPU MEM HDD

CPU MEM HDD

Aggregated Resources

CPU MEM HDD

Maximum Resources

Page 7: 2012 Fall OpenStack Bare-metal Summit Session

DOCOMO, INC All Rights Reserved

o  Put one Nova-Compute for one Bare-Metal machine

o  1:1 = Nova-Compute : Bare-Metal machine ⇒ Should not configure like this.

Nova-Compute2

Simple Solution

7

Nova-Compute1 Scheduler

Nova DB

MQ

capability via messaging

gets all BM-machines information

BM-machine-1

BM-machine-2

Nova-Compute1 : BM-Machine-1 Nova-Compute2 : BM-Machine-2…

BM DB

Page 8: 2012 Fall OpenStack Bare-metal Summit Session

DOCOMO, INC All Rights Reserved

Expose BM-Mache to a scheduler o Nova-Compute creates/updates "compute_nodes" entries for

each BM-Machines o Nova-Compute sends a list of capabilities for each BM-Machine

o  Scheduler can choose appropriate BM-Machine since all BM-Machines are exposed to the scheduler

8

Nova-Compute

Scheduler

gets all BM-machine’s information

BM DB BM-machines

via compute_nodes table

via capability message

(cpu,mem,disk)x3

(extra specs)x3

Nova DB

MQ

Scheduler Instance Request Nova-

Compute

MQ

instance_system_metadata Node : ID

PXE boot

Page 9: 2012 Fall OpenStack Bare-metal Summit Session

DOCOMO, INC All Rights Reserved

Changes in Capability Management o ComputeDriver passes a list of capabilities to

ComputeManager with nodenames Ø before: {'cpu_arch': 'x86_64', ...} Ø after: [{'node': 'node-1','cpu_arch': 'x86_64', ...}, {'node':

'node-2', 'cpu_arch': 'tilera', ...},...] o ComputeManager sends them to scheduler

Ø We can reuse almost all code related to passing capability o Scheduler holds the capabilities of the nodes

Ø the dictionary keys are changed: "$host" -> "$host/$node" Ø before:{'host-1': {capability}, ...} Ø after:{'host-1/node-1': {capability}, 'host-1/node-2':

{capability}, ...}

Page 10: 2012 Fall OpenStack Bare-metal Summit Session

DOCOMO, INC All Rights Reserved

Changes in Resource Tracking

o ComputeManager holds multiple ResourceTrackers corresponding to the nodes Ø  In a dictionary: {'node-1': tracker1, 'node-2': tracker2, ...}

o ResourceTracker is aware of node Ø  holds the name of the node Ø  creates/updates an entry in compute_nodes table corresponding to the

node ü To store the name, use hypervisor_hostname? or add a new column?

o Compatibility Ø Under non-bare-metal drivers, the current behavior is unchanged

Page 11: 2012 Fall OpenStack Bare-metal Summit Session

DOCOMO, INC All Rights Reserved

Other Topics

o ResourceTracker for bare-metal Ø  calculates resources in all-or-nothing manner Ø  specify a custom RT class by driver or by flag

ü  in reviewing

o  Best-match scheduling Ø  choose the best-suit node rather than the largest one

o  ...