67
Naohiro Tamura Professional Engineer Fujitsu Limited Ironic towards truly open and reliable, eventually for mission critical Copyright 2015 FUJITSU LIMITED OpenStack Summit October 2015 Tokyo Thursday, October 29 • 11:00am - 11:40am

Ironic Towards Truly Open and Reliable, Eventually for Mission Critical

Embed Size (px)

Citation preview

Naohiro Tamura

Professional Engineer

Fujitsu Limited

Ironic

towards truly open and reliable,

eventually for mission critical

Copyright 2015 FUJITSU LIMITED

OpenStack Summit October 2015 Tokyo

Thursday, October 29 • 11:00am - 11:40am

TOC

Introduction

Who am I?

What is Fujitsu good at?

Vision

Customer Values: What are the most important for customer?

Mid Term Vision: Truly Open and Reliable

Long Term Vision: Eventually Mission Critical

Contribution

What we have done and are doing

What we are going to do

Conclusion

1 Copyright 2015 FUJITSU LIMITED

TOC

Introduction

Who am I?

What is Fujitsu good at?

Vision

Customer Values: What are the most important for customer?

Mid Term Vision: Truly Open and Reliable

Long Term Vision: Eventually Mission Critical

Contribution

What we have done

What we are going to do

Conclusion

2 Copyright 2015 FUJITSU LIMITED

Who am I?

I joined Ironic Community at the beginning of Kilo development cycle, and focus on Ironic Driver Development.

Before that, I mainly worked on proprietary software developmentfor system management.

I developed bare metal provisioning and IO virtualization for N+1 server redundancyby enhancing PXE server on legacy BIOS and UEFI.

OpenStack is my first Open Source Project, it’s a whole new experience

Joyful part – working with talented and smart people

Interesting phenomenon – bikeshedding

Contacts

Email: [email protected] IRC: naohirot

3 Copyright 2015 FUJITSU LIMITED

What is Fujitsu good at?

Fujitsu sustains many social infrastructures

Banking

Stock exchange

Factory automation

Government agency system

For example, Tokyo Stock Exchange Trading System

Highly Reliable IA x64 server (PRIMEQUEST/PRIMERGY)

Open Source Linux (RHEL)

Fujitsu’s in-memory database system

Fujitsu is good at missions critical systems

4 Copyright 2015 FUJITSU LIMITED

TOC

Introduction

Who am I?

What is Fujitsu good at?

Vision

Customer Values

Mid Term Vision: Truly Open and Reliable

Long Term Vision: Eventually Mission Critical

Contribution

What we have done and are doing

What we are going to do

Conclusion

5 Copyright 2015 FUJITSU LIMITED

Customer Values

6 Copyright 2015 FUJITSU LIMITED

What are the most important thingsfor customer?

Three Customer Values We See

7 Copyright 2015 FUJITSU LIMITED

Truly open, no vendor lock-into provide customer with freedom to switch any vendor anytime

Reliable, robust, highly available system

to operate customer's business continuously

Responsive, responsible, competent support

to resolve customer's incident quickly and accurately.

Mid Term Vision : Truly Open and Reliable

8 Copyright 2015 FUJITSU LIMITED

What should truly open and reliable be?

The Android Robot logo is licensed under the terms of the Creative Commons Attribution license

CurrentUsed to be

Mid Term Vision : Truly Open and Reliable

Android as a concrete reference model

9 Copyright 2015 FUJITSU LIMITED

iOS

Android

Android

iOS

2015Q2 WW Smartphone Shipments

Source: Worldwide Smartphone Growth Expected to Slow to 10.4% in 2015, Down From 27.5% Growth in 2014, According to IDC http://www.idc.com/getdoc.jsp?containerId=prUS25860315

13.9%

82.8%

No vendor lock-in.

Customer can switch

from one vendor

to other vendor

whenever she/he wants

because of truly open

and reliable.

Current Future

Mid Term Vision : Truly Open and Reliable

OpenStack would be in the same situation as Android

10 Copyright 2015 FUJITSU LIMITED

Big FourBig Four

Customer can switch

from one cloud

to other cloud

whenever she/he wants

if it’s truly open and reliable.

Interoperability is key

for no vendor lock-in

among public, private

and hybrid cloud.

Mid Term Vision : Truly Open and Reliable

Android

Android defines Hardware

Customer can switch anytime to whichever vendor he/she likes.

From UI’s point of view, all Android smartphones have same functionality

Hardware reliability and Support Responsiveness are different among vendors.

11 Copyright 2015 FUJITSU LIMITED

Ironic

Ironic will define datacenter server hardware specification if the market accepts Ironic.

Customer will be able to switch anytime to whichever vendor he/she likes.

From API, CLI, and UI’s point of view, Ironic will have same functionality to all bare metal servers

Hardware reliability and Support Responsiveness will be different among vendors.

Situation Comparison between Android and Ironic

Mid Term Vision: Truly Open and Reliable

12 Copyright 2015 FUJITSU LIMITED

How can we achieveTruly Open and Reliable?

Mid Term Vision: Truly Open and Reliable

1. First of all, we need to complete the following table to create the same situationas Android, that is no vendor lock-in situation.

Current status as of Liberty: Ironic BMC Driver Implementation

Legend: ✔done, △ ongoing, ×not yet , - not applicable Green: Contributed, Yellow: Contributing, Pink: plan to contribute

2. And then enhance proactive/reactive features to achieve higher reliability

13 Copyright 2015 FUJITSU LIMITED

State Power On/Off Power Off to On Deploy Active Inspect Clean Zap Rescue

I/F Power Boot Deploy Mgmt Inspect Clean Raid Rescue

method

BMC

hard soft pxe vmedia iscsi agent oob ib oob iscsi agent ib oob pxe vmedia

IPMI ✔△

liberty✔ - ✔ ✔ ✔

✔kilo

-×

✔kilo

×-

× -

AMT✔kilo

×✔kilo

×✔kilo

×✔kilo

× × ××

× × × ×

DRAC ✔×

✔ × ✔ × ✔✔kilo

× × × × × × ×

iLO ✔ × ✔ ✔ ✔ ✔ ✔△

liberty

✔kilo

✔kilo

✔kilo

△liberty

×× ×

iRMC✔kilo

△liberty

✔kilo

✔liberty

✔kilo

✔liberty

✔kilo

×△

liberty

×× ×

×× ×

UCS✔

liberty

× ✔liberty

×✔

liberty

✔liberty

✔liberty

×△

liberty× × × × × ×

Long Term Vision: Eventually Mission Critical

Can you imagine that Tokyo Stock Exchange Trading System runs inside Ironic?

No, I can’t right now.Stock market involves huge amount of investment money.

Our Vision is really challenging, “Ironic for Mission Critical”.

We believe that it’s difficult to achieve this visionjust by a company.

But we believe that we can achieve this visionby a community.Because there are a lot of things to be done.

14 Copyright 2015 FUJITSU LIMITED

https://ja.wikipedia.org/wiki/%E6%9D%B1%E8%A8%BCArrows

TOC

Introduction

Who am I?

What is Fujitsu good at?

Vision

Customer Values: What are the most important for customer?

Mid Term Vision: Truly Open and Reliable

Long Term Vision: Eventually Mission Critical

Contribution

What we have done and are doing

What we are going to do

Conclusion

15 Copyright 2015 FUJITSU LIMITED

To realize the mid term vision1) Complete the table for truly open

2) Proactive/reactive features for reliability

TOC

Introduction

Who am I?

What is Fujitsu good at?

Vision

Customer Values: What are the most important for customer?

Mid Term Vision: Truly Open and Reliable

Long Term Vision: Eventually Mission Critical

Contribution

What we have done and are doing

What we are going to do

Conclusion

16 Copyright 2015 FUJITSU LIMITED

What we have done and are doing

Virtual Media Deployment• Out of Band Boot

Soft Power Off and Inject NMI• Power Control Finite State Machine

• Abort Task

What we are going to do

Rescue Mode in Tenant Network• Repair Instance Image in Cinder by Virtual Media Boot

Bare Metal N+1 Redundancy• Cold Migration by Soft Power Off and Virtual Media Boot

Virtual Media Deployment

17 Copyright 2015 FUJITSU LIMITED

What is it good for?

Virtual Media Deployment

What we are going to do

Rescue Mode in Tenant Network• Repair Instance Image in Cinder by Virtual Media Boot

Bare Metal N+1 Redundancy• Cold Migration by Soft Power Off and Virtual Media Boot

18 Copyright 2015 FUJITSU LIMITED

Virtual Media enables Out of Band (OOB) Boot

It is good for multi tenant and networked storage environment

Element Technology

Virtual Media Deployment

19 Copyright 2015 FUJITSU LIMITED

How does it work?

Note: Ironic Deploy Basics

Boot methods

PXE (network) - IB (In Band)

Virtual Media - OOB (Out Of Band)

Types of Image

Deploy image (Deploy ramdisk)

User image (Boot ramdisk, Instance boot image, OS instance)

Deploy methods

iSCSI

Ironic Python Agent (http/https)

20 Copyright 2015 FUJITSU LIMITED

CIFS/NFS

Virtual Media Deployment

How does iscsi_irmc driver work?

21 Copyright 2015 FUJITSU LIMITED

Bare Metal

ServerIronic Conductor

BMC

Image Service Ironic API

depoy.iso

floppy.img

Instance

Boot Image

Management Network

Tenant Network

1

Local

disk

1) Create virtual floppy and copy it into CIFS/NFS

CIFS/NFS

Virtual Media Deployment

How does iscsi_irmc driver work?

22 Copyright 2015 FUJITSU LIMITED

Bare Metal

ServerIronic Conductor

BMC

Image Service Ironic API

depoy.iso

floppy.img

Instance

Boot Image

Management Network

Tenant Network

1

2

Local

disk

Mount cd/fd

1) Create virtual floppy and copy it into CIFS/NFS

2) Attach virtual cdrom and floppy

CIFS/NFS

Virtual Media Deployment

How does iscsi_irmc driver work?

23 Copyright 2015 FUJITSU LIMITED

Bare Metal

ServerIronic Conductor

BMC

Image Service Ironic API

depoy.iso

floppy.img

Instance

Boot Image

Management Network

Tenant Network

1

2, 3

Local

disk

Mount cd/fd

1) Create virtual floppy and copy it into CIFS/NFS

2) Attach virtual cdrom and floppy

3) Boot from virtual cdrom

CIFS/NFS

Virtual Media Deployment

How does iscsi_irmc driver work?

24 Copyright 2015 FUJITSU LIMITED

Bare Metal

ServerIronic Conductor

BMC

Image Service Ironic API

depoy.iso

floppy.img

Instance

Boot Image

Management Network

Tenant Network

1

2, 3

4

Local

disk

Mount cd/fd

1) Create virtual floppy and copy it into CIFS/NFS

2) Attach virtual cdrom and floppy

3) Boot from virtual cdrom

4) Export local disk as iscsi target,

Call Ironic API to continue

Virtual Media Deployment

How does iscsi_irmc driver work?

25 Copyright 2015 FUJITSU LIMITED

Bare Metal

ServerIronic Conductor

BMC

Image Service Ironic APIInstance

Boot Image

Management Network

Tenant Network

1

2, 3

45

Local

disk

Mount cd/fd

1) Create virtual floppy and copy it into CIFS/NFS

2) Attach virtual cdrom and floppy

3) Boot from virtual cdrom

4) Export local disk as iscsi target,

Call Ironic API to continue5) Dispatch Ironic API call to conductor

CIFS/NFS

depoy.iso

floppy.img

Virtual Media Deployment

How does iscsi_irmc driver work?

26 Copyright 2015 FUJITSU LIMITED

Bare Metal

ServerIronic Conductor

BMC

Image Service Ironic APIInstance

Boot Image

Management Network

Tenant Network

1

2, 3

45

6

Local

disk

Mount cd/fd

1) Create virtual floppy and copy it into CIFS/NFS

2) Attach virtual cdrom and floppy

3) Boot from virtual cdrom

4) Export local disk as iscsi target,

Call Ironic API to continue5) Dispatch Ironic API call to conductor

6) Call Image service

CIFS/NFS

depoy.iso

floppy.img

Virtual Media Deployment

How does iscsi_irmc driver work?

27 Copyright 2015 FUJITSU LIMITED

Bare Metal

ServerIronic Conductor

BMC

Image Service Ironic APIInstance

Boot Image

Management Network

Tenant Network

1

2, 3

45

6

7

1) Create virtual floppy and copy it into CIFS/NFS

2) Attach virtual cdrom and floppy

3) Boot from virtual cdrom

4) Export local disk as iscsi target,

Call Ironic API to continue

6) Call Image service

7) Download boot image

Local

disk

Mount cd/fd

5) Dispatch Ironic API call to conductor

CIFS/NFS

depoy.iso

floppy.img

Virtual Media Deployment

How does iscsi_irmc driver work?

28 Copyright 2015 FUJITSU LIMITED

Bare Metal

ServerIronic Conductor

BMC

Image Service Ironic APIInstance

Boot Image

Management Network

Tenant Network

1

2, 3

45

6

7

8

8) Attach local disk by iscsi,

DD boot image to local disk

Local

disk

Mount cd/fd

1) Create virtual floppy and copy it into CIFS/NFS

2) Attach virtual cdrom and floppy

3) Boot from virtual cdrom

4) Export local disk as iscsi target,

Call Ironic API to continue

6) Call Image service

7) Download boot image

5) Dispatch Ironic API call to conductor

CIFS/NFS

depoy.iso

floppy.img

Virtual Media Deployment

How does iscsi_irmc driver work?

29 Copyright 2015 FUJITSU LIMITED

Bare Metal

ServerIronic Conductor

BMC

Image Service Ironic APIInstance

Boot Image

Management Network

Tenant Network

1

2, 3, 9

45

6

7

8

8) Attach local disk by iscsi,

DD boot image to local disk9) Boot from local disk

Local

disk

Mount cd/fd

9

1) Create virtual floppy and copy it into CIFS/NFS

2) Attach virtual cdrom and floppy

3) Boot from virtual cdrom

4) Export local disk as iscsi target,

Call Ironic API to continue

6) Call Image service

7) Download boot image

5) Dispatch Ironic API call to conductor

CIFS/NFS

depoy.iso

floppy.img

Virtual Media Deployment

How does agent_irmc driver work?

30 Copyright 2015 FUJITSU LIMITED

Bare Metal

ServerIronic Conductor

BMC

Image Service Ironic APIInstance

Boot Image

Management Network

Tenant Network

1

Local

disk

1) Create virtual floppy and copy it into CIFS/NFS

CIFS/NFS

depoy.iso

floppy.img

Virtual Media Deployment

How does agent_irmc driver work?

31 Copyright 2015 FUJITSU LIMITED

Bare Metal

ServerIronic Conductor

BMC

Image Service Ironic APIInstance

Boot Image

Tenant Network

1

2

Local

disk

Mount cd/fd

1) Create virtual floppy and copy it into CIFS/NFS

2) Attach virtual cdrom and floppy

Management Network

CIFS/NFS

depoy.iso

floppy.img

CIFS/NFS

Virtual Media Deployment

How does agent_irmc driver work?

32 Copyright 2015 FUJITSU LIMITED

Bare Metal

ServerIronic Conductor

BMC

Image Service Ironic API

depoy.iso

floppy.img

Instance

Boot Image

Tenant Network

1

2, 3

Local

disk

Mount cd/fd

1) Create virtual floppy and copy it into CIFS/NFS

2) Attach virtual cdrom and floppy

3) Boot from virtual cdrom

Management Network

Virtual Media Deployment

How does agent_irmc driver work?

33 Copyright 2015 FUJITSU LIMITED

Bare Metal

ServerIronic Conductor

BMC

Image Service Ironic APIInstance

Boot Image

Tenant Network

1

2, 3

4

Local

disk

Mount cd/fd

1) Create virtual floppy and copy it into CIFS/NFS

2) Attach virtual cdrom and floppy

3) Boot from virtual cdrom

4) Export IPA (Ironic Python Agent) API

Call Ironic API to heartbeat

Management Network

CIFS/NFS

depoy.iso

floppy.img

Virtual Media Deployment

How does agent_irmc driver work?

34 Copyright 2015 FUJITSU LIMITED

Bare Metal

ServerIronic Conductor

BMC

Image Service Ironic APIInstance

Boot Image

Tenant Network

1

2, 3

45

Local

disk

Mount cd/fd

1) Create virtual floppy and copy it into CIFS/NFS

2) Attach virtual cdrom and floppy

3) Boot from virtual cdrom

4) Export IPA (Ironic Python Agent) API

Call Ironic API to heartbeat

5) Dispatch Ironic API call to conductor

Management Network

CIFS/NFS

depoy.iso

floppy.img

Virtual Media Deployment

How does agent_irmc driver work?

35 Copyright 2015 FUJITSU LIMITED

Bare Metal

ServerIronic Conductor

BMC

Image Service Ironic APIInstance

Boot Image

Tenant Network

1

2, 3

456

Local

disk

Mount cd/fd

1) Create virtual floppy and copy it into CIFS/NFS

2) Attach virtual cdrom and floppy

3) Boot from virtual cdrom

4) Export IPA (Ironic Python Agent) API

Call Ironic API to heartbeat

5) Dispatch Ironic API call to conductor

6) Call IPA API to start boot image download

Management Network

CIFS/NFS

depoy.iso

floppy.img

Virtual Media Deployment

How does agent_irmc driver work?

36 Copyright 2015 FUJITSU LIMITED

Bare Metal

ServerIronic Conductor

BMC

Image Service Ironic APIInstance

Boot Image

Tenant Network

1

2, 3

456

Local

disk

7

Mount cd/fd

1) Create virtual floppy and copy it into CIFS/NFS

2) Attach virtual cdrom and floppy

3) Boot from virtual cdrom

4) Export IPA (Ironic Python Agent) API

Call Ironic API to heartbeat

5) Dispatch Ironic API call to conductor

6) Call IPA API to start boot image download

7) Download boot image by HTTP to local disk

Management Network

CIFS/NFS

depoy.iso

floppy.img

Virtual Media Deployment

How does agent_irmc driver work?

37 Copyright 2015 FUJITSU LIMITED

Bare Metal

ServerIronic Conductor

BMC

Image Service Ironic APIInstance

Boot Image

Tenant Network

1

2, 3

456, 8

Local

disk

7

Mount cd/fd

1) Create virtual floppy and copy it into CIFS/NFS

2) Attach virtual cdrom and floppy

3) Boot from virtual cdrom

4) Export IPA (Ironic Python Agent) API

Call Ironic API to heartbeat

5) Dispatch Ironic API call to conductor

6) Call IPA API to start boot image download

7) Download boot image by HTTP to local disk

8) Call IPA API to see if deploy has been done

Management Network

CIFS/NFS

depoy.iso

floppy.img

Virtual Media Deployment

How does agent_irmc driver work?

38 Copyright 2015 FUJITSU LIMITED

Bare Metal

ServerIronic Conductor

BMC

Image Service Ironic APIInstance

Boot Image

Tenant Network

1

2, 3, 9

456, 8

1) Create virtual floppy and copy it into CIFS/NFS

2) Attach virtual cdrom and floppy

3) Boot from virtual cdrom

4) Export IPA (Ironic Python Agent) API

Call Ironic API to heartbeat

5) Dispatch Ironic API call to conductor

6) Call IPA API to start boot image download

7) Download boot image by HTTP to local disk

8) Call IPA API to see if deploy has been done

9) Boot from local disk

Local

disk

7

Mount cd/fd

9

Management Network

CIFS/NFS

depoy.iso

floppy.img

TOC

Introduction

Who am I?

What is Fujitsu good at?

Vision

Customer Values: What are the most important for customer?

Mid Term Vision: Truly Open and Reliable

Long Term Vision: Eventually Mission Critical

Contribution

What we have done and are doing

What we are going to do

Conclusion

39 Copyright 2015 FUJITSU LIMITED

What we have done and are doing

Virtual Media Deployment• Out of Band Boot

Soft Power Off and Inject NMI• Power Control Finite State Machine

• Abort Task

What we are going to do

Rescue Mode in Tenant Network• Repair Instance Image in Cinder by Virtual Media Boot

Bare Metal N+1 Redundancy• Cold Migration by Soft Power Off and Virtual Media Boot

Usecases of Soft Power Off and Inject NMI*

In what situation or scenario does Soft Power Off help?

Unscheduled Hardware Maintenance, because cloud provider cannot logon customer’s instance.

Scheduled Hardware Maintenance, but customer didn’t shutdown

In what situation or scenario does Inject NMI help?

Cloud provider support can ask customer to provide OS dump to resolve customer's incident quickly and accurately.

Customer can investigate problem by themselves with keeping sensitive business data such as credit card number.

40 Copyright 2015 FUJITSU LIMITED

*NMI: Non-maskable interrupt https://en.wikipedia.org/wiki/Non-maskable_interrupt

Benefits of Soft Power Off and Inject NMI

Soft Power Off protects customer’s data

Current Power Control is “hard” only. Imagine in-memory database is running, it’s very dangerous operation!• ironic node-set-power-state off

Soft Power Off shuts down OS gracefully, and it’s abortable• ironic node-set-power-state soft_off

• ironic node-set-power-state abort_soft_off

Inject NMI enables responsive support

Inject NMI behaves like reboot, and take OS dump when reboot has done• ironic node-set-power-state inject_nmi

41 Copyright 2015 FUJITSU LIMITED

Soft Power Off and Inject NMI

Power State and Target Power State

Power Control is so basic, but not simple and easy to implement

$ ironic node-show-states $NODE_UUID

+------------------------+---------------------------+

| Property | Value |

+------------------------+---------------------------+

| target_power_state | None |

| target_provision_state | None |

| last_error | None |

| console_enabled | False |

| provision_updated_at | 2015-10-01T05:20:15+00:00 |

| power_state | power off |

| provision_state | available |

+------------------------+---------------------------+42 Copyright 2015 FUJITSU LIMITED

Power On | Power Off | Error

power on | power off

soft power off | inject NMI

Soft Power Off and Inject NMI

Current Implementation

No Power Control Finite State Machine such as Deployment State

43 Copyright 2015 FUJITSU LIMITED

Power ON Power OFF

Power ON

Error

Timeout | IOError

Reboot = Power Cycle (Power OFF + Power ON) Stable State Existing Target State

Power OFF

Soft Power Off and Inject NMI

Proposed Implementation

Create Power Control Finite State Machine such as Deployment State

The most difficult part is to support Abort

44 Copyright 2015 FUJITSU LIMITED

Power ON Power OFF

Power ON

Power OFF SOFT

Inject NMI Error

Abort | Timeout | IOError

Reboot = Power Cycle (Power OFF + Power ON) New Target StateStable State Existing Target State

Power OFF

Soft Power Off and Inject NMI

How to implement Abort

How to handle Abort/Cancel/Timeout/Exception of background task are common problem in concurrent programming

How should we implement in eventlet green thread?• CSP (Communication Sequential Process) Channel

• Channel Registry such as Erlang process registry

45 Copyright 2015 FUJITSU LIMITED

Database

node state

node state

node state

Ironic Conductor

Ironic API

Channel Registry

channelnode uuid

channelnode uuid

channelnode uuid

Soft Power OFF Task (green thread)

AbortTask (green thread)

1) Soft Power Off

Exit

Abort Message

Soft Power Off and Inject NMI

How to implement Abort

How to handle Abort/Cancel/Timeout/Exception of background task are common problem in concurrent programming

How should we implement in eventlet green thread?• CSP (Communication Sequential Process) Channel

• Channel Registry such as Erlang process registry

46 Copyright 2015 FUJITSU LIMITED

Database

node state

node state

node state

Ironic Conductor

Ironic API

Channel Registry

channelnode uuid

channelnode uuid

channelnode uuid

Soft Power OFF Task (green thread)

AbortTask (green thread)

1) Soft Power Off

Exit

2) Exclusive Node Lock

Abort Message

Soft Power Off and Inject NMI

How to implement Abort

How to handle Abort/Cancel/Timeout/Exception of background task are common problem in concurrent programming

How should we implement in eventlet green thread?• CSP (Communication Sequential Process) Channel

• Channel Registry such as Erlang process registry

47 Copyright 2015 FUJITSU LIMITED

Database

node state

node state

node state

Ironic Conductor

Ironic API

Channel Registry

channelnode uuid

channelnode uuid

channelnode uuid

Soft Power OFF Task (green thread)

AbortTask (green thread)

1) Soft Power Off

Exit

2) Exclusive Node Lock

3) Get Chan

Abort Message

Soft Power Off and Inject NMI

How to implement Abort

How to handle Abort/Cancel/Timeout/Exception of background task are common problem in concurrent programming

How should we implement in eventlet green thread?• CSP (Communication Sequential Process) Channel

• Channel Registry such as Erlang process registry

48 Copyright 2015 FUJITSU LIMITED

Database

node state

node state

node state

Ironic Conductor

Ironic API

Channel Registry

channelnode uuid

channelnode uuid

channelnode uuid

Soft Power OFF Task (green thread)

AbortTask (green thread)

1) Soft Power Off

4) Read Chan

Exit

2) Exclusive Node Lock

3) Get Chan

Abort Message

Soft Power Off and Inject NMI

How to implement Abort

How to handle Abort/Cancel/Timeout/Exception of background task are common problem in concurrent programming

How should we implement in eventlet green thread?• CSP (Communication Sequential Process) Channel

• Channel Registry such as Erlang process registry

49 Copyright 2015 FUJITSU LIMITED

Database

node state

node state

node state

Ironic Conductor

Ironic API

Channel Registry

channelnode uuid

channelnode uuid

channelnode uuid

Soft Power OFF Task (green thread)

AbortTask (green thread)

1) Soft Power Off

4) Read Chan

Exit

2) Exclusive Node Lock

3) Get Chan

5) AbortAbort Message

Soft Power Off and Inject NMI

How to implement Abort

How to handle Abort/Cancel/Timeout/Exception of background task are common problem in concurrent programming

How should we implement in eventlet green thread?• CSP (Communication Sequential Process) Channel

• Channel Registry such as Erlang process registry

50 Copyright 2015 FUJITSU LIMITED

Database

node state

node state

node state

Ironic Conductor

Ironic API

Channel Registry

channelnode uuid

channelnode uuid

channelnode uuid

Soft Power OFF Task (green thread)

AbortTask (green thread)

1) Soft Power Off

4) Read Chan

Exit

2) Exclusive Node Lock

3) Get Chan

5) AbortAbort Message

6) Get Chan

Soft Power Off and Inject NMI

How to implement Abort

How to handle Abort/Cancel/Timeout/Exception of background task are common problem in concurrent programming

How should we implement in eventlet green thread?• CSP (Communication Sequential Process) Channel

• Channel Registry such as Erlang process registry

51 Copyright 2015 FUJITSU LIMITED

Database

node state

node state

node state

Ironic Conductor

Ironic API

Channel Registry

channelnode uuid

channelnode uuid

channelnode uuid

Soft Power OFF Task (green thread)

AbortTask (green thread)

1) Soft Power Off

4) Read Chan

Exit

2) Exclusive Node Lock

3) Get Chan

5) AbortAbort Message

6) Get Chan

7) Send Message

Soft Power Off and Inject NMI

How to implement Abort

How to handle Abort/Cancel/Timeout/Exception of background task are common problem in concurrent programming

How should we implement in eventlet green thread?• CSP (Communication Sequential Process) Channel

• Channel Registry such as Erlang process registry

52 Copyright 2015 FUJITSU LIMITED

Database

node state

node state

node state

Ironic Conductor

Ironic API

Channel Registry

channelnode uuid

channelnode uuid

channelnode uuid

Soft Power OFF Task (green thread)

AbortTask (green thread)

1) Soft Power Off

4, 8) Read Chan

Exit

2) Exclusive Node Lock

3) Get Chan

5) AbortAbort Message

6) Get Chan

7) Send Message

Soft Power Off and Inject NMI

How to implement Abort

How to handle Abort/Cancel/Timeout/Exception of background task are common problem in concurrent programming

How should we implement in eventlet green thread?• CSP (Communication Sequential Process) Channel

• Channel Registry such as Erlang process registry

53 Copyright 2015 FUJITSU LIMITED

Database

node state

node state

node state

Ironic Conductor

Ironic API

Channel Registry

channelnode uuid

channelnode uuid

channelnode uuid

Soft Power OFF Task (green thread)

AbortTask (green thread)

1) Soft Power Off

4, 8) Read Chan

Exit

2) Exclusive Node Lock

3) Get Chan

5) AbortAbort Message

6) Get Chan

7) Send Message

9) If Abort

Message

Soft Power Off and Inject NMI

How to implement Abort

How to handle Abort/Cancel/Timeout/Exception of background task are common problem in concurrent programming

How should we implement in eventlet green thread?• CSP (Communication Sequential Process) Channel

• Channel Registry such as Erlang process registry

54 Copyright 2015 FUJITSU LIMITED

Database

node state

node state

node state

Ironic Conductor

Ironic API

Channel Registry

channelnode uuid

channelnode uuid

channelnode uuid

Soft Power OFF Task (green thread)

AbortTask (green thread)

1) Soft Power Off

4, 8) Read Chan

Exit

2) Exclusive Node Lock

10) Unock

Node

3) Get Chan

5) AbortAbort Message

6) Get Chan

7) Send Message

9) If Abort

Message

TOC

Introduction

Who am I?

What is Fujitsu good at?

Vision

Customer Values: What are the most important for customer?

Mid Term Vision: Truly Open and Reliable

Long Term Vision: Eventually Mission Critical

Contribution

What we have done and are doing

What we are going to do

Conclusion

55 Copyright 2015 FUJITSU LIMITED

What we have done and are doing

Virtual Media Deployment• Out of Band Boot

Soft Power Off and Inject NMI• Power Control Finite State Machine

• Abort Task

What we are going to do

Rescue Mode in Tenant Network• Repair Instance Image in Cinder by Virtual Media Boot

Bare Metal N+1 Redundancy• Cold Migration by Soft Power Off and Virtual Media Boot

Multi Tenant, and

Networked storage

Environment

Rescue Mode in Tenant Network

Rescue Usecase in Multi Tenant Support

The instance image is deployed by pxe (In Band) boot and flip network

56 Copyright 2015 FUJITSU LIMITED

Management Network

Tenant Network

Bare Metal

Server 1

BMC

Cinder

Instance

Boot Image

Deploy Network

L2 Switch

NeutronIronic ConductorDeploy

Image

What if the instance is damaged?

Flip NetworkPXE boot

Multi Tenant, and

Networked storage

Environment

Rescue Mode in Tenant Network

Multi Tenant Network Support – Provider Network

Rescue Image needs Rescue Network and Tenant Network

Bare Metal Server 1 now has different network configuration from the production environment which could make rescue difficult

57 Copyright 2015 FUJITSU LIMITED

Ironic Conductor

Tenant Network

Bare Metal

Server 1

BMC

Cinder

Instance

Boot Image

Rescue Network

Rescue

Image

L2 Switch

Neutron

Management Network

Fix the damaged instance

in different network configuration

Multi Tenant, and

Networked storage

Environment

Rescue Mode in Tenant Network

Virtual Media Boot provides Out Of Band Rescue Modewith the same tenant network configuration as the real Instance

58 Copyright 2015 FUJITSU LIMITED

Management Network

Tenant Network

Bare Metal

Server 1

BMC

Rescue Network

L2 Switch

Neutron

CIFS/NFS

rescue.iso

Ironic ConductorRescue

Image

Fix the damaged instance

in same network configuration

Cinder

Instance

Boot Image

TOC

Introduction

Who am I?

What is Fujitsu good at?

Vision

Customer Values: What are the most important for customer?

Mid Term Vision: Truly Open and Reliable

Long Term Vision: Eventually Mission Critical

Contribution

What we have done and are doing

What we are going to do

Conclusion

59 Copyright 2015 FUJITSU LIMITED

What we have done and are doing

Virtual Media Deployment• Out of Band Boot

Soft Power Off and Inject NMI• Power Control Finite State Machine

• Abort Task

What we are going to do

Rescue Mode in Tenant Network• Repair Instance Image in Cinder by Virtual Media Boot

Bare Metal N+1 Redundancy• Cold Migration by Soft Power Off and Virtual Media Boot

Multi Tenant, and

Networked storage

Environment

CIFS/NFS

Bare Metal N+1 Redundancy

Cold Migration by Soft Power Off and Virtual Media Boot

60 Copyright 2015 FUJITSU LIMITED

Ironic Conductor Bare Metal

Server 2

BMC

Cinder

Instance

Boot Image

Management Network

Tenant Network

Bare Metal

Server N

BMC

Bare Metal

Server 1

BMC

Bare Metal

Server N+1

BMC

migration.iso

floppy.img

1) Bare Metal Server 1 is running normally 3) % ironic cold-migration “Bare Metal Server 1”

2) A sign of failure is detected

Spare server

Multi Tenant, and

Networked storage

Environment

CIFS/NFS

Bare Metal N+1 Redundancy

Cold Migration by Soft Power Off and Virtual Media Boot

61 Copyright 2015 FUJITSU LIMITED

Ironic Conductor Bare Metal

Server 2

BMC

Cinder

Instance

Boot Image

Management Network

Tenant Network

Bare Metal

Server N

BMC

Bare Metal

Server 1

BMC

Bare Metal

Server N+1

BMC

migration.iso

floppy.img3) % ironic cold-migration “Bare Metal Server 1” 4) Graceful shutdown by

Soft Power Off

Spare server

Multi Tenant, and

Networked storage

Environment

CIFS/NFS

Bare Metal N+1 Redundancy

Cold Migration by Soft Power Off and Virtual Media Boot

62 Copyright 2015 FUJITSU LIMITED

Ironic Conductor Bare Metal

Server 2

BMC

Cinder

Instance

Boot Image

Management Network

Tenant Network

Bare Metal

Server N

BMC

Bare Metal

Server 1

BMC

Bare Metal

Server N+1

BMC

migration.iso

floppy.img

5) Boot migration.iso

from Virtual Media,

and set IO to attach Cinder

Spare server

3) % ironic cold-migration “Bare Metal Server 1”

Multi Tenant, and

Networked storage

Environment

CIFS/NFS

Bare Metal N+1 Redundancy

Cold Migration by Soft Power Off and Virtual Media Boot

63 Copyright 2015 FUJITSU LIMITED

Ironic Conductor Bare Metal

Server 2

BMC

Cinder

Instance

Boot Image

Management Network

Tenant Network

Bare Metal

Server N

BMC

Bare Metal

Server 1

BMC

Bare Metal

Server N+1

BMC

migration.iso

floppy.img

6) Reboot from the same instance boot

Image in Cinder

Spare server

3) % ironic cold-migration “Bare Metal Server 1”

Recap

64 Copyright 2015 FUJITSU LIMITED

Mid Term VisonCostumer Values Contribution

Long Term Vison: Eventually for Mission Critical

Truly Open

Reliable

Truly open,

no vendor lock-in

Reliable, robust,

highly available system

Responsive, responsible,

competent support

Virtual Media Deployment

Rescue Mode

in Tenant Network

Soft Power Off and

Inject NMI

Bare Metal

N+1 Redundancy

Proactive

Reactive

Conclusion

Thanks!

Q&A

65 Copyright 2015 FUJITSU LIMITED

Let’s make Ironic great product together!

66