Towards the Cloud: Architecture Patterns and VDI Story

Preview:

Citation preview

Towards the Cloud:

Architecture patterns

and VDI story

#me

Hieu LE◎ hieulq19@gmail.com◎ https://github.com/vietstacker◎ Skype: hieulq89◎ Community leader Vietnam OPS◎ Technical Manager @ VRD◎ FB.com/vietstack

Interest● Distributed System● HPC

Cloud Computing

OpenStack

CloudStack

.e.g.

Favorite Language

Python

C++

50 cent for PR

Mistral - Kuryu - Murano - Magnum - Barbican - Solum ..

Vietnam OpenStack Community

◎ Official User Group @ Vietnam◎ 7 meetups @ Hanoi, HCM◎ Sponsored by VietStack, DTT,

VCCorp, Fujitsu and OPS Foundation◎ FB/vietstack◎ groups.openstack.

org/groups/vietnam-vietopenstack◎ Co-op with other

Cloud/Virtualization/Container UG: Cloud Computing VN, DockerHN..

Here we are

VOPS

Agenda ~ 1.5hTowards the Cloud

[WHAT] Cloud services and deploy models [10%]

[WHOSE] Cloud which we will toward [10%]

[HOW] Cloud Architecture patterns (for appliances and system) ~

User Story

VDI [80%]

➔ Approach➔ Cost and Effective➔ Architecture

Towards the cloudCloud 101

101

“CLOUD COMPUTING

?

“First to mind when asked what ‘the cloud’ is, a majority respond it’s either an actual cloud, the sky, or something related to weather.” - Citrix Cloud Survey - 2012

Cloud Deploy Models

◎ Public Cloud◎ Private Cloud◎ Hybrid Cloud

Federate Cloud

Cloud Deploy Models

Public Cloud

◎ Access anywhere, anytime.◎ Unit: VM/Instance; Bandwidth..◎ Amazon EC2, RackSpace, VCCloud, Z.

Com, FPT Public Cloud, GApp, Heroku, M$ 365

◎ Public Cloud vs {VPS, Web Builder, WebApp (EyeOS)} ??

“I don’t need a hard disk in my computer if I can get to the server faster… carrying around these non-connected computers is byzantine by comparison.” - Steve Jobs

Cloud Deploy Models

Private Cloud◎ On-premise solution for

managing internal IT resources.◎ Unit: VM/Instance◎ OpenStack, CloudStack, Euca

Cloud Deploy Models

Hybrid Cloud

Federate Cloud

A co-op game between multi cloud providers for sharing.◎ Cloud Standardization: AWS,

OCCI, CDMI.◎ Policies ?◎ EGI Federate Cloud, ACCA

Cloud Services

◎ IaaS◎ PaaS◎ SaaS

*XaaS* → whatever as a services.(Just marketing terms.)

Cloud Services

◎ IaaS◎ PaaS◎ SaaS

Cloud Services

IaaS is the must-have-first factor before thinking about PaaS or SaaS.

So what about the term “in da cloud” ?

◎ Many companies claim that they have cloud in the product.

◎ Do your company have one ?

Moving to Cloud

“Stop thinking about your servers as pets, and start thinking about them as cattle” - Las Vegas INTEROP

◎ Definition◎ Architecture◎ Cost + Effective

Towards the cloud: VDI

What is VDI ?

◎ SBC - Server Based Computing + Thin Client◎ Virtualization: decoupling OS from HW, sharing HW

resources.◎ VDI = SBC + Virtualization

→ SBC + Cloud Computing

“THE FACT: YOU CAN DEPLOY VDI

WITHOUT CLOUD.Tech Stack Example:

XRDP + AD (CB) - KVM - RDP

Traditional Architecture

Connection Broker

Missions:

◎ Handle requests from clients.◎ Manager all current working sessions:

◉ Session recovery (network interrupt..)◉ Session control: CRUD → require Agent deployed

in VM.◎ Integrated with Cloud via API/WS:

◉ Handle connection to Cloud, ensure HA◉ Handle all working VM state: on/off, migrate◉ Deliver virtual disk/storage on-demand to client.

◎ Remote Machine: Grant access for client to remote to VM.

◎ Remote App: using remote apps likely in native environment.

◎ Quota: ? VM per users, ? session per users.◎ Scheduler◎ Multi-tenancy support.

Client & Agent

Client:

◎ Native App deployed in Thin/Zero Client (linux, win), Web App (via HTML5 supported)

◎ Show user’s resources: apps, VDI VMs◎ Remote to user’s resources

Agent:

◎ Deployed in VDI VMs.◎ Interconnect with CB for handling remote session.

Connection Broker (CB) Problem

1. Very high traffics:◉ Between CB and Cloud service endpoint: for

command and query tasks.◉ Between CB and VM (Agent): init session, grant

access and apply policy (for multi-tenant purpose).

◉ Between CB and Client: update user’s resource (VM state, session status), connection status.

◉ Between Client and VM (Agent): remote desktop, remote apps → huge bandwidth consumer.

2. Data consistency:◉ VM State: conflict between CB (scheduler,

manual), client, cloud endpoint.◉ Session status: conflict between CB, VM and

client

→ Approach: applying some cloud design-patterns.

VDI Biggest Problem

1. IOPS◉ Many users read/write from 01 storage system ?

2. Network bandwidth◉ Depend on remote protocol (Spice, ICA, PCoIP,

RDP, VNC,..)3. User Experience

◉ Login/Logout time◉ Using virtual app◉ Using remote environment

VDI Biggest Problem Solution

1. IOPS◉ Deploy multi-tier and auto-tiering storage system◉ Caching (in-memory,..)

2. Network bandwidth◉ Using UDP◉ Implement compression algorithm (LZMA)◉ Security concern.

3. User Experience◉ Applying cloud pattern: throttling, retry, external

configuration store, runtime reconfig, health endpoint monitoring for optimizing connection broker

VDI Flow

1. Clients send request (RD, RA) to CB for working in VDI VM.

2. CB Session Manager send request to Cloud endpoint for ensuring VM is starting and performing correctly. If yes, create session by sending request to agent. If no, make new request for deployed new VM.

3. CB send remote parameters (display, channel enable…) to client.

4. Agent VM send session’s status to CB (ready, fail, creating)

5. If session status is ready, CB announce to Client.

6. Client grab session id, remote parameter and start working with VDI VM.

#1 Problem: Too many duplicate requests between CB services → waste of resources.

◉ CB monitor cloud status (VDI VM, Cloud service..) → periodically send health-check request to Cloud service.

◉ CB monitor session status → periodically send health-check request to VM Agent.

◉ Cloud Service must deploy multiple VMs from the same images.

→ Monitoring Solution ??

#1 Solution:

1. Apply Event Sourcing pattern: to make CB become eventually consistent and store historical data operations.

E.g: VM State change event, Session status change event, cloud

service status change event..

2. Applying Cache-aside pattern: caching all VDI VM state, Cloud service status, session status.

or:

Applying Health Endpoint-Monitoring Pattern.

Event Sourcing

When: Viewing/restoring from historical record of data operations and restrict data update conflicts.

What: Implement append-only event store for publishing and replaying. Event are immutable and simple object.

How: (ITLC SA - CQRS)

Cache-aside

When: Deploying app/service in PaaS that do not support caching.

What: Implement local app read/write through caching mechanism

How:

1. Determine whether the item is

currently held in the cache.

2. If the item is not currently in

the cache, read the item from

the data store.

3. Store a copy of the item in

the cache.

Event Sourcing + Cache-aside

Health Endpoint Monitoring

When: complex system deploying in distributed environment, including external services/agents

What: implement health monitoring to ensure they are available and performing correctly.

How:

Event Sourcing + Cache-aside vs Health Endpoint Monitoring

Health Endpoint Monitor

◎ Amount of requests depend on Monitoring solution

◎ Lower performance (passive check)

◎ Data consistency◎ Easier and flexible to

integrate with Throttling pattern or Auto-Scaling.

ES+Cache

◎ Lower rate request to Cloud API and Agent

◎ Higher performance (active change)

◎ Eventually consistency◎ Provide only current state

of data → for improving, using CQRS pattern.

CQRS

When: Traditional CRUD model can not handle large query (read/write), hardly scale and ensure data consistency.

What: Segregate operations that read data from operations that update data by using separate interfaces. Integrated with ES as write model.

How: (ITLC SA - CQRS)

Issues of Cache-aside

◎ Determine which data to cache and where to store all caches sometimes is very hard.◉ What if I want to “cache” all virtual app in virtual

machines to improve UX ? → Atlantis Computing Tech.

◉ in-memory cache or nosql ? (reduce IOPS or consume more memory)

#2 Problem: What if error occur in VDI Flow (6 steps) ?◉ CB forward ready session to client but VM state is

corrupt ?◉ Client deploy/restart/shutdown VDI VM but Cloud

service is not available.◉ Session is initializing but VDI VM OS have

BSOD/Kernel Panic.

→ Data inconsistency.

#2 Solution:

1. Apply Retry pattern: fault tolerance mechanism that repeat tasks which expect to be success.

2. Applying Circuit-Breaker pattern: fault tolerance mechanism that prevent system repeat task which is likely to fail .

3. Applying Compensating Transaction pattern: reverse data back to old state.

Retry

When: deploying services/apps that functions depend on actions which expect to be success.

What: implement an mechanism handle failure actions.

How:

Circuit Breaker

When: prevent application/service from performing actions that is likely to fail.

What: simulation circuit mechanism which have 3 state for handling failure action.

How: • Closed: route request to services/apps; maintain failure by a counter.• Open: Requests from the application fails immediately; return exception. • Half-Open: A limited number of requests are allowed to pass through and invoke the operation. Change to Closed state if reach success counter.

Retry co-op Circuit Breaker issues

◎ Define which task is successful expectation or likely failure.

E.g: All tasks interact with Cloud services → likely failure; all task interact with VM agent → successful expectation. (Scope of interaction)

◎ Define the correct time-out for heavy task.

E.g: deploy VM task need longer time-out than start VM task.

◎ Define correct threshold for retry (retry counter) and circuit (success/failure counter)

Compensating Transaction

When: trace path/restore state of data in services/apps that have many operations to data store.

What: using workflow model to define an operation as step, also define counter operation for each step model. (Ref Mistral Cloud workflow engine)

E.g:

◎ Create - Delete◎ Plus - Minus◎ Multiply - Divide

Fault Tolerance in VDI CB

#3 Problem: Update NEW system configuration (for Private Cloud, CB) require restarting services/system.

◉ CloudStack require restarting all services, OpenStack require restarting relative services.

→ Downtime risk !

#3 Solution:

1. Apply External Configuration Store pattern.2. Apply Runtime configuration pattern.

Runtime Reconfiguration

When: Minimize downtime of applications when updating configurations. (ref plugin architecture)

What: implement configuration-change event handler, keep configuration outside of deployed app.

How:

External Configuration Store

When: Sharing configurations between multiple app/instances/services

What: Implement centralize configuration store, can be integrated with service discovery, health endpoint monitoring and retry pattern

How:

How to reconfigure system in runtime ?

◎ Using plugin architecture → require independent plugin, hardly design.

◎ VDI CB using interpreter programming language: PHP, python.

#4 Problem: CB Server/VDI VM overload resources.◉ HW upgrade for CB server ?◉ Increase VDI VM HW resources (require downtime -

restart VM) ?

#4 Solution:

1. Virtualize CB Server !!!2. Apply Throttling pattern co-op with Auto-Scaling feature in

Cloud.

or:

Apply some design pattern for distributed processing requests (messages) → reference

◉ Competing Consumer◉ Priority Queue◉ Leader Election

Throttling

When: avoid resource overload, optimize performance for higher priority services/apps.

What: disable features/service that have lower priority, integrated with health endpoint monitoring.

How:

Throttling

Integrated with Auto-Scaling in Cloud

Auto Scaling

Server Overload:◎ Increase resources (CPU, RAM, Storage..) that system

load take responz → Vertical Scaling (1)◎ Buy new server (system?) and share loads between them

→ Horizontal Scaling (2)

(1)/(2) + Automation → Auto Scaling.

Some product:

Amazon Cloud Watch + Auto Scale; EXA TrueCloud.

Hyper-V (Dynamic Memory), VMWare (Memory Overcommit)

Citrix NetScaler (Hardware)

Auto Scaling components

◎ Monitoring System: metrics (counters). <2 approaches: agentless, agent>

◎ Decision Support mechanism: rules (conditions), rule-conflict resolver.

◎ Scaling engine: action trigger (scale up/down, out/in).

Auto Scaling Monitoring

◎ Metrics (Counter): amount of which resource you want to check in realtime. Used for measuring and calculating based on the scaling policies (rules)

◎ Agentless: hypervisor based.◉ E.g: libvirt API (KVM), XAPI RRD (Xen)◉ Pros: Fast, security.◉ Cons: The metrics are too simple (CPU, MEM,

Storage – FullVirt; Network RX/TX – ParaVirt)◎ Agent:

◉ E.g: SNMP ..◉ Pros: Flexible and easily to manage◉ Cons: Slow, sometimes can break user’s policies.

Auto Scaling DSDecision Support Machine: grab the output from monitoring, based on user’s policies (rules) and calculate the most satisfied actions.E.g about Rules:• if CPU > 80% then scale-up CPU to 4 cores 3.7GHz • if Memory < 30% then scale-in to <n-1> VMs → Why we need DS ?

Look at following mesh case:

• Input metrics: CPU, MEM, Concurrent Connections. (CCC)

• Rules:

If CPU > 80% then Scale-out plus 01 VMs and LB between them.

If Mem > 85% then Scale-out plus 01 VMs and LB between them.

If CCC > 1000 then Scale-up CPU to 4 core 3.5GHz.If CPU < 20% then Scale-in 01 VMs.If Mem < 25% then Scale-in 01 VMs.

• So:What if 01 VM have 80% CPU load and 10% Mem ?

Auto Scaling DS

DS need a conflict resolver.Approaches:• Rule-conflict check before apply auto-scaling: NetScaler, IBM Cloud.• Using some algorithm for decision support:

• Neural Network• FuzzyLogic• Neuro-Fuzzy(ANFIS)

Auto Scaling Scaling Engine

Based on cloud computing platform you use/manage. • OpenStack: Heat, Ceilometer, Mistral.• Docker: Marathon, Swarm, Mesos.• CloudStack: VR. • Azure (Hyper-V): Dynamic Memory API.• AWS: CloudWatch & Auto-Scaling.• VMWare VDI/Cloud: Memory Overcommit.

#5 Problem: Deploy VDI solution for different departments whose identity/authorization system is not the same.

◉ Migrate old identity data to VDI identity system and abandon the old one ?

◉ Implement new module in VDI identity system to interact with the old mechanism ?

◉ Implement some IdM solutions (SSO, OpenID, STS..) for both VDI and old identity system ?

#5 Solution:

1. Federated Identity pattern (~ Federate Cloud)

Federated IdentityWhen: Deploy app (multiple services) in multiple cloud (IaaS) or based on multiple platform (SaaS).

What: Implement an authentication mechanism that use federated identity. Separating user authentication from the application code, and delegating authentication to a trusted identity provider

How:

Federated Identity in VDI Env

1. Authenticate with OWN identity provider (e.g. AD/LDAP) and receive issued token.

2. Forwards this token to the CB federation provider (e.g. KeyStone). Get back token valid for the VDI init phase.

3. Federation provider transform on the claims in the token into VDI CB authorize mechanism.

4. Client apply authorization rules of VDI remote access with new token from federation provider.

Benefit ?

Benefits

◎ Reduce cost (IN THE FUTURE)◉ HW Maintain◉ Troubleshoot problem (network, OS..)◉ Human resources

◎ Centralize management (network, security, resources, session)

◎ Cloud benefits (HA, HS..)

Cost

The initial cost is often VERY HIGH

(based on system design, application design and how big is your organizer )

The term “Cost Saving/Reduce cost” will appear in at least 1 year after deploying VDI

Which cost to reduce:

◎ HW maintain◎ PC maintain◎ Human resources (network, sysads)◎ Time (troubleshooting time, maintain time..)

VDI Report

‘The state of the VDI and SBC union’ report, running from Feb 12 2015. About 519 participants completed the full survey. Participants come from US, UK, The Netherlands, Germany and 20+ other countries.

VDI Report

VDI Process

POC

TCO (Total cost of ownership) Calculate

Deploying

VDI TCO

TOWARD THE CLOUD

Ref

◎ Cloud Design Pattern - MS◎ AWS Cloud Design Pattern [1]◎ Pacific Asia Cloud report - 4th Meetup VietStack [2]◎ VMWare Cloud index [3]◎ Microservices vs Enterprise service bus by voxxed [4]◎ Plugin Architect in Wikipedia◎ IdM in Wikipedia◎ ANFIS in Wikipedia

[1]: http://en.clouddesignpattern.org/ [2]: http://vietopenstack.org/2015/05/09/tong-quan-thi-truong-dien-toan-dam-may-tai-chau-a-thai-binh-duong-va-viet-nam/ [3]: http://info.vmware.com/content/APAC_APJ_Enterprise_Cloud_Index_2013 [4]: https://www.voxxed.com/blog/2015/01/good-microservices-architectures-death-enterprise-service-bus-part-one/

Thanks!

Any questions?

Competing Consumer

Priority Queue

When: services/apps have multiple kind of messages which have time/resource consumer differential.

What: mark priority and elect suitable consumers for each messages.

How:

Leader Election

When: multiple instances/services do the same task and make data inconsistency

What: select one instance as leader and command other instances/services.

How:1: An instance request mutex from BlobDistributedMutex object and is elected the leader.

2: Other instances request mutex to run task and are blocked.

3: The leader runs a task that coordinates the work of the subordinate instances.

4: The mutex in the leader periodically renews the lease.

Recommended