Upload
brian-holt
View
257
Download
10
Tags:
Embed Size (px)
Citation preview
BITS PilaniHyderabad Campus
BITS Pilani presentationD. PowarLecturer,
BITS-Pilani, Hyderabad Campus
BITS PilaniHyderabad Campus
SSZG527
Lecture 18
Cloud Computing
BITS Pilani, Hyderabad Campus
Lectures
Lecture No Objectives
Lecture 10 Capacity management
Lecture 11 Introduction to PAAS (Drupal, Wolf frameworks, force.com), 5 Principles of UI Design by AWS: MADPO Principles
Lecture 12 RAID (Redundant Array of Independent Disks)
Lecture 13 MapReduce - distributed programming frame work, Pig, Hive
Lecture 14 Distributed File System (GFS,HDFS), cloud storage
Lecture 15 Multi-Tenancy, 4 levels multi-tenancy
Lecture 16 Cloud security
Lecture 17 OpenStack – a cloud computing operating system
BITS Pilani, Hyderabad Campus
MapReduce
BITS Pilani, Hyderabad Campus
Map:– Accepts input key/value pair– Emits intermediate key/value pair
Reduce – Accepts intermediate key/value* pair– Emits output key/value pair
Map+Reduce
Very big
dataResult
MAP
REDUCE
BITS Pilani, Hyderabad Campus
Data type: key-value records
Map function:
(Kin, Vin) list(Kinter, Vinter)
Reduce function:
(Kinter, list(Vinter)) list(Kout, Vout)
MapReduce Programming Model
BITS Pilani, Hyderabad Campus
let map(k,v) =emit (k.toUpper(), v.toUpper() )– (“foo”, “bar”) -> (“FOO”,”BAR”)– (“key2”,”data”) -> (“KEY2”,”DATA”)
let map(k,v)= foreach char c in v :emit (k,c)– (“A”,”cats”)->(“A”,”c”),(“A”,”a”),(“A”,”t”),(“A”,”s”)– (“B”,”hi”) ->(“B”,”h”), (“B”,”i”)
let map(k,v)= if (isPrime(v)) then emit (k,v)– (“foo”,7) -> (“foo”,7)– (“test”,10) -> (nothing)
let map(k,v)= emit(v.length,v)– (“hi”,”test”)->(4,”test”)– (“x”,”quux”) ->(4,”quux”)
Examples
BITS Pilani, Hyderabad Campus
Example: Word Count
def mapper(line): foreach word in line.split(): output(word, 1)
def reducer(key, values): output(key, sum(values))
BITS Pilani, Hyderabad Campus
Word Count Execution
the quickbrown
fox
the fox ate the mouse
how now
brown cow
Map
Map
Map
Reduce
Reduce
brown, 2
fox, 2how, 1now, 1the, 3
ate, 1cow, 1mouse,
1quick, 1
the, 1brown, 1
fox, 1
quick, 1
the, 1fox, 1the, 1
how, 1now, 1
brown, 1ate, 1
mouse, 1
cow, 1
Input Map Shuffle & Sort Reduce Output
BITS Pilani, Hyderabad Campus
http://hadoop.apache.org/docs/stable/mapred_tutorial.html
http://wiki.apache.org/hadoop/WordCount
Word Count example code (java)
BITS Pilani, Hyderabad Campus
Distributed File Systems
BITS Pilani, Hyderabad Campus
GFS stores a huge number of files, totaling many terabytes of data
Individual file characteristics– Very large, multiple gigabytes per file– Files are updated by appending new entries to the
end (faster than overwriting existing data)– Files are virtually never modified (other than by
appends) and virtually never deleted.– Files are mostly read-only
The Google File System
BITS Pilani, Hyderabad Campus
Divide files in large 64 MB chunks, and distribute/replicate chunks across many servers.
A couple of important details:– The master maintains only a (file name, chunk server) table in main memory:
minimal I/O– Files are replicated using a primary-backup scheme; the master is kept out of the
loop
Google File System
BITS Pilani, Hyderabad Campus
Hadoop's Distributed File System is designed to reliably store very large files across machines in a large cluster.
It is inspired by the Google File System. Hadoop DFS stores each file as a sequence of
blocks, all blocks in a file except the last block are the same size.
Blocks belonging to a file are replicated for fault tolerance. The block size and replication factor are configurable per file. Files in HDFS are "write once" and have strictly one writer at any time.
HDFC??
BITS Pilani, Hyderabad Campus
Hadoop Distributed File System – Goals:• Store large data sets• Cope with hardware failure• Emphasize streaming data access
BITS Pilani, Hyderabad Campus
Terminology differences:– GFS master = Hadoop namenode– GFS chunkservers = Hadoop datanodes
Functional differences:– No file appends in HDFS (planned feature)– HDFS performance is (likely) slower
From GFS to HDFS
BITS Pilani, Hyderabad CampusAdapted from (Ghemawat et al., SOSP 2003)
(file name, block id)
(block id, block location)
instructions to datanode
datanode state(block id, byte range)
block data
HDFS namenode
HDFS datanode
Linux file system
…
HDFS datanode
Linux file system
…
File namespace/foo/bar
block 3df2
Application
HDFS Client
HDFS Architecture
BITS Pilani, Hyderabad Campus
Managing the file system namespace:– Holds file/directory structure, metadata, file-to-
block mapping, access permissions, etc. Coordinating file operations:
– Directs clients to datanodes for reads and writes– No data is moved through the namenode
Maintaining overall health:– Periodic communication with the datanodes– Garbage collection
Namenode Responsibilities
BITS Pilani, Hyderabad Campus
Cloud storage is a model of networked online storage where data is stored in virtualized pools of storage
Companies operate large data centers, and people who require their data to be hosted, buy or lease storage capacity from them
Cloud storage services may be accessed through a web service application programming interface (API), a cloud storage gateway or through a Web-based user interface
It is difficult to pin down a canonical definition of cloud storage architecture, but object storage is reasonably analogous
Cloud???
BITS Pilani, Hyderabad Campus
Multi-tenanancy
1. ad-hoc /custom
2. configurable single tenant
3. configurable multi tenant
4. configurable multi tenant (scalable)
basic SaaS maturity model
BITS Pilani, Hyderabad Campus
Each customer has their own custom vision of the software
Represents a enterprise data center where there are multiple instances and versions of the software
Each customer would have their own binaries, as well as their own dedicated processes for implementation of the application
Disadv: Difficulty in Management: Each customer would need their own management support
Ad-hoc /customizable instances
BITS Pilani, Hyderabad Campus
All customers share the same vision of the software (one copy for each customer)
adv: Easy Management: Single copy of the software
Configurable instances
BITS Pilani, Hyderabad Campus
All customers share the same version of the software (only single copy among all customers)
adv: Easy Management: running of only single instance
Configurable multi-tenant efficient instances
BITS Pilani, Hyderabad Campus
All customers share the same version of the software (only single copy among all customers)
Software is hosted on a cluster of computers Hence, allows the capacity of the system to
scale almost limitlessly Thus, increase in no. of customers and capacity
as well Ex: Gmail, yahoo mail, etc Disadv: Shared storage problem
Configurable multi-tenant efficient instances (scalable)
BITS Pilani, Hyderabad Campus
share isolate
vs
business model (can I monetise?)architectural model (can I do it?)operational model (can I guarantee SLAs?)
access control
meta-data
BITS Pilani, Hyderabad Campus
Unlike traditional computer systems, the tenant would specify the valid users, and cloud service provider would authenticate them
Two basic approaches are used Centralized authentication Decentralized authentication
Authentication
BITS Pilani, Hyderabad Campus
Centralized authentication: Authentication is performed using a centralized user database Cloud admin gives the tenant admin rights to manage user
accounts for that tenant Multiple (two) sign-on service Given self service nature of the cloud, it is more generally
used
Decentralized authentication: Each tenant maintains their own user database, and needs to
deploy a federation service that interface between that tenant’s authentication framework and the cloud system’s authentication service
Single sign-on service
Authentication (contd..)
BITS Pilani, Hyderabad Campus
Two major resource that need to be shared are storage and servers
Sharing storage resources (two types) File system Databases
Since file system storage is well known mechanism, we will restrict our discussion to database storage
Resource sharing
BITS Pilani, Hyderabad Campus
There are two methods of sharing data in a single database Dedicated tables per tenant Shared table
Dedicated tables per tenant: Each tenant stores their data in a separate set of tables
different from other tenants ex: www.mygarage.com portal Shows the way auto repair stores may store each table
as separate file
Database
BITS Pilani, Hyderabad Campus
Dedicated tables per tenant:
Car license Service Cost
Car license Service Cost
Car license Service Cost
Best garage
Friendly garage
Honest garage
BITS Pilani, Hyderabad Campus
The data for all the tenant is stored in the same table in different rows.
One of the column in the table identifies a tenant to which a particular row belongs
It is more space efficient than previous approachA auxiliary table, called a metadata table, stores
information about the tenants
Shared table:
BITS Pilani, Hyderabad Campus
Shared table (contd..)
Tenant ID Car license Repair Cost
1
2
2
1
3
2
Data table 1
Tenant ID Data
1 Best garage
2 Friendly garage
3 Honest garage
Metadata table 1
BITS Pilani, Hyderabad Campus
It is important for the cloud infrastructure to support customization of the stored data, since it is likely that different tenants may want store different data in their tables
In Dedicated table method, each tenant has their own table, and therefore can have different schema
Difficulty is with shared table approach Three method used
Pre-allocated columns Name-value pair XML method
Data customization
BITS Pilani, Hyderabad Campus
Space is reserved in the tables for custom columns, which can be used by tenants for defining new columns
Salesforce.com reserves 500 columnsSome of the tenants may not use these columns
Disadv: There could be a lot of wasted space
Pre-allocated columns
BITS Pilani, Hyderabad Campus
Pre-allocated columns
Tenant ID Car license Service Cost Custom1 Custom2
1
2
2
1
3
2
Data table 1
Tenant ID Tenant name Custom1 name Custom1 type
1 Best garage Service rating int
2 Friendly garage Service manager string
3 Honest garage
Metadata table 1
BITS Pilani, Hyderabad Campus
The standard table will have an extra column which is a pointer to a table of name-value pair, which indicates additional custom fields for a record
The table name-value pair is also called as a pivot table
This method overcomes the deficiencies of storage wastage from previous method
Name-value pair
BITS Pilani, Hyderabad Campus
Name-value pair (contd..)Tenant ID Car license Service Cost Name-value pair record1 27522132
Name-value pair Name ID Value
275 15 5.5
Name ID Name Type
15 Service rating int
Service manager string
Tenant ID Data
1 Best garage
2 Friendly garage
3 Honest garage
Metadata table 2Metadata table 1
Data table 1
Data table 2
BITS Pilani, Hyderabad Campus
OpenStack – a cloud computing operating system
BITS Pilani, Hyderabad Campus
Nova - Compute Service
Swift - Storage Service
Glance - Imaging Service
Keystone - Identity Service
Horizon - UI Service
Quantum - Network connectivity Service
Cinder - Block Storage Service
Ceilometer - billing, benchmarking, scalability, and statistics purposes
Heat: Orchestrates multiple composite cloud applications
9 core components of OpenStack (Havana)
BITS Pilani, Hyderabad Campus
OpenStack conceptual architecture
BITS Pilani, Hyderabad Campus
Table 1.1. OpenStack current services (Havana)Service Project name Description
Dashboard Horizon Enables users to interact with OpenStack services to launch an instance, assign IP addresses, set access controls, and so on.
Compute Nova Provisions and manages large networks of virtual machines on demand.
Networking NeutronEnables network connectivity as a service among interface devices managed by other OpenStack services, usually Compute. Enables users to create and attach interfaces to networks. Has a pluggable architecture that supports many popular networking vendors and technologies.
StorageObject Storage Swift Stores and gets files. Does not mount directories like a file server.Block Storage Cinder Provides persistent block storage to guest virtual machines.
Shared services
Identity Service Keystone Provides authentication and authorization for the OpenStack services. Also provides a service catalog within a particular OpenStack cloud.
Image Service Glance Provides a registry of virtual machine images. Compute uses it to provision instances.
Metering/Monitoring Service
Ceilometer Monitors and meters the OpenStack cloud for billing, benchmarking, scalability, and statistics purposes.
Higher-level services
Orchestration Service
HeatOrchestrates multiple composite cloud applications by using either the native HOT template format or the AWS CloudFormation template format, through both an OpenStack-native REST API and a CloudFormation-compatible Query API.
BITS Pilani, Hyderabad Campus
Capacity management Introduction to PAAS (Drupal, Wolf frameworks,
force.com), 5 Principles of UI Design by AWS RAID (Redundant Array of Independent Disks) MapReduce - distributed programming frame work, Pig,
Hive Distributed File System (GFS,HDFS), cloud storage Multi-Tenancy, 4 levels multi-tenancy Cloud security OpenStack – a cloud computing operating system
Summary