Upload
dharmesh-kakadia
View
119
Download
1
Tags:
Embed Size (px)
Citation preview
What we do● Run an Indian Languages Search Engine
● Research○ Information Extraction○ Information Retrieval○ Information Access○ Virtualization and Cloud
● Users of ○ OpenStack○ Hadoop○ and lot of other FOSS
Problems● Provisioning
○ Adhoc○ Time consuming○ Unmanaged
● User Management○ No resource accounting○ Access Control○ Usage Restriction
● Storage○ Data reliability○ Duplication
More Problems...● Cluster
○ Terrible Resource Utilization○ New deployment => Too much time○ Data Redundancy○ Non-optimal deployments
● Academic○ No cloud platform for experimentation○ Large Scale sandboxed resource provisioning for
students.
OpenStack(KVM)● 7 Compute nodes
(8GB, quad-core)● 1 nova-volume(2 TB,
Raid-1)
Swift● 3 storage nodes (2TB
each)
OpenStack(LXC)● 16 Compute nodes
(6GB, dual core)
Provisioning● Pre-configured images to quickly get started.
● VM of any capacity available at any time( 2 a.m. Sunday morning)
● Snapshots
User Management● Resource restrictions using Quota
● Project based collaboration and private resources
● Usage monitoring
StorageThis wasn't easy. We experimented with● nova-volume
● Swift(diablo)
● GlusterFS
● Swift(Folsom)(current)
Storage● Hadoop compatible distributed storage
● Glance image store
● Desktop backup utility using CloudFuse
● Data reliability
● No more Data Fragmentation
OpenStack in Academia● Research
○ Inter cloud migration○ Inter cloud scheduling○ Performance Evaluation
● Resource provisioning for course assignments and projects.○ 3 courses○ 350+ students○ 20+ projects
HadoopStack
● Big Data processing on Demand
● Entire ecosystem for Big Data - Hadoop Family, Spark, Mahout, R
● Multi-Cloud - OpenStack and AWS.