43
Analytical Compute Grid (ACG) Elastic “Big Data” Infrastructure by Natasha Gajic Big Data on Open Cloud 6/20/22

Rackspace Analytical Compute Grid (ACG)

Embed Size (px)

DESCRIPTION

Rackspace’s Enterprise Business Intelligence group (EBI) was looking for a cost-effective way to support the reporting and information needs of its internal users, which include business and operations personnel. It was also looking to scale out new infrastructure in order to meet their increasing business demands, house increasing amounts of data, and customize the collection of data, while seeking a way to move away from their legacy Data Warehouse solution. To do this, Rackspace built the Analytical Compute Grid (ACG) by using Hadoop, Cassandra and PostgreSQL with an OpenStack cloud. Read more about it in this presentation.

Citation preview

Page 1: Rackspace Analytical Compute Grid (ACG)

April 12, 2023

Analytical Compute Grid (ACG)

Elastic “Big Data” Infrastructure

by Natasha Gajic

Big Data on Open Cloud

Page 2: Rackspace Analytical Compute Grid (ACG)

2RACKSPACE® HOSTING | WWW.RACKSPACE.COM

Rackspace’s EBI Environment

Current EnvironmentWindows and Linux

operating systemsOracle and Microsoft

databases solutionsMicrosoft and Oracle

replication technologySSISInformaticaDedicated serversRapid data set growth

“Big Data” ProblemCost of purchasing

additional licensesTime required to set up

new hardwareIncreased demand for DBA

resourcesSystem performanceSystem scalabilityCapacity

Page 3: Rackspace Analytical Compute Grid (ACG)

3RACKSPACE® HOSTING | WWW.RACKSPACE.COM

Analytical Compute Grid (ACG) Features

•Host ever growing set of data•Quick data collection and retrieval•Rapid scalability•Ease of maintenance•Provide standard data access API

Page 4: Rackspace Analytical Compute Grid (ACG)

4RACKSPACE® HOSTING | WWW.RACKSPACE.COM

Analytical Compute Grid (ACG) Features

•Ability to provide variety of storage types:

• Columnar

• Relational

• HDFS

•Enable users to select optimal storage type for information collected

•Leverage Rackspace® Private Cloud powered by OpenStack® and open source technology

Page 5: Rackspace Analytical Compute Grid (ACG)

5RACKSPACE® HOSTING | WWW.RACKSPACE.COM

Analytical Compute Grid (ACG) Quality Attributes

Page 6: Rackspace Analytical Compute Grid (ACG)

6RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

High Level Architecture

Page 7: Rackspace Analytical Compute Grid (ACG)

7RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack® 

Page 8: Rackspace Analytical Compute Grid (ACG)

8RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Image

Page 9: Rackspace Analytical Compute Grid (ACG)

9RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Database Engine Selection

Columnar Cassandra

Relational PostgreSQL

HDFS Hadoop

Page 10: Rackspace Analytical Compute Grid (ACG)

10RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Node

Page 11: Rackspace Analytical Compute Grid (ACG)

11RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Node

Page 12: Rackspace Analytical Compute Grid (ACG)

12RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Node

Page 13: Rackspace Analytical Compute Grid (ACG)

13RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Node

Page 14: Rackspace Analytical Compute Grid (ACG)

14RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Controller

Page 15: Rackspace Analytical Compute Grid (ACG)

15RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Controller

Page 16: Rackspace Analytical Compute Grid (ACG)

16RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Controller

Page 17: Rackspace Analytical Compute Grid (ACG)

17RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

API

Page 18: Rackspace Analytical Compute Grid (ACG)

18RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Indexing Structure

Page 19: Rackspace Analytical Compute Grid (ACG)

19RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Indexing Structure

Page 20: Rackspace Analytical Compute Grid (ACG)

20RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Indexing Structure

What is ACG Indexing Structure?

• System entry point

• Set of pointers ultimately addressing database entities

Page 21: Rackspace Analytical Compute Grid (ACG)

21RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Indexing Structure

What is ACG Indexing Structure?

• System entry point• Set of pointers ultimately addressing database entities

Where is Indexing Structure Located?

• It is a part of ACG so it resides on Open Cloud• ACG Controller manages Indexing Structure

Page 22: Rackspace Analytical Compute Grid (ACG)

22RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Indexing Structure

What ACG Indexing Structure Enables?

• Splitting of large data sets across many instances• Query parallelization• Controlled data store size• Optimal data store configuration• Uniform access to data residing in various storage types• System scalability as it expands horizontally and vertically to address ever growing data set

Page 23: Rackspace Analytical Compute Grid (ACG)

23RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Quality Attributes

Page 24: Rackspace Analytical Compute Grid (ACG)

24RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Quality Attributes - Performance

Rackspace® Private Cloud powered by OpenStack®

Creates ACG node in 30 secondsCreates ACG nodes concurrentlyRe-size ACG nodes adding CPUs

Page 25: Rackspace Analytical Compute Grid (ACG)

25RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Quality Attributes - Performance

Rackspace® Private Cloud powered by OpenStack®

Creates ACG node in 30 secondsCreates ACG nodes concurrentlyRe-size ACG nodes adding CPUs

ACG

Indexing structure and controlled data set size allow for: Quick data distribution Query parallelization

Page 26: Rackspace Analytical Compute Grid (ACG)

26RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Quality Attributes – Availability

Rackspace® Private Cloud powered by OpenStack®

Rapidly replace failed ACG nodes

Page 27: Rackspace Analytical Compute Grid (ACG)

27RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Quality Attributes – Availability

Rackspace® Private Cloud powered by OpenStack®

Rapidly replace failed ACG nodes

ACG

Deploys data store native availability mechanisms (replication, data distribution…)

Page 28: Rackspace Analytical Compute Grid (ACG)

28RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Quality Attributes – Maintainability

Rackspace® Private Cloud powered by OpenStack®

Adding ACG nodes expands: Storage capacity CPU power MemoryNo DBA or system administrators activity required

Page 29: Rackspace Analytical Compute Grid (ACG)

29RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Quality Attributes – Maintainability

Rackspace® Private Cloud powered by OpenStack®

Adding ACG nodes expands: Storage capacity CPU power RAM No DBA or system administrators activity required

ACG

Controlled data set size enables: Optimal and stable data store configuration Reducing demand for managing data store objects Stable query execution plans

Page 30: Rackspace Analytical Compute Grid (ACG)

30RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Quality Attributes – Flexibility

ACG

Variety of storage types:Columnar – Cassandra : time series dataRelational – PostgreSQL : relational dataHDFS – Hadoop : un-structured data

Ability to select optimal storage type for individual use case

Page 31: Rackspace Analytical Compute Grid (ACG)

31RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Quality Attributes – Usability

ACG

Standard interfaces: SQL language JDBC API ODBC

ACG Management Console

ACG Monitoring Console Loader utility implementing: Bulk Loader Insert Loader

Page 32: Rackspace Analytical Compute Grid (ACG)

32RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Current State

Page 33: Rackspace Analytical Compute Grid (ACG)

33RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Current State

ACG Controller

•ACG Manager•Rule Engine•Node Manager•ACG Management Console•ACG Monitoring

Columnar Implementation

•Data Store Controller•JDBC extended to work with supercolumn•Loader integrated with Informatica

Relational Implementation

•Data Store Controller•JDBC driver extended with distributed query rewrite•Loader integrated with Informatica•ODBC (In Progress)

HDFS Implementation

•Will start soon

Page 34: Rackspace Analytical Compute Grid (ACG)

34RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG on Rackspace® Private Cloud powered by OpenStack®

Rackspace Use Case

Page 35: Rackspace Analytical Compute Grid (ACG)

35RACKSPACE® HOSTING | WWW.RACKSPACE.COM

• Subject:

• Complex availability calculation sourcing 3 months of monitoring data and creating 1 billion records in initial calculation

ACG on Rackspace® Private Cloud powered by OpenStack®

Rackspace Use Case

Page 36: Rackspace Analytical Compute Grid (ACG)

36RACKSPACE® HOSTING | WWW.RACKSPACE.COM

• Environment 1

• Data Warehouse Microsoft SQL server database• SSIS data loading• SQL server with 24 CPUs and 250GB RAM was dedicated to the initial calculation

• SQL server stored procedure performed the calculation

• Source and result are stored in traditional data warehouse structure

ACG on Rackspace® Private Cloud powered by OpenStack®

Rackspace Use Case

Page 37: Rackspace Analytical Compute Grid (ACG)

37RACKSPACE® HOSTING | WWW.RACKSPACE.COM

• Environment 2

• ACG running two Cassandra clusters 4 nodes each

• Informatica with Cassandra bulk loader• Each ACG node has 2CPUs and 8GB RAM• Java program running on instance with 4CPUs and 8GB RAM

• Source and result are stored in columnar structure suitable for time series data

ACG on Rackspace® Private Cloud powered by OpenStack®

Rackspace Use Case

Page 38: Rackspace Analytical Compute Grid (ACG)

38RACKSPACE® HOSTING | WWW.RACKSPACE.COM

• Calculation Duration

•Microsoft SQL Server lasted 5 days•ACG calculation completed in 3.5 hours

• Storage Size• Microsoft SQL server 500GB •ACG 20 GB

• Complexity of the calculation•Columnar data store is optimal for time series data. Sourcing from columnar data store resulted in relatively simple Java calculation process comparing to SQL server stored procedure

ACG on Rackspace® Private Cloud powered by OpenStack®

Rackspace Use Case - Result

Page 39: Rackspace Analytical Compute Grid (ACG)

39RACKSPACE® HOSTING | WWW.RACKSPACE.COM

• Selecting optimal data store for use case resulted in:

• Substantial performance improvement• Reduced storage demand•Simplified processes•Ability to process terabytes of data per day close to real-time and on-demand

•Improved trending and reporting:• enhances support capabilities

• improved Rackspace customer experience

• Significant cost reduction

ACG on Rackspace® Private Cloud powered by OpenStack®

Rackspace Use Case - Conclusion

Page 40: Rackspace Analytical Compute Grid (ACG)

40

RACKSPACE® HOSTING | 5000 WALZEM ROAD | SAN ANTONIO, TX 78218

US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COM

RACKSPACE® HOSTING | © RACKSPACE US, INC. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN THE UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM

Page 41: Rackspace Analytical Compute Grid (ACG)

41RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG UI

Page 42: Rackspace Analytical Compute Grid (ACG)

42RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG UI

Page 43: Rackspace Analytical Compute Grid (ACG)

43RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG UI