36
[email protected] Designing Big Data Systems Like a Pro Smart Decisions: An Architecture Design Game Humberto Cervantes, Serge Haziyev, Olha Hrytsay, Rick Kazman September 2015

Designing Big Data Systems Like a Pro

Embed Size (px)

Citation preview

[email protected]

Designing Big Data Systems Like a Pro

Smart Decisions: An Architecture Design GameHumberto Cervantes, Serge Haziyev, Olha Hrytsay, Rick Kazman

September 2015

PresentersDr. Rick Kazman is a Professor at the University of Hawaii and a Principal Researcher at the Software Engineering Institute of Carnegie Mellon University (SEI). His primary research interests are software architecture, design and analysis tools, software visualization, and software engineering economics. Rick has created several highly influential methods and tools for architecture analysis, including the ATAM (Architecture Tradeoff Analysis Method).

Dr. Humberto Cervantes is a professor at Universidad Autónoma Metropolitana–Iztapalapa in Mexico City. His primary research interests include software architecture design methods and their adoption in industrial settings. Dr. Cervantes is also a consultant for software development companies in topics related to software architecture. He holds the Software Architecture Professional and ATAM Evaluator certificates from the SEI.

Serge Haziyev is VP of Software Architecture at SoftServe. Serhiy has more than 17 years of experience in enterprise-level solutions including Big Data, SaaS/Clouds, SOA and Carrier-grade telecommunication services. He specializes in software architecture methodologies, architectural patterns and software development practices for large and complex projects in multiple industry verticals, including healthcare.

Olha Hrytsay works as a BI/DW consultant at SoftServe, Inc., a leading global outsourced product and application development company. Olha has more than seven years of experience in building business intelligence, data warehousing, and big-data solutions for a number of global companies in the network security, health care, and finance business domains. Her current activities at SoftServe include leading the BI Center of Excellence as well as design and implementation of data warehousing, data visualization, and analytics solutions.

[email protected]

Agenda

Game Motivation

[email protected]

Agenda

Game Motivation

[email protected]

Game Motivation

This game intends to illustrate the essentials of architecture design using an iterative method such as ADD.

You will be competing against other software architects (or other teams) from rival companies, so you need to make smart design decisions or else your competitors will leave you behind!

[email protected]

Past Game EventsSATURN 2015

Architecture Gathering 2014

SEI ACE Educators Workshop 2015

Game Inventory1. Playing cards

3. Game board

4. Dice

5. Markers

2. Game scenario

6. Scorecard

Download materials at: www.smartdecisionsgame.com

Brief ADD Method Introduction

Design Concepts Catalog

[email protected]

Agenda

Game Motivation

Game RulesADD Step 1: Review Inputs

Let’s start by reviewing the inputs to the design process…

Input Requirements1. Functional Requirements

3. Constraints

2. Quality Attributes

Game Rules

The game is played in rounds which represent the iterations.

For each round the game provides:- Iteration goal (i.e. selected

drivers)- Element to refine

ADD Step 2: Establish iteration goal by selecting driversADD Step 3: Choose one or more elements of the system to refine

[email protected]

Let’s Start!

[email protected]

Iteration 1 Goal: Logically Structure The System

Drivers for the iteration:- Ad-Hoc Analysis- Real-time Analysis- Unstructured data processing- Scalability- Cost Economy

Big Data System

Element to refine:

Game Rules

You will make the design decision of selecting design concepts:- Reference architectures*- Technology families*- Specific technologies

* In the game they are considered as a type of pattern

ADD Step 4: Choose one or more design concepts that satisfy the selected drivers

Game Rules: Design Concepts Cards

Name and type of design concept

Influence on drivers

Technologies Patterns

Iteration 1 Goal: Logically Structure The System

Select 1 Reference Architecture Card

Drivers for the iteration:- Ad-Hoc Analysis- Real-time Analysis- Unstructured data processing- Scalability- Cost Economy

Alternatives:• Extended Relational• Pure Non-Relational• Data Refinery• Lambda Architecture

Big Data System

Element to refine:

Fill The Scorecard

Fill (b) by adding the points for the drivers considered for the iteration, in this case:- Ad-Hoc Analysis (2.5)- Real-time Analysis (3)- Unstructured data processing (3)- Scalability (3)- Cost Economy (3)

= 1 Point

2.5+3+3+3+3=14.5

LambdaArchitecture

Record design decisions in (a)

Some iterations require you to draw two cards. For these iterations you will need to:

- Record the name of both design concepts- Add the points for both of the cards

Please note that some drivers may not be associated to both cards, for example:- Performance (for Family and Technology)- Compatibility (for Family)- Reliability (for Technology)

In these cases, you only count points for the drivers that are associated to the card

IntroductionADD Step 5: Instantiate elements, allocate responsibilities and define interfaces. ADD Step 6: Sketch views and record design decisions

You will:- Record the design decision- Throw two dice to simulate how

well you instantiate your selected design concepts

Fill The Scorecard

Roll two dice once and add or subtract points according to the following table, fill (c).

2.5+3+3+3+3=14.5

LambdaArchitecture

+2

Introduction

We will review the decisions together. The first iteration will be reviewed now but the rest will be reviewed at the end.

ADD Step 7: Perform Analysis of Current Design and Review Iteration goal and Design Objective

[email protected]

Iteration 1 Review

Design decision Driver points Bonus points Comments

Extended Relational 3+2+2+2+1=10 -4 This reference architecture is less appropriate for this solution mostly because of cost and real-time analysis limitation

Pure Non-Relational 2+2.5+3+3+3=13.5 This reference architecture is closer to the goal than the others except Lambda Architecture

Lambda Architecture (Hybrid) 2.5+3+3+3+3=14.5 +2

This is the most appropriate reference architecture for this solution!From the provided reference architectures Lambda Architecture promises the largest number of benefits, such as access to real-time and historical data at the same time.

Data Refinery (Hybrid) 3+1+3+2+1=10 -4 This reference architecture is less appropriate for this solution mostly because of cost and real-time analysis limitation

Score Ad-Hoc Analysis, Real-time Analysis, Unstructured data processing, Scalability, Cost Economy

[email protected]

Fill The Scorecard

2.5+3+3+3+3=14.5

LambdaArchitecture

+2

Add bonus points, if any and fill (d)

+2

Sum the points and calculate the total for the iteration in (e)

18.5

[email protected]

Agenda

Game Motivation

Game Scenario: Big Data System

Web Servers

24/7 Operations, Support

Engineers, Developers

Real-time Dashboard

ManagementStatic Reports

• Real-time monitoring• Full-text search

• Historical static reports• Available through BI corporate tool

• Hundreds of servers

• Massive logs from multiple sources

Data Scientists/Analysts

Ad-Hoc Reports

• Raw and aggregated historical data• Ad-hoc analysis• Human-time queries

UC-1,2

UC-3

UC-4

UC 1 - Monitor online services UC 2 - Troubleshoot online service issuesUC 3 - Provide management reports UC 4 - Provide ad -hoc data analytics

Big Data System: Quality Attributes and Constraints

[email protected]

1st Decision: Lambda Architecture

Batch Layer Serving Layer

Speed Layer

Master Dataset

Data Stream

Real-time Views

Pre-Computing Batch Views

Query & Reporting

Source: http://lambda-architecture.net/

Lambda Architecture: Design IterationsBatch Layer Serving Layer

Speed Layer

Master Dataset

Data Stream

Real-time Views

Pre-Computing Batch Views

Query & Reporting

2

3 4

5

Iteration 2 – Refine Data Stream elementIteration 3 – Refine Master Dataset elementIteration 4 – Refine Batch Views elementIteration 5 – Refine Real-time Views element

Iteration 2: Data Stream Technology Alternatives

Iteration 2 review

Design decision Driver points Bonus points Comments

Data Collector 2+3=5 +2 Additional bonus is added for extensibility

Distributed Message Broker 3+1=4

Design decision Driver points Bonus points Comments

Apache Flume 2+2=4

Logstash 2+2=4

Fluentd 2+3=5

RabbitMQ 2+2=4

Apache Kafka 3+2=5 +2 Additional bonus for easier deployment and configuration comparing with other alternatives

Amazon SQS 0 Disqualified due to deployment constraint (support On-premise and Cloud)

Apache ActiveMQ 2+2=4

Family card: score Performance and Compatibility

Technology card: score Performance and Reliability

[email protected]

Game Result Sample

Batch Layer Serving Layer

Speed Layer

Master Dataset

Data Stream

Real-time Views

Pre-Computing Batch Views

Query & Reporting

[email protected]

Agenda

Game Motivation

[email protected]

QUESTIONS & ANSWERSE-mail your questions to

[email protected]

[email protected]

Oct 13 Outsourcing: Been there, done that, didn’t work out

Nov 11 Software Application Management

OUR NEXT WEBINARS