Upload
softserve
View
344
Download
2
Embed Size (px)
Citation preview
Designing Big Data Systems Like a Pro
Smart Decisions: An Architecture Design GameHumberto Cervantes, Serge Haziyev, Olha Hrytsay, Rick Kazman
September 2015
PresentersDr. Rick Kazman is a Professor at the University of Hawaii and a Principal Researcher at the Software Engineering Institute of Carnegie Mellon University (SEI). His primary research interests are software architecture, design and analysis tools, software visualization, and software engineering economics. Rick has created several highly influential methods and tools for architecture analysis, including the ATAM (Architecture Tradeoff Analysis Method).
Dr. Humberto Cervantes is a professor at Universidad Autónoma Metropolitana–Iztapalapa in Mexico City. His primary research interests include software architecture design methods and their adoption in industrial settings. Dr. Cervantes is also a consultant for software development companies in topics related to software architecture. He holds the Software Architecture Professional and ATAM Evaluator certificates from the SEI.
Serge Haziyev is VP of Software Architecture at SoftServe. Serhiy has more than 17 years of experience in enterprise-level solutions including Big Data, SaaS/Clouds, SOA and Carrier-grade telecommunication services. He specializes in software architecture methodologies, architectural patterns and software development practices for large and complex projects in multiple industry verticals, including healthcare.
Olha Hrytsay works as a BI/DW consultant at SoftServe, Inc., a leading global outsourced product and application development company. Olha has more than seven years of experience in building business intelligence, data warehousing, and big-data solutions for a number of global companies in the network security, health care, and finance business domains. Her current activities at SoftServe include leading the BI Center of Excellence as well as design and implementation of data warehousing, data visualization, and analytics solutions.
Game Motivation
This game intends to illustrate the essentials of architecture design using an iterative method such as ADD.
You will be competing against other software architects (or other teams) from rival companies, so you need to make smart design decisions or else your competitors will leave you behind!
Past Game EventsSATURN 2015
Architecture Gathering 2014
SEI ACE Educators Workshop 2015
Game Inventory1. Playing cards
3. Game board
4. Dice
5. Markers
2. Game scenario
6. Scorecard
Download materials at: www.smartdecisionsgame.com
Game Rules
The game is played in rounds which represent the iterations.
For each round the game provides:- Iteration goal (i.e. selected
drivers)- Element to refine
ADD Step 2: Establish iteration goal by selecting driversADD Step 3: Choose one or more elements of the system to refine
Iteration 1 Goal: Logically Structure The System
Drivers for the iteration:- Ad-Hoc Analysis- Real-time Analysis- Unstructured data processing- Scalability- Cost Economy
Big Data System
Element to refine:
Game Rules
You will make the design decision of selecting design concepts:- Reference architectures*- Technology families*- Specific technologies
* In the game they are considered as a type of pattern
ADD Step 4: Choose one or more design concepts that satisfy the selected drivers
Game Rules: Design Concepts Cards
Name and type of design concept
Influence on drivers
Technologies Patterns
Iteration 1 Goal: Logically Structure The System
Select 1 Reference Architecture Card
Drivers for the iteration:- Ad-Hoc Analysis- Real-time Analysis- Unstructured data processing- Scalability- Cost Economy
Alternatives:• Extended Relational• Pure Non-Relational• Data Refinery• Lambda Architecture
Big Data System
Element to refine:
Fill The Scorecard
Fill (b) by adding the points for the drivers considered for the iteration, in this case:- Ad-Hoc Analysis (2.5)- Real-time Analysis (3)- Unstructured data processing (3)- Scalability (3)- Cost Economy (3)
= 1 Point
2.5+3+3+3+3=14.5
LambdaArchitecture
Record design decisions in (a)
Some iterations require you to draw two cards. For these iterations you will need to:
- Record the name of both design concepts- Add the points for both of the cards
Please note that some drivers may not be associated to both cards, for example:- Performance (for Family and Technology)- Compatibility (for Family)- Reliability (for Technology)
In these cases, you only count points for the drivers that are associated to the card
IntroductionADD Step 5: Instantiate elements, allocate responsibilities and define interfaces. ADD Step 6: Sketch views and record design decisions
You will:- Record the design decision- Throw two dice to simulate how
well you instantiate your selected design concepts
Fill The Scorecard
Roll two dice once and add or subtract points according to the following table, fill (c).
2.5+3+3+3+3=14.5
LambdaArchitecture
+2
Introduction
We will review the decisions together. The first iteration will be reviewed now but the rest will be reviewed at the end.
ADD Step 7: Perform Analysis of Current Design and Review Iteration goal and Design Objective
Iteration 1 Review
Design decision Driver points Bonus points Comments
Extended Relational 3+2+2+2+1=10 -4 This reference architecture is less appropriate for this solution mostly because of cost and real-time analysis limitation
Pure Non-Relational 2+2.5+3+3+3=13.5 This reference architecture is closer to the goal than the others except Lambda Architecture
Lambda Architecture (Hybrid) 2.5+3+3+3+3=14.5 +2
This is the most appropriate reference architecture for this solution!From the provided reference architectures Lambda Architecture promises the largest number of benefits, such as access to real-time and historical data at the same time.
Data Refinery (Hybrid) 3+1+3+2+1=10 -4 This reference architecture is less appropriate for this solution mostly because of cost and real-time analysis limitation
Score Ad-Hoc Analysis, Real-time Analysis, Unstructured data processing, Scalability, Cost Economy
Fill The Scorecard
2.5+3+3+3+3=14.5
LambdaArchitecture
+2
Add bonus points, if any and fill (d)
+2
Sum the points and calculate the total for the iteration in (e)
18.5
Game Scenario: Big Data System
Web Servers
24/7 Operations, Support
Engineers, Developers
Real-time Dashboard
ManagementStatic Reports
• Real-time monitoring• Full-text search
• Historical static reports• Available through BI corporate tool
• Hundreds of servers
• Massive logs from multiple sources
Data Scientists/Analysts
Ad-Hoc Reports
• Raw and aggregated historical data• Ad-hoc analysis• Human-time queries
UC-1,2
UC-3
UC-4
UC 1 - Monitor online services UC 2 - Troubleshoot online service issuesUC 3 - Provide management reports UC 4 - Provide ad -hoc data analytics
1st Decision: Lambda Architecture
Batch Layer Serving Layer
Speed Layer
Master Dataset
Data Stream
Real-time Views
Pre-Computing Batch Views
Query & Reporting
Source: http://lambda-architecture.net/
Lambda Architecture: Design IterationsBatch Layer Serving Layer
Speed Layer
Master Dataset
Data Stream
Real-time Views
Pre-Computing Batch Views
Query & Reporting
2
3 4
5
Iteration 2 – Refine Data Stream elementIteration 3 – Refine Master Dataset elementIteration 4 – Refine Batch Views elementIteration 5 – Refine Real-time Views element
Iteration 2 review
Design decision Driver points Bonus points Comments
Data Collector 2+3=5 +2 Additional bonus is added for extensibility
Distributed Message Broker 3+1=4
Design decision Driver points Bonus points Comments
Apache Flume 2+2=4
Logstash 2+2=4
Fluentd 2+3=5
RabbitMQ 2+2=4
Apache Kafka 3+2=5 +2 Additional bonus for easier deployment and configuration comparing with other alternatives
Amazon SQS 0 Disqualified due to deployment constraint (support On-premise and Cloud)
Apache ActiveMQ 2+2=4
Family card: score Performance and Compatibility
Technology card: score Performance and Reliability
Game Result Sample
Batch Layer Serving Layer
Speed Layer
Master Dataset
Data Stream
Real-time Views
Pre-Computing Batch Views
Query & Reporting
Oct 13 Outsourcing: Been there, done that, didn’t work out
Nov 11 Software Application Management
OUR NEXT WEBINARS