How we design data architectureMate Gulyas
CTO & Co-FounderGULYÁS MÁTÉ
@gulyasm
ARCHITECTURE?●CODE ARCHITECTURE●GENERAL
INFRASTRUCTURE●DATA INFRASTRUCTURE
@gulyasm
ON THE NEXT EPISODE OF BIG DATA...
1.WHAT DO WE DESIGN FOR?
2.OUR STORY, OUR FAILURES
@gulyasm
WHAT DO WE DESIGN FOR?
WHAT DO WE DESIGN FOR?
●SCALABILITY●MAINTAINABILITY●COST
@gulyasm
SCALABILITY AND MAINTAINABILITY
ARE RESULTS OF A GOOD DESIGN
WHAT DO WE REALLY DESIGN FOR?
●SIMPLICITY
●RESILIENCY
●SMALL ITERATIONS
●SELF SERVICE
@gulyasm
WHAT DO WE REALLY DESIGN FOR?
●SIMPLICITY
●RESILIENCY
●SMALL ITERATIONS
●SELF SERVICE
@gulyasm
SIMPLICITY
SIMPLE THINGS
SCALE WELL
@gulyasm
SIMPLICITY
SIMPLE THINGS ARE EASY TO UNDERSTAND
@gulyasm
SIMPLICITY
BORING TECHNOLOGY IS GOOD TECHNOLOGY
@gulyasm
SMALL ITERATIONS
THE UNKNOWNS● THE UNKNOWNS
● THE UNKNOWN UNKNOWNS
@gulyasm
SMALL ITERATIONS @gulyasm
END RESULT @gulyasm
SMALL ITERATIONS @gulyasm
SMALL ITERATIONS @gulyasm
SMALL ITERATIONS @gulyasm
SMALL ITERATIONS @gulyasm
SMALL ITERATIONS @gulyasm
SMALL ITERATIONS @gulyasm
SELF SERVICE
YOUR SOFTWARE/IT INFRASTRUCTURE
IMPACTS THE WHOLE ORGANIZATION
ENBRITELY
DATA
PLATFORM
Product placeholder
Luigi TOOLS
Luigi + enbrite.ly extensions = Gabo Luigi
WORKFLOW ENGINE
Tools we created
GABO LUIGI
Spark TOOLS
0.5-4TB daily data1-10B events
Ad-hoc batch queries: 20TB data
Spark TOOLS
●SPENT 3 MONTHS OPTIMIZING IT
●20+ NODE CLUSTERS●UNIT TESTS
AWS TOOLS
●16 services●110+ machines●1-4 EMR clusters (1-20 node)●100TB+ on S3●All clients has separate
infrastructure
HOW WE GOT HERE?
2014
MONOLITHIC PYTHON ANALYTICS
2015
JAN
EVALUATE BIG DATA TECHNOLOGIES
2015
SEP
STARTED WORK ON DP
2016FEB
DPPRODUCTION READY
2016JULSAAS DP
@gulyasm
HAVE FUN!
@gulyasm
PRACTICE AT HOME
@gulyasm
WE ARE HIRING!
WE ARE HIRING!