Upload
alistairhann
View
282
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Slides from Alistair Hann's presentation at 2014 All Your Base Conference in Oxford, UK Synopsis: Skyscanner performs 200 million searches every month, and generated $7bn of downstream revenue last year. Scaling from zero brought many challenges, in the context of a continually growing variety of databases and hardware. Alistair will talk about how that rapid change has shaped Skyscanner’s data architecture and how they moved from just using SQL Server to a range of hardware and data technologies (Postgres, Couchbase, ElasticSearch, Hadoop). (Unfortunately the animations were destroyed by the upload)
Citation preview
SomeSQLScaling in a changing world of databases and hardware
Alistair HannCTO, Skyscanner
Buzzwords
Web 2.0
Year of Mobile
Big Data
NoSQL
Scaling the live pricing cache
Website Native Apps APIs and White Labels
Traditional Airlines Budget Airlines Online Travel Agencies
Prices +Timetables
Data Collection Services
1) Which websites should we show?
2) What prices do we already have
cached?
3) Live update what we still
need.
4) Clean up and save the new data
5) Return the prices to the user.
Live Pricing Service
Live Pricing Service
Live Pricing Service
Cached Prices (key/value)
2 bn itineraries and quotes
270 GB
table
250 GB
indices
2000 quotes per second
What we really needed
Consistency
Horizontal Scaling
Elasticity
Persistence
Speed
Resilience
Simplicity
Live Pricing Service
Cached Prices (key/value)
Beyond key value
Couchbase – Map Reduce Views
{ "website": { "published": true, "id": "affd", ... }, "office_id": "1", "city_id": "AUHA", "raw_data": [...] "address": "closing_time": "00:00", "routenodeid": "9618", "type": "office“}
What about the hardware?
Disk for VMs
c.f. 250,000 iops Fusion I/O
Standard $0.03 / GBGlacier $0.01 / GB
Quote Bus staging
UK1
Thrift
long-term archive
GZIP
queryablehierarchical
LZO
queryableflat
filter
Loader
GZIP
Quote Bus
UK2
Thrift
Loader
hierarchical
flat
Hadoop clusteror
Elastic MapReduce
analystsquery
load
analytical tools
feed
export
The death of the data warehouse
Fluentd
Graphite
Fluentd
Stitchedevents
Stitchedevents
Operationalmetrics
reportingKafka
ErrorsRaw JSON events
ElasticMapReduce
RawEvents
Trigger and view materialization Indexes on the data
A distributed database…
Some things don’t change
CC Images courtesy of chaya760 on Flickr
We still face the same challenges
RAM and Disk i/o concerns
Administration
Security
Data insert and retrieval
Monitoring and alerting
Performance optimization
The report of my death was an exaggeration
Elastic Search
NoSQL
Microsoft SQL Server
Relational Vs NoSQL
EdinburghQuartermile One15 Lauriston PlaceEdinburgh EH3 9EN
Glasgow5th floor, 151-155 St Vincent St, Glasgow G2 5NW
SingaporeNo. 08-01&04 & 09-048th floor, Robinson Point, 39 Robinson Rd, Singapore
BeijingLevel 19, Tower E2, Oriental Plaza, No. 1 East Chang An Avenue, Dong Cheng District, Beijing 100738
Miami1395 Brickell Ave, Suite 900, Miami, Florida 33131
BarcelonaTorre NN, Calle Tarragona, 157, 4a Planta, Barcelona, 08014
thank you