Scaling Databases with DBIx::Router

Perrin HarkinsWe Also Walk Dogs

What is DBIx::Router?

Load-balancing

Failover

Sharding

Transparent

(Mostly.)

Why would you need this?

Web and app servers are easy to scale

Just add another dozen boxes

Why would you need this?

Databases not so much

Big iron

Commercial clustering solutions

Human sacrifice

Advice from Experts

Brad Fitzpatrick,“Inside LiveJournal’s Backend”

Cal Henderson,“Building Scalable Websites”

Jeremy Zawodny and Derek J Balling,

“High Performance MySQL”

Caching

Keep your hot data in a fast cache

You get this one for free, thanks to DBI::Gofer

Can use Cache::FastMmap, memcached, etc. through CHI

Read-only CopiesReplication to local server or remote slaves

Be careful of replication lag

(from MySQL 5.1 docs)

ShardingLarge data is split across multiple machines

Users A-L, M-Z

Logs by month

Consistent hashing algorithm

May involve directory server

JOINs are now the programmer’s problem

This is going to mean a custom database layer

And rewriting all your old code to use it

DBIx::Router tries to separate this plumbing

Enter DBI::Gofer

“A scalable stateless proxy architecture for DBI”

Bundles up requests, sends them over a transport, executes them, sends back results

Shoulders of Giants

Used by shopzilla.com and petfinder.com to pool connections

Lots of good stuff like caching, timeouts, tracing

DBIx::Router is a Gofer transport

But it executes the calls locally

Shoulders of Giants

CPAN makes everything easy

SQL::Statement parses SQL (!)

Config::Any solves the XML problem

Shoulders of Giants

How does it work?

DataSources

Individual (DSNs) or Group

Group handles failover

Also load-balancing

Single DataSource{ name => 'Master1', dsn => 'dbi:Pg:dbname=lolcats', user => ‘icanhas’, password => ‘ch33zeburger’,},

Group DataSource

{ name => 'ReadCluster', class => 'random', datasources => [ ‘Slave1’, ‘Slave2’ ],},

Group DataSource

{ name => 'ReadCluster', class => 'roundrobin', datasources => [ ‘Slave1’, ‘Slave2’ ],},

Group DataSource

{ name => 'ReadCluster', class => 'roundrobin', datasources => [ ‘Slave1’, ‘Slave2’ ], failover => 1, timeout => 8,},

Group DataSource

{ name => 'ReadWriteCluster', class => 'repeater', datasources => [ ‘Master1’, ‘Master2’ ],},

Group DataSource{ name => 'StoreShards', class => 'shard', type => 'list', table => 'orders', column => 'store_id', shards => [ { values => [ 1, 3, 5 ], datasource => 'EastCoast', }, { values => [ 2, 4, 6 ], datasource => 'WestCoast', }, ],},

Subclass DataSources

Custom auth schemes for the paranoid

Ye olde insane load-balancing scheme

Shards with a directory server

Map queries to DataSources

Organized in RuleLists

Most specific to least

Can fall back to pass-through

Rule: regex{ class => 'regex', datasource => 'ReadCluster', match => ['^ \s* SELECT \b '], not_match => ['\b FOR \s+ UPDATE \b '],},

Rule: readonly{ class => 'readonly', datasource => 'ReadCluster',},

Rule: parser{ class => 'parser', match => [ { structure => 'tables', operator => 'all', tokens => ['order_history’] },},

Operators: all, any, none, only

Structures: command, tables, columns

Rule: not{ class => 'not', rule => { class => ‘readonly’ } datasource => 'Master1',},

Rule: default{ class => 'default', datasource => 'Master1',},

{ datasources => [ { name => 'Master1', dsn => 'dbi:mysql:dbname=lolcats', user => undef, password => undef, }, { name => 'Slave1', dsn => 'dbi:mysql:dbname=zomg1', user => undef, password => undef, }, { name => 'Slave2', dsn => 'dbi:mysql:dbname=zomg2', user => undef, password => undef, }, { name => 'ReadCluster', class => 'random', datasources => [ 'Slave1', 'Slave2', ], failover => 1, timeout => 8, }, ], rules => [ { class => 'readonly', datasource => 'ReadCluster', }, { class => 'default', datasource => 'Master1', }, ],}

How do you run it?

DBI_AUTOPROXY=”dbi:Gofer: \ transport=DBIx::Router; \ conf=/path/to/conf.pl”

$dbh = DBI->connect("dbi:Gofer: \ transport=DBIx::Router; \ conf=/path/to/conf.pl; \ dsn=$original_dsn", $user, $passwd, \%attributes);

What’s the bad news?

AutoCommit

But Tim plans to fix that

No streaming results

Also fixable

Failover and sharding are tough to generalize

StatusHosted on Google Code

Mostly there, but some bits need work

Sharding

Failover

Needs user feedback

Needs more tests

Future Directions

Make routing decisions optionally sticky

Support explicit hints in method call attributes

Helps with sharding

Workaround for tricky queries that fool SQL::Parser

Thanks!http://code.google.com/p/dbix-router/

What else is out there?

Mostly faux DBD drivers

DBD::Multi

DBD::Multiplex

DBIx::HA

DBI::Role

Scaling Databases with DBIx::Router

Technology

Scaling Multicore Databases via Constrained Parallel Executionjinyang/pub/ic3-sigmod16.pdf · Scaling Multicore Databases via Constrained Parallel Execution Zhaoguo Wang, Shuai Mu,

Demystifying DBIx::Class

CS15-319 / 15-619 Cloud Computingmsakr/15619-f14/recitations/Recitation9_21Oct20… · Horizontal Scaling in Databases •Scenario Your company has a popular wiki-based website, which

Introduction to DBIx::Lite - Kyoto.pm tech talk #2

Scaling Multicore Databases via Constrained Parallel ExecutionScaling Multicore Databases via Constrained Parallel Execution TR2016-981 Zhaoguo Wang, Shuai Mu, Yang Cui, Han Yi †,

IoT ppt - Module 4€¦ · IoT Data Analytics Challenges Scaling problems : Continually send data, relational databases can grow incredibly large very quickly. Volatility of data:

March 3 - 6, 2019 Mesa, Arizona Archive...–Density scaling –Speed scaling –Power scaling –Cost scaling •CPU architectures are not scaling –Processor frequency scaling ended

Text Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Data Mining Multimedia Databases Text databases Image and video

databases: Oracle - Dell...Reliable Datagram Sockets (RDS) Over InfiniBand can provide a horizontally scalable, high-performance alternative to traditional vertical scaling for enterprises

Scaling SQL and NoSQL Databases in the Cloud

DBIx-DataModel v2.0 in detail

HopsFS: Scaling Hierarchical File System Metadata Using ... - HopsFS: Scaling Hierarchic… · HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases Salman Niazi,

Data Science. Topics databases and data architectures databases in the real world –scaling, data quality, distributed machine learning/data mining/statistics

Mysql Cluster Scaling Web Databases Webinar Aug 26

Cloud Scalability Patterns · Scaling Up == Vertical Scaling Scaling Out == Horizontal Scaling •Architectural Decision –ig decision… hard to change. Scaling Up: Scaling the

Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

Pedology Spatial Scaling for Digital Soil Mappingsmartdigiag.com/downloads/journal/malone2013_3.pdf · Pedology soil information databases can be achieved efficiently, and with quantifiable

Mostly Lazy DBIx::Class Testing

SomeSQL at Skyscanner - Scaling in a changing world of databases and hardware

DBIx::Class vs. DBix::DataModel