Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Database Integrity in Django: Safely Handling Critical Data
in Distributed Systems
Dealing with concurrency, money, and log-structured data in the Django ORM
Nick Sweeting (@theSquashSH)
Slides: github.com/pirate/django-concurrency-talk
NickSwee)ngTwi-er:theSquashSH|Github:pirateCo-Founder/[email protected]
WebuiltOddSlingers.com,afast,cleanonlinepoker
experiencemadewithDjango+Channels&React/Redux.Welearnedalotaboutdatabaseintegrityalongtheway.
Background
Disclaimer:Iamnotadistributedsystemsexpert.
Ijustthinkthey'reneat.
Itallstartswithasinglesalamislice.
Itendswithmillionsofdollarsmissing.
Dealingwithmoney float,Decimal,andmath
Avoidingconcurrencylinearizingwritesinaqueue
Dealingwithconcurrency transac)ons,locking,compare-and-swaps
Schemadesignlog-structureddata,minimizinglocks
Thebiggerpicture codelayout,storagelayer,NewSQLdatabases
Dealing With Money
float,Decimal,&math
Losingtrackoffrac)onalcents(akasalamislicing)
I know you're tempted,don't even try it...salami slicers all
get caught eventually
floatvsDecimal
>>>0.1+0.2==0.3False
RoundinginPython3
>>>round(1.5)
>>>round(2.5)
2
2
wat.
Dealingwithmoney float,Decimal,andmath
Avoidingconcurrencylinearizingwritesinaqueue
Dealingwithconcurrency transac)ons,locking,compare-and-swaps
Schemadesignlog-structureddata,minimizinglocks
Thebiggerpicture codelayout,storagelayer,NewSQLdatabases
Avoid Concurrency
Eliminatethedragons.
?
Whatisadistributedsystem?EveryDjangoapp.
>EachDjangorequesthandlerisaseparatethread
>Whathappenswhentwothreadstrytodosomethingcri?calatthesame?me?e.g.updateauser'sbalance
Otherthreadswri)ngwillbreakthebank
$0 instead of $-100
TheChallengeDealingwithconcurrentwriteconflicts
Asolu?on:removetheconcurrency
Strictlyorderallstatemuta)onsby)mestamp
Linearizeallthewritesintoasinglequeue
transactions = [ # timestamp condition action
(1523518620, "can_deposit(241)" , "deposit_usd(241, 50)"), (1523518634, "balance_gt(241, 50)", "buy_chips(241, 50)"), ]
Ifonlyonechangehappensata)me,
noconflic)ngwritescanoccur.
Executethewrites1by1inadedicatedprocess
while True: ts, condition, action = transaction_queue.pop() if eval(condition): eval(action)
(usingaRedisQueue,orDrama)q,Celery,etc.)
Don'tletanyotherprocessestouchthesametables.
Allchecks&writesarenowlinearized.
Ifyouvalueyoursanity,linearizecri)caltransac)onsintoasinglequeue
wheneverpossible.
Don'tevenwatchtherestofthetalk,juststopnow,really,youprobablydon'tneedconcurrency...
Eliminateconcurrencyatallcosts.
Dealingwithmoney float,Decimal,andmath
Avoidingconcurrencylinearizingwritesinaqueue
Dealingwithconcurrency transac)ons,locking,compare-and-swaps
Schemadesignlog-structureddata,minimizinglocks
Thebiggerpicture codelayout,storagelayer,NewSQLdatabases
Dealing With Concurrency
transac)ons,locking,compare-and-swaps
Iwarnedyouaboutthedragons...
❤
!
❤
❤
ToolstheORMprovides
>Atomictransac)onstransac?on.atomic()
>LockingModel.objects.select_for_update()
>Compare-and-swaps.filter(val=expected).update(val=new)
AtomicTransac)ons
with transaction.atomic(): thing = SomeModel.objects.create(...) other_thing = SomeModel.objects.create(...) if error_condition(...): raise Exception('Rolls back entire transaction')
Excep)onsrollbacktheen)retransac)onblock.
NeitherobjectwillbesavedtotheDB.
Transac)onscanbenested.
RowLocking
with transaction.atomic(): to_update = SomeModel.objects.select_for_update().filter(id=thing.id) ... to_update.update(val=new)
.select_for_update()allowsyoutolockrows
Lockingpreventsotherthreadsfromchangingtherowun)ltheendofthecurrenttransac)on,
whenthelockisreleased.
(pessimis)cconcurrency)
Atomiccompare-and-swaps
last_changed = obj.modified
...SomeModel.objects.filter(id=obj.id, modified=last_changed).update(val=new_val)
Onlyupdatesifthedbrowisunchangedbyotherthreads.>anymodifiedobjindbwilldifferfromourstalein-memoryobjts>filter()wontmatchanyrows,update()fails>overwri)ngnewerrowindbwithstaledataisprevented
Thisisveryhardtogetright,lockingisbe-erfor90%ofusecases!
(op)mis)cconcurrency)
HybridSolu)on
last_changed = obj.modified
... read phaseSomeModel.objects.select_for_update().filter(id=obj.id, modified=last_changed)... write phase
Bestofbothworlds>lockingislimitedtowrite-phaseonly>noneedforcomplexmul)-modelcompare-and-swaps
MVCCisusedinternallybyPostgreSQL
Alterna)ve:SQLgap-lockingw/filterqueryonindexedcol.
(op)mis)cconcurrency+pessimis)corMul)versionConcurrencyControl)
Dealingwithmoney float,Decimal,andmath
Avoidingconcurrencylinearizingwritesinaqueue
Dealingwithconcurrency transac)ons,locking,compare-and-swaps
Schemadesignlog-structureddata,minimizinglocks
Thebiggerpicture codelayout,storagelayer,NewSQLdatabases
Schema Design
log-structureddata,minimizinglocks
Whatislog-structureddata?Append-onlytablesvsmutabletables
>MutableexampleUser.balance=100
>Log-structuredexample(immutable,append-only)User.balance=()=>sum(BalanceTransfer.objects.filter(user=user).values_list('amt',flat=True))
Log-structuredstorageisafounda?onalbuildingblockofsafe,distributedsystems.
-Providesstrictorderingofwrites-Immutablelogofeverychange
-Abilitytoreverttoanypointin)me
See:redux,CouchDB,Redis
Butlog-structuredtablesmakelockinghard...
we'dhavetolocktheen)reBalanceTransfertabletopreventconcurrentprocessesfromaddingnewtransfersthatchangethetotal.
Howelsecanwepreventconcurrentwritesfromchanginga
user'sbalance?
Becauseanynewrowaddedcanchangethetotal,
Storeatotalseparatelyfromthelog,requiretheybeupdatedtogether
class UserBalance(models.model): user = models.OneToOneField(User) total = models.DecimalField(max_digits=20, decimal_places=2)
Asingle-rowlockmustnowbeobtainedonthetotalbeforeaddingnewBalanceTransferrowsforthatuser.
Fullexampleusinglocking
def send_money(src, dst, amt): with transaction.atomic(): # Lock balance rows, preventing other threads from making changes src_bal = UserBalance.objects.select_for_update().filter(id=src) dst_bal = UserBalance.objects.select_for_update().filter(id=dst) if src_bal[0].total < amt: raise Exception('Not enough balance to complete transaction')
# Update the totals and add a BalanceTransfer log row together BalanceTransfer.objects.create(src=src, dst=dst, amt=amt) src_bal.update(total=F('total') - amt) dst_bal.update(total=F('total') + amt)
Sidebenefit:noneedtoscanen)reBalanceTransfertableanymoretogetauser'sbalance
Log-structureddataisgreat,but...
itrequirescarefulthoughtto:
-minimizedetrimentalwhole-tablelocking-accessaggregatevalueswithoutscanning
Dealingwithmoney float,Decimal,andmath
Avoidingconcurrencylinearizingwritesinaqueue
Dealingwithconcurrency transac)ons,locking,compare-and-swaps
Schemadesignlog-structureddata,minimizinglocks
Thebiggerpicture codelayout,storagelayer,NewSQLdatabases
The bigger picture
codelayout,storagelayer,NewSQLdatabases
Whathappenswhenthebu-erfliesflipyourbits?
CodeLayout
>Onlyperformwritesviahelperfuncs,neverupdatemodelsdirectly>Putalltransac)onsinonefileforeasieraudi)ng&tes)ng
banking/transactions.py:def transfer_money(src, dst, amt): with transaction.atomic(): ...
def merge_accounts(user_a, user_b): with transaction.atomic(): ...
def archive_account(user): with transaction.atomic(): ...
from banking.transactions import transfer_money
...
OFFSITE BACKUPS. OFFSITE BACKUPS. SET UP YOUR OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. TEST YOUR OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS. OFFSITE BACKUPS.
HardwareLayerConcerns
>Bitflipsarecommon,useECCRAM(andZFS!)
>Thedatabasecan'tguaranteedataintegrityonitsown
>Streamingreplica)onorsnapshotstodooffsite-backups
>Synchronizingclocksbetweensystemsisveryhard
DatabaseIsola)onLevelsInsomemodes,par)altransac)onstatecanleakintootherthreads.
DATABASES = { 'OPTIONS': { 'isolation_level':psycopg2.extensions.ISOLATION_LEVEL_SERIALIZABLE,
Highlycomplextopic,muchmoreinfocanbefoundelsewhere...
It'spossibletouseaseparatedatabasewithahigherisola)onlevelforcri)caldata
with transaction.atomic(using='default'): with transaction.atomic(using='banking'): MyModel_one(...).save(using='default') MyModel_two(...).save(using='banking')
Djangosupportstransac)onsacrossmul)pledatabases.
WhattheFutureLooksLike
Serializable,distributedSQLwithoutsharding.
SQLontopof>key:valstoreontopof>
rao-basedlog-structuredstorage
CockroachDB&TiDBworkwithPython
Dealingwithmoney float,Decimal,andmath
Avoidingconcurrencylinearizingwritesinaqueue
Dealingwithconcurrency transac)ons,locking,compare-and-swaps
Schemadesignlog-structureddata,minimizinglocks
Thebiggerpicture codelayout,storagelayer,NewSQLdatabases
The End
Keytakeaways:don'tstopworrying,butloveatomic()
Specialthanksto:DjangoCoreTeam&Contributors,
PyGothamOrganizers,AndrewGodwin,Aphyr,TylerNeely
FinalDisclaimer:I'mnotqualifiedtotellyouhowtodesignyourdistributedsystem.Getaprofessionalforthat.
Icanonlyshowyouchallengesandsolu)onsI'vediscoveredinmypersonal
adventureswithDjango.Pleaseletmeknowifyouhavecorrec)ons!
Monadicalishiringandtakinginvestmentrightnow!contactme:talks@swee)ng.me
Q&A
Slides & Info: github.com/pirate/django-concurrency-talk