Upload
atlassian
View
4.772
Download
0
Embed Size (px)
Citation preview
How Atlassian Scales Bitbucket Data Center on
AWS FELIX HAEHNEL
DEVELOPER • ATLASSIANCHRIS SZMAJDA
SENIOR TEAM LEAD • ATLASSIAN
Performance at scale Auto-scaling clusters in AWS
Bitbucket Data Center
BitbucketData Center@ Atlassian
Smart Mirrors
Disaster recovery
High availability and failover
Instant scalability
C L U S T E R I N G I N AW S
S M A R T M I R R O R S
S C A L I N G B I T B U C K E T @ AT L A S S I A N
The Challenge: Scaling Bitbucket
@ Atlassian
D I S A S T E R R E C O V E RY
Why scale @ Atlassian?
Scale @ Atlassian
100,000’s of builds
1,000’s of users
Terabytes of dataRepositories and forks and moar, mostly on one central instance.
More Atlassians using Bitbucket every day.
Continuous integration hammering Bitbucket around the clock.
Sydney
Global Atlassian teams
Type your annotation here,the dotted line can be moved and/or re-sized.
San Fransisco
Austin
Manila
Tokyo
Amsterdam
London
Gdansk
• Big cool statistic
• 2,569
• Add-Ons in Marketplace
Repository data
GBytes230422982312232622912282 227122642240222922122179214821302018200319931984197619511942192119061898187118551844181918091794177617521743 17211702168516711653
OMG CI load!
1,000’s of agents
100,000’s of builds
Petabytes served
Pull Request workflow
1,000’s of repo’s
100,000’s of PR’s
500,000+ comments
Downtime
S C A L I N G B I T B U C K E T @ AT L A S S I A N
S M A R T M I R R O R S
C L U S T E R I N G I N AW S
The Sky’s the Limit: Clustering in AWS
D I S A S T E R R E C O V E RY
Clustering
ClusteringLoad balancer
Cluster nodes
What makes up a Data Center deployment?
Cluster nodes
Database
File Server (NFS)
Elasticsearch
Bitbucket Data Center deployment model
Load balancer
Cluster nodes
Database
File Server (NFS)
Elasticsearch
Bitbucket Data Center deployment model
Load balancerSimple but powerful
No lock-in stacks.
Takes advantage of your infrastructure.
Scales as you grow.
Data Center, in AWS?
Why AWS? Managed services
Elastic scale
Total Cost of OwnershipCheaper in the long run.
Grow from micro to massive in minutes.
Saves time.
AWS Services
Auto Scaling
Auto Scaling Group
Elastic Load Balancer
Relational Database Service (RDS)
Automaticupgrades
ReadReplicas
MultipleAvailability
Zones
Scale
Elasticsearch Service (ES)
Scale
SecurityCode andrepository
Search
Disaster RecoveryPerformance Zero Downtime BackupSnapshot in under a second.
Restore in minutes.Low latency.
Up to 16 Terabytes.Up to 20,000 IOPS.
Copy snapshots offsite.
Elastic Block Store (EBS)
Auto ScalingGroup
Amazon RDS
NFS Server
AmazonElasticsearch
EBSvolume
AmazonELB
Bitbucket DC AWSdeployment model
Availability Zones
AWS Availability Zones Low latency
Typically several miles apart.
Within one RegionEqually visible across the whole Region.
Isolated data centers
Connected by fast network links.
Why more than one AZ?
us-east-1a
Availability Zones
us-east-1b
EBSRDSELB
Auto Scaling GroupElasticsearch Service
Availability Zones for Bitbucket DC
Use two or more AZ’s.
Enable cross zone load balancing.
Enable Multi-AZ deployment.
Deploy two or more nodes.
Enable Zone Awareness.
Snapshot often.
Consider Cloud NAS products for AWS.
What about scale?
Elastic scale
Scaling in AWS
More nodes = greater capacity
Load tests
Type your annotation here,the dotted line can be moved and/or re-sized.
AWS makes it easy to scale up capacity.
Larger clusters can serve bigger workloads.
Representing 10,000’s of typical users and build agents.
Thro
ughp
ut (T
PS)
10
20
30
40
50
60
70
Concurrent requests0 40 80 120 160 200 240 280
1 node 2 nodes 4 nodes 8 nodes 12 nodes
Choose the right instances
IOPS = Happiness ☺Scale up, then out
Scale vertically with bigger instances first.
Then scale horizontally
with more nodes.
Bitbucket’s largest workload is Git.
Give it enough IOPS.
Different instance types have different balances of
CPU, memory, and I/O.
Choose wisely.
Scaling in AWS: Pro tips
How did we do it?
How we did it
How we did it
Loose coupling
Stateless nodes
Highly tuned cachesKeep “hot” data in fast local storage.
Extremely light JVM memory footprint.
Minimizes node-to-node communication.
Don’t maintain any state that can’t be reconstructed from other sources.
Storage optimizationsMinimize use of the shared file system.
How we did it
Source Code Management cacheIntercepts Git requests and caches the responses on each node (in AWS, on an Instance Store device).
Saves CPU and memoryEvery cache hit saves a Git process.
Saves NFS trafficTakes load off the shared file system.
Cluster awareLightweight cache consistency between nodes.
Git request
NFS
git
response
SCMcache
anotherGit request
cachedresponse
Git push
invalidations
lotsa Gitrequests
lotsa cachedresponsesCI agents generate many Git requests
That’s great!
But how do I deploy it all?
Deploying in AWS
Auto Scaling Group
ELB
RDS instance
Elasticsearch
EC2 file server
Scaling policies
Security Group
VPC
Subnets
EBS volume
Provisioned IOPS
Instance Stores
IAM Role
Listeners
Manually create … ?
Deploy in a few clicks
One template = many resourcesOne JSON file that describes an entire stack of AWS resources: ELB, ASG, EC2, EBS, RDS, ES, …
Create, update, and delete stacks from the AWS Console or CLI.
AWS CloudFormation
AWS CloudFormation
Auto ScalingGroup
Amazon RDS
NFS Server
AmazonElasticsearch
EBSvolume
AmazonELB
CloudFormation{ "AWSTemplateFormatVersion": "2010-09-09", "Description": "Atlassian Bitbucket Data Center", "Metadata": { ... }, "Parameters": { ... }, "Resources": { "ClusterNodeGroup": { "Type": "AWS::AutoScaling::AutoScalingGroup", ... }, ... }
Bitbucket Amazon Machine Image (AMI)
Based onAmazon Linux
“Bitbucket Server”
Multi-role:
Any versionof Bitbucket
“Cluster node”
“File server”
Public in all AWS Regions
https://bitbucket.org/atlassian/atlassian-aws-deployment
AWS Quick Start
AWS Quick Start
Free credits!
Best practices
Single click deployComplete CloudFormation template for Bitbucket Data Center and VPC.
Designed by Amazon solutions architects for security and scale.
Amazon is offering free credits with all Atlassian AWS Quick Starts.
https://atlassian.com/aws
C L U S T E R I N G I N AW S
S C A L I N G B I T B U C K E T @ AT L A S S I A N
S M A R T M I R R O R S
Mirror, mirror on the wall:Smart Mirrors
D I S A S T E R R E C O V E RY
Austin
“Prod”(N. Virginia)
San Fransisco
“Prod”(N. Virginia)
AustinSan Fransisco
Tokyo
Gdańsk
Sydney
San Fransisco
Tokyo
Gdańsk
“Prod”(N. Virginia)
Sydney
Austin
San Fransisco
Tokyo
Gdańsk
“Prod”(N. Virginia)
Austin
Sydney
Smart Mirror (AU)
FishEye/Crucible
JIRA
Confluence
Bamboo
Bitbucket Server
Minutes0 25 50 75 100
How much faster is a mirror?
On average over
faster to clone!
8x
Setup Wizard
AWS CloudFormationLaunch Bitbucket with application.mode=mirror.Deploying
Mirrors in AWS
That’s great!
But how does it work?
How it works
PrimaryBitbucket
Mirror
PrimaryBitbucket
Mirror
How it works
How it works
PrimaryBitbucket
Mirror
How it works
PrimaryBitbucket
Mirror
How it works
PrimaryBitbucket
Mirror
How it works
PrimaryBitbucket
Mirror
How it works
PrimaryBitbucket
Mirror
What makes Smart Mirroring
smart?
PrimaryBitbucket
Mirror
How it works
Forks optimizedSaves disk space on the primary Bitbucket instance.
Mirrors fetch only onceSaves disk space and network bandwidth on the mirror.
repository
forkforksmirrored forks
How it worksGit LFS optimizedLarge files fetched only when needed.
large files
Each large file fetched only once.
Saves disk space and bandwidth.
PrimaryBitbucket
Mirror
repository
mirrored large files
PrimaryBitbucket
Mirror
Mirrors can be for CI, too
CI agents
“Build” mirrorsTake load off your primary Bitbucket instance.
C L U S T E R I N G I N AW S
S M A R T M I R R O R S
D I S A S T E R R E C O V E RY
Always have a plan B:Disaster recovery
S C A L I N G B I T B U C K E T @ AT L A S S I A N
Disasters 101
Always have a backup ALWAYS have a backup
Cluster nodes
Database
File Server
Elasticsearch
Backing up Bitbucket
Load balancerBitbucket backup client?Only recommended for small instances.
Rsync?Still requires some downtime.
Backup storageConsistent snapshots
BackupZero Downtime
Database File Server
Zero Downtime Backup
Backup storage
Amazon RDS
All DB vendors Many FS vendors
Amazon EBS
ZFS
Atomicsnapshots
Offsite snapshots
Amazon S3
Downtime
Primary Bitbucket(N. Virginia)
Sydney
Backups @ Atlassian
Round the clockBackups taken continually, sometimes minutes apart.
Backups copied offsiteCan be restored anywhere, in minutes.
AWSsnapshots
Sub-second latency.
https://bitbucket.org/atlassianlabs/atlassian-bitbucket-diy-backup
Disasters 201
Always have a standbyALWAYS have a backup
Disaster recovery
Disaster recovery
Primary instance Standby instance
Standbydatabase
Standbyfile server
RDS read replica
All DB vendors
ZFS
Many FS vendors
my name is
Integrity Checker
Hello
Disaster recovery
https://bitbucket.org/atlassianlabs/atlassian-bitbucket-diy-backup
Yes, it does Disaster recovery too
Start replicating
AWS CloudFormationLaunch Bitbucket Data Center with DBMaster=<primary-db>.Deploying a
standby in AWS
setup-home-replication.sh replicate-home.sh
BackupsDisaster recovery standby
Clustering
Handles isolated failures.
Works in seconds.
Handles failure of an entire data center.
Failover in minutes.
Handle failures that corrupt all your data.
Restore to anypoint in time.
High availability: Pro tips
Smart Mirroring Disaster Recovery AWS Support and Quick Start
https://atlassian.com/aws
FELIX HAEHNEL DEVELOPER • ATLASSIAN
CHRIS SZMAJDA SENIOR TEAM LEAD • ATLASSIAN