Upload
mongodb
View
4.807
Download
0
Embed Size (px)
DESCRIPTION
MongoDB is one of the fastest growing NoSQL workloads on AWS due to its simplicity and scalability, and recent product additions by the AWS team have only improved those traits. In this session, we’ll talk about various AWS offerings and how they fit together with MongoDB -- including CloudFormation, Elastic MapReduce, Route53, Elastic Beanstalk, Elastic Load Balancing, and more -- and how they can be leveraged to enhance your MongoDB experience.
Citation preview
MongoDB and AWSIntegrating with AWS Services
Partner Technical Solutions, MongoDB Inc.
Sandeep Parikh
#mongodb
Recap: Deployment and Availability
• MongoDB basics
• Deployment configurations
• Instance types
• Best practices
• Slides and recording:– http://www.mongodb.com/presentations/mongodb
-and-amazon-web-services-deploying-high-availability
Recap: Storage Configurations
• Storage options
• Simple recommendations
• Backup and restore
• Advanced configurations
• Slides and recording:– http://www.mongodb.com/presentations/mongodb
-and-amazon-web-services-storage-options-mongodb-deployments
Agenda
• Available Services
• Integrations
• Infrastucture
• Future Directions
• Questions
Available Services
AWS Services
Compute Storage Persistent IPs DNS
Hadoop Data Warehouse
Stream processing
App deploymen
t
Orchestration
Provisioning
App services Caching
AWS Services
Compute Storage Persistent IPs DNS
Hadoop Data Warehouse
Stream processing
App deploymen
t
Orchestration
Provisioning Security Caching
Integrations
CloudFormation
• Simplify provisioning and deployment
• JSON-based templates
• Manage like source code
• Specify all manner of AWS components
• Boostrap for other tools like Chef or Puppet
"Parameters" : {
"KeyPairName" : {
"Description" : "EC2 KeyPair to enable SSH access",
"Type" : "String"
},
"SecurityGroupName" : {
"Description" : "EC2 Security Group",
"Type" : "String”
},
"InstanceType" : {
"Type" : "String",
"Default" : ”m3.large",
"AllowedValues" : [”m3.large”,”m3.xlarge”,”m3.2xlarge”],
"Description" : "EC2 instance type"
}
},
CloudFormation Sample
"Properties" : {
"InstanceType" : { "Ref" : "InstanceType" },
"ImageId" : { … },
"SecurityGroups" : [{ "Ref”: “SecurityGroupName" }],
"KeyName" : { "Ref" : "KeyPairName" },
"EbsOptimized" : "true",
"BlockDeviceMappings" : [{
"DeviceName" : "/dev/xvdf",
"Ebs" : { "VolumeSize" : "200”, "Iops" : "1000",
"VolumeType" : "io1”, "DeleteOnTermination" : “false”
}}]
CloudFormation Sample
CloudFormation Templates
• https://github.com/crcsmnky/aws-cfn-mongodb
• Templates to launch single-node MongoDB deployment
• Each one implements our best practices– EBS-optimized, PIOPS, ulimit, readahead
• Used to generate AWS Marketplace instances
CloudFormation Templates
Clone the repo
Upload the CF template
Instance provisioning
starts
Instance clones repo
Instance runs setup
script
Instance provisioned
and deployed
CloudFormation Tools
• https://github.com/cloudtools/troposphere
• Python package to generate CF templates
• Next versions of our templates will leverage this
• Coming soon: Replica Sets
• Coming later: Sharded Cluster
Elastic Map Reduce
• Quickly deploy and run Hadoop in AWS
• Tuned distributions to run on top of EC2
• Provision deployments with any number of nodes
• Supports Spot and Reserved pricing for savings
EMR and MongoDB
• https://github.com/mongodb/mongo-hadoop
• MongoDB-Hadoop connector– Bi-directional access to/from MongoDB
• Supports MapReduce, Hive, Pig, Streaming
• Read/write from – MongoDB deployments or – BSON backup files
EMR with MongoDB
MongoDB
BSON
S3
EMR
EMR
EMR
EMR
EMR
EMR
EMR
EMR
EMR
EMR
EMR Workflow
Bootstrap script• MongoDB-Hadoop• MongoDB Java
driver
Copy resources• Bootstrap script• MapReduce job
Launch EMR• Instance type• Instance count• Arguments
MapReduce Output• MongoDB• BSON in S3
EMR Logs• Written to S3
$ elastic-mapreduce --create --jobflow ENRON000
--instance-type m1.xlarge --num-instances 5
--bootstrap-action s3://$S3_BUCKET/bootstrap.sh
--log-uri s3://$S3_BUCKET/enron_logs
--jar s3://$S3_BUCKET/enron-example.jar
--arg -D --arg mongo.job.input.format =
com.mongodb.hadoop.BSONFileInputFormat
--arg -D --arg mapred.input.dir =
s3n://mongo-test-data/messages.bson
--arg -D --arg mapred.output.dir =
s3n://$S3_BUCKET/BSON_OUT
--arg -D --arg mongo.job.output.format =
com.mongodb.hadoop.BSONFileOutputFormat
EMR Launch
Elastic Beanstalk
• Deploy and manage applications
• Handles provisioning, scaling, load balancing
• Built on EC2, S3, SNS, Auto Scaling
Elastic Beanstalk Architecture
App Serve
r
App Serve
r
App Serve
rSecurity Group
Elastic Load Balancer
Auto Scaling Group
Elastic Beanstalk with MongoDB
App Server
App Server
App Server
Security Group
Elastic Load Balancer
Auto Scaling Group
mongos
mongos
mongos
MongoDB
Elastic Beanstalk with MongoDB
• Customize and configure software that your app needs (e.g. mongos)
• Install packages
• Create files
• Execute commands (before or after app is setup)
• Control system services
• http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-containers-ec2.html
Infrastructure
Elastic IPs
• EC2 instances use dynamic IP addresses
• EIPs are static addresses that can be assigned to individual EC2 instances
• Unfortunately you have a limited number
Route53
• Highly available and scalable DNS service in AWS
• Hostnames can be assigned to EC2 instances, ELB instances, or S3 buckets
• DNS load balancing with weighted-round-robin
• Supports hostnames for non-AWS infrastructure
Route53 and MongoDB
• Short answer: use hostnames for all components
• With replica sets, hostnames can ease machine replacement
• With sharded clusters, hostnames can simplify config server maintenance
VPC
• Virtual Private Cloud lets you provision a logically isolated network inside AWS
• You manage all aspects of networking including– IP address ranges– Subnets– Routing tables and gateways
• Can be used as an extension to an offsite data center with Hardware VPN
VPC Public and Private
http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Introduction.html
• Private subnets hidden to outside world
• Internet Gateway and EIPs can be used to access
• Web tier in public subnet
• Data tier in private subnet
ElastiCache
• Distributed in-memory cache
• Backed by Memecached or Redis
• Can be a drop-in replacement for existing cache deployments
• Supports auto-discovery and read-replicas
Future Directions
RedShift
• Fully-managed petabyte-scale data warehouse service
• MongoDB not natively supported as a data source
• … So how do you get your data in?
Data Pipeline
• Process and move data between different AWS compute and storage services
• Date Pipeline handles resources, failures, and dependencies
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/what-is-datapipeline.html
Data Pipeline with MongoDB
AWS Data Pipeline
MongoDB
S3
EMRorRedshift
OpsWorks
• Complete DevOps stack
• Model and manage apps, load balancers, databases
• Uses Chef recipes
• Load or time-based scaling
• Deploying MongoDB with OpsWorks:– http://blogs.aws.amazon.com/application-manage
ment/post/Tx1RB65XDMNVLUA/Deploying-MongoDB-with-OpsWorks
CloudWatch
• Monitoring for AWS resources
• Supports custom metrics
http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/WhatIsCloudWatch.html
aws cloudwatch put-metric-data
--metric-name ResidentMemory
--namespace MongoDB
--timestamp 2014-02-14T20:30:00Z
--value 32
--unit Gigabytes
CloudWatch Custom Metrics
Questions?
MongoDB WorldNew York City, June 23-25
#MongoDBWorld
See what’s next in MongoDB including • MongoDB 2.6• Sharding• Replication• Aggregation
http://world.mongodb.comSave 25% with discount code 25SandeepParikh