View
9.021
Download
2
Category
Preview:
Citation preview
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pravin Pillai, Sr. Product ManagerJon Handler, Principal Solutions Architect
October, 2015
Introduction to Amazon Elasticsearch Service
Amazon Elasticsearch Service
What to Expect from the Session
• Context: Managing your growing data• Introducing Amazon Elasticsearch Service (Amazon ES)• Configuring, securing, connecting, monitoring, and
scaling your Amazon ES cluster
Your data is constantly growingProduct usage
Your data is constantly growingSystem logs
Your data is constantly growingCustomer conversations
That’s a lot of data!
“Big data is not about the data”- Gary King, Harvard University, making the point that while data is plentiful and easy to collect, the real value is in the analytics.
So what can you do with all this data?
• Share information• Extract insight• Recognize patterns• Track performance
Ultimately, make better business, technical, and operational decisions
Scenario 1: Full-text search
Knowledge Sharing Systems
•Your team is constantly generating content•You are tasked with making this knowledge base searchable and accessible•You need key search features including text matching, faceting, filtering, fuzzy search, auto complete, and highlighting
Scenario 2: Streaming data analytics
Intrusion detection
•You have to protect your system from attacks•You need easy to use, yet powerful analytics and data visualization tools to detect issues in near real-time•Easy and flexible data ingestion is important to capture information from a variety of key data sources
Scenario 3: Batch data analytics
Usage Monitoring
•You are a mobile app developer•You have to monitor/manage users across multiple app versions•You want to analyze and report on usage and migration between app versions
What options do you have?
How Elasticsearch can help
A powerful, real-time, distributed, open-source search and analytics engine:•Built on top of Apache Lucene•Schema free•Developer friendly RESTful API
How Elasticsearch can help
Combined with Logstash and Kibana, the ELK stack provides a tool for real-time analytics and data visualization
Operating Elasticsearch is time-consuming
“Elasticsearch allows us to easily and quickly build bleeding edge big data and analytics applications using the ELK stack. By offering direct access to the Elasticsearch API while offloading administrative tasks, Amazon Elasticsearch Service gives us the manageability, flexibility and control we need ”
Sean Curtis, SVP Engineering at Major League Baseball Advanced Engineering
Introducing Amazon Elasticsearch Service
Amazon Elasticsearch Service is a managed service from AWS that makes it easy to set up, operate, and scale Elasticsearch clusters in the cloud.
Key benefits
Easy cluster creation and configuration management Support for ELK
Security with AWS IAMMonitoring with Amazon CloudWatch
Auditing with AWS CloudTrail
Integration options with other AWS services (CloudWatch Logs, Amazon DynamoDB, Amazon S3,
Amazon Kinesis)
Create the cluster
AWS CLI commands
add-tagscreate-elasticsearch-domaindelete-elasticsearch-domaindescribe-elasticsearch-domaindescribe-elasticsearch-domain-config
describe-elasticsearch-domainslist-domain-nameslist-tagsremove-tagsupdate-elasticsearch-domain-config
aws es create-elasticsearch-domain --domain-name my-domain --elasticsearch-cluster-config InstanceType=m3.xlarge.elasticsearch,InstanceCount=3 --ebs-options EBSEnabled=true,VolumeType=gp2,VolumeSize=512
Amazon ES domain overview
Amazon Route 53
Elastic LoadBalancingIAM
CloudWatch
Elasticsearch API
CloudTrail
Amazon Route 53
Elastic LoadBalancingIAM
CloudWatch
Elasticsearch API
CloudTrail
Amazon ES domain overview
Nodes under management
IAM
CloudWatchCloudTrail
Elasticsearch API
Amazon Route 53
Elastic LoadBalancing
Amazon ES domain overview
Single endpoint, REST API
CloudWatchCloudTrail
Elasticsearch API
Amazon Route 53
Elastic LoadBalancingIAM
Amazon ES domain overview
IAM integration
Elasticsearch API
Amazon Route 53
Elastic LoadBalancingIAM
CloudWatchCloudTrail
Amazon ES domain overviewCloudWatch/CloudTrail for monitoring
Scale for your workload
Online scaling operations
XUpdate
Data partitioning for search
Shard 1 Shard 2
{ {Id Id Id . . .
Documents {Index
• Document: The unit of search• ID: Unique identifier, one per
document• Field: Documents comprise a
collection of fields• Shard: An instance of Lucene with
a portion of an index• Index: A collection of data
Deployment of indices to a cluster
• Index 1• Shard 1• Shard 2• Shard 3
• Index 2• Shard 1• Shard 2• Shard 3
Amazon ES cluster
12
3
12
3
12
3
12
3
Primary Replica
1
3
3
1
Instance 1
2
1
1
2
Instance 2
3
2
2
3
Instance 3
Performance: single shard, single nodeInstance type (EBS Volume)
Average Write (EBS)1000 doc _bulks
Average Read (EBS) vCPU RAM(GB)
T2.micro (35GB) - (1.3) - (0.47) 1 1
T2.small (35GB) - (2.6) - (0.77) 1 2
T2.medium (35GB) - (4.2) - (1.3) 2 4
M3.medium (100GB) 2.95 (2.86) 1.31 (1.39) 1 3.75
M3.large (100GB) 6.35 (6.29) 2.81 (2.84 2 7.5
M3.xlarge (100GB) 11.6 (11.6) 4.62 (5.57) 4 15
M3.2xlarge (100GB) 18.45 (18) 11.32 (12.05) 8 30
R3.large (100GB) 5.72 (5.94) 2.86 (2.88) 2 15.25
R3.xlarge (100GB) 10.8 (10.5) 5.76 (5.79) 4 30.5
R3.2xlarge (100GB) 16.8 (16.5) 11.31 (11.38) 8 61
R3.4xlarge (100GB) 19.1 (19.2) 24.05 (24.66) 16 122
R3.8xlarge (100GB) 22.2 (21.8) 44 (47.29) 32 244
I2.xlarge (100GB) 10.8 (10.8) 5.09 (5.88) 4 30.5
I2.2xlarge (100GB) 17.8 (18.1) 10.05 (10.93) 8 61
Instance type recommendations
Instance WorkloadT2 Entry point. Dev and test. OK for dedicated masters.
M3 Equal read and write volumes. Up to 5 TB of storage with EBS.
R3 Read-heavy or workloads with high query demands (e.g., aggregations).
I2 Up to 16 TB of SSD instance storage.
Secure access to your domain
Secure access to your domain{ "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam:123456789012:user/susan" }, "Action": [ "es:ESHttpGet", "es:ESHttpPut", "es:ESHttpPost", "es:CreateElasticsearchDomain", "es:ListDomainNames" ], "Resource":
"arn:aws:es:us-east-1:###:domain/logs-domain/<index>/*"} ] }
Secure access to your domain
Control access by user with signed requests
{ "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam:123456789012:user/susan" }, "Action": [ "es:ESHttpGet", "es:ESHttpPut", "es:ESHttpPost", "es:CreateElasticsearchDomain", "es:ListDomainNames" ], "Resource":
"arn:aws:es:us-east-1:###:domain/logs-domain/<index>/*"} ] }
Secure access to your domain
Allow/Deny HTTP methods and Config operations per policy
{ "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam:123456789012:user/susan" }, "Action": [ "es:ESHttpGet", "es:ESHttpPut", "es:ESHttpPost", "es:CreateElasticsearchDomain", "es:ListDomainNames" ], "Resource":
"arn:aws:es:us-east-1:###:domain/logs-domain/<index>/*"} ] }
Secure access to your domain
Fine-grained control to the index level
{ "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam:123456789012:user/susan" }, "Action": [ "es:ESHttpGet", "es:ESHttpPut", "es:ESHttpPost", "es:CreateElasticsearchDomain", "es:ListDomainNames" ], "Resource":
"arn:aws:es:us-east-1:###:domain/logs-domain/<index>/*"} ] }
Secure access to your domain
And/or use IP-based access control
{ "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "AWS": "*" }, "Action": [ "es:ESHttpGet", "es:ESHttpPut", "es:ESHttpPost", "es:CreateElasticsearchDomain", "es:ListDomainNames" ], "Resource": "arn:aws:es:us-east-1:###:domain/logs-domain/<index>/*", "Condition": "IpAddress": { "aws:SourceIp": [ "xx.xx.xx.xx/yy" ] } } ] }
Load data
Direct access to the Elasticsearch API
$ curl -XPUT https://<endpoint>/blog -d '{ "settings" : { "number_of_shards" : 3, "number_of_replicas" : 1 } }'$ curl -XPOST http://<endpoint>/blog/post/1 -d '{
"author":"jon handler","title":"Amazon ES Launch" }'
$ curl -XPOST https://<endpoint>/blog/post/_bulk -d '{ "index" : { "_index" : "blog", "_type" : "post", "_id" : "2"}}{"title":"Amazon ES for search", "author": "pravin pillai"},{ "index" : { "_index":"blog", "_type":"post", "_id":"3" } }{ "title":"Analytics too", "author": "vivek sriram"}'
$ curl -XGET http://<endpoint>/_search?q=ES{"took":16,"timed_out":false,"_shards":{"total":3,"successful":3,"failed":0},"hits":
{"total":2,"max_score":0.13424811,"hits":[{"_index":"blog","_type":"post","_id":"1","_score":0.13424811,"_source":{"author":"jon handler", "title":"Amazon ES Launch" }},{"_index":"blog","_type":"post","_id":"2","_score":0.11506981,"_source":{"title":"Amazon ES for search", "author": "pravin pillai"},}]}}
Loading data using Logstash
Application nodes/Logstash forwarders
Logstash indexerAmazon
Elasticsearch Service
Logstash plugin for Amazon ES
https://github.com/awslabs/logstash-output-amazon_esoutput { amazones { *hosts => ["foo.us-east-1.es.amazonaws.com"] *region => "us-east-1" access_key => 'ACCESS_KEY' (optional) secret_key => 'SECRET_KEY' (optional) codec => "plain" workers => 1 index => "logstash-%{+YYYY.MM.dd}" } }
Loading data using Lambda
Amazon Lambda
Amazon Elasticsearch
Service
Amazon S3
DynamoDB
Amazon Kinesis
Lambda code snippet (node.js) for upload
var AWS = require('aws-sdk');var creds = new AWS.EnvironmentCredentials('AWS');
function postDocumentToES(doc, context) { var req = new AWS.HttpRequest(endpoint); var signer = new AWS.Signers.V4(req, 'es'); signer.addAuthorization(creds, new Date()); var send = new AWS.NodeHttpClient(); send.handleRequest(req, null, function(httpResp)...
https://github.com/awslabs/amazon-elasticsearch-lambda-samples
Export logs to Amazon ES
CloudWatch Amazon Elasticsearch
Service
Export CloudWatch Logs
Demo
Monitor and auditCloudWatch
CloudTrail
Monitoring
What should I monitor?
• FreeStorageSpace – monitor and alarm before the cluster runs out of space
• CPUUtilization – alarm at 80% CPU to signal the need to scale up
• ClusterStatus.yellow – check whether replication requires additional nodes
• JVMMemoryPressure – check instance type and count for sufficient resources
• MasterCPUUtilization – monitoring for master nodes is separated from data nodes
Snapshot and restore for data durability
Daily automated snapshots
• No additional charges• Snapshots retained for 14 days
Taking manual snapshots
Amazon S3 role
Snapshot repository
Trust relationship:{ "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "Service": "es.amazonaws.com" }, "Action": "sts:AssumeRole" } ]}
Taking manual snapshots
Amazon S3
Snapshot repository
{ "Version":"2012-10-17", "Statement":[ { "Action":[ "s3:ListBucket" ], "Effect":"Allow", "Resource": [ "arn:aws:s3:::bucket" ] }, { "Action":[ "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "iam:PassRole" ], "Effect":"Allow", "Resource":[ "arn:aws:s3:::bucket/*" ] } ] }
role
Taking manual snapshots
Register the bucketcurl -XPUT http://<endpoint>/_snapshot/<repo-name> -d '{"type":"s3", "settings": { "bucket":"<bucket>", "region":"<region>", "role-arn":"<arn>"}}'
Take a snapshotcurl -XPUT http://<endpoint>/_snapshot/<repo-name>/snapshot1
Snapshot time is proportional to size.
Built-in Kibana
Application overview
Logstash indexerAmazon
Elasticsearch Service
Application nodes/Logstash forwarders
Kibana UI
Securing Kibana
IAMProxy(Optional)
IAM policy for Kibana
{ "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "AWS": "*" }, "Action": [ "es:ESHttpGet", "es:ESHttpPut", "es:ESHttpPost", "es:ESHttpHead"], "Resource": [ "arn:aws:es:us-east-1:####:domain/<domain>/*" ], "Condition": { "IpAddress": { "aws:SourceIp": [ xx.xx.xx.xx ] } } } ]}
Pay for what you use
Pay for compute and storage you use
With Amazon Elasticsearch Service, you pay only for the compute and storage resources you use. AWS Free Tier for qualifying customers.
Amazon Elasticsearch Service is publicly available now!
• us-east-1• us-west-1• us-west-2
• eu-west-1• eu-central-1• ap-southeast-1
• ap-southeast-2• ap-northeast-1• sa-east-1
You can use Amazon Elasticsearch Service in these regions:
Wrap up
1. Elasticsearch is a tool for full-text search, analysis, and visualization of time series data that helps you get the most out of your growing data set
2. Amazon Elasticsearch Service makes it easy to deploy and manage an Elasticsearch cluster in the AWS cloud
3. Amazon Elasticsearch Service is a drop-in replacement for your existing Elasticsearch cluster
Thank you!
aws.amazon.com/elasticsearch-service
Recommended