Upload
silota-inc
View
541
Download
1
Embed Size (px)
DESCRIPTION
Slides from a talk which introduces infrastructure that powers your data applications. The data infrastructure ties together the distributed components, systems and processes to drive value from data. The topics covered were data collection, immutable logs, scaling ETL processes and real-time analytics. Example use cases of Kafka, Storm, Elasticsearch, and Amazon Redshift were presented.
Citation preview
Introduction to Big Data Infrastructure
Vancouver SMAC (Social, Mobile, Analytics & Cloud) Meetup
Oct 22, 2014
www.silota.comGanesh Swami
Hi
• Programming professionally for 10+ years
• x86 assembly, STL, boost, python-boost, python
• Built emacs-‐wiki-‐blog: first blogging engine for Emacs!
What is Big Data?
What is Big Data?3V: Volume, Velocity, Variety
The Big Data ZooAmazon Kinesis Riak Cassandra Hive
Apache Spark Apache Hadoop Pig Apache Storm
Kibana Tableu Apache Kafka
Elasticsearch Amazon EMR Redshift
Dynamo DB Riak HBase
The Zoo OrganizedIngest Store Process/
Enrich Visualize
Kafka S3 Hive/Pig/EMR Tableu
Kinesis DynamoDB Spark Kibana
Flume HDFS Storm
Scribe Redshift
Data Answers
Data Ingestion
IngestLayer
Mobile Apps
Websites
Internet of Things
ElasticsearchOpen-source search and analytics solution
Kibana
Amazon RedshiftPetabyte-scale data warehouse solution
What is Silota• Building blocks of analytics
• A simple REST API
• to ingest
• to analyze
• to export
• based on Kafka, Storm and Elasticsearch
Silota -vs- Mixpanel• Mixpanel for product people
• great UI
• cookie-cutter analysis for verticals (gaming, e-commerce)
• Silota is an API
• more low-level, full-power
• first class API: responses, pagination, errors, etc.