Introduction to Big Data Infrastructure
Vancouver SMAC (Social, Mobile, Analytics & Cloud) Meetup
Oct 22, 2014
www.silota.comGanesh Swami
Hi
• Programming professionally for 10+ years
• x86 assembly, STL, boost, python-boost, python
• Built emacs-‐wiki-‐blog: first blogging engine for Emacs!
What is Big Data?
What is Big Data?3V: Volume, Velocity, Variety
The Big Data ZooAmazon Kinesis Riak Cassandra Hive
Apache Spark Apache Hadoop Pig Apache Storm
Kibana Tableu Apache Kafka
Elasticsearch Amazon EMR Redshift
Dynamo DB Riak HBase
The Zoo OrganizedIngest Store Process/
Enrich Visualize
Kafka S3 Hive/Pig/EMR Tableu
Kinesis DynamoDB Spark Kibana
Flume HDFS Storm
Scribe Redshift
Data Answers
Data Ingestion
IngestLayer
Mobile Apps
Websites
Internet of Things
ElasticsearchOpen-source search and analytics solution
Kibana
Amazon RedshiftPetabyte-scale data warehouse solution
What is Silota• Building blocks of analytics
• A simple REST API
• to ingest
• to analyze
• to export
• based on Kafka, Storm and Elasticsearch
Silota -vs- Mixpanel• Mixpanel for product people
• great UI
• cookie-cutter analysis for verticals (gaming, e-commerce)
• Silota is an API
• more low-level, full-power
• first class API: responses, pagination, errors, etc.