Introduction to Big Data Infrastructure

  • View

  • Download

Embed Size (px)


Slides from a talk which introduces infrastructure that powers your data applications. The data infrastructure ties together the distributed components, systems and processes to drive value from data. The topics covered were data collection, immutable logs, scaling ETL processes and real-time analytics. Example use cases of Kafka, Storm, Elasticsearch, and Amazon Redshift were presented.

Text of Introduction to Big Data Infrastructure

  • 1. Introduction toBig Data InfrastructureVancouverSMAC (Social, Mobile, Analytics & Cloud) MeetupOct 22, 2014Ganesh

2. Hi Programming professionally for 10+ years x86 assembly, STL, boost, python-boost, python Built emacs-wiki-blog: first blogging engine forEmacs! 3. What is Big Data? 4. What is Big Data?3V: Volume, Velocity, Variety 5. The Big Data ZooAmazon Kinesis Riak Cassandra HiveApache Spark Apache Hadoop Pig Apache StormKibana Tableu Apache KafkaElasticsearch Amazon EMR RedshiftDynamo DB Riak HBase 6. The Zoo OrganizedIngest StoreProcess/EnrichVisualizeKafka S3 Hive/Pig/EMR TableuData AnswersKinesis DynamoDB Spark KibanaFlume HDFS StormScribe Redshift 7. Data IngestionIngestLayerMobile AppsWebsitesInternet of Things 8. ElasticsearchOpen-source search and analytics solution 9. Kibana 10. Amazon RedshiftPetabyte-scale data warehouse solution 11. What is Silota Building blocks of analytics A simple REST API to ingest to analyze to export based on Kafka, Storm and Elasticsearch 12. Silota -vs- Mixpanel Mixpanel for product people great UI cookie-cutter analysis for verticals (gaming, e-commerce) Silota is an API more low-level, full-power first class API: responses, pagination, errors, etc. 13. Keep in Touch!Ganesh