45
Copyright © 2014 Big Data Partnership Ltd. All rights reserved. Copyright © 2014 Big Data Partnership Ltd. All rights reserved. Big Data Concepts Masterclass A crash course for executives and managers @BigDataExperts

Big Data Concepts Masterclass

Embed Size (px)

DESCRIPTION

Here are the slides from a presentation delivered by Big Data Partnership (@BigDataExperts). This masterclass on Big Data Concepts is an hour-long version of the one-day course run by Big Data Partnership (http://www.bigdatapartnership.com/wp-content/uploads/2013/11/BigData-Concepts.pdf). "This one-day masterclass is an executive briefing on Big Data designed for senior management and business leaders to learn about Big Data concepts and familiarise themselves with the business and technology trends and opportunities. Includes extensive guidance in applying the right economic, technological and business criteria to the evaluation of Big Data adoption in your organisation and how it can help you meet business goals, dispelling the myths around Big Data, and find out what it is and is not, how to get the biggest benefit for your organisation and guidance on the best of breed approach to initiate a Big Data programme." If you have any questions or would like to learn more about big data (including the consultancy, training and support we offer), please get in touch contact [at] bigdatapartnership dot com.

Citation preview

  • 1. Copyright 2014 Big Data Partnership Ltd. All rights reserved.Copyright 2014 Big Data Partnership Ltd. All rights reserved. Big Data Concepts Masterclass A crash course for executives and managers @BigDataExperts

2. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Who We Are? 3. Copyright 2014 Big Data Partnership Ltd. All rights reserved.Copyright 2014 Big Data Partnership Ltd. All rights reserved. Agenda Three big questions: !1. What is Big Data? 2. Why should I care? 3. Where do I start? 4. Copyright 2014 Big Data Partnership Ltd. All rights reserved.Copyright 2014 Big Data Partnership Ltd. All rights reserved. 1.What is Big Data? 5. Copyright 2014 Big Data Partnership Ltd. All rights reserved. What is Big Data? 1. New technology Volume Variety Velocity 2. New philosophy Value of data Taming Voracity Becoming data-driven Empirical approach: Data Science 3. 1 + 2 = Business Transformation 6. Copyright 2014 Big Data Partnership Ltd. All rights reserved.Copyright 2014 Big Data Partnership Ltd. All rights reserved. New technology drivers What is Big Data? 7. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Volume:The Information Revolution We are living in an Information Revolution. Accumulation of last 2 years data flow (1 ZB), dwarfs the entire prior record of human civilization. Social Media, smart sensors, server logs, finance, e-mail 8. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Volume:Why cant I just make it bigger? Legacy database, experiencing huge growth in data volumes $ / GB $$ / GB $$$ / GBLarge Application Database or Data Warehouse $$$$ / GB TB ??? Data Volume Performance Cost ScaleUP 9. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Variety:Why wont it load my data? Business are increasingly moving beyond relational data 80% of enterprise data is unstructured. The rise of social media data integrated with other enterprise data leaves us with the problem of handling complex graph data. Machine-generated data such as log data is often semi- structured. Often as datasets get much larger, it is more efficient to leave them in their original format and store them that way, than to transform everything into a normalised relational schema. 10. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Velocity:Why cant I capture everything? All single-server information systems have limits on throughput. The only question is whether you hit that limit or not. If you do, your options are limited unless you have a distributed system to capture the data as it arrives. Distributed systems which are designed in an appropriate way can scale linearly to accept increasing data throughput rates, effectively lifting the cap on capture throughput. In todays high data intensity applications, this is becoming ever more important. 11. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Cheat Sheet: Big Data Jargon Hadoop Open-source framework for storing and processing large data sets. Uses clusters of commodity hardware to tackle big data challenge in an affordable way. Designed to cope with failures automatically. Can be scaled out from one server to thousands of machines. Scale Out 12. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Cheat Sheet: Big Data Jargon NoSQL Means Not Only SQL Refers to most databases which post-date the SQL era. Some support SQL, or SQL-derived languages. May be capable of handling Big Data (a Distributed System), or may be limited to a single server. Often represent data in more flexible ways than spreadsheets, e.g.a map of many Item=>Value pairs, or, a graph of many items and the relationships between them. 13. Copyright 2014 Big Data Partnership Ltd. All rights reserved.Copyright 2014 Big Data Partnership Ltd. All rights reserved. New philosophy What is Big Data? 14. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Whats Data Science all about? Data + Science Science: theory + experiment => evidence => insight Science: the empirical method = evidence-based approach Never based on assumptions or intuition. Data Science movement, particularly in the context of Big Data, is all about making business data-driven and empirical. 15. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Whats Data Science all about? Before: analysts used intuition and domain knowledge to draw conclusions from statistics. Unfortunately, statistics can be easily manipulated, as we often see in the media. There are lies, damned lies, and statistics Mark Twain Critical evaluation of data empirically is key to avoiding bias. More modern techniques such as Bayesian statistics can help to remove subjective bias. Machine Learning methods can remove the human element almost entirely. 16. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Data Science + Big Data More data + limited compute resource More aggressiv e sampling Less accurate results Improve accuracy of results + limited compute resource More complex models Less accounta ble results All data + scalable compute resource No sampling More accurate results All data + scalable compute resource Less complex models More accounta ble results Often quoted as more data trumps smarter algorithms (Google) 17. Copyright 2014 Big Data Partnership Ltd. All rights reserved.Copyright 2014 Big Data Partnership Ltd. All rights reserved. BusinessTransformation What is Big Data? 18. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Why do we need to change? ! New technology Disruptive New philosophy Challenging to existing processes Business transformatio n New strategy, new roles Big Data Strategy Big Data Engineer Big Data Architect Data Scientist Chief Data Officer 19. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Why do we need a Big Data strategy? Any major change programme needs a strategy to steer it. 1. Everyone will be pulling in the same direction. 2. Performance can be measured against the strategy later. 3. Target outcomes will be clearly defined. 4. The business will understand the need for the programme. 20. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Why cant I just build it and they will come? 21. Copyright 2014 Big Data Partnership Ltd. All rights reserved.Copyright 2014 Big Data Partnership Ltd. All rights reserved. Do I need a Data Scientist? 22. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Understanding the Data Scientist role Data Analyst SAS, SPSS Excel, possibly R Relational Databases SQL Some training in statistics Education in IT or Business Happy with table or spreadsheet formatted data Data Scientist Statistics SAS, SPSS R Relational Databases SQL Education in Maths or Physics Happy with any data formats and data varieties Machine Learning Big Data NoSQL, Hadoop, Cassandra 23. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Understanding the Data Scientist role Because they are a scientist, their job is to explore and discover within your business data. 1. Access to all data => break down information siloes 2. Tools to explore => big data computing infrastructure 3. Freedom to explore and discover => changes to policy and team structure 24. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Use Case #101: Data Lake Consolidation of data siloes for combined analysis, online archival, and free-range Data Science exploration. Often begins as a POC. Value could take a long time to emerge, and could be difficult to plan or predict. (Unknown unknowns) ROI analysis: Value uplift from new insight should be > than cost of big data implementation + cost of data source integration + cost of staffing Data Science team 25. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Where is Big Data heading? Big Data is here to stay. Data volumes are not going to decrease! We see data processing becoming increasingly commoditised. Vendor proliferation + it is simply a matter of mechanics. We see Machine Learning becoming far more widespread. More complex relationships harder to identify for humans We see Data Science permeating a much wider range of businesses and taking over as the next boom industry. The 24-hour global economy makes being data-driven increasingly more valuable. Investment in Big Data technology is a solid foundation, but investment in Machine Learning and Data Science expertise will really put you at the front of the pack. 26. Copyright 2014 Big Data Partnership Ltd. All rights reserved.Copyright 2014 Big Data Partnership Ltd. All rights reserved. 2.Why should I care? 27. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Why should I care? 1. Quality of insight 2. Time to insight 3. Competitive advantage 28. Copyright 2014 Big Data Partnership Ltd. All rights reserved.Copyright 2014 Big Data Partnership Ltd. All rights reserved. Quality of Insight 29. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Recap: Data Science + Big Data More data + limited compute resource More aggressiv e sampling Less accurate results Improve accuracy of results + limited compute resource More complex models Less accounta ble results All data + scalable compute resource No sampling More accurate results All data + scalable compute resource Less complex models More accounta ble results 30. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Data Science + Big Data More accurate results Better decisions More efficienc y or revenue More accounta ble results Better traceabili ty Less risk + regulator y complian ce Clear, quantifiable business outcomes Use Case #102 31. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Use Case #102: Migrate & scale existing analytic models Identify existing analytic models which suffer from sampling of input data, or overly complex models. Migrate to big data platform, scale out to whole dataset and/or simplify model. Can go directly to POV with real measurable business value. Rapid turnaround for POV if models are not too difficult to migrate to your chosen platform. ROI analysis: Value uplift from use case should be = value from improved model accuracy cost of migration work 32. Copyright 2014 Big Data Partnership Ltd. All rights reserved.Copyright 2014 Big Data Partnership Ltd. All rights reserved. Case Studies 33. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Case Study: Retail Predicting fashion trends for retailers Client: Global publisher providing fashion insight & trend analysis for customers. Wanted superior market intelligence to inform crucial retail buyer decisions. Challenges: Consume vast amounts of unstructured data from the web. Make accurate, actionable predictions from the data. Use cases: Large-scale parallel data processing of unstructured data from uncontrolled sources. Predictive analytics & machine learning. Used big data ecosystem technologies (Hadoop, Hive, Pig) to collect, process, transform the data and serve the front-end. Outcome: Platform successfully launched Sept 2013 Opened up new business stream as this was a new product 34. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Case Study: Music Industry Digital music service play analytics, recommendations, and royalties Client: leading online music streaming service Music listening habits of millions of users, measured across millions of tracks. Challenges: Connecting datasets from different application systems, too large for a traditional database. Generating actionable reports and recommendations. Use cases: Reporting, and royalty charge computation. Generating recommendations for users to help them find new music. Outcome: Richer information about users in a shorter time frame Lower overheads and for less money than previous system = significant operational efficiency improvements 35. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Case Study: M2M Machine-to-machine data across various industries M2M data = telemetry collected from industrial machines (e.g. production line robots, power plants, aircraft engines, ). ! Can be analysed to increase efficiency of those machines. Individually, or optimise many of them as a collective system. GE conducted a detailed study of the impact of a 1% improvement in productivity across different industries, as a result of Machine Data Analytics with big data technology. 36. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Case Study: M2M Machine-to-machine data across various industries Use cases: Asset management & Predictive maintenance Aggregate view across geography, machines, components, parts Deliver optimal number of parts to right location at right time Minimise parts inventory held, and maintenance costs Predictive analytics to replace parts before failure Supply chain optimisation RFID & smart sensors Deliver goods at optimal time, e.g. fresh produce Monitor state of goods in transit, adjust logistics in real-time Transport fleet optimisation Interconnected vehicles know their own + other vehicles location Optimise routing to find most efficient system-level solution 37. Copyright 2014 Big Data Partnership Ltd. All rights reserved.Copyright 2014 Big Data Partnership Ltd. All rights reserved. 3.Where do I start? 38. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Life cycle of a Big Data programme Education (1) Analys is (2) Discovery (3) Prototyp e (4) Implem entatio n (5) Evolution (6) Company Strategy Big Data Strategy 39. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Cheat Sheet: POC vs POV Proof of Concept Select a use case to illustrate with Sample, or mock up data on a smaller scale Build a scaled-down version of the full use case Prove the technology can deliver as intended for use case, and can scale to the full dataset Preferably through repeatable, automated unit tests Proof of Value Select a use case to illustrate with Sample, or partition real data to a smaller scale Build a scaled-down version of the full use case Prove the technology can deliver business value from insight generated for the use case Document implementation cost vs business value uplift rigorously 40. Copyright 2014 Big Data Partnership Ltd. All rights reserved.Copyright 2014 Big Data Partnership Ltd. All rights reserved. BusinessTransformation 41. Copyright 2014 Big Data Partnership Ltd. All rights reserved. BusinessTransformation 1. Make sure there is an owner of data across the organisation, Chief Data Officer is an ideal role for this if you can do it. 2. Organise your Data scientists so they are best placed to support the business goals, one central team, one team per analysis type, or one person dedicated to each business unit, for instance. 3. Make sure IT is able to make the data available to those individuals in the right way (sandpit, right tools, access etc.). 42. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Integrating with the enterprise Data Warehouse Systems like Hadoop are not a full replacement for a Data Warehouse. There are overlapping qualities. Hadoop is not transactional, nor does it support fine-grained access to data. Hadoop is fundamentally a batch oriented system, so mixed workloads are not easily supported. ! Best practice is to use Hadoop to complement an existing Data Warehouse. Hadoop can offload cold or rarely accessed data to act as an online archive. Hadoop can offload expensive ETL processing. Hadoop can efficiently generate aggregations/summaries, and export these to the Data Warehouse for enterprise use. Keep only the highest-value data in Data Warehouse. 43. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Summary Dispelled myths: Big Data is only about technology Big Data is only relevant to technologists Hadoop is magic Hadoop is an unknown black box A Big Data approach can help with problems which may combine Volume, Variety, Velocity. A Big Data approach is in demand because it is helping increase business value, and time to insight. Data Science is key to getting full value from a Big Data platform. 44. Copyright 2014 Big Data Partnership Ltd. All rights reserved. FurtherTraining Apache Hadoop 2.0 Developing Java Applications (4 days) Apache Hadoop 2.0 Development for Data Analysts (4 days) Apache Hadoop 2.0 Operations Management (3 days) MapR Hadoop Fundamentals of Administration (3 days) Apache Cassandra DevOps Fundamentals (3 days) Apache Hadoop Masterclass (1 day) Big Data Concepts Masterclass (1 day) Machine Learning at scale with Apache Mahout (1 day) 45. Copyright 2014 Big Data Partnership Ltd. All rights reserved. Contact Details Tim Seears CTO Big Data Partnership [email protected] @BigDataExperts