Upload
snaplogic
View
658
Download
2
Embed Size (px)
Citation preview
1
Big Data Management: What’s New, What’s Different and What You Need to Know
2
Today’s Featured Presenter
Matt AslettResearch Director, Data Platforms and Analytics451 Research
As Research Director, Matt has overall responsibility for the data platforms and analytics research coverage, which includes operational and analytic databases, Hadoop, grid/cache, stream processing, search-based data platforms, data integration, data quality, data management, analytics, and advanced analytics. Matt's own primary area of focus includes data management, reporting and analytics, and exploring how the various data platform and analytics technology sectors are converging in the form of next-generation data platform
33
Agenda
• Big Data Management– Matt Aslett, 451 Research
• SnapLogic Overview • SnapLogic Demonstration
– Ravi Dharnikota, Head of SnapLogic Enterprise Architecture
• Q&A
Copyright (C) 2016 451 Research LLC
Big Data Management
Matt Aslett, Research Director
Copyright (C) 2016 451 Research LLC
451 Research is a leading IT research & advisory company
5
Founded in 2000250+ employees, including over 100 analysts
1,000+ clients: Technology & Service providers, corporate advisory, finance, professional services, and IT decision makers50,000+ IT professionals, business users and consumers in our research communityOver 52 million data points published each quarter and 4,500+ reports published each year
2,000+ technology & service providers under coverage
451 Research and its sister company, Uptime Institute, are the two divisions of The 451 Group
Headquartered in New York City, with offices in London, Boston, San Francisco, Washington DC, Mexico, Costa Rica, Brazil, Spain, UAE, Russia, Taiwan, Singapore and Malaysia
Research & Data
Advisory
Events
Go 2 Market
Copyright (C) 2016 451 Research LLC
Big data and beyond• V is for various things…
but does not define big data
3
Copyright (C) 2016 451 Research LLC
Big data and beyond• V is for various things…
but does not define big data
• To understand the trends driving ‘big data’ 451 Research focused beyond the nature of the data on what enterprises wanted to do with it
4
Copyright (C) 2016 451 Research LLC
Big data and beyond
8
• V is for various things…but does not define big data
• To understand the trends driving ‘big data’ 451 Research focused beyond the nature of the data on what enterprises wanted to do with it
• Totality – storing and processing all data (or as much as is economically viable) • Exploration – schema-free approaches to analyzing data to identify new patterns• Frequency – more frequent analysis of data to enable real-time decision making
Copyright (C) 2016 451 Research LLC
‘Big data’ is primarily driven by economics, not data
6
• ‘Big Data’ is the realization of competitive advantage based on the fact that it is now more economically feasible to store and process data that was previously ignored due to the cost and functional limitations of traditional data management technologies to handle its volume, velocity and variety
Copyright (C) 2016 451 Research LLC
‘Big data’ is primarily driven by economics, not data
6
“Big data is what happened when the cost of keeping information became less than the cost of throwing it away.”
George Dyson
• ‘Big Data’ is the realization of competitive advantage based on the fact that it is now more economically feasible to store and process data that was previously ignored due to the cost and functional limitations of traditional data management technologies to handle its volume, velocity and variety
Copyright (C) 2016 451 Research LLC
‘Big data’ is primarily driven by economics, not data
7
“Big data is what happened when the cost of keeping information became less than the cost of throwing it away.”
George Dyson
• ‘Big Data’ is the realization of competitive advantage based on the fact that it is now more economically feasible to store and process data that was previously ignored due to the cost and functional limitations of traditional data management technologies to handle its volume, velocity and variety
• Moved from storing 1% of data for 60 days in EDW @ $100,000/TB• To 100% of data for a year in Hadoop @ $900/TB
Copyright (C) 2016 451 Research LLCSource: 451 Research, Total Data Analytics 2016
The evolution of enterprise analytics
12
REPORTING- What happened
ANALYSIS- Why did it happen?
PRESCRIPTIVE- Influence what happens
STATISTICALMODELING
MACHINE LEARNING
DESCRIPTIVE- What is happening?
PREDICTIVE- What will happen?
Complexity
Automated
User-d
riven
IT-driv
en
VISUALIZATION
Copyright (C) 2016 451 Research LLC
Data sources: Multi-structuredRDBMS, Hadoop, NoSQL, stream processing, historical and real-time
Source: 451 Research, Total Data Analytics 2016
Data sources: Structured, RDBMS, historical
The evolution of enterprise analytics
13
REPORTING- What happened
ANALYSIS- Why did it happen?
PRESCRIPTIVE- Influence what happens
STATISTICALMODELING
MACHINE LEARNING
DESCRIPTIVE- What is happening?
PREDICTIVE- What will happen?
Complexity
Automated
User-d
riven
IT-driv
en
VISUALIZATION
Copyright (C) 2016 451 Research LLC
EDW vs Hadoop (Schema-on-write vs schema-on-read)
14
Source: https://www.flickr.com/photos/wbaiv/16510090506/ Source: https://www.flickr.com/photos/notbrucelee/5696238930/
Copyright (C) 2016 451 Research LLC
Schema-on-write
15
Source: https://www.flickr.com/photos/wbaiv/16510090506/
• Pre-prepared
• Single-purpose
• Some assembly required
• Inflexible
Copyright (C) 2016 451 Research LLC
Schema-on-read
16
Source: https://www.flickr.com/photos/notbrucelee/5696238930/
• Flexible
• Reusable
• Some imagination required*
• Multi-purpose
• *Instructions available if desired
Copyright (C) 2016 451 Research LLC
Hadoop-based data lakes• The concept of the data lake
has taken off in recent years, with the Apache Hadoop data-processing framework serving as the unified repository into which raw data is landed from multiple sources and made available to multiple users for multiple purposes.
17
Photo: Myrabella / Wikimedia Commons, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=11263585
Copyright (C) 2016 451 Research LLC
Hadoop-based data lakes• The concept of the data lake
has taken off in recent years, with the Apache Hadoop data-processing framework serving as the unified repository into which raw data is landed from multiple sources and made available to multiple users for multiple purposes.
• Beware the data swamp
18
https://www.flickr.com/photos/lofink/4501610335/
Copyright (C) 2016 451 Research LLC
Data governance, data preparation and the data lake• Data needs to be filtered, processed, treated
and managed to make it suitable for multiple analytics use cases.
• Data governance• Data catalog• Data security• Data lineage
• Data preparation• Data discovery• Data cleansing• Data harmonization
19
• Data inventory• Data quality• Data pipelines
• Data enrichment• Data matching• Collaboration
Copyright (C) 2016 451 Research LLC
Data governance, data preparation and the data lake
20
DATA-AS-A-SERVICE
PARTNERS
SUPPLIERS
SELF-SERVICEDATA PREPARATION
IT
DATA LAKE
APPLICATIONS
DATA GOVERNANCEData lineage Data inventory
Data catalogData security Data quality
Data pipelines
DATA STEWARDS
Data cleansing
Data harmonizationData discovery
Collaboration
Data matchingData enrichment
ADVANCED ANALYTICS
DATA SCIENTISTS
SELF-SERVICE ANALYTICS
SENIOR EXECUTIVES BUSINESS ANALYSTS DATA ANALYSTS
Copyright (C) 2016 451 Research LLC
Hadoop and other animals
21
Copyright (C) 2016 451 Research LLC
Recommendations
22
• Enterprises should seriously consider the data governance and management requirements before embarking on data lake projects to ensure that the functionality is available to turn the concept into reality.
• For flexibility and agility, employ data management approaches and technologies that abstract data processing pipelines from the execution environment.
• Look for data integration and transformation technologies that execute natively, taking advantage of the underlying engine (e.g. Spark, YARN).
• Seek out data management and integration technologies that enable consumption and transformation of large volumes of structured and unstructured data.
Copyright (C) 2016 451 Research LLC
Thank [email protected]@maslettwww.451research.com
SnapLogic Elastic IntegrationAccelerate Your Integration. Accelerate Your Business
“We can do more in two hours with SnapLogic than we could in two days with traditional solutions.”
25
CSV
Big Data and hybrid cloud environments are making yesterday’s approaches to integration obsolete
26
Anythingapps | data | APIs | things
SnapLogic: Unified Platform for Data and Application Integration
Anytime batch | streaming | real-
time
Anywhereon prem | cloud | hybrid
2727
SnapLogic in the Modern Data Fabric: Ingest, Transform, Deliver
Cons
ume
Stor
e &
Proc
ess
Sour
ce
z z z z
HANA
Data Warehouses & Data Marts Big Data and Data
LakesINGEST INGEST
Data Integration and Transformation
On Prem Application
s
RelationalDatabases
CloudApplication
s
NoSQLDatabases
WebLogs
Internet of Things
DELIVER DELIVER
28
Modern Architecture: Hybrid and Elastic Execution
Streams: No data is stored/cachedSecure: 100% standards-basedElastic: Scales out & handles data and app integration use cases
MetadataData
Databases On Prem Apps
Big Data
Cloud Apps and DataCloud-Based Designer,
Manager, Dashboard
Execution
Execution
Execution
Firewall
SnapLogic “respects data’s gravity.”
SnapLogic Demonstration
30
Discussion
Matt AslettResearch Director, Data Platforms and Analytics451 Research
Ravi DharnikotaHead of Enterprise Architecture SnapLogic