Upload
hadoop-summit
View
116
Download
0
Embed Size (px)
Citation preview
To The Cloud and Back: A Look at Hybrid AnalyticsKeith Manthey
© Dell EMC 2016 - All rights reserved2
ITIL-basedIT processes
Client-serverscale-up apps
Infrastructureresiliency
TraditionalCIO challenge: next-generation infrastructures are needed
© Dell EMC 2016 - All rights reserved3
ITIL-basedIT processes
Client-serverscale-up apps
Infrastructureresiliency
Cloud-native
Coexisting IT paradigms
Traditional
DevOps basedIT processes
Distributedscale-out apps
Applicationresiliency
CIO challenge: next-generation infrastructures are needed
© Dell EMC 2016 - All rights reserved4
Traditional Cloud-native
Modern IT
Off-Premises
On-Premises
Available both on and off premises
EDGE
CLOUD ENABLED STORAGE
PRIVATE
PUBLICHOSTED
DATALAKE
COREDATACENTER
EDGE
CLOUD ENABLED STORAGE
PRIVATE
PUBLICHOSTED
DATALAKE
COREDATACENTER
Through 2020, IDC predicts 4.5x faster spending on cloud based big data analytics then on-premise solutions
By 2020, IDC predicts that usage of big data analytics for actionable intelligence will double over today
On premise and Cloud use of Hybrid Analytics will become a dominant use of the technology
More specifically – Moving data from On-Premise to the Cloud and Back
The journey to here
© Dell EMC 2016 - All rights reserved8
Data Management
DATA LAKE SOLUTION FOR EDW MODERNISATION
Clickstream
Web & Social
Geolocation
Sensor & Machine
Server Logs
EXIS
TIN
G S
OU
RC
ES
ERP
CRM
Commodity Compute
DATA SERVICES
OPERATIONAL SERVICES
HORTONWORKS DATA PLATFORM
HADOOP CORE
Business Analytics
Visualization& Dashboards
IT Applications
NEW
SO
UR
CES
2
3
1
ETL/ELT OFFLOAD
ACTIVE ARCHIVE
ENRICH WITH NEW DATA TYPES
MULTI-PROTOCOLACCESS
ENTERPRISE-GRADE DATA MANAGEMENT
5NFS, SMB,HTTP, Swift
1
2
3
4
5
4
New Data Flow
Current Data Flow
Legend
OFFLOAD
Isilon
© Dell EMC 2016 - All rights reserved9
1. Active Archive– Optimise Enterprise Data Warehouse storage by archiving cold data and still analyse it as
needed
2. ETL Offload– Improve EDW performance by offloading ETL processing to Hadoop
3. Semi/Unstructured Data Analytics– Increase confidence in business decisions with new data sources
4. Multi-protocol Access – Enable applications to access/update Hadoop data using NFS, SMB, HTTP, Swift and
other file/object based access methods
5. Data Management– Enterprise-grade data management at Hadoop economics
DATA LAKE BENEFITS
Unique to Isilon
© Dell EMC 2016 - All rights reserved10
ISILON MOMENTUM
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
• >12,000 Clusters in the Field• Approaching 1PB Average Cluster
Capacity• 1.3EB of Hadoop Usage
• 7,000+ Customers World Wide• 1,300 New Customers in 2015• #1 in Data Lakes
>1,200 HDFS Customers
Scale-Out NAS Leader
A look forward
Ability to:
• Enact Policy Based Data Movement• Migrate data from On-Prem to Cloud
and vice versa• Leverage Hybrid Analytics to greatest
Effect
To The Cloud and Back
ISILON SD
Limits:100 Gb/Sec
1 Petabyte (1 million gigabytes)
~1 Day to transfer one way
To The Cloud and Back
ISILON SD
Data Silos
On Prem
Only files C
loud
Onl
y fil
es
Strong BandwidthTypically Hand Scripted
Expensive ExtractionLimit outbound data movement
Our Research
Problem Domain
1. File movement automation in place today is labor intensive and fraught with peril
2. Non-prescriptive file movement out of the cloud will be extremely expensive with limited value in return.
3. Most file movements might not return to exactly the same target location. For example:
Location 1, File 1 Location 2,
File 1 – begets File 2
Location 3, File 2
What if….
1. There was a way to move files by policy (i.e., rules based) to various locations, including the cloud and back1. For Example – only target net new files created in the cloud in certain
directories for movement.2. The rules based file movement could allow for multiple targets and / or
destinations for files.1. For Example – move net new files in one directory to a single argetand
net new files in a second directory to three targets.
OneFS in the Cloud
Our Journey to Here
1. OneFS already is a software defined asset. 1. SDEdge is a software defined storage offering2. CloudPools is a software feature in SDEdge that uses the Public Cloud
(Azure, AWS, Google) as a back up target2. OneFS has been loaded and run in the public cloud as noted in the previous
slide.3. Dell EMC has a long history of data mobility across our product suite
(replication, backups, etc)4. Dell EMC has a long history of policy based file movement features.
Futures
1. We are exploring what OneFS with all its features and abilities to move data up to the cloud and back would look like.
2. What about a cloud environment that contained a OneFS daemon that allowed policy based file movement.?
3. The cloud environment file movement (up and back) could be controlled by a Isilon cluster
On-Prem FilesMovement policies
Future State?
ISILON SD
Cloud FilesOneFS Daemon