43
© Hortonworks Inc. 2013 Modern Data Architecture for Non-Stop Hadoop Page 1

Non-Stop Hadoop for Hortonworks

Embed Size (px)

DESCRIPTION

In this webinar, we'll: -Examine the key drivers and use cases for High Availability, performance and scalability for Apache Hadoop. -Walk through an overview of reference architecture for a Non-Stop Hadoop implementation. -Show how you can get started with Non-Stop Hadoop with the Hortonworks Data Platform.

Citation preview

© Hortonworks Inc. 2013

Modern Data Architecture …for Non-Stop Hadoop

Page 1

© Hortonworks Inc. 2013

Your Presenters

Page 2

• Jagane Sundar (@jagane) – CTO of Big Data at WANdisco –  Co-founder of AltoStor and former Director of

Engineering in Yahoo’s Hadoop group –  Managed Hadoop 0.20.204 release for Yahoo

• Rohit Bakhshi (@Rohit2b) – Product Management at Hortonworks –  Focus on HDP Platform Services, Hadoop

Core and Windows enablement –  Enjoy live jazz and expresso

© Hortonworks Inc. 2013

Today’s Topics

• Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop in the MDA • WANdisco’s role in the MDA • Q&A

Page 3

© Hortonworks Inc. 2013

Existing Data Architecture

Page 4

APPLICAT

IONS  

DATA

   SYSTEM  

REPOSITORIES  

SOURC

ES  

Exis4ng  Sources    (CRM,  ERP,  Clickstream,  Logs)  

RDBMS   EDW   MPP  

OPERATIONAL  TOOLS  

MANAGE  &  MONITOR  

DEV  &  DATA  TOOLS  

BUILD  &  TEST  

Business    Analy4cs  

Custom  Applica4ons  

Packaged  Applica4ons  

© Hortonworks Inc. 2013

Existing Data Architecture

Page 5

APPLICAT

IONS  

DATA

   SYSTEM  

REPOSITORIES  

SOURC

ES  

Exis4ng  Sources    (CRM,  ERP,  Clickstream,  Logs)  

RDBMS   EDW   MPP  

Business    Analy4cs  

Custom  Applica4ons  

Packaged  Applica4ons  

Source: IDC

2.8  ZB  in  2012  

85%  from  New  Data  Types  

15x  Machine  Data  by  2020  

40  ZB  by  2020  

© Hortonworks Inc. 2013 - Confidential

Modern Data Architecture Enabled

Page 6

APPLICAT

IONS  

DATA

   SYSTEM  

REPOSITORIES  

SOURC

ES  

Exis4ng  Sources    (CRM,  ERP,  Clickstream,  Logs)  

RDBMS   EDW   MPP  

Emerging  Sources    (Sensor,  Sen4ment,  Geo,  Unstructured)  

OPERATIONAL  TOOLS  

MANAGE  &  MONITOR  

DEV  &  DATA  TOOLS  

BUILD  &  TEST  

Business    Analy4cs  

Custom  Applica4ons  

Packaged  Applica4ons  

© Hortonworks Inc. 2013 - Confidential

Drivers of Hadoop Adoption

Page 7

A Modern Data Architecture Complement your existing data systems: the right workload in the right place

Architectural

New Business Applications

Types of Big Data •  CRM, ERP •  Server log •  Clickstream

•  Sentiment/Social •  Machine/Sensor •  Geo-locations

© Hortonworks Inc. 2013 - Confidential

Opportunity in types of data

1.  Sentiment Understand how your customers feel about your brand and products – right now

2.  Clickstream Capture and analyze website visitors’ data trails and optimize your website

3.  Sensor/Machine Discover patterns in data streaming automatically from remote sensors and machines

4.  Geographic Analyze location-based data to manage operations where they occur

5.  Server Logs Research logs to diagnose process failures and prevent security breaches

6.  Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents

Value

Page 8

© Hortonworks Inc. 2013 - Confidential

Integrated Interoperable with existing data center investments Skills

Leverage your existing skills: development, operations, analytics

Requirements for Hadoop Adoption

Page 9

Key Services Platform, operational and data services essential for the enterprise

3 Requirements for Hadoop’s Role in the Modern Data Architecture

© Hortonworks Inc. 2013 - Confidential

1

Integrated Engineered with existing data center investments

Key Services Platform, Operational and Data services essential for the enterprise Skills Leverage your existing skills: development, analytics, operations

2

3

Requirements for Enterprise Hadoop

Page 10

OS/VM   Cloud   Appliance  

PLATFORM    SERVICES  

   

CORE  

Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots

HORTONWORKS    DATA  PLATFORM  (HDP)  

OPERATIONAL  SERVICES  

DATA  SERVICES  

HDFS  

SQOOP  

FLUME  

NFS  

LOAD  &    EXTRACT  

WebHDFS  

KNOX*  

OOZIE  

AMBARI  

FALCON*  

YARN      

MAP       TEZ  REDUCE  

HIVE  &  HCATALOG  PIG  HBASE  

© Hortonworks Inc. 2013 - Confidential

Requirements for Enterprise Hadoop

Page 11

1

Integration Engineered with existing data center investments

Key Services Platform, operational and data services essential for the enterprise

Skills Leverage your existing skills: development, analytics, operations

2

3 DE

VELO

P  AN

ALYZE  

OPE

RATE  

COLLECT   PROCESS   BUILD  

EXPLORE   QUERY   DELIVER  

PROVISION   MANAGE   MONITOR  

© Hortonworks Inc. 2013 - Confidential

Familiar and Existing Tools

Page 12

1 Key Services Platform, operational and data services essential for the enterprise

Skills Leverage your existing skills: development, analytics, operations

2

DEVE

LOP  

ANAL

YZE  

OPE

RATE  

COLLECT   PROCESS   BUILD  

EXPLORE   QUERY   DELIVER  

PROVISION   MANAGE   MONITOR  

BusinessObjects BI

Integration Interoperable with existing data center investments 3

© Hortonworks Inc. 2013 - Confidential

APPLICAT

IONS  

DATA

   SYSTEM  

REPOSITORIES  

SOURC

ES  

Exis4ng  Sources    (CRM,  ERP,  Clickstream,  Logs)  

RDBMS   EDW   MPP  

Emerging  Sources    (Sensor,  Sen4ment,  Geo,  Unstructured)  

OPERATIONAL  TOOLS  

MANAGE  &  MONITOR  

DEV  &  DATA  TOOLS  

BUILD  &  TEST  

Business    Analy4cs  

Custom  Applica4ons  

Packaged  Applica4ons  

Requirements for Enterprise Hadoop

Page 13

Integration Engineered with existing data center investments 3

Integrated with Applications Business Intelligence, Developer IDEs, Data Integration

Systems Data Systems & Storage, Systems Management

Platforms Operating Systems, Virtualization, Cloud, Appliances

© Hortonworks Inc. 2013 - Confidential

WANdisco in the Modern Data Architecture

Page 14

APPLICAT

IONS  

DATA

 SYSTEM  

SOURC

ES  

RDBMS   EDW   MPP  

Emerging  Sources    (Sensor,  Sen4ment,  Geo,  Unstructured)  

HANA

BusinessObjects BI

OPERATIONAL  TOOLS  

DEV  &  DATA  TOOLS  

Exis4ng  Sources    (CRM,  ERP,  Clickstream,  Logs)  

INFRASTRUCTURE  

© Hortonworks Inc. 2013 - Confidential

Non-Stop Hadoop for Hortonworks

Page 15

•  Non-stop technology delivers continuous uptime with no data loss

•  One Hadoop cluster across data centers any distance

•  Eliminates the bottleneck of a single active NameNode

•  Automatic backup, failover and recovery within across data centers

•  LAN-speed read and write

© Hortonworks Inc. 2013

Today’s Topics

• Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop’s role in the MDA • WANdisco’s role in the MDA • Q&A

Page 16

© WANdisco 2013

u  WANdisco: Wide Area Network Distributed Computing –  Enterprise ready, high availability software solutions that enable globally distributed

organizations to meet today’s data challenges of secure storage, scalability and availability

u  Leader in tools for software engineers – Subversion –  Apache Software Foundation sponsor

u  Highly successful IPO, London Stock Exchange, June 2012 (LSE:WAND)

u  US patented active-active replication technology granted, November 2012

u  Global locations –  San Ramon (CA) –  Chengdu (China) –  Tokyo (Japan) –  Boston (MA) –  Sheffield (UK) –  Belfast (UK)

WANdisco Background

/ page 17

© WANdisco 2013

Customers

© WANdisco 2013

u  Overarching theme - We’re enabling global protection against:

•  Data loss

•  Downtime

•  Loss of Intellectual Property

•  Loss of revenue/time to market

•  Falling behind the competition

WANdisco

© WANdisco 2013

Non-Stop Hadoop

u  Single HDFS that spans multiple Data Centers across the world

u  Provides 100% Uptime for Hadoop

u  Built as an extension on top of Apache Hadoop HDFS

u  100 % HDFS / 100% compatibility with Hadoop applications – Applications run unmodified

u  Applications can run in any Data Center

u  Not Simple Mirroring or a Copy

Extending HDFS across Data Centers

© WANdisco 2013

u  WANdisco’s patented WAN capable Paxos implementation –  Mathematically proven –  Provides distributed co-ordination of File system metadata

•  Active-Active (All locations)

•  Create, Modify, Delete

•  Share nothing (No Leader)

u  No restrictions on distance between data centers –  US Patent granted for time independent implementation of Paxos

u  Not based on SAN block device synchronization such as EMC SRDF –  SAN block replication has distance limits resulting from the inability of file systems such as

NTFS and ext4 to tolerate long RTTs to block storage –  Possible distribution of corrupted blocks

Distributed Coordination Engine WANdisco DConE

© WANdisco 2013 / page 22

Apache Hadoop

© WANdisco 2013 / page 23

Apache Hadoop

© WANdisco 2013 / page 24

Apache Hadoop

© WANdisco 2013 / page 25

Apache Hadoop

© WANdisco 2013 / page 26

Non-Stop Hadoop over WAN Continuous availability

© WANdisco 2013 / page 27

Non-Stop Hadoop over WAN Continuous availability

© WANdisco 2013 / page 28

Non-Stop Hadoop over WAN Continuous availability

© WANdisco 2013 / page 29

Non-Stop Hadoop over WAN Continuous availability

© WANdisco 2013 / page 30

Non-Stop Hadoop over WAN Unlimited performance and scalability

© WANdisco 2013 / page 31

Non-Stop Hadoop over WAN Automated failover and recovery

© WANdisco 2013 / page 32

Non-Stop Hadoop over WAN Automated failover and recovery

© WANdisco 2013 / page 33

Non-Stop Hadoop over WAN Automated failover and recovery

© WANdisco 2013 / page 34

Non-Stop Hadoop over WAN Automated failover and recovery

© WANdisco 2013 / page 35

Non-Stop Hadoop over WAN Automated failover and recovery

© WANdisco 2013 / page 36

Non-Stop Hadoop over WAN Automated failover and recovery

© WANdisco 2013 / page 37

Non-Stop Hadoop over WAN Automated failover and recovery

© WANdisco 2013 / page 38

Non-Stop Hadoop over WAN Automated failover and recovery

© WANdisco 2013

u Architecture –  Non-Intrusive - Not Simple Mirroring or a Copy –  Does not modify Apache Hadoop –  Runs on HDP 2 and later

u Provides 100% Uptime for Hadoop –  Provides Continuous Availability of HDFS Data –  Guarantees 100% Uptime of HDFS During all 4 Categories of Failures

u  Enables HDFS to be Deployed Globally – Across the WAN –  Extends HDFS Across Multiple Data Centers –  Unifies the HDFS Namespace –  Exceeds Business Continuity Requirements for SLAs and Compliance

u  Load Balances NameNode Traffic for Increased Scalability

Non-Stop Hadoop

© WANdisco 2013 / page 40

DEMO DEMO

© WANdisco 2013

u  Disaster Recovery –  Data is as current as possible (no periodic synchronizations) –  Virtually zero downtime to recover from regional data center failure –  Regulatory compliance

u  Load Balancing

u  Multi Data Center Ingest –  Information doesn’t need to be sent to one DC and then copied back to the other using DistCP –  Parallel ingest methods don’t require redirected data streams

u  Global MapReduce –  Global Click Stream Analysis –  Global Log Analysis –  Etc.

u  Maximize Resource Utilization –  All data centers can be used to run different jobs concurrently

/ page 41

Use Cases for Non-Stop Hadoop with Hortonworks

© WANdisco 2013

u  Non-Stop Hadoop make Hadoop Enterprise/Production Ready

u  Load balancing eliminates the bottleneck of a single NameNode

u  Active-Active replication solves the Hadoop high availability issue

u  No job restarts or lost time for NameNode failures (Continuous Availability)

u  Single HDFS across multiple data centers –  No out of sync issues –  No Load Balancer maintenance problems

u  Data Centers can be located at any distance from each other

u  If any Data Center fails, applications can be run on any other replicated Data Center

u  If a Data Center is completely lost, any other replica of that Data Center can be used to restore it

/ page 42

Non-Stop Hadoop for Hortonworks Key Takeaways

© Hortonworks Inc. 2013

Next Steps:

Page 43

More about Non-Stop Hadoop for Hortonworks http://www.wandisco.com/hadoop/non-stop-hadoop-hortonworks

Get started on Hadoop with Hortonworks Sandbox http://hortonworks.com/hadoop-tutorial/

Try Non-Stop Hadoop for Hortonworks Contact us: [email protected]