20
This document contains confidential, proprietary and trade secret information and is subject to certain legal protection. You may not review, copy, or distribute this information unless you are a designated recipient, and have prior written authorization from Zettaset, Inc. Creating a Secure Hadoop Initiative Securing the Big Data Ecosystem

Big Data Cloud Meetup - Jan 24 2013 - Zettaset

Embed Size (px)

Citation preview

Page 1: Big Data Cloud Meetup - Jan 24 2013 - Zettaset

This document contains confidential, proprietary and trade secret information and is subject to certain legal protection. You may not review, copy, or distribute this information unless you are a designated recipient, and have prior written authorization from Zettaset, Inc.

Creating a Secure Hadoop Initiative

Securing the Big Data Ecosystem

Page 2: Big Data Cloud Meetup - Jan 24 2013 - Zettaset

About Me

• CTO Zettaset, Inc. – Big Data Hadoop Company

– Founded 2007

• Distributed Computing Guy

– Have been since college

• Security Guy

– Founder SPI Dynamics (sold to HP, 2007)

– Internet Security Systems, Prof. Services

– Security First Network Bank, Sec. Guru.

Page 3: Big Data Cloud Meetup - Jan 24 2013 - Zettaset

Zettaset Enables Enterprise-Ready Hadoop

• Zettaset Orchestrator™ automates Hadoop installation and cluster management with an enterprise-ready solution for Big Data deployments– Enterprise-class – Hardened for security, high

availability, and performance

– Dramatically lowers operational expenses – Reduces IT resource requirements

– Simple to deploy – Accelerates time to value from weeks to hours

– Eliminates unnecessary dependencies on professional services

– Works with any Apache Hadoop distribution

3 © 2012 Zettaset, Inc. | Proprietary and Confidential

Page 4: Big Data Cloud Meetup - Jan 24 2013 - Zettaset

Zettaset Orchestrator:Making Hadoop Clusters Enterprise-Ready

4

© 2012 Zettaset, Inc. | Proprietary and Confidential

Page 5: Big Data Cloud Meetup - Jan 24 2013 - Zettaset

What is Big Data?

• Great Question

• It’s not a number, people define it differently.

• Majority define it as a scalability issue:– “The inability to continue storing and processing data the way

that you’ve been storing and processing data.”

Page 6: Big Data Cloud Meetup - Jan 24 2013 - Zettaset

Exponential Data Growth = Big Data

6

Source: http://www.emc.com/leadership/programs/digital-universe.htm, which was based on the 2011 IDC Digital Universe Study

Estimated Global Data Volume:

2011: 1.8 Zettabytes 2015: 7.9 Zettabytes

The world's information doubles every two years

Over the next 10 years: The number of servers

worldwide will grow by 10x Amount of information

managed by enterprise data centers will grow by 50x

Number of “files” enterprise data center handle will grow by 75x

Page 7: Big Data Cloud Meetup - Jan 24 2013 - Zettaset

April 18, 2023 Zettaset, Inc. | Proprietary and Confidential

7

Hadoop Distribution LandscapeDistribution & Core

Components CDH3u4 CDH4u0 HDP v1.0 Apache Bigtop v0.3 MapR M3 MapR M5 BigInsights

1.4 BEBigInsights

1.4 EEApache Hadoop ü ü ü ü ü ü ü ü

HDFS ü ü ü ü ü ü ü üFuse-DFS ü ü ü ü - - - -

MapReduce ü ü ü ü ü ü ü üMapReduce 2 - ü - - - - - -

Hadoop Common ü ü ü ü - - ü üApache Hive ü ü ü ü ü ü ü üApache Pig ü ü ü ü ü ü ü ü

Apache HBase ü ü ü ü ü ü ü üApache Zookeeper ü ü ü ü - - ü ü

Apache Ambari - - ü - - - - -

Apache Templeton - - ü - - - - -

Apache Flume ü ü - ü ü ü ü üApache Sqoop ü ü ü ü ü ü - -

Apache Mahout ü ü - ü ü ü - -

Apache Whirr ü ü - ü ü ü - -

Apache Oozie ü ü ü ü ü ü ü üApache Lucene - - - - - - ü üApache Derby - - - - - - ü üApache Avro - - - - - - ü ü

Hue ü ü - - - - - -

BigInsights Apps - - - - - - - ü

Hadoop Management

Nagios - - ü - - - - -

Ganglia - - ü - - - - -

Zettaset Orchestrator™ ü ü ü ü ü ü ü üCloudera Manager ü ü - - - - - -

MapR Manager - - - - ü ü - -

BigInsights web console - - - - - - - üBigInsights simple console - - - - - - ü -

Proprietary

Open Source Apache Hadoop

Page 8: Big Data Cloud Meetup - Jan 24 2013 - Zettaset

What is the current state of Security?• Another Great Question• Minimal work has been done in this field• Currently Not a Huge Community Focus.• Everyone feels like it’s been addressed by

adding Kerberos to the system

Don’t tell InfoSec People the Kerberos has fixed everything!

Page 9: Big Data Cloud Meetup - Jan 24 2013 - Zettaset

Why Not Tell Them That?• You will give them an aneurysm.• Kerberos is “Brushed On” Security NOT

“Baked In” security.• Kerberos does NOT address compliance

issues around data (HIPAA, GLBA, PCI, Etc.)

Nothing around encryption, nothing around best practices.

Page 10: Big Data Cloud Meetup - Jan 24 2013 - Zettaset

Hadoop: What’s Missing?

• All Hadoop distros are constrained

by the limitations of the Apache

open source components

• Not written to support hardened

security, compliance, encryption,

policy-enablement, and risk

management

• Not written with high availability,

service management, and

monitoring in mind

10 © 2012 Zettaset, Inc. | Proprietary and Confidential

Page 11: Big Data Cloud Meetup - Jan 24 2013 - Zettaset

Current State of Hadoop Security

• Existing security for Apache-based Hadoop distributions does not meet enterprise requirements to support regulatory compliance mandates such as HIPAA and SOX, for example

• Security breaches can result in negative impact, e.g., release sensitive information, damage brand, compromise competitive advantage, spark litigation, etc.

• Hadoop security mechanism provides mutual authentication of users and services via SASL and Kerberos, but this has limitations

11 © 2012 Zettaset, Inc. | Proprietary and Confidential

Page 12: Big Data Cloud Meetup - Jan 24 2013 - Zettaset

Enterprise-Class Hadoop Security

Addresses the security gaps and vulnerabilities that exist

in all Apache-based Hadoop distributions

• Hardened to address access control, policy, compliance and risk management

• Support for Lightweight Directory Access Protocol (LDAP) and Active Directory (AD),

enabling Hadoop clusters to seamlessly integrate with existing security policies within

the enterprise environment

• Centralized configuration management, logging, and auditing, which maintains control

of ingress and egress points in the cluster, and enables Hadoop clusters to meet

compliance requirements for reporting and forensics

• Role-based access control (RBAC), which significantly improves the user

authentication process, and enables Kerberos to be run against all components of a

big data ecosystem, not just Hadoop

© 2012 Zettaset, Inc. | Proprietary and Confidential 12

Page 13: Big Data Cloud Meetup - Jan 24 2013 - Zettaset

Defining Big Data Use Case

• What is your use case?

• What are you trying to accomplish?

• What data are you going to be storing?

• What are you going to do after you store it?

This will define your Security Threat Model and how you protect your data.

Page 14: Big Data Cloud Meetup - Jan 24 2013 - Zettaset

Log filesAlerts

Transactionsetc.

Big Data Production System D

ata

Typ

es Structured Data

Semi-structured Data

Unstructured Data

April 18, 2023 Zettaset, Inc. | Proprietary and Confidential

14

Page 15: Big Data Cloud Meetup - Jan 24 2013 - Zettaset

Open Source Projects

Location / People / Events

SMB Analytics

Infrastructure Analytics Applications

Data SourcesStorage

Security

Crowdsourcing

Hadoop RelatedNoSQL Databases Data Visualization

Cluster Services

Personal Data

MPP Databases

Industry Applications

Social Media

Sentiment Analysis

Analytics Solutions

Crowdsourced Analytics

IT Analytics

Data SourcesData Marketplace

s

Publisher Tools

Marketing

Management / Monitoring

Real-Time

Ad Optimization

Statistical Computing

Big Data Landscape (Version 2.0)

© Matt Turck (@mattturck) and Shivon Zilis (@shivonz) Bloomberg Ventures

Cross Infrastructure / Analytics

Application Service Providers Big Data Search

Analytics Services

Collection / Transport

Framework Query / Data Flow

Data Access Coordination / Workflow

Machine Learning

Real - Time

Statistical Tools

Cloud Deployme

nt

NewSQL Databases

Page 16: Big Data Cloud Meetup - Jan 24 2013 - Zettaset

What is a Threat Model?• Threat modeling is based on the notion that any

system or organization has assets of value worth protecting and these assets have certain vulnerabilities.

• Internal or external threats exploit these vulnerabilities in order to cause damage to the assets, and appropriate security countermeasures exist that mitigate the threats.

• A threat model can help to assess the probability, the potential harm, the priority etc., of attacks, and thus help to minimize or eradicate the threats.

Page 17: Big Data Cloud Meetup - Jan 24 2013 - Zettaset

Approaches to threat modeling• 3 general approaches to threat modeling:

• Attacker-centric

– Attacker-centric threat modeling starts with an attacker, and evaluates their goals, and how they might achieve them. Attacker's motivations are often considered, for example, "The NSA wants to read this email," or "Jon wants to copy this DVD and share it with his friends." This approach usually starts from either entry points or assets.

• Software-centric

– Software-centric threat modeling (also called 'system-centric,' 'design-centric,' or 'architecture-centric') starts from the design of the system, and attempts to step through a model of the system, looking for types of attacks against each element of the model. This approach is used in threat modeling in Microsoft's Security Development Lifecycle.

• Asset-centric

– Asset-centric threat modeling involves starting from assets entrusted to a system, such as a collection of sensitive personal information.

Page 18: Big Data Cloud Meetup - Jan 24 2013 - Zettaset

Bottom Line

• Identify any threats to the confidentiality, availability and integrity of the data and the application based on the data access control matrix that your application should be enforcing

• Assign risk values and determine the risk responses

• Determine the countermeasures to implement based on your chosen risk responses

• Continually update the threat model based on the emerging security landscape.

Page 19: Big Data Cloud Meetup - Jan 24 2013 - Zettaset

Summary

• All existing Apache-based Hadoop distributions have functional limitations which constrain enterprise adoption

• Zettaset Orchestrator is addressing the enterprise-level gaps in security, high availability, performance, and manageability that exist in all Apache-based Hadoop distributions

• Orchestrator is a universal management and control software layer that can sit on top of any Hadoop distribution (distro-agnostic)

• Orchestrator fills the Service Management gaps that exist in all Hadoop distributions and cluster deployments, and makes Hadoop ready for broader enterprise adoption

© 2012 Zettaset, Inc. | Proprietary and Confidential 19

Page 20: Big Data Cloud Meetup - Jan 24 2013 - Zettaset

Thank You !