View
2.012
Download
2
Embed Size (px)
Citation preview
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Who Are I?
Product Management Spark for 2.5 + years, Hadoop for 3+ years Recovering Programmer Blog at www.vinayshukla.com Twitter: @neomythos Addicted to Yoga, Hiking, & Coffee Minor contributor to Apache Zeppelin
Vinay Shukla
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Security: Rings of Defense
Perimeter Level Security•Network Security (i.e. Firewalls)
Data Protection•Wire ecnryption•HDFS TDE/Dare•Others
Authentication•Kerberos•Knox (Other Gateways)
OS Security
Authorization•Apache Ranger/Sentry•HDFS Permissions•HDFS ACLs•HBase ACLs
Page 4
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key to Spark Security
Spark processes data in-memory, does not store it.
Page 5
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Context: Spark Deployment Modes
• Spark on YARN–Spark driver (SparkContext) in YARN AM(yarn-cluster)–Spark driver (SparkContext) in local (yarn-client):
• Spark Shell & Spark Thrift Server runs in yarn-client only
Client
Executor
App Master
Spark Driver
Client
Executor
App Master
Spark Driver
YARN-ClientYARN-Cluster
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Spark on YARN
Spark Submit
John Doe
Spark AM
1
Hadoop Cluster
HDFS
Executor
YARN RM
4
2 3
Node Manager
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Spark – Security – Four Pillars
Authentication Authorization Audit Encryption
Spark leverages Kerberos on YARNEnsure network is secure
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authentication: Kerberos Primer
Client
KDC
NN
DN
1. kinit - Login and get Ticket Granting Ticket (TGT)
3. Get NameNode Service Ticket (NN-ST)
2. Client Stores TGT in Ticket Cache
4. Client Stores NN-ST in Ticket Cache
5. Read/write file given NN-ST and file name; returns block locations, block IDs and Block Access Tokensif access permitted
6. Read/write block givenBlock Access Token and block ID
Client’sKerberos
Ticket Cache
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kerberos authentication within Spark
KDC
Use Spark ST, submit Spark Job
Spark gets Namenode (NN) service ticket
YARN launches Spark Executors using John Doe’s identity
Get service ticket for Spark,
John Doe
Spark AMNN
Executor reads from HDFS using John Doe’s delegation token
kinit
1
2
3
4
5
6
7
Hadoop Cluster
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Spark + X (Source of Data)
KDC
Use Spark ST, submit Spark Job
Spark gets X ST
YARN launches Spark Executors using John Doe’s identity
Get Service Ticket (ST) for Spark
Spark AMX
Executor reads from X using John Doe’s delegation token
kinit
1
2
3
4
5
6
7
Hadoop Cluster
John Doe
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Spark – Kerberos - Example
kinit -kt /etc/security/keytabs/johndoe.keytab [email protected]
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 lib/spark-examples*.jar 10
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS
Spark – Authorization
YARN Cluster
A B C
KDC
Use Spark ST, submit Spark Job
Get Namenode (NN) service ticket
Executors read from HDFS
Client gets service ticket for Spark
RangerCan John launch this job?Can John read this file
John Doe
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Encryption: Spark – Communication Channels
Spark Submit
RM
Shuffle Service
AMDriver
NM
Ex 1 Ex N
Shuffle Data
Control/RPC
ShuffleBlockTransfer
DataSource
Read/Write Data
FS – Broadcast,File Download
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Spark Communication Encryption Settings
Shuffle Data
Control/RPC
ShuffleBlockTransfer
Read/Write Data
FS – Broadcast,File Download
spark.authenticate.enableSaslEncryption= true
spark.authenticate = true. Leverage YARN to distribute keys
Depends on Data Source, For HDFS RPC (RC4 | 3DES) or SSL for WebHDFS
NM > Ex leverages YARN based SSL
spark.ssl.enabled = true
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Gotchas with Spark Security Client -> Spark Thrift Server > Spark Executors – No identity propagation on 2nd hop
– Lowers security, forces STS to run as Hive user to read all data– Use SparkSQL via shell or programmatic API– https://issues.apache.org/jira/browse/SPARK-5159
Spark + HBase with Kerberos– Issue fixed in Spark 1.4 (Spark-6918)
Spark Stream + Kafka + Kerberos– Issues fixed in HDP 2.4.x– No SSL support yet
Spark jobs > 72 Hours– Delegation token not renewed before Spark 1.4
Spark Shuffle > Only SASL, no SSL support
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How can I get Row/Column/Masking with SparkSQL?
Hopefully you went to “Fine Grained Security for Hive & Spark” yesterday
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Features: Spark Column Security with LLAP
Fine-Grained Column Level Access Control for SparkSQL.
Fully dynamic policies per user. Doesn’t require views.
Use Standard Ranger policies and tools to control access and masking policies.
Flow:1.SparkSQL gets data locations known as “splits” from HiveServer and plans query.2.HiveServer2 authorizes access using Ranger. Per-user policies like row filtering are applied.3.Spark gets a modified query plan based on dynamic security policy.4.Spark reads data from LLAP. Filtering / masking guaranteed by LLAP server.
HiveServer2
Authorization
Hive MetastoreData Locations
View Definitions
LLAPData Read
Filter Pushdown
Ranger Server
Dynamic Policies
Spark Client
12
4
3
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Example: Per-User Row Filtering by Region in SparkSQL
Spark User 2(East Region)
Spark User 1(West Region)
Original Query:SELECT * from CUSTOMERS
WHERE total_spend > 10000
Query Rewrites based onDynamic Ranger Policies
LLAP Data AccessUser ID Region Total Spend1 East 5,1312 East 27,8283 West 55,4934 West 7,1935 East 18,193
Dynamic Rewrite:SELECT * from CUSTOMERS
WHERE total_spend > 10000AND region = “east”
Dynamic Rewrite:SELECT * from CUSTOMERS
WHERE total_spend > 10000AND region = “west”
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Interacting with Spark
Ex
Spark on YARN
Zeppelin
Spark-Shell
Ex
Spark Thrift Server
Driver
REST ServerDriver
Driver
Driver
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin: Authentication + SSL
Spark on YARNEx Ex
LDAP
John Doe
1
2
3
SSL
Firewall
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin + Livy E2E Security
Zeppelin
Spark
Yarn
Livy
Ispark GroupInterpreter
SPNego: Kerberos Kerberos/RPC
Livy APIs
LDAP
John Doe
Job runs as Jon Doe
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin: Authorization
Notebook level authorization Grant Permissions (Owner, Reader, Writer) to users/groups on Notebooks LDAP Group integration just got merged (ZEPPELIN-946)