View
163
Download
11
Category
Preview:
Citation preview
Apache Spark & Apache Zeppelin: Enterprise Security for production deployments
Director, Product Management Dec 8, 2016Twitter: @neomythos
Vinay Shukla
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
whoami
Product Management Spark for 2.5 + years, Hadoop for 3+ years Blog at www.vinayshukla.com Twitter: @neomythos Addicted to Yoga, Hiking, & Coffee Smallest contributor to Apache Zeppelin
Programmer > Product Management > Programmer > Product Management
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What are the enterprise security requirements?
Spark user should be authenticated Integrate with corporate LDAP/AD Allow only authorized users access Audit all access Protect data both in motion & at rest Easily manage all security Make security easy to manage …
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Security: Rings of Defense
Perimeter Level Security•Network Security (i.e. Firewalls)
Data Protection•Wire encryption•HDFS TDE/DARE•Others
Authentication•Kerberos•Knox (Other Gateways)
OS Security
Authorization•Apache Ranger/Sentry•HDFS Permissions•HDFS ACLs•YARN ACL
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Interacting with Spark
Ex
Spark on YARN
Zeppelin
Spark-Shell
Ex
Spark Thrift Server
Driver
REST ServerDriver
Driver
Driver
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Context: Spark Deployment Modes
• Spark on YARN–Spark driver (SparkContext) in YARN AM(yarn-cluster)–Spark driver (SparkContext) in local (yarn-client):
• Spark Shell & Spark Thrift Server runs in yarn-client only
Client
Executor
App Master
Spark Driver
Client
Executor
App Master
Spark Driver
YARN-Client YARN-Cluster
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Spark on YARN
Spark Submit
John Doe
Spark AM
1
Hadoop Cluster
HDFS
Executor
YARN RM
4
2 3
Node Manager
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Spark – Security – Four Pillars
Authentication Authorization Audit Encryption
Spark leverages Kerberos on YARNEnsure network is secure
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authenticate users with AD/LDAP
KDC
Use Spark ST, submit Spark Job
Spark gets Namenode (NN) service ticket
YARN launches Spark Executors using John Doe’s identity
Get service ticket for Spark,
John Doe
Spark AMNN
Executor reads from HDFS using John Doe’s delegation token
kinit
1
2
3
4
5
6
7
Hadoop Cluster
AD/LDAP
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Spark – Kerberos - Example
kinit -kt /etc/security/keytabs/johndoe.keytab johndoe@EXAMPLE.COM
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 lib/spark-examples*.jar 10
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS
Allow only authorized users access to Spark jobs
YARN Cluster
A B C
KDC
Use Spark ST, submit Spark Job
Get Namenode (NN) service ticket
Executors read from HDFS
Client gets service ticket for Spark
Ranger/Sentry
Can John launch this job?Can John read this file
John Doe
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Secure data in motion: Wire Encryption with Spark
Spark Submit
RM
Shuffle Service
AMDriver
NM
Ex 1 Ex N
Shuffle Data
Control/RPC
ShuffleBlockTransfer
DataSource
Read/Write Data
FS – Broadcast,File Download
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Spark Communication Encryption Settings
Shuffle Data
Control/RPC
ShuffleBlockTransfer
Read/Write Data
FS – Broadcast,File Download
spark.authenticate.enableSaslEncryption= true
spark.authenticate = true. Leverage YARN to distribute keys
Depends on Data Source, For HDFS RPC (RC4 | 3DES) or SSL for WebHDFS
NM > Ex leverages YARN based SSL
spark.ssl.enabled = true
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Sharp Edges with Spark Security SparkSQL – Only coarse grain access control today Client -> Spark Thrift Server > Spark Executors – No identity propagation on 2nd hop
– Lowers security, forces STS to run as Hive user to read all data– Use SparkSQL via shell or programmatic API– https://issues.apache.org/jira/browse/SPARK-5159
Spark Stream + Kafka + Kerberos– No SSL support yet
Spark Shuffle > Only SASL, no SSL support Spark Shuffle > No encryption for spill to disk or intermediate data
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SparkSQL: Fine grained security
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Features: Spark Column Security with LLAP
Fine-Grained Column Level Access Control for SparkSQL.
Fully dynamic policies per user. Doesn’t require views.
Use Standard Ranger policies and tools to control access and masking policies.
Flow:1.SparkSQL gets data locations known as “splits” from HiveServer and plans query.2.HiveServer2 authorizes access using Ranger. Per-user policies like row filtering are applied.3.Spark gets a modified query plan based on dynamic security policy.4.Spark reads data from LLAP. Filtering / masking guaranteed by LLAP server.
HiveServer2
Authorization
Hive MetastoreData Locations
View Definitions
LLAPData Read
Filter Pushdown
Ranger Server
Dynamic Policies
Spark Client
12
4
3
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Example: Per-User Row Filtering by Region in SparkSQL
Spark User 2(East Region)
Spark User 1(West Region)
Original Query:SELECT * from CUSTOMERS
WHERE total_spend > 10000
Query Rewrites based onDynamic Ranger Policies
LLAP Data AccessUser ID Region Total Spend1 East 5,1312 East 27,8283 West 55,4934 West 7,1935 East 18,193
Dynamic Rewrite:SELECT * from CUSTOMERS
WHERE total_spend > 10000AND region = “east”
Dynamic Rewrite:SELECT * from CUSTOMERS
WHERE total_spend > 10000AND region = “west”
Fine grained Security to SparkSQL
http://bit.ly/2bLghGzhttp://bit.ly/2bTX7Pm
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin Security
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin: Authentication + SSL
Spark on YARNEx Ex
LDAP
John Doe
1
2
3
SSL
Firewall
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Security in Apache Zeppelin?
Zeppelin leverages Apache Shiro for authentication/authorization
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Example Shiro.ini
# =======================# Shiro INI configuration# =======================
[main]## LDAP/AD configuration
[users]# The 'users' section is for simple deployments# when you only need a small number of statically-defined# set of User accounts.
[urls]# The 'urls' section is used for url-based security#
Edit with Ambari or your favorite text editor
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin: AD Authentication Configure Zeppelin to use AD
[main]activeDirectoryRealm = org.apache.zeppelin.server.ActiveDirectoryGroupRealm activeDirectoryRealm.systemUsername = XXXXX activeDirectoryRealm.systemPassword = XXXXXXXXXXXXXXXXX activeDirectoryRealm.searchBase = DC=hdpqa,DC=Example,DC=com activeDirectoryRealm.url = ldap://hdpqa.example.com:389 activeDirectoryRealm.principalSuffix = @hdpqa.example.com activeDirectoryRealm.groupRolesMap = "CN=hdpdv_admin,DC=hdpqa,DC=example,DC=com":"admin" activeDirectoryRealm.authorizationCachingEnabled = true sessionManager = org.apache.shiro.web.session.mgt.DefaultWebSessionManager cacheManager = org.apache.shiro.cache.MemoryConstrainedCacheManager securityManager.cacheManager = $cacheManager securityManager.sessionManager = $sessionManager securityManager.sessionManager.globalSessionTimeout = 86400000shiro.loginUrl = /api/login
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin: LDAP Authentication Configure Zeppelin to use LDAP
[main]ldapRealm = org.apache.zeppelin.server.LdapGroupRealm ldapRealm = org.apache.shiro.realm.ldap.JndiLdapRealm ldapRealm.contextFactory.environment[ldap.searchBase] = DC=hdpqa,DC=example,DC=com ldapRealm.userDnTemplate = uid={0},OU=Accounts,DC=hdpqa,DC=example,DC=com ldapRealm.contextFactory.url = ldaps://hdpqa.example.com:636 ldapRealm.contextFactory.authenticationMechanism = simplesessionManager = org.apache.shiro.web.session.mgt.DefaultWebSessionManager securityManager.sessionManager = $sessionManager # 86,400,000 milliseconds = 24 hour securityManager.sessionManager.globalSessionTimeout = 86400000 shiro.loginUrl = /api/login
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Don’t want passwords in clear in shiro.ini? Create an entry for AD credential
–Zeppelin leverages Hadoop Credential API–hadoop credential createactiveDirectoryRealm.systemPassword -provider jceks:///etc/zeppelin/conf/credentials.jceksEnter password: Enter password again: activeDirectoryRealm.systemPassword has been successfully created.org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated.
Make credentials.jceks only Zeppelin user readable chmod 400 with only Zeppelin process r/w access, no other user allowed access to
Credentials Edit shiro.in
activeDirectoryRealm.systemPassword -provider jceks://etc/zeppelin/conf/credentials.jceks
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Want to connect to LDAP over SSL? Change protocol to ldaps in shiro.ini
ldapRealm.contextFactory.url = ldaps://hdpqa.example.com:636
If LDAP is using self signed certificate, import the certificate into truststore of JVM running Zeppelin
echo -n | openssl s_client –connect ldap.example.com:389 | \ sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > /tmp/examplecert.crt
keytool –import -keystore $JAVA_HOME/jre/lib/security/cacerts \
-storepass changeit -noprompt -alias mycert -file /tmp/examplecert.crt
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin + Livy E2E Security
Zeppelin
Spark
Yarn
Livy
Ispark GroupInterpreter
SPNego: Kerberos Kerberos/RPC
Livy APIs
LDAP
John Doe
Job runs as John Doe
LDAP/LDAPS
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin: Authorization
Note level authorization Grant Permissions (Owner, Reader, Writer)
to users/groups on Notes LDAP Group integration
Zeppelin UI Authorization Allow only admins to configure interpreter Configured in shiro.ini
For Spark with Zeppelin > Livy > Spark– Identity Propagation Jobs run as End-User
For Hive with Zeppelin > JDBC interpreter Shell Interpreter
– Runs as end-user
Authorization in Zeppelin Authorization at Data Level
[urls]/api/interpreter/** = authc, roles[admin]
/api/configurations/** = authc, roles[admin]
/api/credential/** = authc, roles[admin]
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Map admin role to AD Group
Allows mapped AD group access to Configure Interpreters
[main]activeDirectoryRealm = org.apache.zeppelin.server.ActiveDirectoryGroupRealm activeDirectoryRealm.systemUsername = XXXXX activeDirectoryRealm.systemPassword = XXXXXXXXXXXXXXXXX activeDirectoryRealm.searchBase = DC=hdpqa,DC=Example,DC=com activeDirectoryRealm.url = ldap://hdpqa.example.com:389 activeDirectoryRealm.principalSuffix = @hdpqa.example.com activeDirectoryRealm.groupRolesMap = "CN=hdpdv_admin,DC=hdpqa,DC=example,DC=com":"admin" activeDirectoryRealm.authorizationCachingEnabled = true sessionManager = org.apache.shiro.web.session.mgt.DefaultWebSessionManager cacheManager = org.apache.shiro.cache.MemoryConstrainedCacheManager securityManager.cacheManager = $cacheManager securityManager.sessionManager = $sessionManager securityManager.sessionManager.globalSessionTimeout = 86400000
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User reports: Can’t see interpreter Page
Zeppelin has URL based access control enabled User does not have the role Or Role incorrectly mapped[main]activeDirectoryRealm = org.apache.zeppelin.server.ActiveDirectoryGroupRealm activeDirectoryRealm.systemUsername = XXXXX activeDirectoryRealm.systemPassword = XXXXXXXXXXXXXXXXX activeDirectoryRealm.searchBase = DC=hdpqa,DC=Example,DC=com activeDirectoryRealm.url = ldap://hdpqa.example.com:389 activeDirectoryRealm.principalSuffix = @hdpqa.example.com activeDirectoryRealm.groupRolesMap = "CN=hdpdv_admin,DC=hdpqa,DC=example,DC=com":"admin" activeDirectoryRealm.authorizationCachingEnabled = true sessionManager = org.apache.shiro.web.session.mgt.DefaultWebSessionManager cacheManager = org.apache.shiro.cache.MemoryConstrainedCacheManager securityManager.cacheManager = $cacheManager securityManager.sessionManager = $sessionManager securityManager.sessionManager.globalSessionTimeout = 86400000
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User reports: Livy interpreter fails to run with access error
Ensure Livy has ability to proxy user
Ensure Livy has Impersonation enabled In /etc/livy/conf/livy.conf
livy.impersonation.enabled true
Edit HDFS core-site.xml via Ambari: <property> <name>hadoop.proxyuser.livy_qa.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.livy_qa.hosts</name> <value>*</value> </property>
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin: Credentials
LDAP/AD account Zeppelin leverages Hadoop Credential API
Interpreter Credentials Not solved yet
Credentials in Zeppelin
This is still an open issue
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You
Vinay Shukla @neomythos
Recommended