32
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Practical Kerberos with Apache HBase Josh Elser HBaseCon East 2016/09/26

Practical Kerberos with Apache HBase

Embed Size (px)

Citation preview

Page 1: Practical Kerberos with Apache HBase

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Practical Kerberos with Apache HBaseJosh ElserHBaseCon East2016/09/26

Page 2: Practical Kerberos with Apache HBase

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Engineer at Hortonworks, Member of the Apache Software Foundation

Top-Level Projects• Apache

Accumulo®• Apache CalciteTM

• Apache CommonsTM

• Apache HBase®• Apache PhoenixTM

ASF Incubator• Apache FluoTM

• Apache GossipTM

• Apache PirkTM

• Apache RyaTM

• Apache SliderTM

These names are trademarks or registered trademarks of the Apache Software Foundation.

Page 3: Practical Kerberos with Apache HBase

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

… but today we’re talking about Kerberos!

- “The Madness beyond the Gate” [1]

- An exploration in black magic and voodoo

- The word most accompanied with expletives

1: https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/kerberos_the_madness.html

Page 4: Practical Kerberos with Apache HBase

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

What this talk won’t be...

3dom via https://www.flickr.com/photos/steve_l/6042206137/in/album-72157629289333057/, CC-BY-NC

Page 5: Practical Kerberos with Apache HBase

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Introduction to Kerberos

⬢ “Kerberos is a network authentication protocol. It is designed to provide strong authentication for client/server applications by using secret-key cryptography” [1]

⬢ MIT Kerberos is one implementation– Heimdal is another– We’re talking about MIT Kerberos

⬢ Authentication over a computer network– Not authorization– No data privacy

1: http://web.mit.edu/kerberos/

Page 6: Practical Kerberos with Apache HBase

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Introduction to Kerberos

⬢ Key Distribution Center (KDC)– Centralized server which grants Kerberos “tickets”– The “trusted third party” of the security model

⬢ Users are defined by a ”principal”– primary[/instance]@REALM– A human: [email protected]– A service: hbase/[email protected][email protected] is unique with elserj/[email protected]

Page 7: Practical Kerberos with Apache HBase

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Introduction to Kerberos

⬢Principals are identified by a secret shared with the KDC– A normal password– A keytab file (non-plaintext “password”, suitable for non-interactive logins)

⬢ Kerberos Ticket obtained from the KDC by using your secret– Tickets expire– Tickets are renewable*

Client Server

KDC

Password/Keytab Keytab

Authenticated RPC

Page 8: Practical Kerberos with Apache HBase

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Interacting with Kerberos

⬢ kadmin (or kadmin.local)– Command-line interface for administrators to create, modify, delete principals.

⬢ kinit– A command-line tool to obtain a ticket for a principal– Places the ticket in a file on disk in a well-known location called a “ticket cache”

• Default location on Linux: /tmp/krb5cc_$(id –u `whoami`)– The ticket cache is read-write protected for the user only (e.g. chmod 600)– Can obtain a ticket for any principal using a password or keytab– Ticket caches can hold multiple tickets

⬢ klist– Lists the contents of the current user’s ticket cache– Can list the keys in a keytab file

Page 9: Practical Kerberos with Apache HBase

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Benefits of Kerberos

⬢ Building a secure, network-based authentication system is very hard

⬢ Functions on non-trusted networks– Security for multi-tenant systems, protect against malicious and non-malicious users

⬢ Leveraged across the Apache Hadoop “Stack”

⬢ Widely integrated externally– Operating systems and programming languages

⬢ Can integrate with Active Directory

Apache Hadoop is a registered trademark of the Apache Software Foundation

Page 10: Practical Kerberos with Apache HBase

10

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Promises

It’s simple, you just get your Kerberos ticket, use HBase and it knows who you are!

[elserj@localhost] $ kinit elserjPassword for [email protected]: [elserj@localhost] $ hbase com.hortonworks.hbase.MyMapReduceJob /user/elserj/my-big-data.txt…Success![elserj@localhost] $

Page 11: Practical Kerberos with Apache HBase

11

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Reality

[elserj@localhost] $ kinit elserjPassword for [email protected]: [elserj@localhost] $ hbase com.hortonworks.hbase.MyMapReduceJob /big-data.txt... 2016-09-26 14:03:11,549 FATAL [main] ipc.AbstractRpcClient (RpcClientImpl.java:run(709)) – SASL authentication failed. The most likely cause is missing or invalid credentials. Consider ‘kinit’.javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)][elserj@localhost] $

( °□°╯ )╯︵┻━┻

Page 12: Practical Kerberos with Apache HBase

12

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ok, let’s figure out what went wrong?

What should I search for?

RPC

SASL

GSSAPI

JGSSUGI

JAAS

KDC

JCEToken

TicketVoldemort

“Bars near meopen now”

Cthulhu

Kerberos

Page 13: Practical Kerberos with Apache HBase

13

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

How JVM-based applications can obtain Kerberos tickets⬢ Extract a ticket from the local ticket cache for a principal

– hbase shell or hdfs dfs –ls /

⬢ UserGroupInformation Hadoop API (UGI)– UserGroupInformation.loginUserFromKeytab(String, String)– UserGroupInformation.loginUserFromKeytabAndReturnUGI(String, String)

⬢ javax.security.auth.Subject with Krb5LoginModule– The APIs which UserGroupInformation uses under the covers

⬢ Automatic login via JAAS– “Java Authentication and Authorization Service”, implementation of PAM (RFC 86.0)– Configuration file, specified via Java system properties.– Each “block” uses an identifier to denote login details for a specific system

Page 14: Practical Kerberos with Apache HBase

14

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

HBase Service Logins

⬢ HBase services are daemons; they always use a keytab to login

⬢ Principal and keytab are specified in hbase-site.xml for each service

⬢ A JAAS configuration file is also provided for Apache ZooKeeper client authentication– Necessary for authenticated ZooKeeper access (HBase-only ACLs)

⬢ HBase services automatically perform logins/renewals as necessary– Anyone who tells you that they need to ”kinit for HBase to work” doesn’t know what they’re

talking about.

Apache ZooKeeper is a trademark of the Apache Software Foundation

Page 15: Practical Kerberos with Apache HBase

15

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

HBase Clients

⬢ HBase clients will use a variety of mechanism for authentication– Interactive use: ticket-cache– Automated tasks/Daemons: UGI with keytab

⬢ Reminder: Kerberos tickets expire– Clients must implement renewal logic– UGI provides an API to do this

⬢ Typically, UGI is the way to go–Concise and well-understood

Page 16: Practical Kerberos with Apache HBase

16

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

On using UserGroupInformation correctly

⬢ We mentioned two different method calls earlier for logins– void loginUserFromKeytab(String, String)– UserGroupInformation loginUserFromKeytabAndReturnUGI(String, String)

⬢ loginUserFromKeytab is “global”– Syntactic-sugar to make your life easier– Works great when the application only acts as one user

⬢ loginUserFromKeytabAndReturnUGI is “localized”– Requires invoking “doAs(...)”– Allows for concurrent execution as different users in one JVM

Page 17: Practical Kerberos with Apache HBase

17

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Enter SASL: authentication framework over a transport

⬢ SASL is a framework for building RPC systems with authentication

⬢ “Simple Authentication and Security Layer” RFC-4422– “A framework for authentication and data security in Internet protocols” [1]

– “decouples authentication mechanisms from application protocols” [1]

• Generic Security Services Application Program Interface (GSSAPI) speaks Kerberos• DIGEST-MD5 an HTTP Digest authentication-like method (delegation tokens)

– Data security aka Quality of Protection (QoP)• auth: Authentication only (default)• auth-int: Previous, and integrity check of message content• auth-conf: Previous, and encryption of message content

[1] https://en.wikipedia.org/wiki/Simple_Authentication_and_Security_Layer

Page 18: Practical Kerberos with Apache HBase

18

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Trust on an untrusted network

⬢ A Kerberos ticket implies a valid identity, not necessarily the identity you wanted

⬢ Kerberos relies on accurate/consistent DNS as the basis for a secure RPC model– Secure your DNS as much as your KDC

⬢ Recall the service principal from earlier– hbase/[email protected]

⬢ The instance must be a fully-qualified domain name

⬢ Clients need to know primary and instance must match DNS– “Caused by: KrbException: Identifier doesn't match expected value (906)”– “error Message is Server not found in Kerberos database”

Page 19: Practical Kerberos with Apache HBase

19

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Trust on an untrusted network

Client Trusted ServiceGoodDNS

Rogue Service BadDNS

service/[email protected]/[email protected]

Sends RPC “service” atsvc1.hwx.com

Without enforcement of DNS naming via SASL, a client could be maliciously sent to a rogue service without the client realizing it happened.

Page 20: Practical Kerberos with Apache HBase

20

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Harping on DNS

⬢ DNS must be correct, consistent, and secure

⬢ Hostnames are advertised for discovery– Also benefits multi-homed networks

⬢ Forward and Reverse DNS mappings must be accurate on every node– `nslookup regionserver1.hbase.hwx.com` returns 10.0.0.1– `nslookup 10.0.0.1` returns regionserver1.hbase.hwx.com

⬢ Check /etc/resolv.conf for quick troubleshooting

Page 21: Practical Kerberos with Apache HBase

21

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Recap: Kerberos authentication for HBase RPCs

⬢ Client and Server both obtain Kerberos ticket– Password or Keytab via UGI/JAAS/Ticket-Cache– Tickets must be renewed before they expire

⬢ SASL is the framework which HBase leverages for authenticated RPCs– GSSAPI as the SASL mechanism which can “speak” Kerberos– QoP defines the security of the RPC data (minimum of authentication)

⬢ Fully-qualified hostnames everywhere– Forward and reverse DNS must be consistent across all clients and servers

Page 22: Practical Kerberos with Apache HBase

22

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

The edge cases

⬢ Exceptions to how authentication works– YARN jobs– HBase REST and Thrift services

⬢ Not the traditional client/server model Kerberos was designed to fit– 100-1000’s of tasks concurrently requiring a ticket– Talk to HBase as a user without having that user’s credentials

⬢ Two approaches introduced to address these problems

Page 23: Practical Kerberos with Apache HBase

23

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Delegation Tokens

⬢ Earlier mentioned, SASL supports a variety of mechanisms– DIGEST-MD5 allows a digest-token style authentication scheme

⬢ Delegation token is a temporary ”password” which can authenticate a user– Slight compromise of security for performance

⬢ Circumvents authentication to the KDC, instead handled by HDFS or HBase

⬢ Automatically obtained during job submission and added to the job cache– We must rely on YARN to do the right thing

If you thought Kerberos documentation for Hadoop/HBase was sparse…

Page 24: Practical Kerberos with Apache HBase

24

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Delegation Tokens

Client HBase Master

KDCPassword/Keytab Keytab

Obtain DT

YARNContainers

HBase RegionServers

YARNResourceManager

Client Ticket and DT YARN Ticket

and DT

DT

Page 25: Practical Kerberos with Apache HBase

25

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Proxy Users

⬢ A proxy is some intermediate service that provides access to a backend service– HBase Thrift and REST services

⬢ Each of these services have its own Kerberos principal and keytab used to communicate with HBase

⬢ These services are accessing HBase on behalf of another user.– The ticket is for the service, but we want it to appear as if it is [email protected]

⬢ ProxyUsers refer to a set of configuration values in Hadoop (core-site.xml)– hadoop.proxyuser.SERVICE.{hosts,groups,users}

⬢ Configuration-based approach to allow services to “pretend” to be a user without actually having that user’s credentials

Page 26: Practical Kerberos with Apache HBase

26

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Proxy Users

Client

KDC

Password/Keytab

HBaseProxy ServerClient Ticket

Server Ticket(Client principal)

Keytab

Keytab

Proxy Servers: HBase REST, HBase Thrift, Phoenix Query Server, etc

Page 27: Practical Kerberos with Apache HBase

27

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Kerberos authentication for HTTP-based services (SPNEGO)

⬢ The need to protect services using HTTP–Don’t want to reuse SASL

⬢ Simple and Protected GSSAPI Negotiation Mechanism (SPNEGO) RFC-4178– The Negotiate HTTP header– Built into cURL (--negotiate), most Java-based HTTP libraries, and web-browsers

⬢ Web-browsers often need special configuration to properly authenticate.– Firefox: network.negotiate-auth.delegation-uris, network.negotiate-auth.trusted-uris– Chrome: --auth-server-whitelist="*.domain" --auth-negotiate-delegate-whitelist="*.domain"

Page 28: Practical Kerberos with Apache HBase

28

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Troubleshooting: Prerequisites

⬢ Ensure a recent version of your JVM and Hadoop– Bugs exist in UserGroupInformation for certain JVMs (vendor+version)

⬢ Ensure that the unlimited strength Java Cryptographic Extensions (JCE) are installed on all nodes in the cluster– And that clients/servers are using that JVM installation!– Required for AES-256 encryption type on Kerberos keys (which you will likely get by default)

⬢ Ensure that you have DEBUG logging enabled for HBase services– Potentially, org.apache.hadoop.hbase.ipc=DEBUG is sufficient

⬢ Set the sun.security.krb5.debug system property to true in your application– Or sun.security.spnego.debug for debugging SPNEGO

Page 29: Practical Kerberos with Apache HBase

29

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Troubleshooting: Tips

⬢ Remember that DNS is the cornerstone– When reading logs, make sure that you see the expected fully-qualified domain names– Do not assume that DNS is correct: verify it.

⬢ Determine if an RPC issue is authentication or authorization– If you see an HBase-level error, it is likely an authorization issue– If you only see transport/connection-setup errors, it is likely an authentication issue

⬢ Remember that tickets expire– Cross-reference ticket lifetimes with application logs

⬢ Read the logs. Actually read them.– A vast majority of errors can be solved with appropriate logging JVM-debugging

Page 30: Practical Kerberos with Apache HBase

30

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Reference Material

⬢ “Hadoop and Kerberos: The Madness beyond the Gate”– https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/index.html

⬢ Oracle documentation– http://docs.oracle.com/javase/7/docs/technotes/guides/security/jaas/tutorials/GeneralAcnOnly.html– https://docs.oracle.com/javase/7/docs/jre/api/security/jaas/spec/com/sun/security/auth/module/Krb5

LoginModule.html

⬢ MIT Kerberos documentation– http://web.mit.edu/kerberos/

⬢ “Explain like I’m 5: Kerberos” (great low-level Kerberos write-up)– http://www.roguelynn.com/words/explain-like-im-5-kerberos/

⬢ KDiag: “Kerberos diagnostics for Hadoop”–Apache Hadoop >=2.8 or https://github.com/steveloughran/kdiag

Page 31: Practical Kerberos with Apache HBase

31

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Developing with Kerberos

⬢ Apache Directory’s Kerby project– Great for Kerberos authentication without Hadoop in the picture– http://directory.apache.org/kerby/downloads.html

⬢ Apache Hadoop’s MiniKDC– Built on top of Apache Directory– https://github.com/apache/hadoop/blob/release-2.7.3-RC2/hadoop-common-project/hadoop-min

ikdc/src/main/java/org/apache/hadoop/minikdc/MiniKdc.java

⬢ Support in HDFS, YARN, and HBase MiniCluster classes too

No excuse to not write tests!

Apache Directory is a trademark of the Apache Software Foundation

Page 32: Practical Kerberos with Apache HBase

32

© Hortonworks Inc. 2011 – 2016. All Rights Reserved32

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thanks!Email: [email protected]: @josh_elser

3dom via https://www.flickr.com/photos/steve_l/6674480535/in/album-72157629289333057/, CC-BY-NC

Thanks to those who gave feedback along the way: Brandon Wilson, Bryan Bende, Michael Stack, Randy Gelhausen, Steve Loughran.