BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apache Ranger

Preview:

Citation preview

How to manage authorization rules on Hadoop cluster with Apache Ranger

Krzysztof Adamski

3

We deliver innovativeIT services for the ING Groupall over the world.

ING Services Polska

4

SocialHarmonisation

Digitalisation

Customer Call CentresWebservices

In the Cloud

Virtual Bank

Software as a Service

Infrastructure as a Service

SeamlessConcept of ONE

No geographical boundaries

Exception Handling

APIs

My identity

Straight through processing

Customer experiencePersonalisation

Automation

Standardisation

Agile

Self Service

Mobile FirstReal Time

Security

24/7

‘Outside in and Inside out’

Omnichannel

Zero Touch

Customer journeys

Analytics

Big Data

Digitalised branches

Building standard for new generation digital bank

Cloud Platform as a service

Data Centre

197

289

58

10Średnia wieku w ISP

20-30 31-40 41-50 50-70

33,26

People matters

55416,43% (91)83,57%

(463)

5

How secure is your cluster?

Ownership and permissions look fine…

How secure is your cluster?

That must have been a sophisticated hack…

3 x A or 4 as you wish

Hadoop authentication methods

Simple

Hadoop authentication methods

Kerberos

HDFSHiveServer 2

A B C

KDC

Use Hive ST, submit query

Hive gets Namenode (NN) service ticket

Hive creates map reduce using NN ST

Ranger

Knox gets service ticket for Hive

Knox runs as proxy user using Hive ST

Original request with user id and password

Client gets query result

Client

Apache Knox

Active Directory

Hortonworks Ring of Defense Architecture

hortonworks.com

What is IPA?

redhat.com

AD Account mapping

redhat.com

SSSD integration

redhat.com

IPA for central UAM• This works great for OS• Can this be used by Hadoop?• Can this be used by Ranger?

HDFSHiveServer 2

A B C

KDC

Use Hive ST, submit query

Hive gets Namenode (NN) service ticket

Hive creates map reduce using NN ST

Ranger

Knox gets service ticket for Hive

Knox runs as proxy user using Hive ST

Original request with user id and password

Client gets query result

Client

Apache Knox

Active Directory

Hortonworks Ring of Defense Architecture

hortonworks.com

Installation through ambari

hortonworks.com

Installation through ambari

hortonworks.com

HDP 2.3.4

Watch for ranger.usersync.source.impl.class property

Enable Ranger for HDFS

hortonworks.com

hortonworks.com

hortonworks.com

Ranger audit

• It is recommended that you store audits in Solr and HDFS, and disable Audit to DB.

• Otherwise you can expect performance issues• Audit is stored in a single table• No partitions• No data retention

IPA as a central UAM• This works great for OS• Can this be used by Hadoop? Works great for PA in IPA• Can this be used by Ranger? Not yet. You still need to bind to LDAP.

Ranger KMS

One big advantage of encryption in HDFS is that even privileged users, such as the “hdfs” superuser, can be blocked from viewing encrypted data.

Caveats• Ranger (the same goes for Sentry) feels like slapped on security• User synchronization can be very slow with many users due to

architecture issues• Doesn’t manage HDFS ACLS and requires Hive user access… defeating

end to end security• Vulnerability scans just kill Ranger ;)

Caveats

mysql> select count(*) from x_user;+----------+| count(*) |+----------+| 99 |+----------+1 row in set (0.00 sec)

mysql> select count(*) from x_group;+----------+| count(*) |+----------+| 45 |+----------+1 row in set (0.00 sec)

mysql> select count(*) from x_group_users;+----------+| count(*) |+----------+| 645697 |+----------+1 row in set (0.13 sec)

mysql> select sum(user_id) from (select count(distinct user_id) user_id from x_group_users group by p_group_id) temp;+--------------+| sum(user_id) |+--------------+| 603 |+--------------+1 row in set (1.21 sec)

mysql> delete from x_group_users where id not in(

select minid from (select min(id) as minid from x_group_users group by

p_group_id,user_id) as temp);

Make it better• https://issues.apache.org/jira/browse/RANGER-827 usersync SSSD integration (sync excplicitly specified group)• https://issues.apache.org/jira/browse/HADOOP-12751 allow users with domain suffix (avoid naming collision)• https://issues.apache.org/jira/browse/HIVE-12981 the same for Hive• https://issues.apache.org/jira/browse/RANGER-842 PAM integrated authentication for Ranger

Other upcoming features (0.6)• Tag based policies• Geolocation based policies• Deny and exclude policies• Hive Metastore plugin

Some take away tips • Install updates on a regular basis• Isolate your cluster from the rest of the network• Kerberize your cluster• Secure the user interfaces• dfs.namenode.acls.enabled• fs.permissions.umask-mode• Watch for superusers (hadoop.proxyuser settings)• Change OS default umask (watch for the upgrades and config permissions)• Make sure hive warehouse hdfs path is protected• Implement Ranger• Just don’t sync your whole AD with it ;)

krzysztof.adamski@ingservicespolska.pl

@adamskikrzysiek

http://pl.linkedin.com/in/adamskikrzysztof

And yes. We are hiring

Recommended