Upload
lyduong
View
230
Download
0
Embed Size (px)
Citation preview
Hadoop Elephant in Active Directory Forest
Marek Gawiński, Arkadiusz OsińskiAllegro Group
Agenda
● Goals and motivations● Technology stack● Architecture evolution● Automation integrating new servers● Making AD users and groups visible to Linux● Making architecture non-vulnerable to AD
service inaccessibility● Auto-deployment clients software on
desktops
Allegro Hadoop cluster in numbers
4 terabytes RAM2 petabytes disk space47 datanodes79 projects612 users
Goals and motivations
● Secured cluster● Central authentication and authorisation ● Compliance for real and project users and
groups● Cluster resources available from desktop● Integrating new servers automatically● Making whole architecture non-vulnerable
for failures or timeouts to AD● Auto-deployment and autoconfiguration of
Hadoop clients’ software on users desktops
Technology stack
● Cloudera CDH5● MIT Kerberos● Microsoft Active Directory● FreeIPA● sssd● puppet● msktutil● Hadoop desktop client
History - FreeIPA+FreeIPA Kerberos
Client
Secured Hadoop cluster
FreeIPA User
Local groups management
Kerberos KDCUser/pass
Kerberos Service Ticket
Che
ck u
ser/p
ass
Internal hadoop credsCheck groups
History - FreeIPA+own Kerberos
Client
Secured Hadoop cluster
FreeIPA User
Local groups managementKerberos Service Ticket
Che
ck u
ser/p
ass
User/pass
Inte
rnal
had
oop
cred
s
Check groups
Kerberos KDC
Kerberos KDC MIT
History - FreeIPA+own Kerberos+AD
Client
Secured Hadoop cluster
FreeIPA User
Local groups management
Kerberos KDC MIT
Kerberos Service Ticket
Che
ck u
ser/p
ass
AD User&Groups
AD KerberosChe
ck u
ser/p
ass
User/pass
Internal hadoop credsCheck groups
Check groupsUser/pass
Final - own Kerberos+AD
Client
Secured Hadoop cluster
Kerberos Service Ticket
AD User&Groups
AD KerberosChe
ck u
ser/p
ass
Kerberos KDC MIT
Internal hadoop creds
Check groupsUser/pass
Integrating new Linux servers automatically with AD
AD User&Groups
AD Kerberos
Msktutil
Kerberos keytab
Create user
Create principal
Integrating new Linux servers automatically with AD
define get_ad_keytab ( $path = '', ...) { ... $realm = 'SOME_REALM' $pass = hiera('hadoop_prod/ad/krb_manager_pass') $principal = "${title}/${host}@${realm}" $command = "echo ${pass} | kinit _hadoop_manager@${realm}; \ /usr/local/bin/add_ad_princ.sh ${title} ${host} ${path}; kdestroy" ...
msktutil -c -s $PRINCIPAL --upn $PRINCIPAL -k $KEYTAB \ --computer-name $COMPUTER_NAME \ --server $SERVER_KRB \ --realm $REALM \ -b $USER_LDAP_ROOT \ --dont-expire-password \ --description "\"$DESCRIPTION\"" \ --user-creds-only
Integrating new Linux servers automatically with AD
root@nn1:~# klist -ketKeytab name: FILE:/etc/krb5.keytabKVNO Timestamp Principal---- ------------------- ------------------------------------------------------ 1 08/17/2015 13:26:45 host/[email protected] (aes256-cts-hmac-sha1-96) 1 08/17/2015 13:26:45 host/[email protected] (aes128-cts-hmac-sha1-96) 1 08/17/2015 13:26:45 host/[email protected] (des3-cbc-sha1) 1 08/17/2015 13:26:45 host/[email protected] (arcfour-hmac) 1 08/17/2015 13:26:45 host/[email protected] (camellia128-cts-cmac) 1 08/17/2015 13:26:45 host/[email protected] (camellia256-cts-cmac) 4 08/17/2015 13:30:23 [email protected] (arcfour-hmac) 4 08/17/2015 13:30:23 [email protected] (aes128-cts-hmac-sha1-96) 4 08/17/2015 13:30:23 [email protected] (aes256-cts-hmac-sha1-96) 4 08/17/2015 13:30:23 host/[email protected] (arcfour-hmac) 4 08/17/2015 13:30:23 host/[email protected] (aes128-cts-hmac-sha1-96) 4 08/17/2015 13:30:23 host/[email protected] (aes256-cts-hmac-sha1-96)
Integrating new Linux servers automatically with AD
Separated Subtree in AD structure
System Security Services Daemon
● Identity and authentication● Multiple providers (FreeIPA, LDAP, AD)● High availability for backends● Provides PAM and NSS modules● Caching● > 1.11.x - stable support for AD forest auth
System Security Services Daemon
AD schema with no modifications
/etc/sssd/sssd.conf
[domain/AD.REALM]id_provider = adad_server = h1, h2, h3ad_backup_server = hb1, hb2, hb3auth_provider = adchpass_provider = adaccess_provider = adenumerate = Falsekrb5_realm = AD.REALMldap_schema = adldap_id_mapping = Truecache_credentials = Trueldap_access_order = expireldap_account_expire_policy = adldap_force_upper_case_realm = truefallback_homedir = /home/AD.REALM/%udefault_shell = /bin/falseldap_referrals = false
root@nn1:~# id _hc_tech_prod |tr "," "\n"uid=1827653611(_hc_tech_prod)gid=1827600513(domain users)groups=1827600513(domain users)1827652945(_gr_hc_users_common)1827647474(_gr_hc_hadoop_prod)1827652940(_gr_hc_project1_prod)1827652919(_gr_hc_project2_prod)
Making whole architecture non-vulnerable for failures
/etc/sssd/sssd.conf
[nss]memcache_timeout = 3600
Local filesystem nss cache
Active Closest DC
Fallback servers in Remote DC
Auto-deployment and autoconfiguration on desktops
● Install script for Hadoop Client on desktops● Refresh configs with currently prod environment● Support for HDFS/YARN/Hive/Spark
[marek.gawinski:~/ALLEHADOOP] $ sh env.shPassword for [email protected]: **************
[marek.gawinski:~/ALLEHADOOP] $ klistTicket cache: FILE:/tmp/krb5cc_1511317717Default principal: [email protected]
Valid starting Expires Service principal09/04/15 23:31:35 09/05/15 09:31:35 krbtgt/[email protected]
renew until 09/11/15 23:31:33
Auto-deployment and autoconfiguration on desktops
[marek.gawinski:~/ALLEHADOOP] $ hivehive (default)> show databases;OKdatabase_nametpch_benchmarks...xwing_pocTime taken: 0.816 seconds, Fetched: 72 row(s)hive (default)> set hive.execution.engine = tez;hive (default)> select count(*) from table1;
[marek.gawinski:~/ALLEHADOOP] $ hdfs dfs -lsFound 8 itemsdrwxr-xr-x - marek.gawinski hadoop 0 2015-08-06 02:00 .Trashdrwxr-xr-x - marek.gawinski hadoop 0 2015-07-28 21:01 .hiveJarsdrwxr-xr-x - marek.gawinski hadoop 0 2015-07-09 10:43 .sparkStagingdrwx------ - marek.gawinski hadoop 0 2015-05-22 02:35 .stagingdrwxr-xr-x - marek.gawinski hadoop 0 2015-08-31 13:11 oozie1-rw-r--r-- 3 marek.gawinski hadoop 43 2015-05-26 15:26 ozzietest1.hql-rw-r--r-- 3 marek.gawinski hadoop 13 2015-08-31 12:30 pwd.txtdrwxr-xr-x - marek.gawinski hadoop 0 2015-04-16 16:21 tables
Auto-deployment and autoconfiguration on desktops
Auto-deployment and autoconfiguration on desktops
Auto-deployment and autoconfiguration on desktops
Auto-deployment and autoconfiguration on desktops
Benefits
● One standard for access control to all company resources
● Every new employee automatically can play with Hadoop with no additional effort
● One password to all systems
Thank you!
Questions?