28
Challenges Running an NFSv4-backed OSG Cluster Kevin Coffman [email protected] Center for Information Technology Integration University of Michigan

Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman [email protected] Center for Information Technology Integration University of Michigan

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Challenges Running an NFSv4-backed OSG Cluster

Kevin [email protected]

Center for Information Technology Integration

University of Michigan

Overview

● Basic NFSv4 in production

● Open Science Grid (OSG) Overview

● OSG Installation

● OSG Configuration

● Submitting a job!

● Authentication differences (AFS vs. NFSv4)

● Authentication futures

Basic NFSv4 file service in production

● Basic file storage● User name mappings● Home directories● Kernel builds, etc.

Open Science Grid Overview

● Architecture

– Head node & worker notes

– Core is NSF Middleware Initiative (including Globus,

Condor, kx.509)● Authentication

– X.509, kx.509, proxy certs● No cluster file-system required

– “Home”, Base, Data, Apps, Temp, Worker node temp

OSG Installation

● New Linux kernels, new NFSv4 code, new OSG releases, repeat!

● Base installation is done solely on head node

● Credentials needed

– Root access assumed for local file system access

● Mapping machine cred now necessary

– Kerberos credentials for NFS file system access

● Name-to-UID mapping issues

– Found the need for tools/scripts for flushing mappings

OSG Configuration

● Daemons (i.e., MonALISA and Condor) on head node and worker nodes require authentication for file system access

– Keytabs

– More name to UID mapping required

● Virtual Organization (VO) accounts

– DN to UNIX account name via grid-mapfile

– Name to UID mappings required for file system access

Submitting a job!

● Job submission uses X.509 authentication

– Need Kerberos authentication for file-system access

– Gatekeeper forks a job manager process for each job● Job manager process runs as the original user and needs

user’s credentials

● Verified works as expected using AUTH_SYS w/o requiring Kerberos credentials

MGRID Architecture

mod_ssl

mod_kx509

mod_kct

CHEF

Apache

Tomcat

KCT

GateKeeper

Resource

Grid Resource

KCA

kx509

kinit

User Workstation

KDC

Kerberos V5

SSL (Client Certificate required)

GSI

Kerberos

Kerberos

SASL

MGRID Portal

1

2

3

4

5

6

7

6

Authorization

Resource Mgr SASL

8

mod_jk

mod_php

LDAP

Authorization

LDAP

libpkcs11

Browser

Grid job authentication issues

● Jobs scheduled to run in the future● Long-running jobs (refreshing credentials)● Combination of both (future and long-running)● Distribution of user credentials to worker nodes

for file system access

Authentication differences(AFS vs NFSv4)

AFS NFSv4

Kernel uses tokens Kernel uses GSS contexts

Kernel assumes tokens were obtained prior to file access (klog)

Kernel requests GSS context on-demand at the time of the (first) file access

Single token for all file servers in a cell

Separate service ticket (really GSS context) needed for each server

Current Architecture

user

kernel

client server

user

process

GSSDSVC

GSSD

NFS NFSDgss context

cache

gss context

cacheCredentials

on Disk keytab

KDC

AS TGS

1

2

3 4

5

6

7

8

9

10

11

1213

Authentication futures

● SPKM3

– Allows us to stay in X.509 world

– Anonymous (DH)● Certificate on server to prevent MIM

– X.509 Certificates● LIPKEY

– Built on top of SPKM3

– Allows TLS-like password authentication

Linux kernel keys support(a.k.a. keyring)

● General credential storage in-kernel– thread-specific keyring

– process-specific keyring

– session-specific keyring (PAG-like via JOIN_SESSION_KEYRING)

● Different key types: ‘user’, ‘rpcsec_gss context’● Create, delete, link, search, revoke, etc.● Quotas and permissions● Referenced by serial # and description

MIT Kerberos ccache using keyring as backing storage

• Assumes a single “active” credentials cache• Can store more than one ccache in same session keyring• All user-level codeSession | +---> krb5_cc_active (key: contains 0x00004f12) | +---> /tmp/krb5cc_20010_XF45C2 (keyring: id is 0x000023cd) | | | +---> [email protected] (principal info) | +---> krbtgt/[email protected] | +---> nfs/[email protected] | +---> nfs/[email protected] | +---> pop/[email protected] | +---> [email protected] | +---> /tmp/krb5cc_20010_umich (keyring: id is 0x00004f12) | +---> [email protected] (principal info) +---> krbtgt/[email protected] +---> imap/[email protected]

Mount using keyring support

● Mount program will use keytab to set up machine credentials in keyring

● /sbin/request-key invoked and finds machine credentials

● Context is negotiated and “rpcsec_gss context” key instantiated

User access using keyring support

● Assumes they have credentials in keyring via kinit or PAM

– No more looking around blindly for creds in

filesystem

– /sbin/request-key invoked and finds user’s session-

specific credentials

Keyring issues

● Upcalls from asynchronous events● Still need to tie “rpcsec_gss context” keys to

Kerberos credentials

Future Architecture

user

kernel

client server

user

process

request-key

handler

SVC

GSSD

NFS NFSDgss context

cache

(in keyring)

gss context

cache

KDC

AS TGS

1

2 3

4

5

6 8

9

1011TGT

keytab

7

Questions / Discussion

http://www.citi.umich.edu/projects

References

● Open Science Grid– http://www.opensciencegrid.org

● MonALISA– http://monalisa.cacr.caltech.edu

● Condor– http://www.cs.wisc.edu/condorCondor

● Keyring– Kernel Source: /Documentation/keys.txt

Backup Slides

Krb5: Obtaining gss context

● TGT: currently stored in file system● Per NFSD service ticket: currently stored in file

system● GSSD locates user credentials by convention

(/tmp/krb5cc_uid)● Synchronizing gss_context and credential

problematic

Linux credential interface● New system calls for kernel credential

placement● Available for upcoming PAG implementation● Passed via upcall to GSSD● Credential vs. gss context management no

longer a problem

Linux Krb5 kernel credential ● Pass TGT to kernel as credential ● Stored in user process (PAG)● Passed to GSSD via gss_init_sec_context upcall● GSSD manages Krb5 NFSD service tickets● Multiple in kernel TGTs vs. cross realm

authentication

Client: LIPKEY with SPKM3

● Initiator– Anonymous SPKM3 client

● Credential:– LIPKEY username and password

– sent to server encrypted in SPKM3 session key

● Context– per <user, nfsd> LIPKEY(?) and SPKM3 gss

context

Linux LIPKEY kernel credential

● LIPKEY credential (username and password) is per server.

● Not stored in kernel● Instead, store information to be passed to GSSD

which will prompt user for LIPKEY password for each NFSD.

Client: SPKM with X509 ● Initiator

– password for long term user X.509 private key

● Credential– short term proxy X509 credential and private key

(grid-proxy-init)

● Context– per <user, nfsd> SPKM gss context

Linux SPKM kernel credential

● Pass proxy (short term) X509 credential and private key to kernel as credential

● Stored in user process (PAG)● Passed to GSSD via gss_init_sec_context upcall● GSSD manages CA hierarchy and credential

checking