10
Agenda Introductions SWAN and NXCals - Current state-of-the-art Automatic Deployment of NXCals software to CVMFS / SWAN Authentication to data-access-api Proposal to reorganize hardware for NXCals cluster Discussion on hadoop software update Other business Adding new users Support channels for Hadoop Service and SWAN Service Spark Summit EU 2018 1

Hadoop service changes - CERN Indico

Embed Size (px)

Citation preview

Agenda• Introductions

• SWAN and NXCals - Current state-of-the-art

• Automatic Deployment of NXCals software to CVMFS / SWAN

• Authentication to data-access-api

• Proposal to reorganize hardware for NXCals cluster

• Discussion on hadoop software update

• Other business• Adding new users• Support channels for Hadoop Service and SWAN Service• Spark Summit EU 2018 1

SWAN - Integrating Services

2

Compute Storage

Software

Isolation| local compute

EP-SFT: LCG_releasesIT-ST-FDO: CVMFS Service

IT-ST-FDOEOS Service

IT-DB-SASHadoop Service

[1] SWAN team consists of members from EP-SFT, IT-DB and IT-ST groups

3

SWAN Interface

http://lcginfo.cern.ch

SWAN – Architecture

IT Hadoop and Spark clusters

Web portal

Container Scheduler

CERN Resources

EOS(Data)

CERNBox(User Files)

CVMFS(Software)

User 1 User 2 User n...

AppMaster

Spark Worker

Python task

Python task

SSO

Spark Driver

4

SWAN, Spark and NXCals – Current-state-of-art

5

• SWAN is fully integrated with IT Hadoop and Spark clusters• Publishing Hadoop (& Spark) software and cluster configuration to CVMFS• Possible to launch spark executors on the cluster from SWAN notebook• SparkMonitor for live monitoring and troubleshooting of Spark application• Authentication, Encryption etc

• NXCals & SWAN• Special configuration bundle for NXCALS & NXCals_TESTBED• Certificate stores to access data-access-api over SSL• Manual deployment of the software

• Notification by email whenever new release is available• EP-SFT software librarians deploy new version to dev3 release• SWAN service amends the configuration bundle

• Frequent releases of NXCals software was not foreseen / anticipated

Automating the deployment of NXCals software

Approach 1

• Provide a way to programmatically query NXCals version (Responsibility: NXCals team)

• Assert that the new version is working and compatible with infrastructure components –SWAN, Hadoop cluster etc. (Responsibility: Hadoop Service and SWAN Service)• This is key to avoid deploying broken / incompatible software

• Deploy the software (Responsibility: EP-SFT)

• Make Spark Configuration agnostic to NXCals software versions (Responsibility: Hadoop Service and SWAN Service)

• Prerequisite: latest software is deployed to dev3 releases, only works if dev3 is stable

6

Automating the deployment of NXCals software

Approach 2

• Download latest NXCals version from artifactory.cern.ch while setting up Spark Session (can be abstracted in our SparkConnector)

# maven repositories

--conf spark.jars.repositories=http://artifactory.cern.ch/ds-release-local,http://artifactory.cern.ch/beco-release-local,http://artifactory.cern.ch/ds-hortonworks-cache \

# release

--conf spark.jars.packages=cern.nxcals:nxcals-data-access:0.1.108 \

# python library

spark.addPyFile(“http://artifactory.cern.ch/ds-release-local/cern/nxcals/nxcals-data-access-python3/0.1.108/nxcals_data_access_python3-0.1.108-py3.6.egg")

• Additional time required to start spark application for the first time

• Prerequisite: • Provide a way to programmatically query NXCals version (Responsibility: NXCals team)• Publish egg distribution on artifactory

7

Automating the deployment of NXCals software

Approach 3

• Publishing software to EOS? (e.g /eos/project/s/swan/public/)

8

9

• Authentication to data-access-api• We are looking to eliminate manual generation of Kerberos ticket in favor of

automatically generate user credentials for service (e.g Hadoop, Spark) using proxy users. How to proceed with authentication to data-access-api?

• Proposal to reorganize hardware for NXCals cluster

• Discussion on hadoop software update

• Other business• Adding new users

• Support channels for Hadoop Service and SWAN Service

• Spark Summit EU 2018

SWAN support channels

• Support ticket via SNOW (FE: SWAN), general feedback welcome to• [email protected]

[email protected]

10

Hadoop and Spark support channels

• Support ticket via SNOW (FE: Hadoop and Spark support), general feedback welcome to• [email protected]