21
©2019 Discover Financial Services Confidential and Proprietary Do Not Copy or Distribute | 1 AIR9 Analytics Environment AIR9 Analytics Environment Brandon Harris / Anirudh Pathe

AIR9 Analytics Environment - OpenShift · 2020. 4. 9. · ©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 2 AIR9 Analytics Environment

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: AIR9 Analytics Environment - OpenShift · 2020. 4. 9. · ©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 2 AIR9 Analytics Environment

©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 1

AIR9 Analytics Environment

AIR9 Analytics Environment

Brandon Harris / Anirudh Pathe

Page 2: AIR9 Analytics Environment - OpenShift · 2020. 4. 9. · ©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 2 AIR9 Analytics Environment

©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 2

AIR9 Analytics Environment

The opinions expressed in this presentation are those of the presenters,in their individual capacities, and not necessarily those of Discover.

Page 3: AIR9 Analytics Environment - OpenShift · 2020. 4. 9. · ©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 2 AIR9 Analytics Environment

©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 3

AIR9 Analytics Environment

DISCOVER

FINANCIAL SERVICES

Page 4: AIR9 Analytics Environment - OpenShift · 2020. 4. 9. · ©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 2 AIR9 Analytics Environment

©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 4

AIR9 Analytics Environment

Discover is one of the largestdirect banks in the United States, offering a broad array of products, including credit cards, personal loans, student loans, deposit products, and home equity loans.

The Discover brand is knownfor rewards, services, and value.

Across all direct banking products,Discover seeks to help customersmeet their financial needs, andachieve brighter financial futures.

Credit Cards▪ $144Bn Card Sales Volume▪ $74Bn in Credit Card Receivables

Digital Banking▪ $52Bn+ Consumer Deposits

▪ $10Bn Private Student Loans▪ $8Bn Personal Loans

4

Page 5: AIR9 Analytics Environment - OpenShift · 2020. 4. 9. · ©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 2 AIR9 Analytics Environment

©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 5

AIR9 Analytics Environment

AIR9

What is it?

Page 6: AIR9 Analytics Environment - OpenShift · 2020. 4. 9. · ©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 2 AIR9 Analytics Environment

©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 6

AIR9 Analytics Environment

The Elevator Pitch

AIR9 was built for data scientists who are frustratedby the fragmentation of today’s analytics environmentsand the difficulty in accessing the latest toolsand consistent data sets.

AIR9 provides a mechanism to jump-start analytics work by combining ourcloud-scale data warehouse with the freedom of on-demand, scalable, and secure analytics environments.

Page 7: AIR9 Analytics Environment - OpenShift · 2020. 4. 9. · ©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 2 AIR9 Analytics Environment

©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 7

AIR9 Analytics Environment

The AIR9 platform allows for increased agility, decreased long-term costs, and faster time-to-value.

Past Challenges Future Opportunities

Fragmented Environment/Restricted Capabilities

Needs of business users for the latest tools and compute capabilities

cannot be achieved in our current environment

— “Frequently running out of space”

— “Takes many months to deploy a model”

— “Inconsistent data between environments”

— “Can’t install the packages I need

or get access to the latest tools”

Faster Access to Tools and Updated Environments

Gone are the days waiting weeks or longer for the latest versions

of tools; launch your own, personalized environments in minutes.

Freedom to Innovate

New opportunities to use different technologies

with the ability to evaluate without a long-term

commitment to infrastructure or specific tools

Collaboration and Centralization

Save and share datasets between different analytics tools

(H2O/Python/R/SAS); a single, well-curated website with

all documentation and tools displayed together.

Page 8: AIR9 Analytics Environment - OpenShift · 2020. 4. 9. · ©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 2 AIR9 Analytics Environment

©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 8

AIR9 Analytics Environment

▪ Self-service model training

environments with choice of

hardware and tools

▪ Extract datasets from warehouse and

use them interactively

▪ Evaluate model performance

well as model explainability

▪ Model/code promotion workflow and

deployment pipeline

AIR9: The Intersection of Code–Data–Compute

Data

▪ Providers

▪ Team Folders

▪ Drop Bucket

Code

▪ Github

Compute

▪ Hardware

▪ Software

Page 9: AIR9 Analytics Environment - OpenShift · 2020. 4. 9. · ©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 2 AIR9 Analytics Environment

©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 9

AIR9 Analytics Environment

TECHNICAL DESIGN

Page 10: AIR9 Analytics Environment - OpenShift · 2020. 4. 9. · ©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 2 AIR9 Analytics Environment

©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 10

AIR9 Analytics Environment

First level

Second level

March 1st 2019

▪ Scalable AIR9 Application

▪ Operational SQL database

▪ 6 REST API Javaservices on OCP

▪ Jenkins server with 6 jobs

▪ 10 AWS Lambda functions and 3 AWS Step Machines

▪ 2 AWS SNS topics, EC2 instance hosting Python REST API service

▪ Operational SQL database

“Make everything as “simple as possible, “but not simpler.”

—Albert Einstein

October 1st 2019

Page 11: AIR9 Analytics Environment - OpenShift · 2020. 4. 9. · ©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 2 AIR9 Analytics Environment

©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 11

AIR9 Analytics Environment

Architecture and Integration

Page 12: AIR9 Analytics Environment - OpenShift · 2020. 4. 9. · ©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 2 AIR9 Analytics Environment

©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 12

AIR9 Analytics Environment

▪ Users own and manage their container’s lifecycle.

▪ Auto-termination of dormant containers.

▪ Prometheus and Instanato monitor and generate container-usage metrics.

AIR9 Container Lifecycle

Submitted

Error

Running

Terminate

Suspend

Terminate

Failed

Success

Terminate

Suspend

Resume

Page 13: AIR9 Analytics Environment - OpenShift · 2020. 4. 9. · ©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 2 AIR9 Analytics Environment

©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 13

AIR9 Analytics Environment

AIR9 modeling environments are

over-subscribed on each node.

Each environment is only guaranteed

10% of the requested

CPU and RAM, and allowed to burst

up to the request limit.

▪ Workloads are very bursty: short-

lived and resource intensive

▪ User base is global and distributed,

pods may be idle during regional off-

hours

▪ Lower pod request values

ensure guarantees closer to the

actual node resource utilization

AIR9 Resource Allocation

SAS Container

Dedicated CPU Container CPU Burstable Limit

Dedicated RAM Container RAM Burstable Limit

RStudio Container

Dedicated CPU Container CPU Burstable Limit

Dedicated RAM Container RAM Burstable Limit

H2O Container

Dedicated CPU Container CPU Burstable Limit

Dedicated RAM Container RAM Burstable Limit

K8S Node

Free Free

SAS SAS

RStudio RStudio

H2O H2O

CPU RAM

Page 14: AIR9 Analytics Environment - OpenShift · 2020. 4. 9. · ©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 2 AIR9 Analytics Environment

©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 14

AIR9 Analytics Environment

BUSINESS VALUE

Page 15: AIR9 Analytics Environment - OpenShift · 2020. 4. 9. · ©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 2 AIR9 Analytics Environment

©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 15

AIR9 Analytics Environment

Value to the Analysts/Scientists

▪ Ability to provision environments on

Demand

▪ Collaboration within multiple tools

▪ Ability to download latest packages

▪ Centralized community

▪ Centralized Help

▪ Solve size limitations

Guiding Principles

Easy UI/UXAbstract

Tech Complexity

Centralize Help

Latest Versions

Page 16: AIR9 Analytics Environment - OpenShift · 2020. 4. 9. · ©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 2 AIR9 Analytics Environment

©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 16

AIR9 Analytics Environment

We are seeing techniques being

tested and evaluated that had

not existed previously.

▪ Distributed machine learning

with H2O & Sparking Water

▪ Neural networks predicting potential

incidents within our customer web

and mobile journeys.

▪ LSTM models to flag consumer calls

on certain topics

▪ Analyze trained models

for fairness and bias

AIR9 is Driving Innovation

Page 17: AIR9 Analytics Environment - OpenShift · 2020. 4. 9. · ©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 2 AIR9 Analytics Environment

©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 17

AIR9 Analytics Environment

There is no central analytics community

at Discover.

▪ Siloed teams, making collaboration

difficult.

▪ Centralizing capabilities onto a single

platform allow users to communicate

and help one another.

▪ A shared and consistent experience

helps isolated teams feel included.

AIR9 is Driving Collaboration

Page 18: AIR9 Analytics Environment - OpenShift · 2020. 4. 9. · ©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 2 AIR9 Analytics Environment

©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 18

AIR9 Analytics Environment

AIR9 has created efficiencies

in model training and development

as well as driving tangible

business results.

▪ Delivering updated

tools/environments in hours, not

weeks

▪ Archiving live environments for

compliance and

regulatory needs

▪ Faster model iteration and

more diverse AI/ML technologies

AIR9 Wins

Page 19: AIR9 Analytics Environment - OpenShift · 2020. 4. 9. · ©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 2 AIR9 Analytics Environment

©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 19

AIR9 Analytics Environment

Onboarding users to the AIR9

platform is a dedicated, multi-day event

that partners IT teams

with our business teams.

Over 60% of analytics users onboarded

to date.

▪ Platform and Tool Training

▪ Identification and tracking

of any missing capabilities

▪ Documentation is programmatically

updated (via GitHub/Sphinx)

User Adoption and Platform Growth

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

January February March April May June July August September October

User Adoption (% of Analytics Users)

Page 20: AIR9 Analytics Environment - OpenShift · 2020. 4. 9. · ©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 2 AIR9 Analytics Environment

©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 20

AIR9 Analytics Environment

APPENDIX

Page 21: AIR9 Analytics Environment - OpenShift · 2020. 4. 9. · ©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 2 AIR9 Analytics Environment

©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 21

AIR9 Analytics Environment