21
Open Science Grid Project DASH: Securing Direct MySQL Database Access for the Grid D. Malon, E. May, D. Ratnikov, A. Vaniachine Argonne National Laboratory M. Vranicar, J. Weicher PIOCON Technologies XV International Conference on Computing in High Energy and Nuclear Physics T.I.F.R., Mumbai, India February 13-17, 2006

Open Science Grid Project DASH: Securing Direct MySQL Database Access for the Grid D. Malon, E. May, D. Ratnikov, A. Vaniachine Argonne National Laboratory

Embed Size (px)

Citation preview

Page 1: Open Science Grid Project DASH: Securing Direct MySQL Database Access for the Grid D. Malon, E. May, D. Ratnikov, A. Vaniachine Argonne National Laboratory

Open Science Grid

Project DASH: Securing Direct MySQL

Database Access for the GridD. Malon, E. May, D. Ratnikov, A. Vaniachine

Argonne National Laboratory

M. Vranicar, J. WeicherPIOCON Technologies

XV International Conference on Computing in High Energy and Nuclear PhysicsT.I.F.R., Mumbai, India

February 13-17, 2006

Page 2: Open Science Grid Project DASH: Securing Direct MySQL Database Access for the Grid D. Malon, E. May, D. Ratnikov, A. Vaniachine Argonne National Laboratory

CHEP06 Mumbai India Alexandre Vaniachine (ANL)

Open Science Grid

2

Databases and Grids

• Databases also play a critical role in grid middleware: file catalogues, monitoring, etc.

• Crosscutting the computational grid infrastructure, a database hyper-infrastructure emerges

• In addition to petabytes of file-based event data, high energy physics applications require access to non-event data (detector conditions, calibrations, etc.) stored in relational databases

Workload Orchestration

OSG WLCG NorduGrid

File Transport Production DB

Non-LHC Sites ATLAS Sites

Production DB

Sites

Sites

Cluster

Head Node Edge Services

Worker Node Worker Node Worker Node

Monitoring DB

CMS Sites

RFT Database

PanDA DB

Conditions DB

Meta-data DB

RLS Database

RLS Database RLS Database

Large Scale DistributedComputationsManagement

System

World-Wide Federation ofComputational

Grids

Workload Orchestration

OSG WLCG NorduGrid

File Transport Production DB

Non-LHC Sites ATLAS Sites

Production DB

Sites

Sites

Cluster

Head Node Edge Services

Worker Node Worker Node Worker Node

Monitoring DB

CMS Sites

RFT Database

PanDA DB

Conditions DB

Meta-data DB

RLS Database

RLS Database RLS Database

Large Scale DistributedComputationsManagement

System

World-Wide Federation ofComputational

Grids

Page 3: Open Science Grid Project DASH: Securing Direct MySQL Database Access for the Grid D. Malon, E. May, D. Ratnikov, A. Vaniachine Argonne National Laboratory

CHEP06 Mumbai India Alexandre Vaniachine (ANL)

Open Science Grid

3

Project DASH

• As grid computing technologies mature, development must focus on database and grid integration

• New technologies are required to bridge the gap between data accessibility and the increasing power of grid computing used for distributed event production and processing

• The Database Access for Secure Hyperinfrastructure (DASH) project is funded by the DOE Small Business Innovative Research Program to build and test secure high-performance database access technology for distributed computing

www.piocon.com/DASH.php

A project of PIOCON Technologies, Inc and Argonne National Laboratory

Page 4: Open Science Grid Project DASH: Securing Direct MySQL Database Access for the Grid D. Malon, E. May, D. Ratnikov, A. Vaniachine Argonne National Laboratory

CHEP06 Mumbai India Alexandre Vaniachine (ANL)

Open Science Grid

4

Database Access on the Grid

Two different architectures:• A separate middleware server does the grid authorization:

• OGSA-DAI: SOAP/XML + XML binary extensions

• Spitfire (EDG WP2): SOAP/XML text-only data transport

• Perl DBI database proxy (ALICE): SQL data transport

• Oracle 10g (separate authorization layer)

• Grid middleware is integrated in database server process:• Instead of surrounding database with external secure middleware layers

the safety features are embedded inside of the code

• By pushing secure authorization into the database engine the inefficient data transfer bottlenecks are eliminated

Page 5: Open Science Grid Project DASH: Securing Direct MySQL Database Access for the Grid D. Malon, E. May, D. Ratnikov, A. Vaniachine Argonne National Laboratory

CHEP06 Mumbai India Alexandre Vaniachine (ANL)

Open Science Grid

5

Embedded Security Approach

• The embedded security approach is listed among the top ten innovations in security by the panel of experts convened by Battelle:– “The Global Cyber Net: Communications and

information are the lifeblood of security. Today we enjoy a worldwide web, which is open but unsecured. In the future, we will have a global cyber net that is faster and better protected than today… Software will contain embedded safety features inside of the code rather than just surrounding it.”

http://www.battelle.org/forecasts/defense.stm

Page 6: Open Science Grid Project DASH: Securing Direct MySQL Database Access for the Grid D. Malon, E. May, D. Ratnikov, A. Vaniachine Argonne National Laboratory

CHEP06 Mumbai India Alexandre Vaniachine (ANL)

Open Science Grid

6

End-to-End Secure Transport• DASH technology bridges the gap between data accessibility and

the increasing power of grid computing• To overcome database access inefficiencies inherent in a traditional

middleware approach the DASH project implements secure authorization on the transport level

• Pushing the grid authorization into the database engine eliminates the middleware message-level security layer and delivers transport-level efficiency of SSL/TLS protocols for grid applications

• The DASH proof-of-concept prototype provides Globus grid proxy certificate authorization technologies for MySQL database access control

• DASH technology brings database access efficiencies similar to the https advantages introduced in the Globus Toolkit 4.0

• The database architecture with embedded grid authorization provides a foundation for secure end-to-end data processing solutions for the grids

Page 7: Open Science Grid Project DASH: Securing Direct MySQL Database Access for the Grid D. Malon, E. May, D. Ratnikov, A. Vaniachine Argonne National Laboratory

CHEP06 Mumbai India Alexandre Vaniachine (ANL)

Open Science Grid

7

Aspect-Oriented Programming

• To avoid a brittle, monolithic system DASH uses an aspect-oriented programming approach

• By localizing Globus security concerns in a software aspect, DASH achieves a clean separation of Globus Grid Security Infrastructure dependencies from the MySQL server code

• During the database server build, the AspectC++ tool automatically generates the transport-level code to support a grid security infrastructure

• www.aspectc.org

Page 8: Open Science Grid Project DASH: Securing Direct MySQL Database Access for the Grid D. Malon, E. May, D. Ratnikov, A. Vaniachine Argonne National Laboratory

CHEP06 Mumbai India Alexandre Vaniachine (ANL)

Open Science Grid

8

Automatic Code Generation

cbk.c

grid.ah

vio.c

GlobusGSI code

MySQLdatabase

server code

Auto-generated grid-enabled

MySQLdatabase

server code

DASH grid security

aspects code

tls.c

OpenSSLTransport

Level Security

code

cbk.ccbk.c

grid.ah

vio.c

GlobusGSI code

MySQLdatabase

server code

Auto-generated grid-enabled

MySQLdatabase

server code

DASH grid security

aspects code

tls.ctls.c

OpenSSLTransport

Level Security

code

Page 9: Open Science Grid Project DASH: Securing Direct MySQL Database Access for the Grid D. Malon, E. May, D. Ratnikov, A. Vaniachine Argonne National Laboratory

CHEP06 Mumbai India Alexandre Vaniachine (ANL)

Open Science Grid

9

AOP is the Next ‘Big Thing’A 2001 paper on Aspect Oriented Programming is on Top 10 Downloads from ACM’s Digital Library

• Paper by our collaborators from Illinois Institute of Technology

ATLAS experience with AOP was first reported at the previous CHEP04

Page 10: Open Science Grid Project DASH: Securing Direct MySQL Database Access for the Grid D. Malon, E. May, D. Ratnikov, A. Vaniachine Argonne National Laboratory

CHEP06 Mumbai India Alexandre Vaniachine (ANL)

Open Science Grid

10

Testing New Functionalities

• Prototype servers built with DASH technology are being tested in ANL, BNL, CERN and U Geneva

• We thank to– Jason Smith (BNL)– Yuri Smirnov (BNL)– Frederik Orellana (U Geneva)

Among the new functionalities are• Check for the proxy expiration time• Host name checking (to reject impersonation)

Page 11: Open Science Grid Project DASH: Securing Direct MySQL Database Access for the Grid D. Malon, E. May, D. Ratnikov, A. Vaniachine Argonne National Laboratory

CHEP06 Mumbai India Alexandre Vaniachine (ANL)

Open Science Grid

11

Packaging Challenge

• Initial response from our beta-testers suggested that because of the globus gsi libraries dependencies the preferred distribution would be the static build

• However test showed that static builds works best on the platforms (Linux distributions) very close to those that of the build machine

• We experienced unexpected sensitivities to the minor variations in the glibc library version

• We are now addressing that issue by developing the dynamic build that will have the static globus gsi and openssl libraries built in

Page 12: Open Science Grid Project DASH: Securing Direct MySQL Database Access for the Grid D. Malon, E. May, D. Ratnikov, A. Vaniachine Argonne National Laboratory

CHEP06 Mumbai India Alexandre Vaniachine (ANL)

Open Science Grid

12

Scalability Challenge

• The chaotic nature of opportunistic grid computations results in variations in daily production rates

• Database services capacities should be adequate for peak demand

• Large-scale world-wide distributed simulations performed by the ATLAS Collaboration show steady progress in grid computing

0

2000

4000

6000

8000

10000

12000

14000

Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May

Job

s/d

ayLCG/CondorG

LCG/Original

NorduGrid

Grid3

Data Challenge 2 (long jobs period)

Data Challenge 2(short jobs period)

Rome Production (mix of jobs)

0

2000

4000

6000

8000

10000

12000

14000

Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May

Job

s/d

ayLCG/CondorG

LCG/Original

NorduGrid

Grid3

Data Challenge 2 (long jobs period)

Data Challenge 2(short jobs period)

Rome Production (mix of jobs)

Page 13: Open Science Grid Project DASH: Securing Direct MySQL Database Access for the Grid D. Malon, E. May, D. Ratnikov, A. Vaniachine Argonne National Laboratory

CHEP06 Mumbai India Alexandre Vaniachine (ANL)

Open Science Grid

13

Why Dynamic Deployment?

• The high level of sharing of computational resources achieved on grids result in increased fluctuations in demand for database services, because of the chaotic nature of shared resource availability

• Static services deployment require over-capacity • Opportunistic production on non-LCG sites requires

database services deployment on-demand• To provide on-demand database services capability for

Open Science Grid, the Edge Services Framework activity builds the DASH mysql-gsi database server into the virtual machine image, which is dynamically deployed via Globus Virtual Workspaces

Page 14: Open Science Grid Project DASH: Securing Direct MySQL Database Access for the Grid D. Malon, E. May, D. Ratnikov, A. Vaniachine Argonne National Laboratory

CHEP06 Mumbai India Alexandre Vaniachine (ANL)

Open Science Grid

14

Edge Services• Services executing on the edge of the public and

private network

Site

CDFCMS ATLAS

GuestVO SECE

Compute nodes and Storage nodes

• See CHEP06 contribution id # 214http://indico.cern.ch/contributionDisplay.py?contribId=214&sessionId=7&confId=048

Page 15: Open Science Grid Project DASH: Securing Direct MySQL Database Access for the Grid D. Malon, E. May, D. Ratnikov, A. Vaniachine Argonne National Laboratory

CHEP06 Mumbai India Alexandre Vaniachine (ANL)

Open Science Grid

15

Synergistic Collaboration

CMS & ATLAS collaborate in OSG ESF Activity

http://www.opensciencegrid.org/esf

To achieve the ESF proof-of-concept milestone:• The first ESF VM was deployed by CMS• The first ESF service on that VM was by ATLAS:

– Grid-enabled MySQL database built by the DASH project• To access the server the grid job used proxy certificate (instead of the clear-text passwords hardwired in the scripts that are distributed world-wide)

Page 16: Open Science Grid Project DASH: Securing Direct MySQL Database Access for the Grid D. Malon, E. May, D. Ratnikov, A. Vaniachine Argonne National Laboratory

CHEP06 Mumbai India Alexandre Vaniachine (ANL)

Open Science Grid

16

Collaboration Benefits

Celebrating ESF proof-of-concept milestone at Supercomputing 2005

Page 17: Open Science Grid Project DASH: Securing Direct MySQL Database Access for the Grid D. Malon, E. May, D. Ratnikov, A. Vaniachine Argonne National Laboratory

CHEP06 Mumbai India Alexandre Vaniachine (ANL)

Open Science Grid

17

Globus Folder at SC05http://www.globus.org/alliance/events/sc05

Page 18: Open Science Grid Project DASH: Securing Direct MySQL Database Access for the Grid D. Malon, E. May, D. Ratnikov, A. Vaniachine Argonne National Laboratory

CHEP06 Mumbai India Alexandre Vaniachine (ANL)

Open Science Grid

18

Complementary Project

• A new collaborative project with the Globus team has just started at Argonne– to grid-enable the PostgreSQL database

• Both DASH and the new project target technology integration with OSGA-DAI

• Please contact us if you are interested to contribute to these projects

Page 19: Open Science Grid Project DASH: Securing Direct MySQL Database Access for the Grid D. Malon, E. May, D. Ratnikov, A. Vaniachine Argonne National Laboratory

CHEP06 Mumbai India Alexandre Vaniachine (ANL)

Open Science Grid

19

OGSA-DAI Complementarity

• Neil P Chue Hong, OGSA-DAI Status SummaryThird OGSA-DAI Users Group Meeting, 6/1/2005

• Through our continued interactions with OGSA-DAI team we have established working relationships to achieve technological compatibility

Page 20: Open Science Grid Project DASH: Securing Direct MySQL Database Access for the Grid D. Malon, E. May, D. Ratnikov, A. Vaniachine Argonne National Laboratory

CHEP06 Mumbai India Alexandre Vaniachine (ANL)

Open Science Grid

20

Additional Benefits

• Direct access to database servers unleashes a broad range of vendor-specific server capabilities for data processing applications: distributed XA transactions, binary data transport, etc.

• Grid proxy certificate technology opens technical opportunities to enable fine-grained delegation of rights for access control (attribute certificates)

• Grid-enabled relational database server technology has the potential for application beyond the domain of high energy physics, and is of interest to bioinformatics and other data-intensive sciences

Page 21: Open Science Grid Project DASH: Securing Direct MySQL Database Access for the Grid D. Malon, E. May, D. Ratnikov, A. Vaniachine Argonne National Laboratory

CHEP06 Mumbai India Alexandre Vaniachine (ANL)

Open Science Grid

21

DASH Technologies DASH Collaborators and Early Adopters

AspectC++ http://www.aspectc.org

Open Science Grid Edge Services Framework http://www.opensciencegrid.org/esf

Globus http://www.globus.org

ATLAS Distributed Database Services http://atlas.web.cern.ch/Atlas/GROUPS/DATABASE/project/services

MySQL http://www.mysql.org IIT

Illinois Institute of Technology Concurrent Programming Research Group http://www.iit.edu/~concur

DASH Presentations at the Conferences and Workshops

Supercomputing 2005, November 12-18, 2005 Washington State Convention and Trade Center, Seattle, Washington, USA

http://osg-docdb.opensciencegrid.org/cgi-bin/ShowDocument?docid=307 First DIALOGUE Workshop: Applications-Driven Issues in Data Grids August 1-2, 2005, The Ohio State University, Columbus, Ohio, USA

http://www.datagrids.org/ws/docs/High-performanceDatabaseAccess.ppt

DASH Outreach