21
Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison [email protected] http://www.cs.wisc.edu/condor Managing and Scheduling Data Placement (DaP) Requests

Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison [email protected] Managing and Scheduling Data

Embed Size (px)

Citation preview

Page 1: Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison kosart@cs.wisc.edu  Managing and Scheduling Data

Tevfik KosarComputer Sciences DepartmentUniversity of Wisconsin-Madison

[email protected]://www.cs.wisc.edu/condor

Managing and Scheduling Data Placement (DaP)

Requests

Page 2: Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison kosart@cs.wisc.edu  Managing and Scheduling Data

www.cs.wisc.edu/condor

Outline

› Motivation

› DaP Scheduler

› Case Study: DAGMan

› Conclusions

Page 3: Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison kosart@cs.wisc.edu  Managing and Scheduling Data

www.cs.wisc.edu/condor

Demand for Storage

› Applications require access to larger and larger amounts of data Database systems Multimedia applications Scientific applications

• Eg. High Energy Physics & Computational Genomics

• Currently terabytes soon petabytes of data

Page 4: Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison kosart@cs.wisc.edu  Managing and Scheduling Data

www.cs.wisc.edu/condor

Is Remote access good enough?

› Huge amounts of data (mostly in tapes)

› Large number of users› Distance / Low Bandwidth › Different platforms› Scalability and efficiency concerns=> A middleware is required

Page 5: Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison kosart@cs.wisc.edu  Managing and Scheduling Data

www.cs.wisc.edu/condor

Two approaches

› Move job/application to the data Less common Insufficient computational power on

storage site Not efficient Does not scale

› Move data to the job/application

Page 6: Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison kosart@cs.wisc.edu  Managing and Scheduling Data

www.cs.wisc.edu/condor

Move data to the Job

Huge tape library (terabytes)

Compute cluster

LAN

Local Storage Area (eg. Local Disk, NeST Server..)

WAN

Remote Staging Area

Page 7: Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison kosart@cs.wisc.edu  Managing and Scheduling Data

www.cs.wisc.edu/condor

Main Issues

› 1. Insufficient local storage area

› 2. CPU should not wait much for I/O

› 3. Crash Recovery

› 4. Different Platforms & Protocols

› 5. Make it simple

Page 8: Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison kosart@cs.wisc.edu  Managing and Scheduling Data

www.cs.wisc.edu/condor

Data Placement Scheduler (DaPS)

› Intelligently Manages and Schedules Data Placement (DaP) activities/jobs

› What Condor is for computational jobs, DaPS means the same for DaP jobs

› Just submit a bunch of DaP jobs and then relax..

Page 9: Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison kosart@cs.wisc.edu  Managing and Scheduling Data

www.cs.wisc.edu/condor

DaPS Architecture

DAPS Server

AcceptExec.

Sched.

DaPS Client

DaPS Client

Req.

Req.

GridFTP Server NeST Server

SRB Server

Local Disk

GridFTP Server

SRM Server Req.

Buffer

Req.

LocalRemote

Queue

Thirdparty transfer

Get

Put

Page 10: Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison kosart@cs.wisc.edu  Managing and Scheduling Data

www.cs.wisc.edu/condor

DaPS Client Interface

› Command line: dap_submit <submit file>

› API: dapclient_lib.a dapclient_interface.h

Page 11: Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison kosart@cs.wisc.edu  Managing and Scheduling Data

www.cs.wisc.edu/condor

DaP jobs

› Defined as ClassAds

› Currently four types: Reserve Release Transfer Stage

Page 12: Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison kosart@cs.wisc.edu  Managing and Scheduling Data

www.cs.wisc.edu/condor

DaP Job ClassAds[ Type = Reserve; Server = nest://turkey.cs.wisc.edu; Size = 100MB; reservation_no = 1; ……][ Type = Transfer; Src_url = srb://ghidorac.sdsc.edu/kosart.condor/x.dat; Dst_url = nest://turkey.cs.wisc.edu/kosart/x.dat; reservation_no = 1; ...... ]

Page 13: Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison kosart@cs.wisc.edu  Managing and Scheduling Data

www.cs.wisc.edu/condor

Supported Protocols

› Currently supported: FTP GridFTP NeST (chirp) SRB (Storage Resource Broker)

› Very soon: SRM (Storage Resource Manager) GDMP (Grid Data Management Pilot)

Page 14: Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison kosart@cs.wisc.edu  Managing and Scheduling Data

www.cs.wisc.edu/condor

Case Study: DAGMan.dagFile

CondorJobQueue

A

DAGManDAGMan

C

D

A

B

Page 15: Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison kosart@cs.wisc.edu  Managing and Scheduling Data

www.cs.wisc.edu/condor

Current DAG structure

› All jobs are assumed to be computational jobs

Job A

Job B Job C

Job D

Page 16: Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison kosart@cs.wisc.edu  Managing and Scheduling Data

www.cs.wisc.edu/condor

Current DAG structure

› If data transfer to/from remote sites is required, this is performed via pre- and post-scripts attached to each job.

Job A

PRE

Job BPOST

Job C

Job D

Page 17: Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison kosart@cs.wisc.edu  Managing and Scheduling Data

www.cs.wisc.edu/condor

New DAG structure

Add DaP jobs to the DAG structurePRE

Job BPOST

Transfer in

Reserve In & out

Job B

Transfer out

Releasein

Release out

Page 18: Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison kosart@cs.wisc.edu  Managing and Scheduling Data

www.cs.wisc.edu/condor

New DAGMan Architecture

.dagFile

CondorJobQueue

A

DAGManDAGMan

B

D

A

C DaPSJob

Queue

X

Y

X

Page 19: Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison kosart@cs.wisc.edu  Managing and Scheduling Data

www.cs.wisc.edu/condor

Conclusion

› More intelligent management of remote data transfer & staging increase local storage utilization maximize CPU throughput

Page 20: Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison kosart@cs.wisc.edu  Managing and Scheduling Data

www.cs.wisc.edu/condor

Future Work

› Enhanced interaction with DAGMan

› Data Level Management instead of File Level Management

› Possible integration with Kangaroo to keep the network pipeline full

Page 21: Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison kosart@cs.wisc.edu  Managing and Scheduling Data

www.cs.wisc.edu/condor

Thank You for Listening &

Questions

› For more information Drop by my office anytime

• Room: 3361, Computer Science & Stats. Bldg.

Email to:• [email protected]