36
Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

Condor Project Computer Sciences Department University of Wisconsin-Madison

Running Map-Reduce Under Condor

Page 2: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

Cast of thousands

› Mihai Pop › Michael Schatz ›  Dan Sommer

h University of Maryland Center for Computational Biology

›  Faisal Khan, Ken Hahn UW ›  David Schwartz, LMCG

Page 3: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

In 2003…

http://labs.google.com/papers/gfs.html

http://labs.google.com/papers/mapreduce.html

Page 4: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

Page 5: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

Page 6: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

Shortly thereafter…

Page 7: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

Two main Hadoop parts

Page 8: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

For more detail

CondorWeek 2009 talk Dhruba Borthakur

http://www.cs.wisc.edu/condor/CondorWeek2009/condor_presentations/borthakur-hadoop_univ_research.ppt

Page 9: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

Page 10: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

HDFS overview

› Making POSIX distributed file system go fast is easy…

Page 11: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

HDFS overview

›  …If you get rid of the POSIX part ›  Remove

h Random access h Support for small files h authentication h In-kernel support

Page 12: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

HDFS Overview

›  Add in h Data replication

•  (key for distributed systems) h Command line utilities

Page 13: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

HDFS Architecture

Page 14: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

HDFS Condor Integration

› HDFS Daemons run under master h Management/control

›  Added HAD support for namenode

›  Added host based security

Page 15: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

Condor HDFS: II

File transfer support

transfer_input_files = hfds://…

Spool in hdfs

Page 16: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

Map Reduce

Page 17: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

Shell hackers map reduce

›  grep tag input | sort | uniq –c | grep

Page 18: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

MapReduce lingo for the native Condor speaker

›  Task tracker startd/starter

›  Job tracker condor_schedd

Page 19: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

Map Reduce under Condor

›  Zeroth law of software engineering

›  Job tracker/task tracker must be managed! h Otherwise very bad things happen

Page 20: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

Hadoop on Demand w/Condor

Page 21: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

Map Reduce as overlay

›  Parallel Universe job ›  Starts job tracker on rank 0 ›  Task trackers everywhere else › Open Question:

h Run more small jobs, or fewer bigger › One job tracker per user (i.e. per job)

Page 22: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

On to real science…

›  David Schwartz, matchmaker

Mihai Pop

Page 23: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

Contrail – MR genome assembly

http://sourceforge.net/apps/mediawiki/contrail-bio/index.php

Page 24: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

Genome assembly

Page 25: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

DNA

3 Billion base pairs

Sequencing machines only read small reads at a time

Page 26: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

Already done this?

Page 27: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

High throughput sequencers

Page 28: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

Contrail Scalable Genome Assembly with MapReduce ›  Genome: African male NA18507 (Bentley et al., 2008) ›  Input: 3.5B 36bp reads, 210bp insert (SRA000271) ›  Preprocessor: Quality-Aware Error Correction

.

Cloud Surfing Error Correction Compressed Initial

N Max N50

>10B 27 27

>1 B 303 bp

< 100 bp

5.0 M 14,007 650 bp

4.2 M 20,594 923 bp

In Progress

Resolve Repeats

Page 29: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

Running it under Condor

› Used CHTC B-240 cluster

›  ~100 machines h 8 way nehalem cpu h 12 Gb total h 1 disk partition dedicated to HDFS h HDFS running under condor master

Page 30: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

Running it on Condor

› Used the MapReduce PU overlay ›  Started with Fruit Flies ›  … ›  And it crashed ›  Zeroth law of software engineering

h Version mismatch ›  Debugging…

Page 31: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

Debugging

›  After a couple of debugging rounds

›  Fruit Fly sequenced!!

h On to humans!

Page 32: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

Cardinality

› How many slots per task tracker? h Task tracker, like schedd multi-slots

› One machine h 8 cores h 1 disk h 1 memory system

› How many mappers per slot

Page 33: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

More MR under Condor

› More debugging, NPEs › Updated MR again ›  Some performance regressions › One power outage

›  12 weeks later…

Page 34: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

Success!

Page 35: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

Page 36: Running Map-Reduce Under Condor · 2012-11-01 · Condor Project Computer Sciences Department University of Wisconsin-Madison Running Map-Reduce Under Condor

www.cs.wisc.edu/Condor

Conclusions

›  Job trackers must be managed! h Glide-in is more than Condor on batch

› Hadoop – more than just MapReduce

› HDFS – good partner for Condor ›  All this stuff is moving fast