Upload
polly-fowler
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Cluster Computing Applications for Bioinformatics
Thurs., Aug. 9, 2007
Introduction to cluster computing
Working with Linux operating systems
Overview of bioinformatics applications
Introduction
Damian Christey Professional
Technologist Departments of
Mathematics and Biology
Cluster Computing
High Availability (HA)
High Performance (HPC)
Specialized software Highly parallel
Beowulf Commodity hardware Open Source software
Biology Cluster Hardware
12 nodes 2 processors per node
Dual core 1GHz Opteron
8 GB RAM each
Gigabit ethernet
2TB RAID storage
GNU/Linux
Free, Open Source, Unix-based operating system
Rocks cluster management system: http://www.rocksclusters.org/
CentOS: http://centos.org/ derived from Redhat:
http://www.redhat.com/
Why Linux?
Cheap
Reliable and Scalable
Customizable
Unix philosophy
Text processing
Accessing the Cluster
Monitoring - http://alba.as.wvu.edu/ganglia
Secure Shell ssh -X [email protected] on Mac OS or
Linux Windows users can download SSH and X server
from: http://cygwin.com/
File transfer – SFTP http://www.winscp.com/ for Windows http://cyberduck.ch/ for Mac
qrsh – command to get a shell on a node
Unix Filesystem
Tree with a single root: / folders may be physically
stored on separate devices, different machines
/home/bob : Bob’s files
/opt/Bio : Bioinformatics programs
/share/bio : shared data, genome libraries
Unix Permissions
3x3 Matrix: owner, group, other read, write, execute
chgrp biouser file change the group to
which the file belongs
chmod g+w file give the group write
permission to your file
Text Processing
cat file : dump the contents of file to standard output
head , tail : output the first / last n lines of file
grep : return lines matching pattern in input or file
grep -v : invert match
| : pipe output of one program to another
> : pipe output to a file >> : concatenate output to end of file
Sequencing and Assembly Software
Phred - reads DNA sequencing trace files, calls bases, and assigns quality values
Phrap - assembling shotgun DNA sequence data
Consed - viewing, editing, and finishing sequence assemblies created with phrap
Artemis - genome viewer and annotation tool
Sequence Analysis and Screening Software
(WU, NCBI, MPI) BLAST - find regions of local similarity between sequences
ClustalW, T_Coffee, MUSCLE - multiple sequence alignment
RepeatMasker - screens for interspersed repeats and low complexity sequences
RepeatScout, PILER - de novo repeat finder
EMBOSS – assorted analysis tools
Phylogenetics Software
Phylip, Paup - packages for inferring phylogenies or evolutionary trees.
MrBayes - bayesian inference of phylogeny
Structure - model-based clustering method for inferring population structure