37
Unix Essentials Bingbing Yuan 1 Next Hot Topics: Unix Beyond Basics (Mon Oct 20 th at 1pm)

Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Unix Essentials

Bingbing Yuan

1 Next Hot Topics: Unix – Beyond Basics (Mon Oct 20th at 1pm)

Page 2: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Objectives

• Unix Overview

• Whitehead Resources

• Unix Commands

• BaRC Resources

• LSF

2

Page 3: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Objectives: Hands-on

• Parsing Human Body Index (HBI) array

data

Goal: Process a large data file to get important

information such as genes of interest, sorting

expression values, and subset the data for

further investigation.

3

Page 4: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Advantages of Unix

• Processing files with thousands, or millions, of

lines

How many reads are in my fastq file?

Sort by gene name or expression values

• Many programs run on Unix only

Command-line tools

• Automate repetitive tasks or commands

Scripting

• Other software, such as Excel, are not able to

handle large files efficiently

• Open Source 4

Page 5: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Scientific computing resources

5

Page 6: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Shared packages/programs

6

https://tak.wi.mit.edu

Installed

packages/programs

Request new

packages/programs

Page 7: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Login

• Requesting a tak account http://iona.wi.mit.edu/bio/software/unix/bioinfoaccount.php

• Windows

PuTTY or Cygwin

Xming: setup X-windows for graphical display

• Macs

Access through Terminal

7

Page 8: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

8

Command

Prompt

user@tak ~$

Connecting to tak for Windows

Page 9: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

9

Log in to tak for Mac

ssh –Y [email protected]

Page 10: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Unix Commands

• General syntax Command

Options or switches (zero or more)

Arguments (zero or more)

Example: uniq –c myFile.txt

Options can be combined

ls –l –a or ls –la

• Manual (man) page man uniq

• One line description whatis ls

10

command options arguments

Page 11: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Unix Directory Structure

11

root

/

home dev bin nfs lab . . .

jdoe BaRC_Public solexa_public solexa_lodish

page

/lab/page /home/jdoe

Page 12: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Accessing Shared Resources

at Whitehead

• Unix

/nfs/BaRC_Public

/lab/solexa_public

/lab/page

• Windows (access using Start Menu Search)

\\wi-files1\BaRC_Public

\\wi-files1\fink_lab

\\wi-files2\page

\\wi-htdata\solexa_public

• Macs (access using Go Connect to Server…)

cifs://wi-files1/BaRC_Public

cifs://wi-htdata/solexa_public

12

Where’s my lab’s share?

• http://wi-inside.wi.mit.edu/departments/it/services/filestorage/labshares

Page 13: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Directory Contents

• List files/directories

ls lists the contents of a directory

ls –l includes additional info (eg. permissions, time stamp) Options:

-l long listing

-h human readable

thiruvil@tak /nfs/BaRC_Public$ ls -l

total 4740

drwxrwxr-x 5 gbell barc 4096 2012-03-16 15:56 apps/

drwxrwxr-x 4 gbell barc 4096 2011-10-18 09:48 BaRC_code/

drwxrwxrwx 5 gbell barc 4096 2012-09-17 15:03 Bartel_Lab/

drwxrwsrwx 3 gbell barc 4096 2012-05-04 16:17 Cheeseman_Lab/

drwxrwsrwx 3 byuan barc 4096 2010-11-23 14:22 chip_seq/

drwxrwsrwx 2 gbell barc 4096 2012-02-21 16:26 CMT/

-rw-r--r-- 1 gbell barc 192568 2012-10-10 10:14 du.20121010a.txt

Permissions Owner Group Size (bytes) Time Stamp File or directory

13

Page 14: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Permissions

drwxrwxr-x

14

Type: directory (d)

symbolic link(l) User Group Others

r read

w write

x execute

• Use chmod to change permissions

user(u), group(g), others(o), all(a) chmod u+x foo.pl (user can execute)

chmod g-w foo.pl (group can’t write)

permission denied error

thiruvil@tak /nfs/BaRC_Public$ ls -l myFile.txt

-rw-r--r-- 1 thiruvil barc 0 2012-10-10 13:32 myFile.txt

thiruvil@tak /nfs/BaRC_Public$ chmod g+w myFile.txt

thiruvil@tak /nfs/BaRC_Public$ ll myFile.txt

-rw-rw-r-- 1 thiruvil barc 0 2012-10-10 13:32 myFile.txt

Page 15: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Navigating in Unix

• pwd print working directory byuan@tak ~$ pwd

/home/byuan

• cd change directory cd fink_lab # if you are in /lab

cd to home directory

cd ~

cd to directory above

cd ..

cd to a specific directory

cd /nfs/BaRC_Public

• No such file or directory error

15

Page 16: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Organizing Files and

Directories • Commands

mkdir make a directory

mkdir my_foo

rmdir remove a directory (must be empty)

rmdir my_foo

mv move or rename a file/directory

mv myOldFile myNewFile

cp copy a file

cp myOldFile myNewFile

rm remove or delete a file

rm myFile

16

Page 17: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Unix Tips

• Use to reuse previous commands

• Ctrl-c: stop a process that is running

• Tab-completion:

– Complete commands/file names

• Unix is case-sensitive

17

Page 18: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Getting Files

• Getting files or directories

Files wget http://www.broadinstitute.org/igv/projects/downloads/IGV_2.1.17.zip

Directories from (outside) servers scp -r [email protected]:/broad/lab/works .

18

Page 19: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

(Un)Compressing Files

• .gz file Compress: gzip expression.txt > expression.gz

Uncompress: gunzip expression.gz

• .tar.gz file

Compress: tar –czvf myFiles.tar.gz myFiles

Uncompress: tar –xzvf myFiles.tar.gz

Options

-c create an archive (files to archive, archive from files)

-x extract an archive (archive to files, files from archive)

-f FILE name of archive

-v be verbose, list all files being archived/extracted

-z create/extract archive with gzip/gunzip

• View compressed files using: – zmore,zgrep

19

Page 20: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Editing a File

• Command-line editors pico

nano

emacs (emacs –nw)

vi

• Graphical editors (Windows users need an X-windows emulator)

Note: may not be part of standard installation nedit

gedit

xemacs

• Put an & at the end of command line to run it in the background when using a graphical editor so that you can continue to use the terminal window

eg. gedit myFile.txt&

20

Page 21: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Viewing a File

• Display page-by-page basis more myFile.txt

Use: to scroll, space for next page and q to quit

• Display first 15 lines of a file head -15 myFile.txt

• Display last 15 lines of a file tail -15 myFile.txt

• Show all contents of a file cat myFile.txt

Show hidden characters (^M or carriage return)

cat –A myFile.txt

• Display number of lines in a file wc –l myFile.txt

21

Page 22: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Output Redirection and Piping

• Write output of a command to file Write to output file

• sort myFile.txt > myFile_sorted.txt

Replace to output file

• sort myFile.txt >| myFile_sorted.txt

Append to output file

• sort myFile.txt >> myFile_sorted.txt

• Piping “|”: use output of one command

as input for another command sort myFile.txt | more

22

Page 23: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Parsing a File: cut

• Select columns of interest cut –f 9,12-15 myGeneValues.txt > col_9.12to15.txt

Options:

-f output only these fields

-d field delimiter

23

Page 24: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Parsing a File: sort and uniq

• Sort on column(s) sort -k 3,3 myGeneExpression.txt | more

Options:

-n numerical sort

-r reverse

-k pos1,pos2 start a key at pos1, end it at pos2

• Get only unique entries

ensure file is sorted before running uniq uniq mySortedGenes.txt > myUniqGenes.txt

Options:

-c count entries

-d duplicate counts

24

Page 25: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Regular Expressions

• Pattern matching

• Easier to search

• Commonly used regular expressions

Example: list all txt files ls *.txt

25

Regular Expression Matches

. All characters

* Zero or more; wildcard

^ Beginning of a line

$ End of a line

Page 26: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Searching Within a File

• grep (global regular expression print)

• Find words, or patterns, occurring in lines of a file grep TMEM geneList.txt

TMEM131

TMEM9B

TMEM14C

TMEM66

TMEM49

Options:

-v select non-matching lines

-i ignore case

-n print line number

Example: get TMEM that does not end with 9

grep TMEM geneList.txt | grep -v "TMEM14C" | more

26

Page 27: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

BaRC Resources

• jura.wi.mit.edu

27

Page 28: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

28

BaRC SOP

http://barcwiki.wi.mit.edu/wiki/SOPs

Page 29: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

BaRC Scripts

29

Page 30: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Running Scripts on Unix

• Perl bed2gff.pl

• R run_rma_customCDF.R

• Python myScript.py

• Matlab matlab -nodesktop -nosplash myScript.m

• Java Archive (JAR) java -Xmx1000m -jar /usr/local/share/IGVTools/igv.jar

30

Page 31: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Running Programs/Tools on

Unix • bedtools

bedtools intersect -a myGenes1.bed –b myGenes2.bed

Other utilities: http://code.google.com/p/bedtools/wiki/Usage

• samtools samtools view myFile.bam

Other utilities: http://samtools.sourceforge.net/samtools.shtml

• Fastx toolkit fastx_quality_stats -i mySeq.fastq -o fastxStats_mySeq

• FastQC fastqc mySeq.fastq

• BLAST blastp –task blastp -db myProtDB.fa –q myProt.fa –out out.txt

31

Page 32: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Commonly Used Data Locations

at Whitehead

32

Location Description

/nfs/genomes Genome data: gff, gtf, fasta,

bowtie indexed files, blat

indexed file, etc. for several

organisms

/nfs/seq/Data Sequence data, including blast

databases, for several

organisms

/nfs/BaRC_datasets Large (array/NGS) datasets:

HBI, HBM 2.0

Page 33: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Scientific computing resources

33

Page 34: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

LSF cluster jobs

34

https://tak.wi.mit.edu

Page 35: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Load Sharing Facility (LSF)

Cluster • More computing power

• Multiple jobs running at the same time

35

Page 36: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

LSF Commands

• bsub to submit jobs bsub wc –l reads.fq

bsub “sort foo.txt > sorted.txt”

Options:

-e error file

-o standard out file

-m machine

-u email address

• bjobs to view your jobs bjobs

• bkill to kill a job bkill 237878

36

Page 37: Unix Essentials - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/unix_essentials/Unix_Essentials_Oct2014.pdfOct 10, 2012  · Advantages of Unix •Processing

Further Reading

• BaRC: Unix Info

http://iona.wi.mit.edu/bio/education/unix_intro.php

• LSF Cluster (incl. examples)

http://iona.wi.mit.edu/bio/bioinfo/docs/LSF_help.php

• Whitehead IT Scientific Computing Tutorials http://wi-inside.wi.mit.edu/departments/it/services/scientificcomputing/scitutorials

37