52
I Workshop on command- line tools (day 1) Center for Applied Genomics Children's Hospital of Philadelphia February 12-13, 2015

Workshop on command line tools - day 1

Embed Size (px)

Citation preview

I Workshop on command-line tools

(day 1)

Center for Applied GenomicsChildren's Hospital of Philadelphia

February 12-13, 2015

ArgumentsCome after the name of the program

Example:cat file.txt (1 argument)cut -f2 file.txt (2 arguments)

The number of spaces between arguments doesn't mattercut -f2 file.txt

man - command manual

man <command>

man catman echoman awk

which - which command is being called

which <command>

which catwhich echowhich awk

some tips (i)

Use <Tab> to auto-complete your commands or file/directory names

To search old commands, you can use ↑ and ↓ arrows in your keyboard

some tips (ii)

The command history will return a list of your last commands

Use ! to run the last command starting with…Example:!grepThis will run the last command starting with grep

Special characters (i)

^ : beginning of line$ : end of line or beginning of variable name? : any character (with one occurrence)* : any character (with 0 or more occurrences)# : start comments[ ] : define sets of characters

Special characters (ii)

" " : define strings' ' : define strings- : start a parameter` ` : define commands; : separate commands| : "pipe" commands

Special characters (iii)

~ : home directory/ : separate internal directories\ : escape character \n : new line (Linux) \r : new line (Mac) \t : tab

First steps

pwd # where am I?

whoami # who am I?

id <your_username> # what can I do?

date # what time/day is it?

cat - concatenate and print text files

cat file1.txt file2.txt > output.txtcat *.bed > all.bed

cat -n : shows line numberscat -e : shows non-printing characters

echo - write to the standard output

echo Hello, CAG!echo -e : prints escape charactersecho -e "C\tA\tG"echo -e "C\nA\nG"echo -n : prints and doesn't go to a new lineecho -n "CAG"; echo "123"echo "CAG"; echo "123"

Redirect output or errors (i)

echo "bla" > bla.txtecho "ble" > ble.txtcat bla.txt ble.txt > BLs.txtecho "bli" >> BLs.txtecho "blo" > blo.txtcat blo.txt >> BLs.txt

Redirect output or errors (ii)

cat -n BLs.txtcat blu.txt >> BLs.txt 2> error.txtcat error.txtcat blublu.txt >> BLs.txt 2>> error.txtcat error.txt

ls - list files in directories (i)

ls : list files of current directoryls workshop : list files in directory workshopls -l : in long formatls -t : list files sorted by time modifiedls -1 : force output to be one entry per linels -S : list files sorted by time modified

ls - list files in directories (ii)

ls -r : reverse the sortingls -a : list hidden files (which begin with a dot)ls -h : show file size human-readablels -G : colors output

We can combine options:ls -lhrt

ssh - secure shell (access remote servers) (i)

ssh <user>@<server>

ssh -t : exits after a list of commandsssh [email protected]

ssh [email protected] -t top

ssh [email protected] -t ls -lh

ssh [email protected] -t ls -lh >

my_home_on_respub.txt

ssh - secure shell (access remote servers) (ii)

ssh -p <port> : access a specific port on server

ssh -X : open session with graphic/display options (if you need to open a graphic program in a remote server; e.g. IGV).

alias - "shortcut" for commandsalias <alias> : see what is a specific alias

alias ll # ll is not a real command. =)

alias resp='ssh [email protected]'

resp

df - report file system disk space usage

df -h : human-readable

du - estimate file space usage

du -h : human-readable

mkdir - make directory

mkdir bioinfo_filesmkdir workshop_text_filesmkdir workshop123mkdir -p 2015/February/12# Suggestion:# Create names that make sense

cd - change working directory

cd bioinfo_filescd .. # go to directory abovecd ~ # go to home directorycd - # go to previous directory

rmdir - remove empty directories

rmdir workshop123rmdir 2015 # it will return an error

mv - move files and directories

mv bl?.txt workshop_text_filesmv BLs.txt old_file.txtmv workshop_text_files workshop_files

cp - copy files and directoriescp old_file.txt workshop_filescp error.txt error_copy.txt

# To copy directories with its contents,# use -r (recursive)cp -r workshop_files bioinfo_files/# Now, try...cp -r workshop_files/ bioinfo_files/

scp - secure copy files and directories in different servers# Similar to "cp" (in this case, we're uploading)

scp *.txt [email protected]:~/

# To copy directories with its contents,

# use -r (recursive)

scp -r w* [email protected]:~/

# Downloading

scp [email protected]:~/*.txt .

rm - remove files and directories

rm old_file.txt error_copy.txt

# Use -r (recursive) to remove# directories and its contentsrm -r bioinfo_files/workshop_files/rm -r 2015

ln - make links (pointers) of files(it's good to avoid multiple copies)# hard links keep the same if the original# files are removedln workshop_files/old_file.txt hard.txt

# symbolic links break if the original # files are removedln -s workshop_files/old_file.txt symbolic.txt

testing linksecho "hard" >> hard.txtecho "symbolic" >> symbolic.txthead hard.txt symbolic.txthead workshop_files/old_file.txtrm workshop_files/old_file.txthead hard.txt symbolic.txt

wget - network downloader wget www.ime.usp.br/~llima/XHMM_results.tar.bz2

wget -c : continue (for incomplete downloads)

wget http://bio.ime.usp.br/llima/GWAS.tar.gz# after 10%, press Ctrl+Cwget -c http://bio.ime.usp.br/llima/GWAS.tar.gz

tar - archiving

Create an archive:tar -cvf newfile.tar file1 file2 dir1 dir2tar -cvf BLs.tar bla.txt ble.txt blo.txttar -cvzf BLs.tar.gz bla.txt ble.txt blo.txt

Parameters: c (create), v (verbose), z (gzip), f (file)

tar - archiving

Extract from an archive:tar -xvzf GWAS.tar.gztar -xvjf XHMM_results.tar.bz2

Parameters: x (extract), v (verbose), f (file),z (gzip), j (bzip2)

gzip - zip files

ls -lh adhd.pedgzip adhd.pedls -lh adhd.ped.gz# to unzip, run "gunzip adhd.ped.gz"

zcat - cat for zipped files

zcat adhd.ped.gz # Ctrl+C to stop

less - file visualization

less DATA.xcnv

Use arrows (←↑→↓) to navigate the file

Type / to search

file slicing - head, tail, cut

head - first lines

# first 20 lineshead -n 20 DATA.xcnv

# all lines, excluding last 2# (on Linux, not Mac)head -n -2 DATA.xcnv

tail - last lines

# last 20 linestail -n 20 DATA.xcnv

# from line 2 to the endtail -n +2 DATA.xcnv

cut - get specific columns of file# fields 1 to 3 and 6

cut -f 1-3,6 DATA.xcnv

# other examples

cut -f1 adhd.ped

cut -f1 -d' ' adhd.ped # delimiter = space

# other delimiters: comma, tab, etc.

cut -d, -f1-2 …

cut -d'\t' -f5,7,9 …

Using "|" (pipe) to join commandscut -f 1-3,6 DATA.xcnv | head -n 1cut -f 1-3,6 DATA.xcnv | less

zcat adhd.ped.gz | less

# Compare (same result? same time?)zcat adhd.ped.gz | cut -f1 -d' ' | headzcat adhd.ped.gz | head | cut -f1 -d' '

column - columnate lists

# using white spaces to separate# and fill columnscolumn -t DATA.xcnv

column -s # choose separator

sort - sort lines of text filessort DATA.xcnvsort -k : choose specific fieldsort -n : numeric-sortsort -r : reverse

# Exercise: show 10 top CNVs with# more targets (column 8)

uniq - report or filter out repeated lines in a file

cut -f1 DATA.xcnv | sort | uniq

# reporting counts of each linecut -f5 DATA.xcnv | sort | uniq -c

wc - word, line, character and byte count

wc -l : number of lineswc -w : number of wordswc -m : number of characters

cut -f5 DATA.xcnv | sort | uniq | wc -l

head -n1 DATA.xcnv | cut -f1 | wc -m

More exercises1. What are the top 10 samples with more CNVs?2. What are the top 5 largest CNVs?3. What are the top 15 directories using more space?

vi/vim (text editor) (i)

vi text_file.txt (open "text_file.txt")i - start edition mode (remember "insert")ESC - stop edition mode:w - save file ("write"):q - quit:x - save (write) and quit

vi/vim (text editor) (ii)

u - undo:30 - go to line number 30:syntax on - syntax highlighting^ - go to beginning of line$ - go to end of line

vi/vim (text editor) (iii)

dd - delete current lined2↓ - delete current line and 2 lines below yy - copy current liney3↓ - copy current line and 3 lines belowpp - paste lines below current line

grep - finds words/patterns in a file (i)

grep word file.txtOptions:grep -w : find the whole wordgrep -c : returns the number of lines foundgrep -f : specifies a file with a list of wordsgrep -o : returns only the match

grep - finds words/patterns in a file (ii)

grep -A 2 : also show 2 lines aftergrep -B 3 : also show 3 lines beforegrep -v : shows lines without patterngrep --color : colors the match

Exercises

1. How many CNVs are located on chrom. 1?2. How many deletions are there?3. Which samples finish with character M?4. Which samples finish with character M or F?5. How many samples do not have NN in the

name?