26
Install and run external command line softwares Yanbin Yin 1

Install and run external command line softwares

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Install and run external command line softwares

Install and run external command line softwares

YanbinYin

1

Page 2: Install and run external command line softwares

Homework#8Createafolderunderyourhomecalledhw8

Changedirectorytohw8

DownloadEscherichia_coli_K_12_substr__MG1655_uid57779faa filefromNCBIandcopyitasmg1655.faa

DownloadEscherichia_coli_O157_H7_uid57781allfaa files fromNCBIandconcatenatethemaso157h7.faa

Createunix one-linerstocounthowmanyproteinsarethereinthetwonewlymadefaa files

Createunix one-linerstodothefollowing:

- BLASTPsearchofmg1655.faavs.o157h7.faaandfindouthowmanymg1655.faaproteinshavehitsino157h7.faa(useE-value<1e-5tofilterhits)

- BLASTPsearchofo157h7.faavs.mg1655.faaandfindouthowmanyo157h7.faaproteinshavehitsinmg1655.faa(useE-value<1e-5tofilterhits)

2

Writeareport(inwordorppt)toincludealltheoperationsandscreenshots.

DueonNov21(sendbyemail,ifthereare2+files,puttheminazipfile;includeyourlastnameinthefilename)

Page 3: Install and run external command line softwares

Programs/toolsweoftenuse

• BLAST• FASTA• HMMER• EMBOSS• lftp• bioperl• R

• Clustalw• MAFFT• MUSCLE• SRAtoolkit• weblogo• PhyML• FastTree• RaxML• USEARCH• ...

http://gacrc.uga.edu/3

Page 4: Install and run external command line softwares

4

OnWINDOWS,youdownloadsomesoftwares usuallyinexeformat.Youdoubleclickonsomeiconandwindowsinstallerwilldotherestforyou.Usuallyyouwillbeasked:whereyouwanttohavethisprograminstalledto(C:\WINDOWS\Programs\)

OnLinux,it’sverydifferent.Youdownloadthesoftwarepackage,usuallyincompressedformat(.gz or.zip).Youhavetotypeinaseriesofcommandstoinstallit,oftenwriteittoasystemfolder(e.g./usr/bin),whichyoudon’thaveaccessunlessyouaretherootuser

Page 5: Install and run external command line softwares

Linux-basedprogramtypes

• SourcecodesinC,C++,Java,Fortranetc.– Needtobecompiledbeforeexecutethecommand

• Precompiledexecutables orbinarycodes• Sourcecodesinscriptinglanguages(perl,python,Retc.)– Canexecutedirectly

Manyprogramsneeddependencies(ifordertoinstallprogramA,youneedinstallBfirst…)

5

Page 6: Install and run external command line softwares

Onyourownubuntu machine…• Youaretheroot(administrator)andusingthesudo

commandyoucaninstallanythingyouwantintothesystemdirectory(/usr/bin/,/bin/,/lib/etc.)– apt-get(AdvancedPackagingTool)candomanyinstallationsfor

youfromsourceorbinarycodeshttps://help.ubuntu.com/12.04/serverguide/apt-get.html

• OnSer,youarenottherootandyoucanonlyinstallthingsunderyourhomeusingthe“hard”way– Download->unpack->install->editPATHenvironmentalvariable– Makesureyoucreatefoldersforeachtools,e.g.

yourhome/tools/fasta

6

NotonSer

Page 7: Install and run external command line softwares

InstallBLASTonyourownmachine[testifinstalled]sudo apt-get install blast2[testifinstalled]blastall[whereitinstalled]which blastall[ifyouwanttouninstall]sudo apt-get remove blast2[testifit’sgone]blastall

[newversionofblast]sudo apt-get install ncbi-blast+

7

NotonSer

Page 8: Install and run external command line softwares

Useapt-gettoinstall

lftp,emboss,hmmer,bioperl,clustaw,muscle,Rsudo apt-get install xxx

[totestifinstalled,typeinthecommand]

8

Page 9: Install and run external command line softwares

OnyourownMAC

http://www.macobserver.com/tmo/article/install_the_command_line_c_compilers_in_os_x_lionInstallXcode,thenCcompiler,thenyoucaninstallMacporthttp://www.macports.org/install.phpWithmacport,youcaninstallwget:sudo port install wgetlftp: sudo port install lftphmmer:sudo port install hmmeremboss:sudo port install embossR:sudo port install Rblast:http://www.blaststation.com/freestuff/en/howtoNCBIBlastMac.html

9

http://www.digimantra.com/howto/apple-aptget-command-mac/http://superuser.com/questions/173088/apt-get-on-mac-os-x

Page 10: Install and run external command line softwares

10

Installprogramsusingthehardway(yourareNOTtheroot)

Usuallywhenabioinformaticstoolisreleased,thetoolpackagealsoincludesareadme,install ormanualfileinadditiontotheprogramitself.

Inmostcases,therearemorethanonefilesintheprogramfolder.

Sometimesyouwillseefilesinmultipledifferentlanguages.

Thereadmeorinstallfilewilltellyouhowtoinstallthetool.

Basicsteps:downloadthepackage->unpack(untar,unzip)->readthereadme->install

Page 11: Install and run external command line softwares

InstallBLAST executable programBLAST source is written C and C program needs to be compiled to get executable programBut here we will download executables of BLAST

lftp ftp.ncbi.nih.gov:/blast/executables/LATEST

get ncbi-blast-2.7.1+-x64-linux.tar.gz

bye

Now you returned to Serls -l

mkdir tools

cd tools

mkdir blast

mv ../ncbi-blast-2.5.0+-x64-linux.tar.gz blast

cd blast

Unpack the tar balltar -zxf ncbi-blast-2.7.1+-x64-linux.tar.gz

ls -l

cd ncbi-blast-2.7.1+/bin

ls -l

./blastp -h

pwd 11

Page 12: Install and run external command line softwares

12

Youareatyourhomenano .bashrc

Addthefollowinginthebeginningofthefileexport PATH="$HOME/tools/blast/ncbi-blast-2.7.1+/bin:$PATH"

Nowexitfromnano anddothefollowing

. .bashrcblastp -h

Now return to your homecdblastp -hwhich blastpYou will be told that the blast version is BLAST 2.2.25+, and blastp in your current path is /usr/bin/blastp. This is an older version I installed two years ago, NOT the one you just installed. You have to give the full path to run the blast version in your home

/home/yyin/tools/blast/ncbi-blast-2.7.1+/bin/blastp

What if you don’t want to type this long path? You can add it to a hidden file in your home called .bashrc where Shell auto execute every time you log in: Shell will be notified that when you call blastp, go to that folder to find it.

Page 13: Install and run external command line softwares

EnvironmentvariableAnenvironmentvariableisanamedobjectthatcontainsdatausedbyoneormoreapplications.Thevalueofanenvironmentalvariablecanforexamplebethelocationofallexecutablefilesinthefilesystem,thedefaulteditorthatshouldbeused,orthesystemlocalesettings.UsersnewtoLinuxmayoftenfindthiswayofmanagingsettingsabitunmanageable.However,environmentvariablesprovidesasimplewaytoshareconfigurationsettingsbetweenmultipleapplicationsandprocessesinLinux.

env tolistallbuilt-inenvironmentvariable

PATH isaveryimportantenvironmentvariable.Thissetsthepaththattheshellwouldbelookingatwhenithastoexecuteanyprogram.Itwouldsearchinallthedirectoriesthataredefinedinthevariable.Rememberthatentriesareseparatedbya':'.Youcanaddanynumberofdirectoriestothislist.

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games

13

Page 14: Install and run external command line softwares

Run BLAST in command line mode

14

http://www.ncbi.nlm.nih.gov/books/NBK52640/

Page 15: Install and run external command line softwares

15

WhyyoudoneedtorunBLASTincommandlineterminal?

- NCBI’sserverdoesnothaveadatabasethatyouwanttosearch

- MillionsofusersareusingNCBIBLASTservertoo

- Yourquerysethasmorethanonesequenceorevenagenome

- TheBLASToutputcouldbeprocessedandusedasinputforotherLinuxsoftwares

Page 16: Install and run external command line softwares

16

Fordoingblastsearch

1. Determinewhatisyourqueryanddatabase,andwhatisthecommand

2. Formatyourdatabase

3. Runtheblastcommand(whatE-valuecutoff,whatoutputformat)

4. Viewtheresult

5. Parsetheresult(hitIDs,hitsequences,alignment,etc.)

TherearetwoversionsofBLAST-blast2(legacyblast)-blast+(thisiswhatyouinstalledinyourhome/tools/)

https://www.biostars.org/p/9480/http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download

Page 17: Install and run external command line softwares

BLAST2(legacyblast)blastall - | less-p # specify blastp, blastn, blastx, tblastn, tblastx

More commands in blast package

formatdb (format database)megablast (faster version of blastn)rpsblast (protein seq vs. CDD PSSMs)impala (PSSM vs protein seq)bl2seq (two sequence blast)blastclust (given a fasta seq file, cluster them based on sequence similarity)blastpgp (psi-blast, iterative distant homolog search)

17

QueryProtein

DNA

DatabaseProteinDNA

http://www.ncbi.nlm.nih.gov/books/NBK1763/pdf/ch4.pdf

Page 18: Install and run external command line softwares

blastall options-pprogramname-ddatabasefilename(textfasta sequencefile)-i queryfilename-ee-valuecutoff(showhitslessthanthecutoff)-moutputformat-ooutputfilename(youcanalsouse>)-Ffilterlow-complexityregionsinquery-vnumberofone-linedescriptiontobeshown-bnumberofalignmenttobeshown-anumberofprocesserstobeused

18

Page 19: Install and run external command line softwares

Newversionblast+,e.g.blastp

blastp -help | less

-query query filename-db databasefilename-out outputfilename-evalue e-valuecutoff-outfmt outputformat-num_descriptions

-num_alignments

19

blastpblastntblastnblastxtblastx

Atyourhomels -l tools/ncbi-blast-2.5.0+/bin/22executablefiles(commands)

http://www.ncbi.nlm.nih.gov/books/NBK1763/pdf/CmdLineAppsManual.pdf

Page 20: Install and run external command line softwares

20

Exercise1:findCesA/Csl homologoussequenceincharophytic greenalgalgenome

Klebsormidium flaccidum genomewassequenced,analyzedandpublishedinhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC4052687/.WewanttofindouthowmanyCesA/Csl genesthisgenomehas.

NotethattherawDNAreadsareavailableinGenBank,buttheassembledgenomeandpredictedgenemodelsarenot

GotoMethodsoftheabovepaperandfindthelinktothewebsitethatprovidesthegenomeandannotationdata

Wget thepredictedproteinsequencefilewget -q http://www.plantmorphogenesis.bio.titech.ac.jp/~algae_genome_project/klebsormidium/kf_download/131203_kfl_initial_genesets_v1.0_AA.fasta

Makeindexfilesforthedatabasemakeblastdb -in 131203_kfl_initial_genesets_v1.0_AA.fasta -parse_seqids -dbtype 'prot’

Checkhowmanyproteinshavebeenpredictedless 131203_kfl_initial_genesets_v1.0_AA.fasta | grep '>' | wc

Page 21: Install and run external command line softwares

21

Wget ourquerysetcp /home/yyin/cesa-pr.fa .

Dotheblastp searchblastp -query cesa-pr.fa -db 131203_kfl_initial_genesets_v1.0_AA.fasta -out cesa-pr.fa.out -outfmt 11

Checkouttheoutputfileless cesa-pr.fa.out

Changetheoutputformattoatab-delimitedfileblast_formatter -archive cesa-pr.fa.out -outfmt 6 | less

FiltertheresultusingE-valuecutoffblast_formatter -archive cesa-pr.fa.out -outfmt 6 | awk '$11<1e-10' | less

blast_formatter -archive cesa-pr.fa.out -outfmt 6 | awk '$11<1e-10' | cut -f2 | less

Page 22: Install and run external command line softwares

22

Howdoyouextractthesequencesoftheblasthits?

DATABASEQuerySeqs HitIDs

HitSeqs

Furtheranalyses:MultiplealignmentPhylogeny,etc.

Linuxcmds

blastdbcmd

blast

blast_formatter

Linuxcmds

Page 23: Install and run external command line softwares

23

SavethehitIDsblast_formatter -archive cesa-pr.fa.out -outfmt 6 | awk '$11<1e-10' | cut -f2 | sort -u > cesa-pr.fa.out.id

Retrievethefasta sequencesofthehitsblastdbcmd -db 131203_kfl_initial_genesets_v1.0_AA.fasta -entry_batch cesa-pr.fa.out.id | less

blastdbcmd -db 131203_kfl_initial_genesets_v1.0_AA.fasta -entry_batch cesa-pr.fa.out.id | sed 's/>lcl|/>/' | sed 's/ .*//' | less

blastdbcmd -db 131203_kfl_initial_genesets_v1.0_AA.fasta -entry_batch cesa-pr.fa.out.id | sed 's/>lcl|/>/' | sed 's/ .*//’ > cesa-pr.fa.out.id.fa

Wget theannotationfilefromtheJapanwebsitewget -q http://www.plantmorphogenesis.bio.titech.ac.jp/~algae_genome_project/klebsormidium/kf_download/131203_gene_list.txt

Checkwhatourgenesareannotatedtobegrep -f cesa-pr.fa.out.id 131203_gene_list.txt

Page 24: Install and run external command line softwares

24

Ifaprogram(e.g.BLAST)runssolongonaremoteLinuxmachinethatitwon’tfinishbeforeyouleaveforhome…

Orifyousomehowwanttorestartyourlaptop/desktopwhereyouhaveaPuttysessionisrunning(Windows)orashellterminalisrunning(Ubuntu)…

Inanycase,youhavetoclosetheterminalsession(orhaveitbeautomaticallyterminatedbytheserver).Ifthishappens,yourprogramwillbeterminatedwithout finishing.Ifyouexpectyourprogramwillrunforaverylongtime,e.g.longerthan10hours,youmayput“nohup”beforeyourcommand;thisensuresthatevenifyouclosetheterminal,theprogramwillstillruninthebackgrounduntilitisfinishedandyoucanloginagainthenextdaytochecktheoutput.Forexample:

nohup blastp -query yeast.aa -db yeast.aa -out yeast.aa.ava.out-outfmt 6 &

Youwillgetanadditionalfilenohup.out intheworkingfolderandthisfilewillbeemptyifnothingwronghappened.

Page 25: Install and run external command line softwares

25

Youareatyourhomecd toolslftp emboss.open-bio.org

Insideembosslftp sitecd pub/EMBOSSget emboss-latest.tar.gz

Byetoexitfromlftpbye

NowyouarebacktoyourhomeonSertar zxf emboss-latest.tar.gz

./configure –prefix=$HOME/tools/emboss

make

make install

Ifyouaretheroot,onecommandcandoalltheaboveforyousudo apt-get install emboss

Testifembosshasbeeninstalled:

water

Exercise2:emboss

Page 26: Install and run external command line softwares

Changesequenceformatseqret –help

seqret -sequence /home/yyin/Unix_and_Perl_course/Data/GenBank/E.coli.genbank -outseqE.coli.genbank.fa -sformat genbank -osformat fasta

Calculatesequencelengthinfoseq –help

infoseq -sequence /home/yyin/cesa-pr.fa -name –length -only

Morecommandexamples:needle –helpwater –helpfuzznuc –helppepstats -helppepinfo –help

26

plotorf -helptranseq -helpprettyseq –help