61
Introduc)on to Bioinforma)cs Shifra Ben-Dor Bioinforma)cs Unit Life Sciences Core Facili)es

Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

Embed Size (px)

Citation preview

Page 1: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

Introduc)ontoBioinforma)cs

ShifraBen-Dor

Bioinforma)csUnitLifeSciencesCoreFacili)es

Page 2: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

LectureOutline:

•  TechnicalCourseItems

•  Sequences

•  Databases

– Thisweekandnextweek

Page 3: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

Thetechnicalstuff

Thecourseismadeupofonelectureandanop)onalexercisesessioneachweek.

Theexercisesessionsarenotmandatory,theyaretheretohelp.Demonstra)onsoftheprogramswillbedoneinboththelecturesandtheexercisesessions.Theexercisesessionsareanopportunityforyoutodotheassignmentwithsomebodytheretoaskforhelpifyougetstuck.

Page 4: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

Thetechnicalstuff

Ifyouareplanningoncomingtotheexercisesessions,pleasesendmeanemail:

[email protected]

Page 5: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

TheTechnicalStuff

Thecoursewebsiteiswhereyoucanfindthesyllabus,lecturenotes,assignments,linkstothevariousprogramstaughtandtorelevantliterature.Itisalsowhereweputannouncementsandupdates.hOp://dors.weizmann.ac.il/course/introbioinfo/

Page 6: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

•  ThiscourseisbuiltforBiologists

•  Backgroundwillbegivenonvarioustopicsasneeded,butbasicknowledgeofB.Sc.levelbiologyistakenforgranted

•  Ifyouneedhelpwiththebiology,contactme

Thetechnicalstuff

Page 7: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

Requirementsforagrade

•  Youarerequiredtodoalloftheassignmentsandafinalproject

•  Thecoursegradeiscomputedasfollows:60%finalproject,40%assignments

Page 8: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

Assignments

•  Youhavetwoweekstohandineachassignment

•  AssignmentsaretobehandedinattheWolfsonlecturehall,bytheendofthelecture(11:00)

•  Ifforanyreasonyouneed/wantanextension,talktomeBEFOREtheassignmentisdue

•  Anassignmenthandedinlateornotatallwillgeta0

Page 9: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

Assignments

•  Youmayconsultwithafriendwhiledoingtheassignment,howeverallworkmustbehandedinindividually.Ifwefindcopyingthegradewillbedividedamongthenumberofstudentshandinginthesameanswersheet

•  Assignmentsshouldbeprintedandhandedin.Electronicsubmission(e-mail)willNOTbeaccepted.

Page 10: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

FinalProject

•  ThefinalprojectwillbegiveninthebeginningofJuly.

•  ItwillbedueonAugust9.

•  Lateprojectswillnotbeaccepted

•  ThereisNOpossibilitytocorrectprojects

•  Ifevidenceisfoundofsharedwork,therewillbenocoursegrade

Page 11: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

Announcements,Updates…

•  Anynewswillbeannouncedinthelecturesandupdatedonthewebsite

• Whatissaidinthelecturehallisthefinalword,unlessspecifiedotherwise

Page 12: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

Ifyouhaveques)ons,comments,sugges)onsorcomplaints-pleasecontactus-theearlierthebeOer!

Page 13: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

CourseStaff

MainLecturer:ShifraBen-DorMetargelot:

IritOrr BareketDassa

Page 14: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

Howtocontactus:

Shifra:[email protected]

Telephone:2470or(08)934-2470Wedon’tbite!

Page 15: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

Whatisbioinforma)cs?

Page 16: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

Whatwillwecoverinthiscourse?

Page 17: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

Whatwon’twecoverinthiscourse?

•  Detailedstructuralanalysisofproteins•  AlgorithmDevelopment

•  Highthroughputmethods

•  In-depthphylogene)csorevolu)onarybiology•  In-depthsystemsbiology

•  siRNA,miRNA

•  PromoterAnalysis

Page 18: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

Skep)cismandcomputers

Page 19: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

ThebiologicalthinkinghastobedonebyYOU

Page 20: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

LectureOutline:

•  TechnicalCourseItems

•  Sequences

•  Databases

– Thisweekandnextweek

Page 21: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

What“unitsofinforma)on”dowedealwithinbioinforma)cs?

• DNA

• RNA

• Protein

• Sequence

• Structure

• Evolu)on

• Pathways

•  Interac)ons

• Muta)ons

Page 22: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

Examplesofbiologicaldatausedinbioinforma)cs

v DNA (Genome)

v RNA (Transciptome)

v Protein (Proteome)

Page 23: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

DNA  RawDNASequence

•  CodingorNotcoding?

•  Parseintogenes?

•  Otherimportant

genomicelements?

•  4bases:AGCT

 atggcaaOaaaaOggtatcaatggOOggtcgtatcggccgtatcgtaOccgtgcagcacaacaccgtgatgacaOgaagOgtaggtaOaacgacOaatcgacgOgaatacatggcOatatgOgaaatatgaOcaactcacggtcgOtcgacggcactgOgaagtgaaagatggtaacOagtggOaatggtaaaactatccgtgtaactgcagaacgtgatcca

Page 24: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

DNA/RNAsequences

•  Genesareencodedingenomicsequences.

•  Genesaretranscribedintopre-mRNAs(includingcoding,intronic,5’and3’untranslatedregions).

• mRNAsarespliced(intronsremoved)andtranslatedintoproteins.

• mRNAsarecopiedtocDNAs(inthelab)

Page 25: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

TSS TTS

ATG Stop PolyAsitePromoter 1 2 3 4

ATG Stop PolyAsite

1 2 3 4

GenomicDNA

Pre-mRNA

mRNA

ModifiedfromZhangMQNatRevGenet.2002Sep;3(9):698-709.

ATG Stop

1 2 3 4Cap PolyA

5’UTR 3’UTRCDS

Page 26: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

SourcesofmRNAs

•  Experimental– Clonenewgene– “Clone”genefromdatabase– RNA-Seq

•  Database– “Typical”cDNA– FulllengthcDNA– EST(ExpressedSequenceTag)– Shortreadsequences

Page 27: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

mRNA

FulllengthcDNA

TypicalcDNA

5’mG AAAA

TTTT

TTTT

tag

AAAAtag

tag

Page 28: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

SourcesofmRNAs

•  Experimental– Clonenewgene– “Clone”genefromdatabase– RNA-Seq

•  Database– “Typical”cDNA– FulllengthcDNA– EST(ExpressedSequenceTag)– Shortreadsequences

Page 29: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

RNA

RNA,cDNA,andESTs

mRNA

cDNA

exon1 exon2 exon3

EST

EST

cDNAclone

GenBankESTs(ExpressedSequenceTags):~8,700,000humanESTs~4,850,000mouseESTs

AdaptedwithpermissionfromAdamSar)el

Page 30: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

UsesofESTs

- predic)onofcodingregions- detec)onofalterna)vesplicing- clusteringtoform“genes”Problemswithclustering:- incompletecoveragebreaksgenesup- genefamilies

Page 31: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

ProblemswithESTs

- lowcopynumbergenes

- rare)ssues- mistakes

- enrichmentof3’endsofgenes

- incompletecoverageofgenes

Page 32: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

NextGenera)onSequencing

•  Generallyshortreads(thoughnowlongertechnologiesarebecomingavailable)

•  Sequencelengthsrangefrom20-25bpto75-100to150bpreads

•  Canbe3’endonly•  Canbepairedorsingleread

Page 33: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

MatePair

Con)gsor“Transcripts”

FragmentRead

Pairedendreads

Page 34: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

ESTvsRead

•  ESTshavelongercon)nuoussequence,sobeOertoseegenestructure(alterna)ve

splicing)

•  Shortreadsgenerallyhavehigheraccuracy

•  Bothcannotgiveapictureofawholegene

Page 35: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

Protein

•  20leOeralphabet  ACDEFGHIKLMNPQRSTVWY  ButnotBJOUXZ

•  Stringsof~300aainanaverageprotein

  (e.g.bacteria)

•  Proteinaredividedintodomains

LNCIVAVSQNMGIGKNGDLPWPPLRNEFRYFQRMTTTSSVEGKQNLVIMGKKTWFSILNSIVAVCQNMGIGKDGNLPWPPLRNEYKYFQRMTSTSHVEGKQNAVIMGKKTWFSIISLIAALAVDRVIGMENAMPWNLPADLAWFKRNTLDKPVIMGRHTWESITAFLWAQDRNGLIGKDGHLPWHLPDDLHYFRAQTVGKIMVVGRRTYESF

Page 36: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

Protein

v ProteomeofanOrganismv 2Dgelsv MassSpecv 2DStructurev 3DStructurev 4DStructure(interac)ons)

Page 37: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

LectureOutline:

•  TechnicalCourseItems

•  Sequences

•  Databases

Page 38: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

Databases:Outline

•  Introduc)on– DataandDatabasetypes– Databasecomponents

•  DataFormats•  Sampledatabases•  Howtotextsearchdatabases

Page 39: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

What“unitsofinforma)on”dowedealwithinbioinforma)cs?

•  DNA•  RNA•  Protein

•  Sequence•  Structure•  Evolu)on

•  Pathways•  Interac)ons•  Muta)ons

Page 40: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

AAGTGCCACTGCATAAATGACCATGAGTGGGCACCGGTAAGGGAGGGTGATGCTATCTGGTCTGAAGNucleotidesequence

Genes

mRNA

Proteinprimarysequence

Protein 3Dstructure

ProteinFunction

Acts as a tumor suppressor inmany tumor types. induces growtharrest or apoptosis depending on thephysiological circumstances or celltype, but both activities areinvolved in tumor suppression.

Involved in the transport ofchloride ions. Defects in CFTRare the cause of cystic fibrosis.It is the most common genetic diseasein the caucasian population, with aprevalence of about 1 in 2000 livebirths. cf, an autosomal recessivedisorder, is a common generalizeddisorder of exocrine gland function

SNPs

Page 41: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

Whatdowewantfromdatabases?

Allofthesehavedatabasesandtoolsthatwerecreatedtoworkwiththem

Page 42: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

Informa)onretrievalfromsequencedatabases

  Biologicaldatabasescontainenormousamountsofdata.

•  Databasesneedtobewellannotated.•  Databasesneedtobeeasilysearched.•  Datafoundindatabasesshouldbeeasilyretrieved.

•  Dataindatabasesshouldbeinstandardformats.

Page 43: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

IntegratedInforma)onRetrieval

• Manydatabasescontainlogicalrela)onsbetweenspecificentries.

•  Oneinterface-connec)ngmanybiologicaldatabases.

•  Forexample:adatabasethatconnectsbetweenproteinsequence,proteindomain,proteinstructureandreferencedatabases.(Interpro)

•  Anotherexample:Connec)onbetweenreferences,proteinsequence,DNAsequence,andstructuredatabases.(Entrez)

Page 44: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

000003 breast cancer 1, early onset000002 breast cancer 1, early onset

000001 tumor protein p53

Chromosomal location: 17p13.1

DNA sequence:

mRNA sequence:

Protein function:

brain -liver -lung -

Protein sequence:

Interacts with genes:

Protein structure:

000365, 025783, 004674

PDB 1OLG, 1OLH, 1SAE

Fields

External links

Internal links

A Database

AccessionNumber

Entries

Slide provided by Dr. Vered Caspi

Page 45: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

CoreDataandAnnota)on

Databasesgenerallyhave(atleast)twotypesofdata:

Coredata:Thedatathedatabasewasgeneratedtoorganize

Annota)on:Extrainforma)onthatroundsoutourpictureofthecoredataForexampleinagenomedatabase,thesequenceisthecoredata,andtheloca)onofgenesistheannota)on

Page 46: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

DatabaseIssues

•  Printedjournalsvs.databases

•  Directsubmissiontodatabases(e.g.GenBank,PDB)

•  Archivalvs.curateddatabases

•  Databasesthatpublishexperimentalresultsoflargegenomiccenters.

•  Publicvs.privatedatabases.

Page 47: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

ForExample:ClassificaSonofGenomicDatabases

Databasescope

InformaSonsource

InformaSontype

ManygenomesOneGenomeOneSubjectOneGene

Directsubmissionfromscien)ficcommunityScien)ficliteratureGenomecenter’sexperimentalresultsOtherdatabases

MappingSequence&annota)onProteinstructure&func)onVaria)onsCompara)vegenomicsgenenetworks

Slide provided by Dr. Vered Caspi

Page 48: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

UserInterface

•  Databasesearch– freetext– field-specific– sequence-based

•  Databaseoutput– text– graphics– dynamic

Page 49: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

DataFormats

Therearemanydataformatsusedforsequences(bothnucleicandaminoacid)

•  FastaFormat•  GenBankFormat•  FastqFormat

•  (EMBLFormat)

Page 50: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

FastaFormat

•  Simplestformat

•  Leastinforma)on

•  Startswitha>andsequencenameononeline

•  Thesequenceinplaintextfollows

Page 51: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

>OB2T2GTGACAACATGTACAGCTGTGAGCGGTGTAAGAAGCTGCGGAACGGAGTGAAGTACTGCAAAGTCCTGCGGTTGCCCGAGATCCTGTGCATTCACCTAAAGCGCTTTCGGCACGAGGTGATGTACTCATTCAAGATCAACAGCCACGTCTCCTTGCCCTCGAGGGGCTCGACCTGCGCCCCTTCCTTGCCAAGGAGTGCACATCCCAGATCACCACCTACGACCTCCTCTCGGTCATCTGCCACCACGGCACGGCAGGCA

Page 52: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

>TNRC_HUMAN P36941 (tumor necrosis factor c receptor)MLLPWATSAPGLAWGPLVLGLFGLLAASQPQAVPPYASENQTCRDQEKEYYEPQHRICCSRCPPGTYVSAKCSRIRDTVCATCAENSYNEHWNYLTICQLCRPCDPVMGLEEIAPCTSKRKTQCRCQPGMFCAAWALECTHCELLSDCPPGTEAELKDEVGKGNNHCVPCKAGHFQNTSSPSARCQPHTRCENQGLVEAAPGTAQSDTTCKNPLEPLPPEMSGTMLMLAVLLPLAFFLLLATVFSCIWKSHPSLCRKLGSLLKRRPQGEGPNPVAGSWEPPKAHPYFPDLVQPLLPISGDVSPVSTGLPAAPVLEAGVPQQQSPLDLTREPQLEPGEQSQVAHGTNGIHVTGGSMTITGNIYIYNGPVLGGPPGPGDLPATPEPPYPIPEEGDPGPPGLSTPHQEDGKAWHLAETEHCGATPSNRGPRNQFITHD>TNRC_MOUSE P50284 lymphotoxin-beta receptor precursorMRLPRASSPCGLAWGPLLLGLSGLLVASQPQLVPPYRIENQTCWDQDKEYYEPMHDVCCSRCPPGEFVFAVCSRSQDTVCKTCPHNSYNEHWNHLSTCQLCRPCDIVLGFEEVAPCTSDRKAECRCQPGMSCVYLDNECVHCEEERLVLCQPGTEAEVTDEIMDTDVNCVPCKPGHFQNTSSPRARCQPHTRCEIQGLVEAAPGTSYSDTICKNPPEPGAMLLLAILLSLVLFLLFTTVLACAWMRHPSLCRKLGTLLKRHPEGEESPPCPAPRADPHFPDLAEPLLPMSGDLSPSPAGPPTAPSLEEVVLQQQSPLVQARELEAEPGEHGQVAHGANGIHVTGGSVTVTGNIYIYNGPVLGGTRGPGDPPAPPEPPYPTPEEGAPGPSELSTPYQEDGKAWHLAETETLGCQDL>TNR1_RAT P22934 tumor necrosis factor receptor 1 precursor (p60)MGLPIVPGLLLSLVLLALLMGIHPSGVTGLVPSLGDREKRDNLCPQGKYAHPKNNSICCTKCHKGTYLVSDCPSPGQETVCEVCDKGTFTASQNHVRQCLSCKTCRKEMFQVEISPCKADMDTVCGCKKNQFQRYLSETHFQCVDCSPCFNGTVTIPCKEKQNTVCNCHAGFFLSGNECTPCSHCKKNQECMKLCLPPVANVTNPQDSGTAVLLPLVIFLGLCLLFFICISLLCRYPQWRPRVYSIICRDSAPVKEVEGEGIVTKPLTPASIPAFSPNPGFNPTLGFSTTPRFSHPVSSTPISPVFGPSNWHNFVPPVREVVPTQGADPLLYGSLNPVPIPAPVRKWEDVVAAQPQRLDTADPAMLYAVVDGVPPTRWKEFMRLLGLSEHEIERLELQNGRCLREAHYSMLEAWRRRTPRHEATLDVVGRVLCDMNLRGCLENIRETLESPAHSSTTHLPR

Page 53: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

KnownIssueswithFastaFormat

•  Differentprogramstreattheheaderlinedifferently:

– Someread10characters,some30

– Somereadun)lthefirstspace

• Makesureyouhaveuniquenames!!!

•  Headerlinesshouldbeunder80characters•  Lengthofsequencelinecandiffer

Page 54: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

@SRR2976060.1 1 length=202NAAGCTCTCACCCATGGAGACCAAGGCGATTAGGGTTTTTCTCTTCGCTCTCCTCCT+SRR2976060.1 1 length=202#1=DDFFFHHHHHJJJEIJJJJJIJJJJFHGJIIJ9DHIIIJJJJGIIJJJGIIIJJ

FastqFormat

Fourlines:1–startswith@andisauniqueiden)fier2–theactualsequence3–startswitha+andcanhaveaniden)fieragain4–thequalityofthebases

Page 55: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

GenbankFormat

• Dividedintothreeparts:– Informa)onlines– Featuretable– Sequence

Page 56: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an
Page 57: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an
Page 58: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an

EMBL sequence formatRN [2] RA Wirsel S.G.R., Leibinger W., Mendgen K.W.; RT "Genetic diversity of fungi associated with common reed (Phragmites RT australis)"; RL Unpublished. XX FH Key Location/Qualifiers FH FT source 1..581 FT /db_xref="taxon:112223" FT /organism="ascomycota sp. 4/97-9" FT /isolate="4/97-9"

Page 59: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an
Page 60: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an
Page 61: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an