21
Biopython: Overview, State of the Art and Outlook Sebastián Bassi [email protected] Twitter: @sbassi

Biopython: Overview, State of the Art and Outlook

Embed Size (px)

Citation preview

Page 1: Biopython: Overview, State of the Art and Outlook

Biopython: Overview, State of the Art and Outlook

Sebastián [email protected]

Twitter: @sbassi

Page 2: Biopython: Overview, State of the Art and Outlook

A few words about Python:

Python is a general-purpose high-level and dynamic programming language.

It supports multiple programming paradigms (OOP, imperative and functional programming).

It features a fully dynamic type system and automatic memory management.

Page 3: Biopython: Overview, State of the Art and Outlook

●Easy to learn●Easy to read (looks like pseudocode)●Interpreted (compiled to a vm bytecode, it is fast to program)●High level data structures (lists, dictionaries, sets and more)●Multiplatform (from supercomputers to phones)●Batteries included philosophy●Extensive 3rd party libraries●Free (as in freedom and as in beer).●Strong community

Python features

Page 4: Biopython: Overview, State of the Art and Outlook

Read a file, load an array an sort itVB

Dim i, j, Array_Used As Integer

Dim MyArray() As String

Dim InBuffer, Temp As String

Array_Used = 0

ReDim MyArray(50)

'open a text file here . . .

Do While Not EOF(file_no)

Line Input #file_no, MyArray(Array_Used)

Array_Used = Array_Used + 1

If Array_Used = UBound(MyArray) Then

ReDim Preserve MyArray(UBound(MyArray) + 50)

End If

Loop

'simple bubble sort

For i = Array_Used - 1 To 0 Step -1

For j = 1 To i

If MyArray(j - 1) > MyArray(j) Then

'swap

Temp = MyArray(j - 1)

MyArray(j - 1) = MyArray(j)

MyArray(j) = Temp

End If

Next

Next

Page 5: Biopython: Overview, State of the Art and Outlook

Read a file, load an array an sort itPython

# Open the filehandle

file_object = open(FILENAME)

# Read all line and store them in a list

lista = file_object.readlines()

# Sort the list

lista.sort()

Page 6: Biopython: Overview, State of the Art and Outlook

What can be done with Python?

from pylab import *from data_helper import get_daily_dataintc, msft = get_daily_data()delta1 = diff(intc.open)/intc.open[0]# size in points ^2volume = (15*intc.volume[:-2]/intc.volume[0])**2close = 0.003*intc.close[:-2]/0.003*intc.open[:-2]scatter(delta1[:-1], delta1[1:], c=close, s=volume, alpha=0.75)ticks = arange(-0.06, 0.061, 0.02) xticks(ticks)yticks(ticks)xlabel(r'$\Delta_i$', fontsize=20)ylabel(r'$\Delta_{i+1}$', fontsize=20)title('Volume and percent change')grid(True)show()

Page 7: Biopython: Overview, State of the Art and Outlook
Page 8: Biopython: Overview, State of the Art and Outlook
Page 9: Biopython: Overview, State of the Art and Outlook
Page 10: Biopython: Overview, State of the Art and Outlook

Robots (made in Argentina)

www.robotia.com.ar

Page 11: Biopython: Overview, State of the Art and Outlook

Biopython

A set of freely available Python tools for bioinformatics and molecular biology

Features include:

●Parsing bioinformatics files into python structures●A sequence class to store sequences, ids and features●Interface to popular bioinformatics programs (clustalw, blast, primer3 and more)●Tools for performing common operations on DNA/protein sequence (translation, transcription, Tm, weight)●Code to deal with alignments●Integration with other languages via BioCorba

Page 12: Biopython: Overview, State of the Art and Outlook

Biopython in the lab (real world usage)

Page 13: Biopython: Overview, State of the Art and Outlook

Contributions to Biopython

Code:

●Tm function●LCC function●Two checksums function in Bio.SeqUtils.CheckSum

Other:

●Feedback●Bug reporting●Testing (BLAST, SFF files, BioSQL)

Page 14: Biopython: Overview, State of the Art and Outlook

Sequence class

>>> from Bio.Seq import Seq>>> from Bio.Alphabet import IUPAC>>> seq_1=Seq('GATCGATGGGCCTATATAGGA', IUPAC.unambiguous_dna)>>> rna_1 = seq_1.transcribe()>>> str(rna_1)'GAUCGAUGGGCCUAUAUAGGA'>>> rna_1.translate()Seq('DRWAYIG', IUPACProtein())

Page 15: Biopython: Overview, State of the Art and Outlook

Run a BLAST search

from Bio.Blast import NCBIStandalone as BLASTr,e = BLAST.blastall(b_exe, 'blastn', b_db,f_in, gap_open='3', gap_extend='2', wordsize=20, expectation=1e-50, alignments=1, descriptions=1, align_view='0', html='F')

Parse a BLAST result

from Bio.Blast import NCBIXMLfor rec in NCBIXML.parse(r): for align in rec.alignments: for hsp in align.hsps: print hsp.query_start, hsp.query_end print hsp.sbjct_start, hsp.sbjct_end if hsp.identities>90: print align.title

Page 16: Biopython: Overview, State of the Art and Outlook

Typical bioinformatic problems and Biopython (1/3)

Problem: Sequence manipulation in batchTool: SeqRecord and SeqIO

Problem: Filtering vector contaminationTool: SeqRecord, SeqIO, NCBIXML and NCBIStandalone

Problem: Searching for primersTool: Emboss.Applications

Problem: Calculate melting temperatureTool: SeqUtils

Page 17: Biopython: Overview, State of the Art and Outlook

Typical bioinformatic problems and Biopython (2/3)

Problem: Introduce mutations with restrictionsTool: Restriction and Data.CodonTable

Problem: Extract information from alignmentTool: Clustalw.MultipleAlignCL

Problem: Get a substitution matrix from an alignmentTool: Align.AlignInfo and SubsMat

Problem: Parse structural data Tool: PDB.PDBParser

Page 18: Biopython: Overview, State of the Art and Outlook

Typical bioinformatic problems and Biopython (3/3)

Problem: Calculate linkage desiquilibriumTool: PopGen.GenePop

Problem: Running SIMCOAL2Tool: PopGen.SimCoal

Problem: Data persistence (in relational database)Tool: BioSQL

Problem: Retrieve data from EntrezTool: Entrez.efetch

Page 19: Biopython: Overview, State of the Art and Outlook

Outlook for Biopython

Current version: 1.53 (December 2009)

For 1.54:

●Updated multiple sequence alignment object●Bio.Phylo module●Bio.SeqIO support for Standard Flowgram Format (SFF) files

Next:

●Extending Bio.PDB (GSoC grant)●Support Python 3

Page 20: Biopython: Overview, State of the Art and Outlook

Additional Resources

Biopython website: www.biopython.org

Documentation: biopython.org/wiki/Category:Wiki_Documentation

Cock PJ, et al. “Biopython: freely available Python tools for computational molecular biology and bioinformatics”. Bioinformatics 2009 Jun 1; 25(11) 1422-3. doi:10.1093/bioinformatics/btp163 pmid:19304878.

Bassi S (2007) A Primer on Python for Life Science Researchers. PLoS Comput Biol 3(11): e199. doi:10.1371/journal.pcbi.0030199

Book: “Python for Bioinformatics” http://tinyurl.com/biopython and www.py4bio.com

Mailing list:

Users: http://lists.open-bio.org/mailman/listinfo/biopython/

Developers: http://lists.open-bio.org/mailman/listinfo/biopython-dev

Python in Argentina: www.python.org.ar

Page 21: Biopython: Overview, State of the Art and Outlook

Thank you!