Upload
asociacion-argentina-de-bioinformatica-y-biologia-computacional
View
1.792
Download
2
Tags:
Embed Size (px)
Citation preview
A few words about Python:
Python is a general-purpose high-level and dynamic programming language.
It supports multiple programming paradigms (OOP, imperative and functional programming).
It features a fully dynamic type system and automatic memory management.
●Easy to learn●Easy to read (looks like pseudocode)●Interpreted (compiled to a vm bytecode, it is fast to program)●High level data structures (lists, dictionaries, sets and more)●Multiplatform (from supercomputers to phones)●Batteries included philosophy●Extensive 3rd party libraries●Free (as in freedom and as in beer).●Strong community
Python features
Read a file, load an array an sort itVB
Dim i, j, Array_Used As Integer
Dim MyArray() As String
Dim InBuffer, Temp As String
Array_Used = 0
ReDim MyArray(50)
'open a text file here . . .
Do While Not EOF(file_no)
Line Input #file_no, MyArray(Array_Used)
Array_Used = Array_Used + 1
If Array_Used = UBound(MyArray) Then
ReDim Preserve MyArray(UBound(MyArray) + 50)
End If
Loop
'simple bubble sort
For i = Array_Used - 1 To 0 Step -1
For j = 1 To i
If MyArray(j - 1) > MyArray(j) Then
'swap
Temp = MyArray(j - 1)
MyArray(j - 1) = MyArray(j)
MyArray(j) = Temp
End If
Next
Next
Read a file, load an array an sort itPython
# Open the filehandle
file_object = open(FILENAME)
# Read all line and store them in a list
lista = file_object.readlines()
# Sort the list
lista.sort()
What can be done with Python?
from pylab import *from data_helper import get_daily_dataintc, msft = get_daily_data()delta1 = diff(intc.open)/intc.open[0]# size in points ^2volume = (15*intc.volume[:-2]/intc.volume[0])**2close = 0.003*intc.close[:-2]/0.003*intc.open[:-2]scatter(delta1[:-1], delta1[1:], c=close, s=volume, alpha=0.75)ticks = arange(-0.06, 0.061, 0.02) xticks(ticks)yticks(ticks)xlabel(r'$\Delta_i$', fontsize=20)ylabel(r'$\Delta_{i+1}$', fontsize=20)title('Volume and percent change')grid(True)show()
Robots (made in Argentina)
www.robotia.com.ar
Biopython
A set of freely available Python tools for bioinformatics and molecular biology
Features include:
●Parsing bioinformatics files into python structures●A sequence class to store sequences, ids and features●Interface to popular bioinformatics programs (clustalw, blast, primer3 and more)●Tools for performing common operations on DNA/protein sequence (translation, transcription, Tm, weight)●Code to deal with alignments●Integration with other languages via BioCorba
Biopython in the lab (real world usage)
Contributions to Biopython
Code:
●Tm function●LCC function●Two checksums function in Bio.SeqUtils.CheckSum
Other:
●Feedback●Bug reporting●Testing (BLAST, SFF files, BioSQL)
Sequence class
>>> from Bio.Seq import Seq>>> from Bio.Alphabet import IUPAC>>> seq_1=Seq('GATCGATGGGCCTATATAGGA', IUPAC.unambiguous_dna)>>> rna_1 = seq_1.transcribe()>>> str(rna_1)'GAUCGAUGGGCCUAUAUAGGA'>>> rna_1.translate()Seq('DRWAYIG', IUPACProtein())
Run a BLAST search
from Bio.Blast import NCBIStandalone as BLASTr,e = BLAST.blastall(b_exe, 'blastn', b_db,f_in, gap_open='3', gap_extend='2', wordsize=20, expectation=1e-50, alignments=1, descriptions=1, align_view='0', html='F')
Parse a BLAST result
from Bio.Blast import NCBIXMLfor rec in NCBIXML.parse(r): for align in rec.alignments: for hsp in align.hsps: print hsp.query_start, hsp.query_end print hsp.sbjct_start, hsp.sbjct_end if hsp.identities>90: print align.title
Typical bioinformatic problems and Biopython (1/3)
Problem: Sequence manipulation in batchTool: SeqRecord and SeqIO
Problem: Filtering vector contaminationTool: SeqRecord, SeqIO, NCBIXML and NCBIStandalone
Problem: Searching for primersTool: Emboss.Applications
Problem: Calculate melting temperatureTool: SeqUtils
Typical bioinformatic problems and Biopython (2/3)
Problem: Introduce mutations with restrictionsTool: Restriction and Data.CodonTable
Problem: Extract information from alignmentTool: Clustalw.MultipleAlignCL
Problem: Get a substitution matrix from an alignmentTool: Align.AlignInfo and SubsMat
Problem: Parse structural data Tool: PDB.PDBParser
Typical bioinformatic problems and Biopython (3/3)
Problem: Calculate linkage desiquilibriumTool: PopGen.GenePop
Problem: Running SIMCOAL2Tool: PopGen.SimCoal
Problem: Data persistence (in relational database)Tool: BioSQL
Problem: Retrieve data from EntrezTool: Entrez.efetch
Outlook for Biopython
Current version: 1.53 (December 2009)
For 1.54:
●Updated multiple sequence alignment object●Bio.Phylo module●Bio.SeqIO support for Standard Flowgram Format (SFF) files
Next:
●Extending Bio.PDB (GSoC grant)●Support Python 3
Additional Resources
Biopython website: www.biopython.org
Documentation: biopython.org/wiki/Category:Wiki_Documentation
Cock PJ, et al. “Biopython: freely available Python tools for computational molecular biology and bioinformatics”. Bioinformatics 2009 Jun 1; 25(11) 1422-3. doi:10.1093/bioinformatics/btp163 pmid:19304878.
Bassi S (2007) A Primer on Python for Life Science Researchers. PLoS Comput Biol 3(11): e199. doi:10.1371/journal.pcbi.0030199
Book: “Python for Bioinformatics” http://tinyurl.com/biopython and www.py4bio.com
Mailing list:
Users: http://lists.open-bio.org/mailman/listinfo/biopython/
Developers: http://lists.open-bio.org/mailman/listinfo/biopython-dev
Python in Argentina: www.python.org.ar
Thank you!