Download pptx - APIs and Synthetic Biology

Transcript
Page 1: APIs and Synthetic Biology

1

The API

Uri Laserson | @laserson | [email protected] May 2014

Page 2: APIs and Synthetic Biology

2

The API, or how to make your computational collaborators love you

Uri Laserson | @laserson | [email protected] May 2014

Page 3: APIs and Synthetic Biology

3

The API, or how to make your computational collaborators love you, and also some perspectives on engineering biology and immunologyUri Laserson | @laserson | [email protected] May 2014

Page 4: APIs and Synthetic Biology

4

Page 5: APIs and Synthetic Biology

5

NCBI Sequence Read Archive (SRA)

Today…1.14 petabytes

One year ago…609 terabytes

Page 6: APIs and Synthetic Biology

For every “-ome” there’s a “-seq”

Genome DNA-seq

TranscriptomeRNA-seqFRT-seqNET-seq

Methylome Bisulfite-seq

Immunome Immune-seq

ProteomePhIP-seqBind-n-seq

Page 7: APIs and Synthetic Biology

7

Crappy academic code

counts_dict = {}for chain in vdj.parse_VDJXML(inhandle): try: counts_dict[chain.junction] += 1 except KeyError: counts_dict[chain.junction] = 1

for count in counts_dict.itervalues(): print >>outhandle, np.int_(count)

Page 8: APIs and Synthetic Biology

8

Crappy academic code

counts_dict = {}for chain in vdj.parse_VDJXML(inhandle): try: counts_dict[chain.junction] += 1 except KeyError: counts_dict[chain.junction] = 1

for count in counts_dict.itervalues(): print >>outhandle, np.int_(count)

SELECT count(*) FROM antibodies GROUP BY junction

vs.

Page 9: APIs and Synthetic Biology

9

What is an API?

Page 10: APIs and Synthetic Biology

10

What is an API?

• Application Programming Interface• Contract (between machines)• Specifications for:

1. Procedures and methods2. Data structures/messages

Page 11: APIs and Synthetic Biology

11

Stripe API

Page 12: APIs and Synthetic Biology

12

Stripe API

Page 13: APIs and Synthetic Biology

13

Java API

public interface List<E> { int size(); boolean isEmpty(); boolean contains(Object o); boolean add(E e); void add(int index, E element); boolean remove(Object o);}

Page 14: APIs and Synthetic Biology

14

Python DB API v2.0 (PEP 249)

http://legacy.python.org/dev/peps/pep-0249/

Page 15: APIs and Synthetic Biology

15

Why use an API?

• Encapsulation/interfaces/abstraction• Loose-coupling of components• Reusable services• Service-oriented architecture

Page 16: APIs and Synthetic Biology

16

Linked-In’s Loose Coupling Architecture

Page 17: APIs and Synthetic Biology

17

Linked-In’s Loose Coupling Architecture

Page 18: APIs and Synthetic Biology

18

(If This Then That)Stitching APIs together

https://ifttt.com/recipes#popular

Page 19: APIs and Synthetic Biology

19

Page 20: APIs and Synthetic Biology

20

IMGT

Page 21: APIs and Synthetic Biology

21

IMGT “Spec”

http://www.imgt.org/IMGTScientificChart/

Page 22: APIs and Synthetic Biology

22

IMGT’s API is an FTP site

Page 23: APIs and Synthetic Biology

23

IMGT does not have an API

def __initVQUESTform(self): # get form request = urllib2.Request( 'http://imgt.cines.fr/IMGT_vquest/vquest?livret=0&Option=humanIg') response = urllib2.urlopen(request) forms = ClientForm.ParseResponse(response, form_parser_class=ClientForm.XHTMLCompatibleFormParser, backwards_compat=False) response.close() form = forms[0] # fill out base part of form - Synthesis view with no extra options - TEXT form['l01p01c03'] = ['inline'] form['l01p01c07'] = ['2. Synthesis'] form['l01p01c05'] = ['TEXT'] # may need to be 'TEXT' form['l01p01c09'] = ['60'] form['l01p01c35'] = ['F+ORF+ in-frame P'] form['l01p01c36'] = ['0'] form['l01p01c40'] = ['1'] # ['1'] for searching with indels form['l01p01c25'] = ['default’] ...

Page 24: APIs and Synthetic Biology

24

Haussler and genomics services

Page 25: APIs and Synthetic Biology

25

Google Genomics API

Page 26: APIs and Synthetic Biology

26

Google Genomics API

Page 27: APIs and Synthetic Biology

27

Flask/Bottle web server example

@route("/receptor/<id>")def lookup_receptor(id): # get the raw read

@route("/sample/<sample_id>")def sample_summary(sample_id): # impl for getting sample information; can return: # * summary of repertoire information # (num reads, VDJ distribution, etc.) # * demographic info

@route("/sample/<sample_id>/common_junctions")def common_junctions(sample_id): # impl for getting the most common CDR3s

Page 28: APIs and Synthetic Biology

28

Genomics ETL has converged on standards

.fastq .bam .vcf

short read alignment

genotype calling analysisbiochemistry

Page 29: APIs and Synthetic Biology

29

VCF##fileformat=VCFv4.1##fileDate=20090805##source=myImputationProgramV3.1##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta##contig=<ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy=x>##phasing=partial##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">##FILTER=<ID=q10,Description="Quality below 10">##FILTER=<ID=s50,Description="Less than 50% of samples have data">##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">#CHR POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA0000320 14370 rs605 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:320 1110696 rs604 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.6 GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:420 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:2

Page 30: APIs and Synthetic Biology

30

What about immune data?

.fastq .bam .vcf

short read alignment

genotype calling analysisbiochemistry

.???immune receptor alignment

Page 31: APIs and Synthetic Biology

31

Multiple models for same types: VDJFasta

sub new { my ($class) = @_; my $self = {}; $self->{filename} = ""; $self->{headers} = []; $self->{sequence} = []; $self->{germline} = []; $self->{nseqs} = 0; $self->{mids} = {};

$self->{accVsegQstart} = {}; # example: 124 $self->{accVsegQend} = {}; # example: 417 $self->{accJsegQstart} = {}; $self->{accJsegQend} = {}; $self->{accDsegQstart} = {};

Page 32: APIs and Synthetic Biology

32

Multiple models for same types: vdj

class ImmuneChain(SeqRecord): def cdr3(self): return len(self.junction)

def num_mutations(self): aln = self.letter_annotations['alignment'] return aln.count('S') + aln.count('I') def v(self): return self.__getattribute__('V-REGION') \ .qualifiers['allele'][0] def v_seq(self): return self.__getattribute__('V-REGION') \ .extract(self.seq.tostring())

Page 33: APIs and Synthetic Biology

33

Interoperability/services depend on being able to communicated data

Page 34: APIs and Synthetic Biology

34

CSV

9 CCTG_PRCONS=IGHC1_R1_IGM unproductive Homsap IGHV5-51*01 F, or Homsap IGHV5-51*03 F Homsap IGHJ4*02 F Homsap 12 GGGG_PRCONS=IGHC3_R1_IGA productive Homsap IGHV3-11*01 F Homsap IGHJ1*01 F Homsap IGHD2-2*03 F .......13 CTTC_PRCONS=IGHC5_R1_IGG unproductive Homsap IGHV1-2*02 F Homsap IGHJ5*02 F Homsap IGHD5-18*01 F .......18 ACTT_PRCONS=IGHC3_R1_IGA productive Homsap IGKV3-15*01 F, or Homsap IGKV3D-15*01 F or Homsap IGKV3D-15*02 P Homsap 20 GGAC_PRCONS=IGHC5_R1_IGG productive Homsap IGHV4-61*02 F Homsap IGHJ4*02 F Homsap IGHD1-26*01 F .......25 TCGT_PRCONS=IGHC2_R1_IGD productive Homsap IGHV3-23*01 F, or Homsap IGHV3-23*04 F or Homsap IGHV3-23D*01 F Homsap 26 GGTG_PRCONS=IGHC5_R1_IGG productive Homsap IGHV4-34*01 F, or Homsap IGHV4-34*02 F or Homsap IGHV4-34*08 F Homsap 28 GTGA_PRCONS=IGHC5_R1_IGG productive Homsap IGHV1-46*01 F, or Homsap IGHV1-46*02 F or Homsap IGHV1-46*03 F Homsap 31 ACCC_PRCONS=IGHC1_R1_IGM productive Homsap IGHV3-9*01 F, or Homsap IGHV3-9*02 F Homsap IGHJ3*02 F Homsap 36 GCAA_PRCONS=IGHC1_R1_IGM productive Homsap IGHV3-9*01 F, or Homsap IGHV3-9*02 F Homsap IGHJ2*01 F Homsap 39 GCAA_PRCONS=IGHC1_R1_IGM productive Homsap IGHV3-7*01 F Homsap IGHJ6*02 F Homsap IGHD1-7*01 F .......40 GGGT_PRCONS=IGHC1_R1_IGM productive Homsap IGHV4-34*01 F, or Homsap IGHV4-34*02 F or Homsap IGHV4-34*08 F Homsap 42 TAGG_PRCONS=IGHC5_R1_IGG productive Homsap IGHV4-39*01 F, or Homsap IGHV4-39*05 F Homsap IGHJ4*02 F Homsap 47 CAAA_PRCONS=IGHC1_R1_IGM productive Homsap IGHV3-15*01 F, or Homsap IGHV3-15*02 F Homsap IGHJ6*02 F Homsap 48 AGAA_PRCONS=IGHC5_R1_IGG unproductive Homsap IGHV3-30*04 F, or Homsap IGHV3-30-3*01 F or Homsap IGHV3-30-3*02 F or Ho52 GCAG_PRCONS=IGHC1_R1_IGM productive Homsap IGHV3-23*01 F, or Homsap IGHV3-23*04 F or Homsap IGHV3-23D*01 F Homsap 53 AACC_PRCONS=IGHC3_R1_IGA productive Homsap IGHV3-30*02 F Homsap IGHJ4*02 F Homsap IGHD5-18*01 F .......

Page 35: APIs and Synthetic Biology

35

XML

<ImmuneChain> <c>IGHD</c> <barcode>RL014</barcode> <j_start_idx>389</j_start_idx> <seq>TTGTGGCTATTTTAAA ... CTCGGACT</seq> <descr>003699_0091_0140</descr> <tag>coding</tag> <clone>IGHV3-43_IGHJ4|387</clone> <j>IGHJ4*02</j> <v_end_idx>314</v_end_idx> <v>IGHV3-43*01</v> <junction>TGTGCAAAAGATAATCT ... TCTTTGACTACTGG</junction> <d>IGHD5-24*01</d></ImmuneChain>

Page 36: APIs and Synthetic Biology

36

JSON

{ "v": "IGHV4-39*02", "seq": "CCTATCCCCCTGTGTGCCTT ... CTCCACCAAG", "num_mutations": 43, "name": "HG2DXMN01CY8UH", "letter_annotations": { "alignment": "..............S....S....3333333333333333........S.." }, "junction_nt": "GCGAGGGGCCGATGGGACTTTTATTACATGGACGTC", "j": "IGHJ6*03", "annotations": { "usearch_90_cluster": "6277", "experiment_date": "20120119", "donor": "17517", "sample_type": "memory_B_cells", "source": "SeqWright", "tags": ["revcomp", "coding"], "taxonomy": [] }, "d": "IGHD3-10*01", "features": [ { "strand": 1, "type": "V-REGION", "location": [51, 356], "qualifiers": { "CDR_length": ["[10.7.2]"], "codon_start": ["1"], "gene": ["IGHV4-39"], "allele": ["IGHV4-39*02"] } }, ... ]}

http://www.json.org/

Page 37: APIs and Synthetic Biology

37

JSON

{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c6172308000000" }, "annotations" : { "D-REGION" : "IGHD3-10*01", "accessions" : "HG2DXMN01CY8{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c6172308000001" }, "annotations" : { "D-REGION" : "IGHD3-9*01", "accessions" : "HG2DXMN01A3VH{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c6172308000002" }, "annotations" : { "D-REGION" : "IGHD3-10*01", "accessions" : "HG2DXMN01BC6{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c6172308000003" }, "annotations" : { "D-REGION" : "IGHD6-19*01", "accessions" : "HG2DXMN01DYU{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c6172308000004" }, "annotations" : { "D-REGION" : "IGHD6-19*01", "accessions" : "HG2DXMN01A8F{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c6172308000005" }, "annotations" : { "D-REGION" : "IGHD3-9*01", "accessions" : "HG2DXMN01BDI2{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c6172308000006" }, "annotations" : { "D-REGION" : "IGHD6-19*01", "accessions" : "HG2DXMN01BS2{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c6172308000007" }, "annotations" : { "D-REGION" : "IGHD6-19*01", "accessions" : "HG2DXMN01DLL{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c6172308000008" }, "annotations" : { "D-REGION" : "IGHD6-25*01", "accessions" : "HG2DXMN01BLF{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c6172308000009" }, "annotations" : { "D-REGION" : "IGHD3-3*01", "accessions" : "HG2DXMN01D4TL{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c617230800000a" }, "annotations" : { "D-REGION" : "IGHD3-10*01", "accessions" : "HG2DXMN01BU6{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c617230800000b" }, "annotations" : { "D-REGION" : "IGHD2-2*03", "accessions" : "HG2DXMN01BIMG{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c617230800000c" }, "annotations" : { "D-REGION" : "IGHD3-3*01", "accessions" : "HG2DXMN01BM9Z{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c617230800000d" }, "annotations" : { "D-REGION" : "IGHD2-2*03", "accessions" : "HG2DXMN01BH9Q{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c617230800000e" }, "annotations" : { "D-REGION" : "IGHD6-19*01", "accessions" : "HG2DXMN01BR3

Page 38: APIs and Synthetic Biology

38

Binary formats

• Protobuf, Thrift, or Avro• Flexible data model

• All common primitive types (e.g. int, double string)• Support nested types, including arrays and maps

• Efficient binary encoding• Code generation for many languages (binary

compatible)• Support for schema evolution• Support IDL for data types and services

Page 39: APIs and Synthetic Biology

39

Thrift example: Twitter

service Twitter { void ping(); bool postTweet(1:Tweet tweet); TweetSearchResult searchTweets(1:string query);}

struct Tweet { 1: required i32 userId; 2: required string userName; 3: required string text; 4: optional Location loc; 16: optional string language = "english"}

Page 40: APIs and Synthetic Biology

40

Thrift example: Immune receptor

cd ~/repos/kiwithrift --gen java kiwi-format/src/main/resources/thrift/kiwi.thriftthrift --gen py:new_style kiwi-format/src/main/resources/thrift/kiwi.thrift

See: https://github.com/laserson/kiwi

Page 41: APIs and Synthetic Biology

41

Questions?

Page 42: APIs and Synthetic Biology

42

Biological parts specifications

• Library of parts with well-characterized input-output characteristics

• In total, similar to API spec

Canton, Nat. Biotech. 26: 787 (2008)

Page 43: APIs and Synthetic Biology

43

Engineering signaling pathways at inputs/outputs

Lim, Nat. Rev. Mol. Cell 11: 393 (2010)

Page 44: APIs and Synthetic Biology

44

Bottom-up genetic circuit design

Brophy, Nature Meth. 11: 508 (2014)

Page 45: APIs and Synthetic Biology

45

Bottom-up genetic circuit design

Brophy, Nature Meth. 11: 508 (2014)

Page 46: APIs and Synthetic Biology

46

Predict composability of genetic elements

Kosuri, PNAS 110: 14024 (2013)

• 114 promoters x 111 RBS

“…rather than relying on prediction or standardization, we can screen synthetic libraries for desired behavior.”

Page 47: APIs and Synthetic Biology

47

Most addressableCheapest to create

ZFN => TALEN => CRISPR/CasLeast addressableMost expensive to create

Page 48: APIs and Synthetic Biology

48

Addressability for precision nanoscale engineering

Douglas, NAR 37: 5001(2009)

Page 49: APIs and Synthetic Biology

49

Addressability for precision nanoscale engineering

Douglas, Nature 459: 414 (2009)

Page 50: APIs and Synthetic Biology

50

Evolution for encapsulation: an evolved electronic thermometer

http://www.genetic-programming.com/hc/thermometer.html

Page 51: APIs and Synthetic Biology

51

Lycopene synthesis optimization

Wang, Nature 460: 894 (2009)

Page 52: APIs and Synthetic Biology

52

Evolutionary encapsulation for signaling pathway engineering

Peisajovich, Science 328: 368 (2010)

Page 53: APIs and Synthetic Biology

53

Evolutionary encapsulation for signaling pathway engineering

Peisajovich, Science 328: 368 (2010)

Page 54: APIs and Synthetic Biology

54

Genetic isolation with Re.coli

Lajoie, Science 342: 357 (2013)

Page 55: APIs and Synthetic Biology
Page 56: APIs and Synthetic Biology

So far, we discussed antibody-only data analysis

Page 57: APIs and Synthetic Biology

Antigen-only data generation

Page 58: APIs and Synthetic Biology

Larman, Nat. Biotech. 29: 535 (2011)

Ben Larman

Steve Elledge

Agilent OLS array

Page 59: APIs and Synthetic Biology

59

Phage immunoprecipitation sequencing (PhIP-seq)

Page 60: APIs and Synthetic Biology

60

Patient A Replica 1

Pat

ient

A R

epl

ica

2

SAPK4

NOVA1

TGIF2LX

log10(-log10 P-value)

PhIP-seq proof-of-principle

Page 61: APIs and Synthetic Biology

61

‘Forward vaccinology’

Page 62: APIs and Synthetic Biology

62

‘Reverse vaccinology’

Page 63: APIs and Synthetic Biology

63

‘Immunization without vaccination’

Page 64: APIs and Synthetic Biology

64

Encapsulation for cancer immunotherapy through TMG processing

Tran, Science 344: 641 (2014)

Page 65: APIs and Synthetic Biology

65

Other examples?

Page 66: APIs and Synthetic Biology

66

Conclusions

• The API perspective helps organize and communicate data

• Use sane file formats if possible:• JSON for lightweight work• Thrift/Avro for heavyweight serialization/communication

• Decouple data modeling for implementation details• Biological engineering: what abstractions are

available?• Evolution as nature’s encapsulator

Page 67: APIs and Synthetic Biology

67


Recommended