19
Peptide Informatics Bridging the gap between small-molecule and large- molecule systems Lisa Sach-Peltason Data Science, pRED Informatics, Roche Basel

Peptide Informatics - Bridging the gap between small-molecule and large-molecule systems

Embed Size (px)

DESCRIPTION

Presented by Lisa Sach-Peltason (Roche, Basel) at 2014 Bio-IT World

Citation preview

Page 1: Peptide Informatics - Bridging the gap between small-molecule and large-molecule systems

Peptide InformaticsBridging the gap between small-molecule and large-

molecule systems

Lisa Sach-Peltason

Data Science, pRED Informatics, Roche Basel

Page 2: Peptide Informatics - Bridging the gap between small-molecule and large-molecule systems

Peptide Therapeutics – An Emerging Modality

US FDA approved drugs (2009-2011)

Small molecule

34

Protein

9

Monocl. antibody

8

Peptide

8

Natural product

6

Amino acid

5

Steroid

2

Nucleoside

1 Enzyme

1

Macrocycle

1 Other

1

Adapted from Albericio & Kruger; Future Med. Chem. (2012), 4(12), 1527-1531.

Page 3: Peptide Informatics - Bridging the gap between small-molecule and large-molecule systems

Peptide Therapeutics – An Emerging Modality

Saladin et al.; IDrugs (2009), 12(12), 779-784.

Therapeutic categories of peptide candidates

entering clinical trials (1980-2007)

Page 4: Peptide Informatics - Bridging the gap between small-molecule and large-molecule systems

Peptide Therapeutics – Opportunities

Selectivity GenerationIntracellular

accessDelivery Action

Oral

delivery

Small

molecules

Low to

highsynthetic High all routes Antago./

AgonistYes

Peptides Highsynthetic or

recombinantPossible

i.v. / s.c.non-parenteral

delivery feasible

Agonist / Antagonist

Potential

Biologics High recombinant Low i.v. / s.c. Antago./ Agonist

No

Proven Advantages of Peptides

• Efficacy at extracellular targets, especially for polar or shallow binding pockets

• Rapid optimization

• Low off-target pharmacology

• High target selectivity

*

* reflects current status; future potential

for peptide antagonists, e.g., PPI’s

Page 5: Peptide Informatics - Bridging the gap between small-molecule and large-molecule systems

Peptides at Roche

� Growing asset of internal and external peptide compounds

• Global Roche compound DB: >25,000 compounds registered with PEPTIDE flag (of 3.9M

total)

• Increasing demand for informatics infrastructure and support for peptide projects

Combination Chart

Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3

2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

993

920

850

780

710

640

567

496

426

355

284

213

140

70

0

26200

25400

24600

23800

23000

22200

21400

20600

19800

19000

18200

17400

16600

15800

15000

Ne

w r

eg

istr

ati

on

s

Peptides in IRCI 2003-2013

To

tal

no

. p

ep

tid

es

Page 6: Peptide Informatics - Bridging the gap between small-molecule and large-molecule systems

Peptide Therapeutics – Informatics Challenges

Molecule graphs Sequences

Cheminformatics Bioinformatics

Similarity searching

SAR analysis, visualization

Property prediction

Small-molecule registration

Sequence searching

Alignment

Sequence analysis

Size, complexity

Non-standard residues

Chemical modifications

No format standards

Peptide informatics

Figure adapted from J.H.Jensen, ChemAxon European UGM, 2012

Page 7: Peptide Informatics - Bridging the gap between small-molecule and large-molecule systems

Data Capture Challenges

Peptide sequence format

� IUPAC-IUB Nomenclature and Symbolism for Amino Acids and Peptides

(“3AA”, 1983)

• 3-letter code for standard and common non-standard amino acids

• Symbolism for representing amino acid sequences

H -Asp-Arg-Val-DTyr-Ile-His-Pro-Phe-OHAc - -NH2Boc- - H … …

Separator /

Peptide bond

N-terminal

specification

Residue

C-terminal

specificationStereoconfiguration

Page 8: Peptide Informatics - Bridging the gap between small-molecule and large-molecule systems

Data Capture Challenges

How to capture non-standard sequence elements?

� Residue symbols

� Modified amino acids

OH

NH2

O L-Norvaline Nva (discouraged by IUPAC but commonly used)

L-2-Aminovaleric acid? Avl (IUPAC)

L-2-Aminopentanoic acid? Ape (IUPAC)

O

O

OH

NH2

L-4-Benzoylphenylalanine 4Bpa

Phe(4-Bz) (systematic; avoid combinatorial

explosion)

Page 9: Peptide Informatics - Bridging the gap between small-molecule and large-molecule systems

Data Capture Challenges

How to capture non-standard sequence elements?

� Cyclic peptides

� Cross-links (disulfide bridges within or across chains, isopeptide bonds, …)

O

O

O

O

O

O

O

O

NH

O

N

NH

NH

NH2

NH

NH

O

N

NH

NH

NH2

NH

(IUPAC recommendation,

depiction rather than text)

cyclo[Leu-DPhe-Pro-Val-Orn-Leu-DPhe-Pro-Val-Orn]

H-Cys(1)-Tyr-Ile-Gln-Asn-Cys(1)-Pro-Leu-Gly-NH2

(IUPAC)

SMILES-like notation; see

also Biochemfusion’s PLN

Page 10: Peptide Informatics - Bridging the gap between small-molecule and large-molecule systems

Peptide Data Inventory

Digest Roche peptides with NextMove’s Sugar&Splice

26000

24000

22000

20000

18000

16000

14000

12000

10000

8000

6000

4000

2000

0

Top 50 monomer frequencies of 23k Roche peptides

Standard AA (without Gly and Pro): 93%

Top 50 monomers: 98%

Page 11: Peptide Informatics - Bridging the gap between small-molecule and large-molecule systems

Peptide Data Inventory

Monomer library

Roche Peptide Building Blocks

• ~200 manually curated templates

• Up to 600 monomers extracted from

Roche peptides

• Direct cartridge with normalization

& uniqueness check

Structure ID Short

Name

Chemical

Name

Category CAS Roche

Number

Ala A L-Alanine L-AA 56-41-7 ROxyz

Fmoc Fmoc 9-Fluorenylmethoxy-carbonyl

SAG

Sequence registrationPeptide drawing

Page 12: Peptide Informatics - Bridging the gap between small-molecule and large-molecule systems

Peptide Sequence Information

Harmonizing peptide registration

LINEAR STRUCTURE DESCRIPTION field

Draw structure from local

monomer templates

H-His-Asp-Glu-Phe-Glu-Arg-His-Ala-Glu-Gly- ... -OH

Enter sequence manually

No format standards or validation

PEPTIDE comment

Compound registration

system

Page 13: Peptide Informatics - Bridging the gap between small-molecule and large-molecule systems

Peptide Sequence Information

Harmonizing peptide registration

Synchronize drawing

templates with monomer library

Automatic sequence generation &

validation

Consistent

structure and

sequence

information

Atoms and bonds

• Chemical identification

• Novelty check

• (Sub-)Structure

searches

Sequence

• Depiction

• Visual comparison

• Sequence

searches

Tools for data analysis

Building

block library H-His-Asp-Glu-Phe-Glu-Arg-His-Ala-Glu-Gly- ... -OH

LINEAR STRUCTURE DESCRIPTION fieldPEPTIDE comment

Compound registration

system

Page 14: Peptide Informatics - Bridging the gap between small-molecule and large-molecule systems

Peptide Drawing

Central template management in Accelrys Draw

Roche Peptide Building Blocks

• ~200 manually curated templates

• Categories: L-AA, D-AA, nS-AA,

Linkers, Attachments, Resins

Accelrys Draw Add-In

• Download templates to Draw

• Regular check for updates

• Register new templates via

Sequence Template Manager

• Validate new templates

Page 15: Peptide Informatics - Bridging the gap between small-molecule and large-molecule systems

Peptide Sequence Information

Sequence generation with NextMove’s Sugar&Splice

� Computational perception of peptide sequence from chemical structure

• Output of sequence in standard format

• Lookup of non-standard names in building block library

� Pipeline Pilot wrapper with easy-to-use web interface for registration

� Maintenance procedure for batch registration and validation

• Check for peptides with empty/outdated sequences and update

• Process legacy peptides and complete sequence information

O

O

O

O

O

O

O

O

NH

O

N

NH

NH

NH2

NH

NH

O

N

NH

NH

NH2

NH

cyclo[Leu-DPhe-Pro-Val-Orn-Leu-DPhe-Pro-Val-Orn]

Building

block librarySugar & Splice

Page 16: Peptide Informatics - Bridging the gap between small-molecule and large-molecule systems

Peptide Sequence Information

Interface to biologics landscape

� Sequence-based analysis tools

• Sequence alignment, BLAST database search, …

• Conversion to standard FASTA via Sugar & Splice:

– Remove cycles and cross-links

– Replace non-standard residues by X or the closest natural analog

– Convert D-amino acids to L form

� Data exchange with biologics research

• HELM format for macromolecule representation

• Shared dictionary for peptide building blocks

• Conversion to HELM via Sugar & Splice

cyclo[Leu-DPhe-Pro-Val-Orn-Leu-DPhe-Pro-Val-Orn]

PEPTIDE1{L.[dF].P.V.[Orn].L.[dF].P.V.[Orn]}$PEPTIDE 1,PEPTIDE1,10:R2-1:R1$$$

LFPVXLFPVX

Page 17: Peptide Informatics - Bridging the gap between small-molecule and large-molecule systems

Summary & Benefits

Re-use and adapt small-molecule tools and systems

Ensure consistent structure and sequence information

Interface to large-molecule world

Benefits

• Maximized data value & quality through harmonized sequence information

• Enable automated sequence searches & analysis for synthetic peptides

• Time savings for peptide drawing, registration and analysis

• Future prospect: store sequence information within the molecular structure

Compound registration

system

H-His-Asp-Glu-Phe-Glu-Arg-His-Ala-Glu-Gly- ... -OH

Page 18: Peptide Informatics - Bridging the gap between small-molecule and large-molecule systems

Acknowledgments

Discovery Chemistry

Konrad Bleicher

Eric Kitas

Kersten Klar

Betty Hennequin

Katja Ostmann

Adrian Schäublin

Patrick Studer-Schriber

pRED Informatics

Fausto Agnetti

Gerd Blanke

Gunther Dörnen

Sébastien Fournier

Werner Gotzeina

Peter Hilty

Ralf Horstmöller

Dieter Imark

Frederic Klein

Stefan Klostermann

Francesca Milletti

Denis Ribaud

Jörg Schmiedle

Daniel Stoffler

Klaus Weymann

Steering Committee

Alexander Alanine

Margret Assfalg

Ralph Haffner

Harald Mauser

Martin Stahl

Accelrys

François Culot

Jonas Danielsson

James Jack

Georgios Rafeletos

NextMove Software

Roger Sayle

Page 19: Peptide Informatics - Bridging the gap between small-molecule and large-molecule systems

Doing now what patients need next