Introduction to Bioinformatics - Shandong...

Preview:

Citation preview

1

Introduction to BioinformaticsEnglish Courses for Graduate Students

http://1.51.212.243/bioinfo.html

Dr. rer. nat. Jing Gong

Cancer Research Center

School of Medicine, Shandong University

2011.11.02

Introduction to BioinformaticsEnglish Courses for Graduate Students

2

Chapter 5

Tree

Introduction to BioinformaticsEnglish Courses for Graduate Students

3

Introduction to BioinformaticsEnglish Courses for Graduate Students

Evolution

This article was used to oppose creationism or its variant called intelligent design.

4

Introduction to BioinformaticsEnglish Courses for Graduate Students

EvolutionDarwin, Charles (1809-1882) 《The Origin of Species》(1859)

5

Introduction to BioinformaticsEnglish Courses for Graduate Students

How to Study the Evolutionary History1. The most authentic evidences are fossils! But fossils are scattered, not complete, not systematic.

2. Comparative morphology and comparative anatomy: to determine general framework of evolution; many details are controversial.

6

Introduction to BioinformaticsEnglish Courses for Graduate Students

How to Study the Evolutionary History3. Computational molecular evolution: phylogenetic tree.

Linus Pauling advanced the theory of molecular evolution in 1964.

To investigate phylogenetic relationship between species, based on some certain molecular characteristics across these species.

Evolution process happened on the level of molecules: DNA, RNA and protein.

Basic assumptions: 1) Nucleic acid sequences and protein sequences contain all information of evolutionary history of species; 2) Molecular clock: the rate of evolutionary change (the number of amino acid differences) of a certain protein was approximately constant over time and over different lineages. => The more similar two homologous proteins are, the closer they are to their common ancestor.

7

Introduction to BioinformaticsEnglish Courses for Graduate Students

Kinds of Homologous GenesOrthologs: They’re separated by speciation — is the phenomenon during

which a common ancestor gives birth to two subgroups that slowly drift away from their common genetic makeup to become distinct species. Orthologsusually have similar functions and structure.

Paralogs: Paralogs are homologues separated by a duplication event, meaning that within a genome, a gene was duplicated. One of the duplicates may have kept the original function while the other duplicate could have acquired a new function.

Xenologs: Xeno is a Greek word that means “foreigner.” Xenologs result from a lateral transfer between two organisms — a direct DNA transfer between two species. This means that one of the species contains a gene that does not have the same history as the genome in which it is inserted. This is often seen between pathogenetic bacteria and humans.

8

Introduction to BioinformaticsEnglish Courses for Graduate Students

Kinds of Homologous Genes

9

Introduction to BioinformaticsEnglish Courses for Graduate Students

Phylogenetic TreeWhat can phylogenetic trees do for you:

For a certain protein/gene, determining the closest relatives of the organism that you’re interested in: For instance, if you have a ribosomal RNA from a new bacterium, you can place it on a phylogenetic tree computed with all known ribosomal RNAs. This can tell you who this bacterium really is or who is its closest relatives.

Discovering the function of a new protein/gene: If the closest relatives of your new protein/gene are well-characterized, their functions can be extended to your protein/gene.

Retracing the origin of a gene: Most genes within a genome travel together through evolutionary time. However, from time to time, individual genes may jump from one species to another. Phylogenetic trees are a great way to reveal such events, which are called horizontal (or lateral) transfers.

10

Introduction to BioinformaticsEnglish Courses for Graduate Students

Phylogenetic TreeConceptions:

leaf / outer node

branch / lineage

inner node

root

Letters represent different species or a certain protein/DNA from different species.

11

Introduction to BioinformaticsEnglish Courses for Graduate Students

Kinds of Tree Branches

All the above trees represent the same evolutionary relationships.

Cladogram Change-based phylogram Time-based phylogram

Branch lengths do Branch lengths indicate Inner nodes indicatenot mean anything. numbers of evolutionary branching time points.

changes

12

Introduction to BioinformaticsEnglish Courses for Graduate Students

Phylogenetic Tree

13

Introduction to BioinformaticsEnglish Courses for Graduate Students

Topology of Phylogenetic Tree

14

Introduction to BioinformaticsEnglish Courses for Graduate Students

Choosing Right Sequences for the Right TreeWhen you build a phylogenetic tree, you make the assumption that the sequences you are comparing have a common ancestor (sequences are similar enough). Then, you should do multiple sequence alignment for the sequences. Should you do this on the protein or on the DNA sequence?

If your DNA sequences are more than 70% identical: You can make a DNA multiple sequence alignment.

If your DNA sequences are less than 70% identical: If your sequences code for proteins, it is much safer to translate them into proteins and to build the multiple sequence alignment with the proteins. If your sequences are too similar at the protein level, you can thread the DNA sequences back onto the protein alignment using pal2nal: http://www.bork.embl.de/pal2nal/.

In practice, unless your sequences are almost identical, it is easier to keep working at the protein level. This may not be as accurate as working with DNA sequences, but, in most cases, you can expect the results to be reasonably good.

15

Introduction to BioinformaticsEnglish Courses for Graduate Students

Choosing Right Sequences for the Right TreeIf you select all the paralogous members of a large human gene family, your gene tree tells the story of this gene family only. You can only use it to reconstruct the chain of duplications that led from one single ancestral gene to the current situation.

If you select a group of genes that are all orthologous from different species, the gene tree you get looks very much like a species tree — which lets you reconstruct the speciation that occurred while the species you’re looking at (or their ancestors) were diverging.

16

Introduction to BioinformaticsEnglish Courses for Graduate Students

Algorithms of Tree Reconstruction

Maximum Parsimony (MP): Closely related sequences, accurate, sequence number <= 12.

Distance (Neighbor Joining, NJ): Distantly/closely related sequences, not very accurate.

Maximum Likelihood (ML): Distantly related sequences, very accurate.

Speed:

Distance > Maximum Parsimony > Maximum Likelihood

17

Introduction to BioinformaticsEnglish Courses for Graduate Students

Algorithms of Tree Reconstruction

18

Introduction to BioinformaticsEnglish Courses for Graduate Students

Preparing Your Multiple Sequence AlignmentComputing your multiple sequence alignment:ClustalW2: http://www.ebi.ac.uk/Tools/msa/clustalw2/MUSCLE: http://www.ebi.ac.uk/Tools/msa/muscle/T-coffee: http://tcoffee.crg.cat/

Removing bad columns that affect the tree quality:1. Make sure there are as many gap-free columns as possible. 2. Remove the extremities of your multiple alignment.3. Remove the gap-rich regions of your alignment.4. Be sure to keep the most informative blocks.

Before using your MSA for building a tree, you must make sure that it is as accurate as possible.

19

Introduction to BioinformaticsEnglish Courses for Graduate Students

Preparing Your Multiple Sequence Alignment1. Make sure there are as many gap-free columns as possible. Gaps cause trouble for most phylogeny reconstruction methods. Some of these methods, such as ClustalW, can use the complete-deletion techniques to ignore every column that contains a gap.

columns to remove

20

Introduction to BioinformaticsEnglish Courses for Graduate Students

Preparing Your Multiple Sequence Alignment2. Remove the “bad” terminals of your multiple alignment.The N-terminus and the C-terminus tend to be poorly conserved — and therefore poorly aligned. You can safely remove them.

columns to remove

21

Introduction to BioinformaticsEnglish Courses for Graduate Students

Preparing Your Multiple Sequence Alignment3. Remove the gap-rich regions of your alignment.Internal, gap-rich regions in a multiple sequence alignment often correspond to loops. Even if your program returns an alignment, it does not mean that this alignment is meaningful.

columns to remove

22

Introduction to BioinformaticsEnglish Courses for Graduate Students

Preparing Your Multiple Sequence Alignment4. Be sure to keep the most informative blocks.The ideal multiple alignment for building a tree would be a high-qualityalignment of sequences with a low level of identity, so that each positioncontains a trace of the family history.

columns to keep

23

Introduction to BioinformaticsEnglish Courses for Graduate Students

How to Delete Columns with WORDWhile pressing the Alt key on your

keyboard, use the mouse to select entire columns in your alignment.

When you’ve selected everything you want to remove, press the Delete key to remove the selected block.

+

24

Introduction to BioinformaticsEnglish Courses for Graduate Students

Computing Your Tree

is, above all, a multiple-sequence-alignment package, but some of the ClustalWservers let you use this program to produce phylogenetic trees.

Guide Tree is NOT a phylogenetic tree.!

25

Introduction to BioinformaticsEnglish Courses for Graduate Students

EMBL ClustalW2 http://www.ebi.ac.uk/Tools/phylogeny/clustalw2_phylogeny

Computing Your Tree

clustalw.aln

sequences.fasta

26

Introduction to BioinformaticsEnglish Courses for Graduate Students

Computing Your TreeThis tree is much more accurate than a guide tree, and this one clearly shows the genetic relationship between the hippopotamus and the whale, as postulated by Higgins and Grauer a few years ago.

27

Introduction to BioinformaticsEnglish Courses for Graduate Students

Computing Your TreeA phylogram is a phylogenetic tree that has branch lengths proportional to the amount of character change.In cladogram tree, the branch lengths do not represent any change.

28

Introduction to BioinformaticsEnglish Courses for Graduate Students

Computing Your TreeA phylogram is a phylogenetic tree that has branch lengths proportional to the amount of character change.In cladogram tree, the branch lengths do not represent any change.

29

Introduction to BioinformaticsEnglish Courses for Graduate Students

Computing Your TreeDifferent tree representation by choosing display options.

The easiest way to save your tree is to make a screen capture with theprint-screen (Prt Sc) key on your keyboard. You can then cut and pastethis image into your favorite application (PowerPoint, Paint. etc.).

30

Case

Study

Introduction to BioinformaticsEnglish Courses for Graduate Students

31

Introduction to BioinformaticsEnglish Courses for Graduate Students

How SIGIRR inhibit the TLR4 and 7 signaling pathways?

Case 1

Homology modeling of Toll-like receptor ectodomains.

Case 2

32

Introduction to BioinformaticsEnglish Courses for Graduate Students

Case 1

How SIGIRR inhibit the Toll-like receptors TLR4 and 7 signaling pathways?

33

Leucine-rich repeat (LRR)

Ectodomain(ECD)

Transmembranedomain

TIR domain

Introduction to BioinformaticsEnglish Courses for Graduate Students

Background : Structure of Toll-like receptors (TLRs)

TLRs belong to the Toll-like receptor/ interleukin-1 receptor (TLR/IL-1R) superfamily, which is defined by the presence of a conserved cytoplasmicToll/interleukin-1 receptor (TIR) domainconnected to an ectodomain through asingle transmembrane stretch. Their ectodomains consist of 16–28 leucine-rich repeats (LRRs).

34

Introduction to BioinformaticsEnglish Courses for Graduate Students

TLR signaling pathways

These LRRs provide a variety of structural frameworks for the binding of protein and non-protein ligandsincluding lipopoly-saccharide (LPS), lipopeptide, CpG DNA, flagellin, and double-/single-stranded RNA.

35

TLRs are capable of recognizing ligands in a dimer form.

Determined crystal structures of TLR ECD-ligand-ECD complexes:

human TLR2-1,mouse TLR3-3,human TLR4-4,mouse TLR2-6.

Introduction to BioinformaticsEnglish Courses for Graduate Students

36

Upon receptor activation, an intracellular TIR signaling complex is formed between the receptor and downstream adaptor TIR domains. MyD88(Myeloid differentiation primary response protein 88) was the first intracellular adaptor molecule characterized among all known adaptors in the TLR signaling. It consists of an N-terminal death domain (DD) separated from its C-terminal TIR domain by a linker sequence. MyD88 also forms a dimer through DD-DD and TIR-TIR domain interactions when recruited to the receptor complex. MyD88 can recruit IRAK (IL-1RI-associated protein kinases) through its DD to continue signaling and, finally, to induce the nuclear factor-kB (NF-kB) leading to the expression of type I interferons.

Introduction to BioinformaticsEnglish Courses for Graduate Students

TIR

DD

37

Leucine-rich repeats (LRRs)

(single immunoglobulin interleukin-1 receptor-related molecule)

Single immunoglobulin (Ig)

Toll/interleukin-1receptor (TIR) domain

TIR domain

73 AA C-terminal tail

Introduction to BioinformaticsEnglish Courses for Graduate Students

TLR SIGIRR

SIGIRR (Single immunoglobulin interleukin-1 receptor-related molecule), also known as TIR8, was initially identified as an Ig domain-containing receptor of the TLR/IL-1R superfamily. Both the extracellular and intracellular domains of SIGIRR differ from those of other Ig domain-containing receptors, as its single extracellular Ig domain does not support ligand-binding. Its intracellular TIR domain cannot activate NF-kB. Moreover, the TIR domain of SIGIRR extends that of the typical TLR/IL-1R superfamily member by >73 amino acids at the C-terminal (C-tail).

38

Introduction to BioinformaticsEnglish Courses for Graduate Students

Instead, SIGIRR acts as an endogenous inhibitor for MyD88-dependent TLR and IL-1R signaling. This behavior was shown by over expression of SIGIRR in Jurkat or HepG2 cells which showed substantially reduced LPS, CpG DNA or IL-1-induced activation of NF-kB. Thus, SIGIRR has attracted tremendous research interest because of its regulating function in cancer-related inflammation and autoimmunity.

For example, systemic lupus erythematosus (SLE, 系统性

红斑狼疮) is caused by TLR7-mediated induction of type I interferons. Compared with wild type mice Sigirr-deficient mice develop excessive lymphoproliferation when introduced into the context of a lupus susceptibility gene.

Although the significance of SIGIRR has been widely acknowledged, its inhibition mechanism remains unclear owing to a lack of structural information.

mouse B6lpr/lprSigirr-/-

mouse B6lpr/lprSigirr+/+

Lech et al., JEM, 2008

39

bind to TLR4 inhibit signaling

ΔN yes yes

ΔC yes yes

ΔTIR no no

Full-length

yes yes

Mutagenesis studies investigated three deletion mutants of SIGIRR: ΔN (lacking the extracellular Igdomain), ΔTIR (lacking the intracellular TIR domain) and ΔC(lacking the C-tail of the TIR domain with deletion of residues 313–410).

The results showed that only the TIR domain (excluding the C-tail part) is necessary for SIGIRR to inhibit TLR4 signaling.

Nevertheless, detailed structural interaction mechanisms of SIGIRR’s TIR domain are still missing.

Introduction to BioinformaticsEnglish Courses for Graduate Students

Qin et al., 2005 JBC

40

Objective: to find a structural explanation for these TIR-TIR interactions.1. Structure prediction of TIR domains of TLRs, MyD88 and SIGIRR.2. Structure analysis/docking.

Introduction to BioinformaticsEnglish Courses for Graduate Students

Hypothesis: SIGIRR blocks the molecular interface of TLR4 and MyD88 via its TIR domain

41

Introduction to BioinformaticsEnglish Courses for Graduate Students

Step 1 : model construction

Amino acid sequences of the target proteins, human TLR4, TLR7, MyD88, and SIGIRR were extracted from the NCBI protein database.

Three-dimensional models of TLR4, TLR7, MyD88 and SIGIRR (without the C-tail) were constructed by homology modeling. Due to the homology of the target proteins, four common templates were obtained via BLAST search against the Protein Data Bank (PDB). They were TLR1 (1FYV), TLR2 (1FYW), TLR10 (2J67) and IL-1RAPL (1T3G).

In the secondary structure-aided alignments for the homology modeling, the average target-template sequence similarity of TLR4, TLR7, MyD88 and SIGIRR was 51.7%, 50.4%, 44.5% and 42.7%, respectively

Multiple sequence alignment of each target with the templates was generated with MUSCLE and analyzed with Jalview. Because the secondary structure of the TIR domain is composed of well-organized alternating β-strands and α-helixes, the alignments were adjusted manually according to the secondary structure information to improve the alignment quality. The secondary structure of each target was predicted by PSIPRED.

42

Introduction to BioinformaticsEnglish Courses for Graduate Students

Step 1 : model construction

The resulting structures exhibit a typical TIR domain conformation in which a central five-stranded parallel β-sheet (βA- βE) is surrounded by a total offive α-helixes (αA–αE) on both sides. The loops are named by the letters of the secondary structure elements that they connect. For example, the BB-loop connects β-strand B and α-helix B. The structure of NSF-N was identified as a template for SIGIRR’s C-tail through protein threading.

crystal structure of IL1-RAPL (1T3G)

To improve the model quality, ModLoop was used to rebuild the coordinates of the low quality loop regions. Finally, model quality assessment programs: ProQ, ModFOLD and MetaMQAP were used to evaluate the output candidate models and select the most reliable one.

43

Introduction to BioinformaticsEnglish Courses for Graduate Students

Step 1 : model construction

The BB-loop and αE of TLR4, TLR7 and MyD88, along with the BB-loop of SIGIRR, may be important to ensure binding specificity achieved by different combinations of TIRs during signaling.

44

Introduction to BioinformaticsEnglish Courses for Graduate Students

Step 1 : model construction

Surface charge distribution (APBS electrostatics) of BB-loop and αE were represented with red indicating areas of negative charge and blue indicating positive charge.

Accordingly, all BB-loops can be divided into two self-complementary parts. The N-terminal (upper region of BB-loops) is negatively charged, whereas the C-terminal (lower region of BB-loops) is positively charged. The αEs, by contrast, are predominantly positive.

45

Introduction to BioinformaticsEnglish Courses for Graduate Students

Step 2 : protein-protein docking

Unrestrained pairwise model docking included eight complexes of TIR domains: TLR4-TLR4, TLR7-TLR7, MyD88-MyD88, TLR4 dimer-MyD88 dimer (tetramer), TLR7 dimer-MyD88 dimer (tetramer), TLR4-SIGIRR, TLR7-SIGIRR and MyD88-SIGIRR. We used GRAMM-X and ZDOCK, which are widely accepted rigid-body protein-protein docking programs, to predict and assess the interactions between these complexes.

The buried surface interaction area of dimer models were calculated with the protein interfaces, surfaces and assemblies service (PISA) at the European Bioinformatics Institute (EBI).

46

Introduction to BioinformaticsEnglish Courses for Graduate Students

Step 3 : hypothesis model construction

From a large number of docking results we established such a model of SIGIRR inhibiting the TLR7 signaling pathways.

47

Introduction to BioinformaticsEnglish Courses for Graduate Students

Step 3 : hypothesis model construction

From a large number of docking results and we established such a model of SIGIRR inhibiting the TLR7 signaling pathways.

48

Introduction to BioinformaticsEnglish Courses for Graduate Students

Step 3 : hypothesis model construction

From a large number of docking results and we established such a model of SIGIRR inhibiting the TLR7 signaling pathways.

Lech et al., 2005 J. Pathol.

49

Introduction to BioinformaticsEnglish Courses for Graduate Students

Step 3 : hypothesis model construction

From a large number of docking results and we established such a model of SIGIRR inhibiting the TLR4 signaling pathways.

50

Introduction to BioinformaticsEnglish Courses for Graduate Students

Step 4 : Conclusion

In summary, we propose a residue-detailed structural framework of SIGIRR inhibiting the TLR4 and 7 signaling pathways. These results were obtained by computer modeling and are expected to facilitate efforts to design further site-directed mutagenesis experiments to clarity the regulatory role of SIGIRR in inflammatory and innate immune responses.

Inhibition of the Toll-like receptors TLR4 and 7 signaling pathways by SIGIRR: a computational approach

J. Struct. Biol., 2010, 169:323-330

IF: 4.06, SCI citation times: 5

Jing Gong, Tiandi Wei, Robert W. Stark, Ferdinand Jamitzky, Wolfgang M. Heckl, Hans-Joachim Anders,

Maciej Lech and Shaila C. Röessle.

51

Introduction to BioinformaticsEnglish Courses for Graduate Students

Case 2

Homology modeling of Toll-like receptor ectodomains

52

TLR sequencesSo far, there are about 3000 protein sequences of different TLRs from different species saved in primary protein databases. The number will continue growing.

… …

Introduction to BioinformaticsEnglish Courses for Graduate Students

53

Leucine-rich repeat (LRR)

Ectodomain(ECD)

Transmembranedomain

TIR domain

Introduction to BioinformaticsEnglish Courses for Graduate Students

Background : Structure of Toll-like receptors (TLRs)

TLRs belong to the Toll-like receptor/ interleukin-1 receptor (TLR/IL-1R) superfamily, which is defined by the presence of a conserved cytoplasmicToll/interleukin-1 receptor (TIR) domainconnected to an ectodomain through asingle transmembrane stretch. Their ectodomains consist of 16–28 leucine-rich repeats (LRRs).

54

ECD ofhuman TLR3

23 LRRs+

2 N/CT LRRs

22 LRR + 1 CT

22 LRR6 LRR + 2 N/CT

6 LRR + 1 CT 17 LRR

+ 2 N/CT

LRR identification

Introduction to BioinformaticsEnglish Courses for Graduate Students

LRR identification

55

LxxLxLxxNxLxxLxxxxFxxLxx

PTNITVLNLTHNQLRRLPAANFTR

PTNITVLNLTHNQLRRLPAANFTR

NITVLNLTHNQLRRLPAANFTRY

PTNITVLNLTHNQLRRLPAA

NITVLNLTHNQLRRLPAANFTRY

Introduction to BioinformaticsEnglish Courses for Graduate Students

LRR identification

56

Structural Motifs (3 Levels)Domains of each TLR

Signal Peptide (SP)Ectodomain (ECD)Transmembrane Domain (TD)TIR Domain

LRRs of each ECD

Segments of each LRRHighly Conserved Segment (HCS)Variable Segment (VS)Inserted Segment (IS)

2734 sequences, 2011/08/01

Introduction to BioinformaticsEnglish Courses for Graduate Students

TollML database

57

Introduction to BioinformaticsEnglish Courses for Graduate Students

Construction pipeline

58

Domain

s

LRRs

Segments

Introduction to BioinformaticsEnglish Courses for Graduate Students

59

Position

Am

ino

acid

s

Introduction to BioinformaticsEnglish Courses for Graduate Students

LRR Finder

main algorithm :a position-specific weight matrix of LRR motifs

YesYes%

Example: … LPTNLTVLMLLHNQLRRLPAANFTRYSQLTSLDVGFNT …3.800 1.054

cutoffcutoff

NoNo2.232

60

Sens

itivi

ty/ S

peci

ficity

Cutoff score

Cutoff 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5Sensitivity 0.942 0.933 0.924 0.916 0.907 0.886 0.868 0.858 0.842 0.822 0.805

Specificity 0.852 0.882 0.902 0.916 0.935 0.954 0.970 0.981 0.988 0.992 0.994

Spe. (filter) 0.914 0.930 0.953 0.959 0.972 0.981 0.987 0.991 0.994 0.996 0.997

Introduction to BioinformaticsEnglish Courses for Graduate Students

3.800 1.0542.232Yes No No

Example: … LPTNLTVLMLLHNQLRRLPAANFTRYSQLTSLDVGFNT …

filterfilter

61

Introduction to BioinformaticsEnglish Courses for Graduate Students

This database is freely available at http://tollml.lrz.de. Any internet user can search and download data from the database, but only registered users can define and save labels for arbitrary entries.

TollML: a database of toll-like receptor strutural motifs

J. Mol. Model., 2010, 16(7):1283-1289

IF: 2.34, SCI citation times: 3

Jing Gong, Tiandi Wei, Ning Zhang, Ferdinand Jamitzky, Wolfgang M. Heckl, Shaila C. Rössle and Robert W. Stark

62

2010/11

Introduction to BioinformaticsEnglish Courses for Graduate Students

63

Construction pipeline

Introduction to BioinformaticsEnglish Courses for Graduate Students

64

Introduction to BioinformaticsEnglish Courses for Graduate Students

65

Introduction to BioinformaticsEnglish Courses for Graduate Students

Every LRR structure can be viewed with an online molecular viewer – Jmol.

66

Introduction to BioinformaticsEnglish Courses for Graduate Students

To simplify the homology modeling, the similarity search was implemented. It returns the structures of the most similar LRRs for a structure unknown LRR. At first, a global pairwisesequence alignment with sequence identity will be generated for the target LRR and each of the LRRs in the user selected set. Then, the most similar LRRs will be returned as template candidates, ranked by sequence identity.

67

Introduction to BioinformaticsEnglish Courses for Graduate Students

LRRML contains individual three-dimensional LRR structures with manual structural annotations. It presents useful sources for homology modeling and structural analysis of LRR proteins. This database is freely available at http://tollml.lrz.de.

LRRML: a conformational database and an XML description of leucine-rich repeats (LRRs)

BMC Struct. Biol., 2008, 8:47

IF: 3.06, SCI citation times: 3

Tiandi Wei, Jing Gong*, Ferdinand Jamitzky, Wolfgang M. Heckl, Robert W. Stark and Shaila C. Rössle

*corresponding author

68

Introduction to BioinformaticsEnglish Courses for Graduate Students

In mammalian, 13 TLRs have been identified. Protein sequences are available for a number of mammalian species. Using these sequences, a complete molecular phylogeneticanalysis and a phylogenetic tree of the known TLRs were reported. According to this tree, mammalian TLRs can be divided into six subfamilies. TLR1, 2, 6 and 10 belong to the TLR1 subfamily. TLR3 constitutes the TLR3 subfamily. TLR4 constitutes the TLR4 subfamily and TLR5 constitutes the TLR5 subfamily. TLR7, 8 and 9 compose the TLR7 subfamily. TLR11, 12 and 13 belong to the TLR11 subfamily.

69

Since 2000 the crystal structure of human TLR3 ECD was firstly reported, four crystal structures of receptor-ligand complexes have been determined.

They are :human TLR2-1 heterodimer, mouse TLR3 homodimer, human TLR4 homodimer, mouse TLR2-6 heterodimer.

Introduction to BioinformaticsEnglish Courses for Graduate Students

70

TLR sequences

~3000 known TLR sequences

… …

Introduction to BioinformaticsEnglish Courses for Graduate Students

Compared with the small number of crystal structures, there are about 3000 known protein sequences of different TLRs from different species. Because the X-ray crystallography remains time-consuming and sometimes it is very difficult to crystallize proteins, computational methods can perform fast and large-scale structural predictions based on the sequences. Currently, the most accurate protein structure prediction method is homology modeling.

71

When applying the homology modeling on the TLR ectodomains, we encountered a problem. The sequence identity between the target and the full-length template(s), namely the aforementioned crystal structures, is much lower than 30% because of diverse numbers and arrangements of LRRs contained in the TLR ectodomains. This problem is also described by the phylogenetic tree. Thus we could not get a proper model.

To solve this problem we developed an LRR template assembly approach with the help of both TollML and LRRML databases.

Introduction to BioinformaticsEnglish Courses for Graduate Students

72

Introduction to BioinformaticsEnglish Courses for Graduate Students

Flowchart of the LRR template assembly approach

73

Introduction to BioinformaticsEnglish Courses for Graduate Students

Threading method Crystal structureFull-length templates LRR assembly TLR3 ECD

74

Superimposition of the model (blue) and crystal structure (orange) of TLR3 at the two ligand interaction regions. Global root mean square deviation: 1.96 Å and 1.90 Å.

Introduction to BioinformaticsEnglish Courses for Graduate Students

75

Zhang et al., 2009.

Introduction to BioinformaticsEnglish Courses for Graduate Students

If the root mean square deviation between a model and a structure is < 3 Å, the model is very good and can be used to perform ligand-docking and molecular replacement.

76

Introduction to BioinformaticsEnglish Courses for Graduate Students

Average target-template sequence identity >= 45%

77

Introduction to BioinformaticsEnglish Courses for Graduate Students

Superimposition of the model (green) and crystal structure (orange) of TLR6. Global root mean square deviation: 1.94 Å; ligand-binding region: 1.18 Å.

78

Introduction to BioinformaticsEnglish Courses for Graduate Students

These models can be used to perform ligand-docking studies or to design mutagenesis experiments to investigate TLR ligand-binding mechanisms, and thus help to develop new TLR agonists and antagonists that have therapeutic significance for infectious diseases.

A leucine-rich repeat assembly approach for homology modeling of human TLR5-10 and mouse TLR11-13 ectodomains.

J. Mol. Model ., 2011, 17(1):27-36

IF: 2.34, SCI citation times: 3

Tiandi Wei, Jing Gong*, Ferdinand Jamitzky, Wolfgang M. Heckl, Shaila C. Rössle and Robert W. Stark

*corresponding author

79

Exam

Thesis

Introduction to BioinformaticsEnglish Courses for Graduate Students

80

Introduction to BioinformaticsEnglish Courses for Graduate Students

Exam Thesis

Topic : What can bioinformatics do for you?

Language : English

Word count : 1000 - 2000

Deadline : 2011/11/30

Submit to : gongjing@sdu.edu.cn

81

Introduction to BioinformaticsEnglish Courses for Graduate Students

Format : 1. The following word processor file formats are acceptable for the thesis:

Microsoft Word (.doc) Rich text format (RTF) Portable document format (PDF)

2. You should choose a legible font and use double line-spacing. Your font should be no smaller than 11 pt font and no bigger than 12 pt font with standard margins.3. Use hard returns only to end headings and paragraphs, not to rearrange lines.4. All references must be numbered consecutively, in square brackets, in the order in which they are cited in the text, followed by any in tables or legends. 5. All pages should be numbered. 6. Greek and other special characters may be included. If you are unable to reproduce a particular special character, please type out the name of the symbol in full.

82

Introduction to BioinformaticsEnglish Courses for Graduate Students

Thank you very much for your attention!

Recommended