Organizational Heterogeneity of Human Genome

Organizational Heterogeneity of Human Genome:

Significant variation of recombination rate of 100 kbp sequences within GC ranges

Svetlana FrenkelValery KirzhnerAbraham Korol

Department of Evolutionary and Environmental BiologyInstitute of Evolution

University of Haifa

Some aspects of intra-genome heterogeneity

Varying gene density Clusters of tissue-specific and

housekeeping genes Linkage disequilibrium (LD) blocks Mutation and recombination rates Conserved and Ultraconserved segments Localization of inversions, deletions,

insertions and duplications

Genome Heterogeneity: GC content

From: Costantini, M., Clay, O., Auletta, F., Bernardi, G. (2006) An isochore map of human chromosomes. Genome Res., 16, 536-541.

From: UHN Microarray Centre's CpG Island Database http://data.microarrays.ca/cpg/index.htm

The level of redness denotes the relative number of CpG islands that can be located on the chromosome in that region

Genome Signature Samuel Karlin, et al, 1997

Local: • preliminary searches of candidates for gene

alignment• detecting candidate regulatory signals• detecting promoter regions• detecting repetitive elements • duplications of genomic • horizontal gene transfer

Genome-wide: • phylogenetic analysis

• species recognition• whole-genome sequence comparisons

Linguistic-like methods

Detecting all of “words” with certain maximal lengthCharacterizing the sequence “vocabulary”

Scoring the occurrences of fixed-

length “words” from a predefined

“vocabulary”Comparison of “word” frequencies obtained

from different sequencesComparison the

“vocabularies” of different sequences

Compositional Spectra Analysis

Compositional Spectra

A linguistic-like method of genome analysis based on occurrences of “words” in the A,C,G,T alphabet Compositional spectrum (CS) is measured as a histogram of imperfect word occurrences

From: V. Kirzhner et al., 2002-20056

Methods: calculating of distances

d’1 d’2

F(Si, W)

F(S’i, W)

F(Sj, W)

F(S’ j, W)

Manhattan (city block) distanceSpearman Rank Correlation ρ (d= 1-ρ)Kendall distance τ

d = min(di, d’i, dj, d’j)F(Si, W’)

F(Sj, W’)

Methods: Detection of Organizational Pattern groups of segments

Genome segment number

Low HighClustering tree

Relative distance between two clusters

Maximal distance between segments

Neighbor-Joining Clustering

“adaptive cutoff”

Analysis of Organizational Pattern groups of segments

Significant variation of evolutionary features of 100 kbp sequences within GC ranges

Testing for potential association between genome-wide distribution of organizational patterns and various evolutionary and structural features reveals the existence of inter-OP heterogeneity in such features as SNP and Indels frequency, recombination rate, number of segmental duplications, size of linkage disequilibrium blocks, and proportion of evolutionary conserved sequence.

Estimation of heterogeneity between OP groups

0.22 8.8×10-5 8.8×10-5 8.8×10-5 8.8×10-5 8.8×10-5 8.8×10-5 8.8×10-5 8.8×10-5 0.03 1.9×10-3 0.01 0.11 3.9×10-3

Kruskal–Wallis non-parametric rank test10,000 segments reshuffles to estimate test critical value FDR correction for multiple comparisons

Reshuffled sequences within every segment as control

2.3 5.1 86.1 48.6 81.9 35.7 21.0 26.0 46.7 36.6 13.6 15.7 15.5 16.9

Detecting the words related to recombination rate

GC% ,Average RR in the compared

OPGsProportion of correct classifications of segments to OP

groups% ,

low RR high RR all words set of 47 words set of 8 words

35 0.82 0.93 98.60 98.62 76.0336 0.62 1.16 98.40 96.56 82.3437 0.83 1.28 94.10 93.88 80.4738 0.80 1.46 99.58 99.17 98.33

39 0.91 1.59 97.32 97.32 96.55

40 0.96 1.50 100.0 100.0 100.0

41 1.13 1.81 98.80 98.50 98.50

42 1.05 1.80 100.0 100.0 99.62

43 1.29 1.99 97.48 96.98 95.46

44 1.44 1.83 99.01 99.21 98.81

45 1.35 2.06 100 98.93 98.22

46 1.30 1.88 98.53 98.53 97.3547 1.15 1.74 94.62 94.61 91.4848 1.33 2.04 98.78 98.77 97.55

Oligonucleotides, which showed high importance in more than half of OPG comparisons in classification of 100kbp segments for high and low recombination rate

Oligonucleotide GC, %Appeared in the list of 10 most important variables

(times)

Appearedas the most important variable

(times)Previously described

pattern Reference

CAGCCAGGTT 60 11 4 -CCNCCNTNNCCNC--CAGCCAGGTT---- Myers et al. 2008

GACCGGACTG 70 10 1

---CCTCCCT---GACCGGACTG- Myers et al. 2005

-CCNCCNTNNCCNC----GACCGGACTG-- Myers et al. 2008

CGCCGGGACT 80 10 3 -CCNCCNTNNCCNC---CGCCGGGACT--- Myers et al. 2008

GCGTAGGCTA 60 9 0 -CCNCCNTNNCCNC----GCGTAGGCTA-- Myers et al. 2008

TGGGCCCGGC 90 8 4 n/a

GGCGTGCGCG 90 8 1

-GGNGGNAGGGG--GGCGTGCGCG-- Zheng et al. 2010

-CCNCCNTNNCCNC----GGCGTGCGCG-- Myers et al. 2008

CCCGGTATCG 70 8 0-CCNCCNTNNCCNC---CCCGGTATCG--- Myers et al. 2008

GCCCTTTCCT 60 7 0

---CCTCCCT---GCCCTTTCCT- Myers et al. 2005

-CCNCCNTNNCCNC----GCCCTTTCCT-- Myers et al. 2008

-CCTCCCTNNCCAC----GCCCTTTCCT-- Myers et al. 2008

Functionally related genes tend to reside in organizationally similar genomic regions

Genes provided the GO enrichment of four organizational pattern clusters, which showed the most significant GO enrichments.

L2-a cluster is enriched by “mitochondrion”, “intracellular non-membrane-bounded organelle”, “nuclear envelope” and “ribonucleoprotein complex” GO terms;L2-h cluster is enriched by “G-protein-coupled receptor protein signaling pathway” and “sensory perception of smell” GO terms;H1-i cluster is enriched by “epithelial cell differentiation” and “epithelium development” GO terms;H2-a cluster is enriched by “skeletal system development” GO term.

Paz A, Frenkel S, Snir S, Kirzhner V, Korol A. 2014. BMC Genomics 15:252. 15

Thank you for your attention

Acknowledgments

Dr. Valery Kirzhner Prof. Abraham Korol Prof. Edward Trifonov Dr. Arnon Paz and Dr. Zeev Frenkel

This work was supported byThe Israeli Ministry of Immigrant AbsorptionThe Israel Council for Higher Education

Calculating compositional spectra

…AGTAGTTACACTACTATAGTGACGACTCCATCGTCGTCGAGAACGTACCTTCTATATCCAAGGTACTACACTCGCGACCG

3676CTACTATAGT

…CTACTATAGTCTACTAAAGTCTAGTAAAGTCTAGTAAAGTCTAGTAACGTCGCCTAAAGTCCACTAAGGT

256 × 3676 = 941056 86.7%Additional slide

Spearman's rank correlation coefficient rho Spearman's rank correlation coefficient is a non-

parametric measure of correlation ρ is given by:

where:• Di = xi − yi = the difference between the ranks of

corresponding values Xi and Yi, and • n = the number of values in each data set (same for

both sets).

Additional slide

The Kendall tau distance The Kendall tau distance is a metric that counts the number of

pairwise disagreements between two lists. The larger the distance, the more dissimilar the two lists are.

The Kendall tau distance between two lists τ1 and τ2 is

K(τ1,τ2) will be equal to 0 if the two lists are identical and n(n − 1) / 2 (where n is the list size) if one list is the reverse of the other. Often Kendall tau distance is normalized by dividing by n(n − 1) / 2 so a value of 1 indicates maximum disagreement. The normalized Kendall tau distance therefore lies in the interval [0,1].

Additional slide

Organizational Heterogeneity of Human Genome

Science

Hidden heritability due to heterogeneity across seven ... · Stockholm SE-171 77, Sweden 10Estonian Genome Center, University of Tartu, Tartu, Estonia, ... 0.402 ℎ!"#! 0.030260

REATMENT HETEROGENEITY

Author manuscripts have been peer reviewed and accepted ......Several studies on molecular heterogeneity in CRC have used genome-wide gene-expression data to classify patients into

Heterogeneity of Genome and Proteome Content in Bacteria ...pbil.univ-lyon1.fr/members/lobry/articles/gene_1999_238_65.kim/KarlinS2002.pdfintotheworkingsoflivingcellsbyilluminatingprotein

Establishing nanoscale heterogeneity with nanoscale force measurements · · 2016-10-19Establishing nanoscale heterogeneity with nanoscale force ... compositional heterogeneity,

INSTITUTIONAL CHANGE AND ORGANIZATIONAL DIVERSITY€¦ · organizational diversity”, INCAS discussion Paper Series 2016#04. Lechevalier Sébastien (2010), “Inter-firm heterogeneity:

Scalability and Heterogeneity · Scalability and Heterogeneity Colin Perkins

Heterogeneity Genome-Linked Protein Foot-and- Disease …jvi.asm.org/content/34/3/627.full.pdf · 0Pacheco (5 pg), were added as a marker, and the mixturewaselectrofocusedon4%polyacrylamidegels

Studying genome heterogeneity within the …Studying Genome Heterogeneity within the Arbuscular Mycorrhizal Fungal Cytoplasm Eva Boon1,z,Se´bastien Halary1, Eric Bapteste2,3,*,y,

Assessing Tumor Heterogeneity and Tracking Clonal ...Genothe me Institute at Washington University Assessing tumor heterogeneity and tracking clonal evolution using whole genome or

Genome-wide scan for genes involved in bipolar affective ... · enti c research documents, whether they are pub-lished or not. ... BPAD proband might reduce the heterogeneity of the

Detecting Heterogeneity in Population Structure Across the … · 2015. 11. 14. · GENETICS | INVESTIGATION Detecting Heterogeneity in Population Structure Across the Genome in Admixed

Meta-analysis: heterogeneity and publication biasmb55/msc/systrev/week7/hetpub-compact.pdf · Meta-analysis: heterogeneity and publication bias Martin Bland ... Heterogeneity Galbraith

Genomic Medicine: Regulatory Science Perspective - Genome.gov · Human Genome Genomics: Understanding Heterogeneity ... heterogeneity in disease and drug response Improve individualized

Implications of Wealth Heterogeneity For Macroeconomics€¦ · Carroll Wealth Heterogeneity. Motivation Heterogeneity Claim The Marginal Propensity to Consume Conclusion References

Geophysics of Chemical Heterogeneity€¦ · Mantle Heterogeneity Hofmann & White (1982) Allegre & Turcotte (1986) Stixrude & Lithgow-Bertelloni (2012) AREPS. Mantle Heterogeneity

Rat Haemoglobin Heterogeneity

2. HETEROGENEITY, PROBABILITY, AND RANDOM FIELDS 2.1 ...groundwater.ucdavis.edu/files/158699.pdf2.1 Introduction: Heterogeneity and Stochastic Analysis Spatial heterogeneity refers

Heterogeneity in the entire genome for three genotypes of ... · genotypes of peach [Prunus persica (L.) Batsch] as distinguished from sequence analysis of genomic variants Jonathan

Colloid Population Heterogeneity