28
1 Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University of Bucharest Spl. Independentei 313, 060042 Bucharest, Romania, Phone: +40-21- 4114437, +40-745-117062 e-mail: [email protected] Morphological and Functional Statistical Features of DNA Molecules Morphological and Functional Statistical Features of DNA Molecules Simpozionul Naţional de Electrotehnică Teoretică 22-23 OCTOMBRIE 2004 Bucureşti, UPB Ediţie omagială dedicată împlinirii a 100 de ani de la naşterea marelui savant român Remus Răduleţ 2 1. Introduction - DNA Structure 2. Nucleotide Representation 3. Phase Analysis of Genomic Signals 4. Reorienting DNA Segments 5. HIV Genomic Signal Analysis 6. Representability of Data 7. Conclusions PAPER OUTLINE PAPER OUTLINE PAPER OUTLINE

Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

1

1

Paul Dan CRISTEABio-Medical Engineering Center

“Politehnica” University of BucharestSpl. Independentei 313, 060042 Bucharest, Romania,

Phone: +40-21- 4114437, +40-745-117062e-mail: [email protected]

Morphological and Functional Statistical Features of DNA Molecules

Morphological and Functional Statistical Features of DNA Molecules

Simpozionul Naţional deElectrotehnică Teoretică

22-23 OCTOMBRIE 2004Bucureşti, UPB

Ediţie omagială dedicată împlinirii a 100 de ani de la naşterea marelui savant român Remus Răduleţ

2

1. Introduction - DNA Structure

2. Nucleotide Representation

3. Phase Analysis of Genomic Signals

4. Reorienting DNA Segments

5. HIV Genomic Signal Analysis

6. Representability of Data

7. Conclusions

PAPER OUTLINEPAPER OUTLINEPAPER OUTLINE

Page 2: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

2

3

5' end

3' end

OP

OO- O

OCH2

OP

OO- O

OCH2

OP

OO- O

OCH2

OP

OO- O

OCH2

5' end

3' end

O

OO

CH2

O-

PO

O

OO

CH2

O-

PO

O

OO

CH2

O-

PO

O

OO

CH2

O-

PO

NN

O

H NH H HO

H

H

N

NN

N

NH

N N

O

OH

H

CH3H

NHH

AT

GC

H

H N

H

NN

N

N H

NN

O

O H

H

CH3

AT

N

NN

N

N N

O

NHH O

H

H

N

NN

N

NH

G C

H

HH

5' to

3' d

irect

ion

5' to

3' d

irect

ion

1. DNA Structure1. DNA Structure1. DNA Structure

43' end

Triphosphate

O O

O

O

5' end

2'3'4'

5'

Deoxyribose

1' Nitrogenous BaseOCH2

PO

HO O

PHO

PHO

Phosphoanhydride bonds

Phosphoester bonds

O

Nucleosine TriphosphateNucleNucleosine osine TTriphosphateriphosphate

Page 3: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

3

5

Adenosine triphosphate (ATP)Adenine attached to

Deoxyribose and three

Phosphate groups

Adenosine diphosphate(ADP)

Adenosine monophosphate(AMP)

Krebs cycle

Pyrophosphate

Krebs cycle

hydrolysis

ATP ADP AMPATP ADP AMPATP ADP AMP

6

2. NUCLEOTIDE REPRESENTATION2. NUCLEOTIDE REPRESENTATION

C (cytosine) . T (thymine)

A (adenine)G (guanine)

R (purines)

S (strong link) W (weak link)

M (amino)

K (keto)

Y (pyrimidines)

Page 4: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

4

7

Vector mapping of UIPAC symbolsVector mapping of UIPAC symbols

8

.

,

,

kjit

kjig

kjic

kjia

rrrr

rrrr

rrrr

rrrr

−−=

+−−=

−+−=

++=

ktcy

kgar

jtgk

jcam

igcs

itaw

rrrr

rrrr

rrrr

rrrr

rrrr

rrr

r

−=+

=

=+

=

−=+

=

=+

=

−=+

=

=+

=

2

2

2

2

2

2

33

33

33

33

tgcau

gcath

catgd

atgcb

rrrrr

rrrrr

rrrrr

rrrrr

−=++

=

−=++

=

−=++

=

−=++

=

Vector representation of FASTAVector representation of FASTA

Page 5: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

5

9

Quadrantal complex mapping of FASTA symbols

Quadrantal complex mapping of Quadrantal complex mapping of FASTA symbolsFASTA symbols

10

A = 1 + jT = 1 – jC = – 1 – jG = – 1 + jW = 1Y = – jS = – 1 R = jK = M = N = 0

( )j131

−−=B

( )j131

+−=V

( )j131

+=D

( )j131

−=H

Complex representation of FASTAComplex representation of FASTA

Page 6: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

6

11

Adenine

Cytosine

-

x = Weak - Strong

-

y = Amino - Keto

z = Purine - Pyrimidines

2g

Thymine

2t2c

Guanine

4t

c

Pyrimidines

Weak Bonds

Strong B

onds

Amin

oKeto4g

g

2a

4a

t

4c

a

Purines

AAA

AAC

AATAAG

Codon Codon tetrahedrontetrahedronrepresentationrepresentation

00

11

22 222 bbbx

rrrr++=

{ } 2,1,0;,,, =∈ itgcabi

rrrrr

12

Codon complex representationCodon complex representationCodon complex representation

Im = R -Y

-S trong bondsWeak bonds

Purines-Pyrimidines

4-4

-4j

4j

AAA

AAT

AAG

AAC

Lysine

Asparagine

-S trong bonds

Re = W - S

Adenine

Cytosine Thymine

Guanine

00

11

22 222 bbbx ++=

{ } 2,1,0;,,, =∈ itgcabi

Page 7: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

7

13

AAA

ACA

AAC

ACCACG AGC AGG

AGT

ATG

ATT

ATAAAT

ATC

AAG

ACT

Adenine

Cytosine

G

T

TAA

TCA

TAC

TCC

TCG

TGCTGG

TGTTTG

TTT

TTA

TAT

TGATTC

TAG

TCT

CAA

CCA

CAC

CCC CCG CGC CGG

CGTCTG

CTT

CTA

CAT

CGA

CTC

CAG

CCT

A

C

Guanine

Thymine

GAA

GCA

GAC

GCC GCG GGC GGG

GGTGTG

GTT

GTAGAT

GGAGTC

GAG

GCT

Ala – Alanine

Arg – Arginine

Asn – Asparagine

Asp – Aspartic acid

Cys – Cysteine

Gln – Glutamine

Glu – Glutamic Acid

Gly - Glycine

Ile – Isoleucine

His – Histidine

Leu – Leucine

Lys – Lysine

Met – Methionine

Phe – Phenylalanine

Pro – Proline

Ser – Serine

Thr – Thereonine

Trp – Tryptophan

Tyr – Tyrosine

Val – Valine

Ter - Terminator

Thr

Leu Val

AlaPro Arg

Cys

His

Gln

Met

Ile

TyrTer

Ter

Phe

Leu

Gly

Asp

Glu

Ser

Arg

Ser

Asn

Lys

Ter

Trp

AGA

Genetic Code

14

Genetic Code TableGenetic Code TableGenetic Code TableAG

C T

Im = R-Y

Re = W-S

Phe

Ala

ArgCys

Gln

Glu

Ile

His

LeuLeu

Lys

Met

Pro Ser

Thr

Tyr

ArgGly

Val

Ter

SerAsp Asn

Trp

Page 8: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

8

15

3. PHASE ANALYSIS OF GENOMIC 3. PHASE ANALYSIS OF GENOMIC SIGNALSSIGNALS

The phase of a complex number is a periodic magnitude: the complex number does not change when adding or subtracting any multiple of 2π to or from its phase. To remove the ambiguity, the standard mathematical convention restricts the phase of a complex number to the domain (- π, π ] that covers only once all the possible orientations of the associated vector in the complex plane. For the genomic signals obtained by using the

nucleotide complex mapping in the figure, the phases of the nucleotide representations have the values radians.

−−

43,

4,

4,

43 ππππ

16

The cumulated phase is the sum of the phases of the complex numbers in a sequence from the first element in the sequence, up to the current element:

where nA, nC, nG and nT are the numbers of adenine, cytosine, guanine and thymine nucleotides in the sequence from the first to the current location. The slope sc of the cumulated phase along the DNA strand at a

certain location is:

where fA, fC, fG and fT are the nucleotide occurrence frequencies.

( ) ( )[ ],34 TACGc nnnn −+−=πθ

( ) ( )[ ]TACGc ffffs −+−= 34π

Page 9: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

9

17

The unwrapped phase is the corrected phase of the elements in a sequence such that the absolute value of the difference between the phase of each element in the sequence and the phase of its preceding element is kept smaller than π.The unwrapped for sequence of nucleotides is:

where n+ is the number of the positive transitions A→G, G→C, C→T, T→A,n- is the number of the negative transitions A→T, T→C, C→G, G→A.

The exact neutral transitions A↔A, C↔C, G↔G, T↔T and the “on the average” neutral transitions A→C, C→A, G→T, T→G do not contribute to the unwrapped phase.The slope su of the variation of the unwrapped phase along a DNA strand is given

by the relation:

where f+ and f- are the frequencies of the positive and negative transitions. A statistically constant slope of the unwrapped phase corresponds to a

statistically helicoidal wrapping of the complex representations of the nucleotides along the DNA strand. The step of the helix, i.e., the spatial period over which the helix completes a turn, is:

( ),2 −+ −= nnuπθ

( ),2 −+ −= ffsuπ

usL π2=

18

su = 0.0667 rad/bp (min: 0.047 rad/bp = 2.7 degree/bp, max: 0.120 rad/bp = 6.9 degree/bp)Cumulated and Unwrapped Phase along the concatenated contigs of homo sapiens chr. 11

Page 10: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

10

19

Homo sapiensHomo sapienschr 1, all contigs (phase 3, length 228507674 bp)chr 1, all contigs (phase 3, length 228507674 bp)

20

Phase Analysis of the Genome of Phase Analysis of the Genome of EscherichiaEscherichia colicoli K12K12

Origin of Origin of replication:replication:3,923,500 bp3,923,500 bpCumulated phase Cumulated phase minimum: minimum: 3,923,225 bp3,923,225 bp

Terminus of Terminus of replication:replication:1,588,8001,588,800 bpbpCumulated phase Cumulated phase maximum: maximum: 1,550,413 bp1,550,413 bp

Page 11: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

11

21

Cumulated phase and unwrapped phase along the chromosome of BS (NC-000964)su = - 0.057 rad/bp, L = 110 bp,sc+ = 0.106 rad/bp, l+ ≈ 1.945 Mbp, sc- = - 0.0965 rad/bp, l- ≈ 2.270 Mbp,

bp/%75.6=∆ +RhKfbp/%14.6−=∆ −RhKf

Phase Analysis of the Genome of Phase Analysis of the Genome of BacillusBacillus SubtilisSubtilis

22

Phase Analysis of the Genome of Phase Analysis of the Genome of YersiniaYersinia PPestiestiss

Page 12: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

12

23

4. REORIENTING DNA SEGMENTS4. REORIENTING DNA SEGMENTS4. REORIENTING DNA SEGMENTS

Initial state

24

Hypothetic segment reversal

REORIENTING DNA SEGMENTSREORIENTING DNA SEGMENTSREORIENTING DNA SEGMENTS

Page 13: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

13

25

Joint direction reversal Joint direction reversal and strand switchingand strand switching

REORIENTING DNA SEGMENTSREORIENTING DNA SEGMENTSREORIENTING DNA SEGMENTS

26

Effect of segment reversal and strand switching on positive and negative nucleotide-to-nucleotide

transitions

Page 14: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

14

27

Phase analysis of the 4288 concatenated rePhase analysis of the 4288 concatenated re--orientedorientedcoding regions of the complete genomecoding regions of the complete genome of E. coliof E. coli

28

Phase analysis of the concatenated re-oriented coding regions of the complete genom of Yersinia Pestis

Page 15: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

15

29

Phase analysis of complete genome and of the concatenated re-oriented 1839 coding regions of

Aeropyrum pernix K

30

Agrobacterium tumefaciens strain C58 CereonAgrobacterium tumefaciens strain C58 CereonCircular chromosome (accession number AE007869)

Complete Sequence2,841, 581 bp

2722 Re-oriented Coding Regions2,539,788 bp

Page 16: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

16

31

Agrobacterium tumefaciens strain C58Agrobacterium tumefaciens strain C58 CereonCereonLinear chromosome (accession number AE007870)

Complete Sequence2,074,782 bp

1834 Re-oriented Coding Regions1,897,575 bp

32

Agrobacterium tumefaciens strain C58Agrobacterium tumefaciens strain C58Plasmid AT (accession number AE007872)

Complete Sequence542,869 bp

547 Re-oriented Coding Regions465,906 bp

Page 17: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

17

33

Agrobacterium tumefaciens strain C58 CereonAgrobacterium tumefaciens strain C58 CereonPlasmid Ti (accession number AE007871)

Complete Sequence214,233 bp

198 Re-oriented Coding Regions188,667 bp

34

5. REPRESENTABILITY OF DATA5. REPRESENTABILITY OF DATA5. REPRESENTABILITY OF DATAWhen can data be represented properly by a line?

PyVy

Px

)( xy

y PfPV

=

Sx - samples(Nx- pixels)

Px - pixel widthin number of samples

Sy - data spanon a screen

(Ny- pixels)

Py - pixel height indata units

Page 18: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

18

35

Pixel size for wellPixel size for well--fitted screensfitted screens

- Horizontal screen size in data samples = Sequence length

- Horizontal pixel size in data samples (Pixel width)

- Vertical screen size in data units= Signal span in the screen

- Vertical pixel size in data units (Pixel height)

LSx =

x

xx N

SP =

)][(min)][(max isisSSS IiIi

y∈∈

−=

},,1{ xS SI K=

y

yy N

SP =

36

Signal span for a pixel

- Data scattering ratio for pixel h, h = 1…Nx

- Average data scattering ratio for pixel

y

y

PhV

hQ)(

)( =

y

y

PV

Q~

~=

Data scattering ratioData scattering ratio

( ))(mean~,,1

hVV yNhyxK=

= - Average vertical signal span on a pixel

)][(min)][(max)( isishVPh

Ph IiIi

y∈∈

−=

xxxPh NhPhPhI ,,1};,,1)1{( KK =+−=

Page 19: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

19

37

Representability DiagramAverage data scattering vs Pixel width

,1kK=

kmax=

)(~

~x

y

y PfPV

Q ==

1)1( =xP

1)( 2 −= kkxP

Initial pixel width

Step k of doubling the pixel size

xx NS =)1(Initial screen size in data samples

Screen size at step k1)( 2 −= k

xk

x NS

max,,1 kk K=

=

xNLk 2max log LS k

x =max)(

mL 2=s

xN 2=

38

Monotonous SignalsMonotonous SignalsRepresentability Best CaseRepresentability Best Case

y

kyk

y NS

P)(

)(~

~ =

Size of wellSize of well--fitted screensfitted screens

]1)1[(][)( )()()( +−−= kx

kx

ky SjsSjsjS

Average size of Pixels

]1)1[(][)( )()()( +−−= kx

kx

ky PhsPhshV

Signal span per pixel

Page 20: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

20

39

Monotonous SignalsMonotonous SignalsRepresentability DiagramRepresentability Diagram

( )( ))(

)(

mean)1(]1[][

mean)1(]1[][~)(

)()(

kx

kx

Sk

S

Pk

P

x

yk

dNsLs

dNsLs

NN

Q↓

−−−

−−−=

Average data scattering ratio

where is the total number of pixels to represent the input sequence s[i], i = 1, ..., L, for an horizontal pixel size ;

is the total number of screens necessary to represent the data at resolution k, and

is the average of the absolute values of the signal variation between successive samples d[i] = s[i+1] - s[i], down-sampled with step D.

)(kPN

1)( 2 −= kkxP

xk

Pk

S NNN )()( =

( )D

d↓

mean

40

So that the representability characteristic results approximatively the same for any monotonous signal:

xNk

x

kx

x

yk

y

kyk

PP

NN

PV

Q1)(

)(

)(

)()( 1

~~

~−−

==

Asymptotic valuex

yk

NN

Qk

→>>− 12

)(

1

If the digital signal can be considered as resulting from the sampling of a continuous and differentiable function, and if both and are large enough, than:)(k

PN )(kSN

( ) ( )1-

]1[][meanmean )()( LsLsdd k

xk

x PS

−≈≈

↓↓

Page 21: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

21

41

Approximatively the same for any monotonous signal:

xNk

x

kx

x

yk

y

kyk

PP

NN

PV

Q1)(

)(

)(

)()( 1

~~

~−−

==

Asymptotic value

x

yk

NN

Qk

→>>− 12

)(

1

Representability diagram for monotonous signalsRepresentability diagram for monotonous signals

42

QQ = = ff ((PPxx)) for monotonous signalsfor monotonous signals

Page 22: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

22

43

3D representability diagram3D representability diagram for monotonous signalsfor monotonous signals

44

Uniformly distributed random signalsUniformly distributed random signalsRepresentability Practical Worst CaseRepresentability Practical Worst Case

11

~~

~)(

)()(

+−

== kx

kx

yky

kyk

PPN

PV

Q

yk NQ k →

>>− 12)(

1 Asymptotic value

Page 23: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

23

45

The expectation of ordered uniformly distributed random variables

Let Xh , h = 1, ..., n, be n statistically independent random variables uniformly distributed on the interval [0, 1] and {xh : h = 1, ..., n}, the values of an instantiation.

Find the expectation of the random variable X - which has the value x equal to the kth element in {xh}, ordered starting with the smallest in magnitude.

The cummulated distribution function of x, defined as the probability that ξ - the kth of the ordered random variables, be equal or less than x, can be computed as the sum of probabilities of the disjoint cases :

where h variables are smaller or equal to x, i.e., have values in the interval [0, x], while the other n-h variables exceed x, i.e., have values in the interval (x, 1], with h being at least k.

[ ] hnhn

khxx

hn

xPxP −

=

=≤= ∑ )1()( ξ

46

The expectation of X results from the chain of relations:

,

where B is Euler’s beta function.

The average of the difference between the maximum and the minimum in the set of n random variables is given by:

1111)1,1(1

)(1)()(mean1

0

1

0

+=

++−

−=+−+

−=

−===

∫∫

= nk

nknhnhB

hn

dxxPdxdx

xdPxxx

n

kh

.11)(mean minmaxminmax +

−=−=−

nnxxxx

Page 24: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

24

47

Representability diagram for a uniformly Representability diagram for a uniformly distributed random signaldistributed random signal

48

3D representability diagram3D representability diagram for for a uniformly a uniformly distributed randomdistributed random signalsignal

Page 25: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

25

49

Representability of sinus signalsRepresentability of sinus signals

50

Representability of Genomic Phase Signals Representability of Genomic Phase Signals

Contig NT 004424 of Homo sapiens chromosome 1

Page 26: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

26

51

3D Representability diagram of the unwrapped phase of the Homo sapiens chr. 1 contig NT 004424

52

Representability diagram for the cumulated and unwrapped phase of hs chr 21, contig NT 011515

Page 27: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

27

53

Representability diagram of the unwrapped phase for the first 1048576 bp from the circular chromosome of

Agrobacterium tumefaciens (AE008688)

54

3D Representability diagram of the unwrapped phase for the first 1048576 bp from the circular chromosome of

Agrobacterium tumefaciens (AE008688)

Page 28: Statistical Features of DNA SNET shortsnet.elth.pub.ro/snet2004/Cd/circ/circ_O6.pdf · 2004. 11. 18. · Paul Dan CRISTEA Bio-Medical Engineering Center “Politehnica” University

28

55

6. CONCLUSIONS6. CONCLUSIONS6. CONCLUSIONS

• DNA sequences can be converted in genomic signals by using a nucleotide complex representation derived from the nucleotide tetrahedral representation.

• There are large scale second order statistical regularities that generalize Chargaff’s law: The difference between the frequency of occurrence of positive nucleotide-to-nucleotide transitions (A→G, G→C, C→T, T→A) and that of negative transitions (the opposite ones) along a strand of DNA tends to be small, constant and taxon & chromosome specific.

• The reorientation of DNA segments involves the simultaneous The reorientation of DNA segments involves the simultaneous reversal of the order and the complementing of the nucleotides reversal of the order and the complementing of the nucleotides (A with T and C with G) in the inverse coding regions.(A with T and C with G) in the inverse coding regions.

56

• The regularity shown by the nucleotide sequences obtained after The regularity shown by the nucleotide sequences obtained after concatenating the reoriented coding regions suggests the existenconcatenating the reoriented coding regions suggests the existence of a ce of a putative primary ancestral genomic material having a quite unifoputative primary ancestral genomic material having a quite uniform rm large scale statistical structure. This feature is also observedlarge scale statistical structure. This feature is also observed only in the only in the chromosomes, and is not found in the plasmids.chromosomes, and is not found in the plasmids.• Combining DNA segments of opposite orientation to generate certain slopes of the phase, i.e., certain densities of the first and second order repartition of nucleotides, has an important functional role at the level of the chromosome, must probably linked to the crossing-over / recombination process, the identification of interacting regions of chromosomes, and the separation of the species. • The cumulated phase and unwrapped phase can be represented adequately as simple graphic lines for very low and large scales, while for medium scales (thousands to tens of thousands of base pairs)statistical descriptions have to be used.