42
Higher-order organization of interactions in human chromosomes: 1D sequence 3D structure topologically associated domain (TAD) via network community detection Sang Hoon Lee School of Physics, Korea Institute for Advanced Study http://newton.kias.re.kr/~lshlj82 @ Indianapolis, 20 June, 2017

Higher-order organization of interactions in human chromosomes: 1D sequence → 3D structure → topologically associated domain (TAD) via network community detection

Embed Size (px)

Citation preview

Page 1: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

Higher-order organization of interactions in human chromosomes:

1D sequence → 3D structure → topologically associated domain (TAD) via

network community detection

Sang Hoon Lee School of Physics, Korea Institute for Advanced Study

http://newton.kias.re.kr/~lshlj82

@ Indianapolis, 20 June, 2017

Page 2: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

(legend on next page)

S10 Cell 159, 1665–1680, December 18, 2014 ª2014 Elsevier Inc.

Page 3: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

(legend on next page)

S10 Cell 159, 1665–1680, December 18, 2014 ª2014 Elsevier Inc.

details unknown, in many aspects

Page 4: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

Hi-C: the interaction map of chromatin loci

A B

C

D

Figure 1. We Used In Situ Hi-C to Map over 15 Billion Chromatin Contacts across Nine Cell Types in Human and Mouse, Achieving 1 kbResolution in Human Lymphoblastoid Cells(A) During in situ Hi-C, DNA-DNA proximity ligation is performed in intact nuclei.

(B) Contact matrices from chromosome 14: the whole chromosome, at 500 kb resolution (top); 86–96 Mb/50 kb resolution (middle); 94–95 Mb/5 kb resolution

(bottom). Left: GM12878, primary experiment; Right: biological replicate. The 1D regions corresponding to a contact matrix are indicated in the diagrams above

and at left. The intensity of each pixel represents the normalized number of contacts between a pair of loci. Maximum intensity is indicated in the lower left of each

panel.

(C) We compare our map of chromosome 7 in GM12878 (last column) to earlier Hi-Cmaps: Lieberman-Aiden et al. (2009), Kalhor et al. (2012), and Jin et al. (2013).

(D) Overview of features revealed by our Hi-C maps. Top: the long-range contact pattern of a locus (left) indicates its nuclear neighborhood (right). We detect at

least six subcompartments, each bearing a distinctive pattern of epigenetic features. Middle: squares of enhanced contact frequency along the diagonal (left)

indicate the presence of small domains of condensed chromatin, whose median length is 185 kb (right). Bottom: peaks in the contact map (left) indicate the

presence of loops (right). These loops tend to lie at domain boundaries and bind CTCF in a convergent orientation.

See also Figure S1, Data S1, I–II, and Tables S1 and S2.

Cell 159, 1665–1680, December 18, 2014 ª2014 Elsevier Inc. 1667

locus i

locus j

locus i

locus j

value: the interaction frequency between loci i and j, or physical proximity

E. Lieberman-Aiden et al., Science 326, 289 (2009):~ 1 Mb resolution

S. S. P. Rao et al., Cell 159, 1665 (2014):~ 1 kb resolution

Page 5: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

Hi-C: the interaction map of chromatin loci

A B

C

D

Figure 1. We Used In Situ Hi-C to Map over 15 Billion Chromatin Contacts across Nine Cell Types in Human and Mouse, Achieving 1 kbResolution in Human Lymphoblastoid Cells(A) During in situ Hi-C, DNA-DNA proximity ligation is performed in intact nuclei.

(B) Contact matrices from chromosome 14: the whole chromosome, at 500 kb resolution (top); 86–96 Mb/50 kb resolution (middle); 94–95 Mb/5 kb resolution

(bottom). Left: GM12878, primary experiment; Right: biological replicate. The 1D regions corresponding to a contact matrix are indicated in the diagrams above

and at left. The intensity of each pixel represents the normalized number of contacts between a pair of loci. Maximum intensity is indicated in the lower left of each

panel.

(C) We compare our map of chromosome 7 in GM12878 (last column) to earlier Hi-Cmaps: Lieberman-Aiden et al. (2009), Kalhor et al. (2012), and Jin et al. (2013).

(D) Overview of features revealed by our Hi-C maps. Top: the long-range contact pattern of a locus (left) indicates its nuclear neighborhood (right). We detect at

least six subcompartments, each bearing a distinctive pattern of epigenetic features. Middle: squares of enhanced contact frequency along the diagonal (left)

indicate the presence of small domains of condensed chromatin, whose median length is 185 kb (right). Bottom: peaks in the contact map (left) indicate the

presence of loops (right). These loops tend to lie at domain boundaries and bind CTCF in a convergent orientation.

See also Figure S1, Data S1, I–II, and Tables S1 and S2.

Cell 159, 1665–1680, December 18, 2014 ª2014 Elsevier Inc. 1667

locus i

locus j

locus i

locus j

value: the interaction frequency between loci i and j, or physical proximity

E. Lieberman-Aiden et al., Science 326, 289 (2009):~ 1 Mb resolution

S. S. P. Rao et al., Cell 159, 1665 (2014):~ 1 kb resolution

320×480 resolution (163 ppi)

1080×1920resolution (401 ppi)

Page 6: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

Hi-C: the interaction map of chromatin loci

A B

C

D

Figure 1. We Used In Situ Hi-C to Map over 15 Billion Chromatin Contacts across Nine Cell Types in Human and Mouse, Achieving 1 kbResolution in Human Lymphoblastoid Cells(A) During in situ Hi-C, DNA-DNA proximity ligation is performed in intact nuclei.

(B) Contact matrices from chromosome 14: the whole chromosome, at 500 kb resolution (top); 86–96 Mb/50 kb resolution (middle); 94–95 Mb/5 kb resolution

(bottom). Left: GM12878, primary experiment; Right: biological replicate. The 1D regions corresponding to a contact matrix are indicated in the diagrams above

and at left. The intensity of each pixel represents the normalized number of contacts between a pair of loci. Maximum intensity is indicated in the lower left of each

panel.

(C) We compare our map of chromosome 7 in GM12878 (last column) to earlier Hi-Cmaps: Lieberman-Aiden et al. (2009), Kalhor et al. (2012), and Jin et al. (2013).

(D) Overview of features revealed by our Hi-C maps. Top: the long-range contact pattern of a locus (left) indicates its nuclear neighborhood (right). We detect at

least six subcompartments, each bearing a distinctive pattern of epigenetic features. Middle: squares of enhanced contact frequency along the diagonal (left)

indicate the presence of small domains of condensed chromatin, whose median length is 185 kb (right). Bottom: peaks in the contact map (left) indicate the

presence of loops (right). These loops tend to lie at domain boundaries and bind CTCF in a convergent orientation.

See also Figure S1, Data S1, I–II, and Tables S1 and S2.

Cell 159, 1665–1680, December 18, 2014 ª2014 Elsevier Inc. 1667

locus i

locus j

locus i

locus j

value: the interaction frequency between loci i and j, or physical proximity

E. Lieberman-Aiden et al., Science 326, 289 (2009):~ 1 Mb resolution

S. S. P. Rao et al., Cell 159, 1665 (2014):~ 1 kb resolution

topologically associated domains (TADs)

Page 7: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

Hi-C: the interaction map of chromatin loci

A B

C

D

Figure 1. We Used In Situ Hi-C to Map over 15 Billion Chromatin Contacts across Nine Cell Types in Human and Mouse, Achieving 1 kbResolution in Human Lymphoblastoid Cells(A) During in situ Hi-C, DNA-DNA proximity ligation is performed in intact nuclei.

(B) Contact matrices from chromosome 14: the whole chromosome, at 500 kb resolution (top); 86–96 Mb/50 kb resolution (middle); 94–95 Mb/5 kb resolution

(bottom). Left: GM12878, primary experiment; Right: biological replicate. The 1D regions corresponding to a contact matrix are indicated in the diagrams above

and at left. The intensity of each pixel represents the normalized number of contacts between a pair of loci. Maximum intensity is indicated in the lower left of each

panel.

(C) We compare our map of chromosome 7 in GM12878 (last column) to earlier Hi-Cmaps: Lieberman-Aiden et al. (2009), Kalhor et al. (2012), and Jin et al. (2013).

(D) Overview of features revealed by our Hi-C maps. Top: the long-range contact pattern of a locus (left) indicates its nuclear neighborhood (right). We detect at

least six subcompartments, each bearing a distinctive pattern of epigenetic features. Middle: squares of enhanced contact frequency along the diagonal (left)

indicate the presence of small domains of condensed chromatin, whose median length is 185 kb (right). Bottom: peaks in the contact map (left) indicate the

presence of loops (right). These loops tend to lie at domain boundaries and bind CTCF in a convergent orientation.

See also Figure S1, Data S1, I–II, and Tables S1 and S2.

Cell 159, 1665–1680, December 18, 2014 ª2014 Elsevier Inc. 1667

locus i

locus j

locus i

locus j

value: the interaction frequency between loci i and j, or physical proximity

E. Lieberman-Aiden et al., Science 326, 289 (2009):~ 1 Mb resolution

S. S. P. Rao et al., Cell 159, 1665 (2014):~ 1 kb resolution

adjacency matrix of a weighted network whose nodes are the loci and the weights are the interaction frequency

topologically associated domains (TADs)

Page 8: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

Hi-C: the interaction map of chromatin loci

A B

C

D

Figure 1. We Used In Situ Hi-C to Map over 15 Billion Chromatin Contacts across Nine Cell Types in Human and Mouse, Achieving 1 kbResolution in Human Lymphoblastoid Cells(A) During in situ Hi-C, DNA-DNA proximity ligation is performed in intact nuclei.

(B) Contact matrices from chromosome 14: the whole chromosome, at 500 kb resolution (top); 86–96 Mb/50 kb resolution (middle); 94–95 Mb/5 kb resolution

(bottom). Left: GM12878, primary experiment; Right: biological replicate. The 1D regions corresponding to a contact matrix are indicated in the diagrams above

and at left. The intensity of each pixel represents the normalized number of contacts between a pair of loci. Maximum intensity is indicated in the lower left of each

panel.

(C) We compare our map of chromosome 7 in GM12878 (last column) to earlier Hi-Cmaps: Lieberman-Aiden et al. (2009), Kalhor et al. (2012), and Jin et al. (2013).

(D) Overview of features revealed by our Hi-C maps. Top: the long-range contact pattern of a locus (left) indicates its nuclear neighborhood (right). We detect at

least six subcompartments, each bearing a distinctive pattern of epigenetic features. Middle: squares of enhanced contact frequency along the diagonal (left)

indicate the presence of small domains of condensed chromatin, whose median length is 185 kb (right). Bottom: peaks in the contact map (left) indicate the

presence of loops (right). These loops tend to lie at domain boundaries and bind CTCF in a convergent orientation.

See also Figure S1, Data S1, I–II, and Tables S1 and S2.

Cell 159, 1665–1680, December 18, 2014 ª2014 Elsevier Inc. 1667

locus i

locus j

locus i

locus j

value: the interaction frequency between loci i and j, or physical proximity

E. Lieberman-Aiden et al., Science 326, 289 (2009):~ 1 Mb resolution

S. S. P. Rao et al., Cell 159, 1665 (2014):~ 1 kb resolution

adjacency matrix of a weighted network whose nodes are the loci and the weights are the interaction frequency

detecting the topologically associated domains (TADs) ≡ detecting the community structures in networks

topologically associated domains (TADs)

Page 9: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

adjacency matrix

0 20 40 60 80 100

0

10

20

30

40

50

60

70

80

90

100

nz = 2730

p1=0.5, p2=0.05, p3=0.5; pS=0, dS=0

community structures in networks

“modularity” (the objective function to be maximized)

review papers: M. A. Porter, J.-P. Onnela, and P. J. Mucha, Not. Am. Math. Soc. 56, 1082 (2009); S. Fortunato, Phys. Rep. 486, 75 (2010).

Q =1

2m

X

i 6=j

✓Aij � �

kikj2m

◆� (gi, gj)

Newman-Girvan null model termki =P

i Aij =P

j Aij

gi: the community to which node i belongs2m =

Pi 6=j Aij =

Pi ki

A = {Aij}

Page 10: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

adjacency matrix

0 20 40 60 80 100

0

10

20

30

40

50

60

70

80

90

100

nz = 2730

p1=0.5, p2=0.05, p3=0.5; pS=0, dS=0

community structures in networks

“modularity” (the objective function to be maximized)

review papers: M. A. Porter, J.-P. Onnela, and P. J. Mucha, Not. Am. Math. Soc. 56, 1082 (2009); S. Fortunato, Phys. Rep. 486, 75 (2010).

Q =1

2m

X

i 6=j

✓Aij � �

kikj2m

◆� (gi, gj)

Newman-Girvan null model term

TAX

ON

OM

IES

OF

NE

TW

OR

KS

FRO

MC

OM

MU

NIT

YST

RU

CT

UR

EPH

YSI

CA

LR

EV

IEW

E86

,036

104

(201

2)

that

alle

dges

are

antif

erro

mag

netic

atre

solu

tion

λ=

"m

axan

dth

ereb

yfo

rces

each

node

into

itsow

nco

mm

unity

.

III.

ME

SOSC

OPI

CR

ESP

ON

SEFU

NC

TIO

NS

(MR

FS)

Tode

scri

beho

wa

netw

ork

disi

nteg

rate

sin

toco

mm

uniti

esas

the

valu

eof

λis

incr

ease

dfr

om"

min

to"

max

[see

Fig.

1(a)

fora

sche

mat

ic],

one

need

sto

sele

ctsu

mm

ary

stat

istic

s.T

here

are

man

ypo

ssib

lew

ays

tosu

mm

ariz

esu

cha

disi

nteg

ratio

npr

oces

s,an

dw

efo

cus

onth

ree

diag

nost

ics

that

char

acte

rize

fund

amen

talp

rope

rtie

sof

netw

ork

com

mun

ities

.Fi

rst,

we

use

the

valu

eof

the

Ham

ilton

ianH

(λ)(

1),w

hich

isa

scal

arqu

antit

ycl

osel

yre

late

dto

netw

ork

mod

ular

ityan

dqu

antifi

esth

een

ergy

ofth

esy

stem

[13,

14].

Seco

nd,

we

calc

ulat

ea

part

ition

entr

opy

S(λ

)to

char

acte

rize

the

com

mun

itysi

zedi

stri

butio

n.To

doth

is,

let

nk

deno

teth

enu

mbe

rof

node

sin

com

mun

ityk

and

defin

ep

k=

nk/N

tobe

the

prob

abili

tyto

choo

sea

node

from

com

mun

ityk

unif

orm

lyat

rand

om.T

hisy

ield

sa(S

hann

on)p

artit

ion

entr

opy

ofS

(λ)=

−!

η(λ

)k=

1p

klo

gp

k,w

hich

quan

tifies

the

diso

rder

inth

eas

soci

ated

com

mun

itysi

zedi

stri

butio

n.T

hird

,we

use

the

num

ber

ofco

mm

uniti

esη

(λ).

ξ=1,

η=3

4ξ=

0,η =

1ξ=

0.2,

η=8

ξ=0.

4, η

=12

ξ=0.

6, η

=17

ξ=0.

8, η

=24

ξ = 0

.2ξ =

0.4

ξ = 0

.6ξ =

0.8

ξ = 0

ξ = 1

0

0.2

0.4

0.6

0.81

ξ

ferr

omag

netic

link

sno

nlin

ksan

tifer

rom

agne

tic li

nks

(a)

(c)

(b)

Hef

f

Sef

fη ef

f

FIG

.1.

(Col

oron

line)

(a)

Sche

mat

icof

som

eof

the

way

sth

ata

netw

ork

can

brea

kup

into

com

mun

ities

asth

eva

lue

ofλ

(or

ξ)

isin

crea

sed.

(b)Z

acha

ryK

arat

eC

lub

netw

ork

[23]

ford

iffe

rent

valu

esof

the

effe

ctiv

efr

actio

nof

antif

erro

mag

netic

edge

.All

inte

ract

ions

are

eith

erfe

rrom

agne

ticor

antif

erro

mag

netic

;i.e

.,fo

rth

eva

lues

ofξ

that

we

used

,th

ere

are

none

utra

lin

tera

ctio

ns.

We

colo

red

ges

inbl

ueif

the

corr

espo

ndin

gin

tera

ctio

nsar

efe

rrom

agne

tic,a

ndw

eco

lor

them

inre

dif

the

inte

ract

ions

are

antif

erro

mag

netic

.We

colo

rth

eno

des

base

don

com

mun

ityaf

filia

tion.

(c)

The

Hef

f,S

eff,

and

ηef

f

MR

Fs,

and

the

inte

ract

ion

mat

rix

Jfo

rdi

ffer

ent

valu

esof

ξ.

We

colo

rel

emen

tsof

the

inte

ract

ion

mat

rix

byde

pict

ing

the

abse

nce

ofan

edge

inw

hite

,fe

rrom

agne

ticed

ges

inbl

ue(d

ark

gray

),an

dan

tifer

rom

agne

ticed

ges

inre

d(l

ight

gray

).

Bec

ause

we

need

tono

rmal

izeH

,S,a

ndη

toco

mpa

reth

emac

ross

netw

orks

,we

defin

ean

effe

ctiv

een

ergy

Hef

f(λ

)=

H(λ

)−H

min

Hm

ax−

Hm

in=

1−

H(λ

)H

min

,(4

)

whe

reH

min

=H

("m

in)

andH

max

=H

("m

ax);

anef

fect

ive

entr

opy

Sef

f(λ

)=

S(λ

)−S

min

Sm

ax−

Sm

in=

S(λ

)lo

gN

,(5

)

whe

reS

min

=S

("m

in)

and

Sm

ax=

S("

max

);an

dan

effe

ctiv

enu

mbe

rof

com

mun

ities

ηef

f(λ

)=

η(λ

)−η

min

ηm

ax−

ηm

in=

η(λ

)−1

N−

1,

(6)

whe

reη

min

("m

in)=

1an

max

("m

ax)=

N.

Som

ene

twor

ksco

ntai

na

smal

lnu

mbe

rof

entr

ies

"ij

that

are

orde

rsof

mag

nitu

dela

rger

than

mos

tot

her

entr

ies.

For

exam

ple,

inth

ene

twor

kof

Face

book

frie

ndsh

ips

atC

alte

ch[2

1,22

],98

%of

the

"ij

entr

ies

are

less

than

100,

but

0.02

%of

them

are

larg

erth

an80

00.

The

sela

rge

"ij

valu

esar

ise

whe

ntw

olo

w-s

tren

gth

node

sbe

com

eco

nnec

ted.

Usi

ngth

enu

llm

odel

Pij

=k i

k j/(

2m),

the

inte

ract

ion

betw

een

two

node

si

and

jbe

com

esan

tifer

rom

agne

ticw

hen

λ>

Aij/P

ij=

2mA

ij/(

k ik j

).If

ane

twor

kha

sa

larg

eto

tal

edge

wei

ght

but

both

ian

dj

have

smal

lst

reng

ths

com

pare

dto

othe

rno

des

inth

ene

twor

k,th

enλ

need

sto

bela

rge

tom

ake

the

inte

ract

ion

antif

erro

mag

netic

.In

prio

rst

udie

s,ne

twor

kco

mm

unity

stru

ctur

eha

sbee

nin

vest

igat

edat

diff

eren

tm

esos

copi

csc

ales

byco

nsid

erin

gpl

ots

ofva

riou

sdi

agno

stic

sas

afu

nctio

nof

the

reso

lutio

npa

ram

eter

λ[1

3,14

,17]

.In

the

pres

ent

exam

ple,

such

plot

sw

ould

bedo

min

ated

byin

tera

ctio

nsth

atre

quir

ela

rge

reso

lutio

n-pa

ram

eter

valu

esto

beco

me

antif

erro

mag

netic

.To

over

com

eth

isis

sue,

we

defin

eth

eef

fect

ive

frac

tion

ofan

tifer

rom

agne

ticed

ges

ξ=

ξ(λ

)=

ℓA(λ

)−ℓA

("m

in)

ℓA("

max

)−ℓA

("m

in)

∈[0

,1],

(7)

whe

reℓA

(λ)

isth

eto

tal

num

ber

ofan

tifer

rom

agne

ticin

-te

ract

ions

for

the

give

nva

lue

ofλ

.In

othe

rw

ords

,it

isth

enu

mbe

rof

"ij

elem

ents

that

are

smal

ler

than

λ.

Thu

s,ℓA

("m

in)

isth

ela

rges

tnu

mbe

rof

antif

erro

mag

netic

inte

rac-

tions

forw

hich

ane

twor

kst

illfo

rms

asi

ngle

com

mun

ity,a

ndth

eef

fect

ive

num

ber

ofan

tifer

rom

agne

ticin

tera

ctio

nsξ

(λ)

isth

enu

mbe

rof

antif

erro

mag

netic

inte

ract

ions

(nor

mal

ized

toth

eun

itin

terv

al)

inex

cess

ofℓA

("m

in).

The

func

tion

ξ(λ

)in

crea

ses

mon

oton

ical

lyin

λ.

Swee

ping

λfr

om"

min

to"

max

corr

espo

nds

tosw

eepi

ngth

eva

lue

ofξ

from

0to

1.(O

neca

nth

ink

ofλ

asa

cont

inuo

usva

riab

lean

asa

disc

rete

vari

able

that

chan

ges

with

even

ts.)

Asw

epe

rfor

msu

chsw

eepi

ngfo

ragi

ven

netw

ork,

the

num

ber

ofco

mm

uniti

esin

crea

sesf

rom

η(ξ

=0)

=1

toη

(ξ=

1)=

Nan

dyi

elds

ave

ctor

[Hef

f(ξ

),S

eff(ξ

),η

eff(ξ

)]w

hose

com

pone

nts

we

call

the

mes

osco

pic

resp

onse

func

tions

(MR

Fs)

ofth

atne

twor

k.(W

eal

soso

met

imes

refe

rto

the

vect

orits

elf

asan

MR

F.)

Bec

ause

Hef

f∈

[0,1

],S

eff∈

[0,1

],η

eff∈

[0,1

],an

∈[0

,1]f

orev

ery

netw

ork,

we

can

com

pare

the

MR

Fsac

ross

netw

orks

and

use

them

toid

entif

ygr

oups

ofne

twor

ksw

ithsi

mila

rm

esos

copi

cst

ruct

ures

.In

Fig.

1(b)

,w

esh

owth

eZ

acha

ryK

arat

eC

lub

netw

ork

[23]

for

diff

eren

tva

lues

of

0361

04-3

� "

ki =P

i Aij =P

j Aij

gi: the community to which node i belongs2m =

Pi 6=j Aij =

Pi ki

A = {Aij}

Page 11: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

community detection with tunable resolutionmethod: I. S. Jutla, L. G. S. Jeub, and P. J. Mucha, GenLouvain (generalized Louvain) version 2.1 [November, 2016] ref) http://netwiki.amath.unc.edu/GenLouvain/GenLouvain

original version: V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, JSTAT 2008 (10), P10008.

data: the Hi-C map in S. S. P. Rao et al., Cell 159, 1665 (2014). the human B-lymphoblastoid cell (GM12878)(data) resolution: 1 kb, … , 1 Mb available. We are using intrachromosomal interactions with the 100 kb and 10 kb resolutions

5.6 GB 152 GBwith the normalization scheme introduced in P. A. Knight and D. Ruiz, IMA J. Numer. Anal. 33, 1029 (2013): the same one used in Cell 159, 1665 (2014).

time complexity: O(n log n)

ref) http://www.curetoday.com/tumor/childhood/treatment/cdr0000258001

˜Aij = ciAijcj such that

Pi˜Aij =

Pj˜Aij = 1

with the locus-specific correction factor {ci}

Page 12: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

goal: providing a systematic way to detect TAD with tunable resolution

� = 0.6

� = 1

� = 1.4

Page 13: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

goal: providing a systematic way to detect TAD with tunable resolution

Page 14: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

2017. 4. 6. 22)15CTCF - Wikipedia

1/9페이지https://en.wikipedia.org/wiki/CTCF

CTCFFor the prison, see Colorado Territorial Correctional Facility.

Transcriptional repressor CTCF also known as 11-zinc fingerprotein or CCCTC-binding factor is a transcription factor that inhumans is encoded by the CTCF gene.[3][4] CTCF is involved in manycellular processes, including transcriptional regulation, insulator activity,V(D)J recombination[5] and regulation of chromatin architecture.[6]

Discovery

CCCTC-Binding factor or CTCF was initially discovered as a negativeregulator of the chicken c-myc gene. This protein was found to be bindingto three regularly spaced repeats of the core sequence CCCTC and thus wasnamed CCCTC binding factor.[7]

Function

The primary role of CTCF is thought to be in regulating the 3D structure ofchromatin.[6] CTCF binds together strands of DNA, thus formingchromatin loops, and anchors DNA to cellular structures like the nuclearlamina.[8] It also defines the boundaries between active andheterochromatic DNA.

Since the 3D structure of DNA influences the regulation of genes, CTCF'sactivity influences the expression of genes. CTCF is thought to be a primarypart of the activity of insulators, sequences that block the interactionbetween enhancers and promoters. CTCF binding has also been bothshown to promote and repress gene expression. It is unknown whetherCTCF affects gene expression solely through its looping activity, or if it hassome other, unknown, activity.[6]

Observed activity

2017. 4. 6. 22)15CTCF - Wikipedia

2/9페이지https://en.wikipedia.org/wiki/CTCF

The binding of CTCF has been shown to have many effects, which areenumerated below. In each case, it is unknown if CTCF directly evokes theoutcome or if it does so indirectly (in particular through its looping role).

Transcriptional regulation

The protein CTCF plays a heavy role in repressing the insulin-like growthfactor 2 gene, by binding to the H-19 imprinting control region (ICR) alongwith differentially-methylated region-1 (DMR1) and MAR3.[9][10]

Insulation

Binding of targeting sequence elements by CTCF can block the interactionbetween enhancers and promoters, therefore limiting the activity ofenhancers to certain functional domains. Besides acting as enhancerblocking, CTCF can also act as a chromatin barrier[11] by preventing thespread of heterochromatin structures.

Regulation of chromatin architecture

CTCF physically binds to itself to form homodimers,[12] which causes thebound DNA to form loops.[13] CTCF also occurs frequently at theboundaries of sections of DNA bound to the nuclear lamina.[8] Usingchromatin immuno-precipitation (ChIP) followed by ChIP-seq, it wasfound that CTCF localizes with cohesin genome-wide and affects generegulatory mechanisms and the higher-order chromatin structure.[14]

Regulation of RNA splicing

CTCF binding has been shown to influence mRNA splicing.[15]

DNA binding

CTCF binds to the consensus sequence CCGCGNGGNGGCAG (in IUPACnotation).[16] This sequence is defined by 11 zinc finger motifs in its

benchmark: the same sequence region (different cell type, though) in R. E. Boulos et al., Phys. Rev. Lett. 111, 118102 (2013).

remarkable linear gradient of the average replication forkpolarity (the mean orientation of the processing replicationmachinery) [15–17]. These replication domains coincidewith a remarkable gene arrangement [14] with, in particu-lar, an overrepresentation of highly expressed genes closeto domain borders [18]. In fact, replication domain bordersappears to be specified by a region (! 200 kb) of open andtranscriptionally active chromatin [16,19] that is signifi-cantly enriched in insulator DNA-binding proteins such asCTCF (the CCCTC-binding factor) [16]. Putative replica-tion origins at domain borders are thus associated withdistinctive attributes that make these origins key featuresof the replication-associated organization of the genome,qualifying them as ‘‘master’’ replication origins [19]. Wefocused our analysis on the coupling between the structuraldata and the replication domain organization in the humangenome. We have performed the analysis of interactiondata obtained from high-throughput chromosome confor-mation capture (Hi-C) technology [2] by mainly concen-trating on the intra- and interchromosomal contact mapsobtained in the human erythroid cell line K562 (100 kbresolution maps with GEO accession number GSE18199).These Hi-C contact maps are positively defined and sym-metric and so can be represented and analyzed using graphtheory [20]. We consider the Hi-C contact matrix as theadjacency matrix of a weighted graph, where the verticesvi are the 100 kb DNA loci and the edges are weightedaccording to the number of Hi-C binary interactions.Because the number of intrachromosome interactionsdecreases very fast when increasing the separation sbetween the loci (! s"1) [2,20], the weighted networkamounts to focus on interactions between loci separatedby short genomic distances (& 10 Mb) over which contactprobabilities are the highest. Alternatively, the non-weighted version of the network takes equally into accountshort-range and long-range interactions within a chromo-some. In this case, we optionally remove from the data allbinary interactions that are present only once (t ¼ 1) ortwice (t ¼ 2), as some of these may well be attributed toexperimental noise (t ¼ 0 corresponds to no thresholding).

In Fig. 1 is shown a Hi-C contact matrix [Fig. 1(b)]corresponding to intrachromosome interactions on a 12Mbfragment of human chromosome 10 where four replicationdomains were identified [Fig. 1(a)] in K562 as U-shapedpatterns in the mean replication timing (MRT) profile ofthis cell line [16,17]. As sketched by the dashed squares inFig. 1(b), these four U domains likely correspond to fourmatrix-square blocks of enriched interactions. This obser-vation suggests that MRT U domains correspond to somespatial compartmentalization into self-interacting struc-tural chromatin units where the bordering early initiationzones prevent cross talk between these domains [16]. Toquantify the importance of these U-domain borders in theHi-C contact interaction graph, we perform a statisticalanalysis over the 876 U domains ($ 3 Mb) identified in

K562 [16]. We also consider 140 additional ‘‘splitdomains’’ of size % 3 Mb whose borders have similargene organization and chromatin structure as replicationdomain borders [9].

(a)

(c)

(b)

FIG. 1 (color online). (a) MRT profile (thick black curve)[16,24] from early 0 to late 1 along a 12 Mb fragment of humanchromosome 10 in K562. The horizontal colored bars correspondto the four replication domains identified as MRT U-shapedpatterns [16] (red segments, 200 kb borders; dark blue segments,400 kb center; light blue segments, interior). CTCF enrichmentprofile (thin purple curve) (ENCODE release 3, March 2010) [25].(b) Corresponding intrachromosome Hi-C contact matrix [2].Each pixel represents the total number of interactions betweenpairs of 100 kb loci. The dashed squares delimit interactionswithin the four U domains. (c) Stationary configuration obtainedfor this 12 Mb fragment when using the 2D particle model tolayout the chromosome 10 interaction graph [Fig. 3(a)]. Verticesare colored according to their position relative to replicationdomains: the border is represented in red, the center in dark blue,the interior in light blue, and the exterior in black. The repre-sented edges correspond to connections between, respectively,replication domain borders (red symbols and lines) and centers(dark blue symbols and lines) with their neighbors distant frommore than 4 Mb. The contact threshold t ¼ 2 (see the text).

PRL 111, 118102 (2013) P HY S I CA L R EV I EW LE T T E R Sweek ending

13 SEPTEMBER 2013

118102-2

Page 15: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

benchmark: the same sequence region (different cell type, though) in R. E. Boulos et al., Phys. Rev. Lett. 111, 118102 (2013).

chr10: normalized values

820 840 860 880 900 920position (100 kb)

820

840

860

880

900

920po

sitio

n (1

00 k

b)10-5

10-4

10-3

10-2

10-1

100

� = 20

� = 2

� = 5

� = 10

remarkable linear gradient of the average replication forkpolarity (the mean orientation of the processing replicationmachinery) [15–17]. These replication domains coincidewith a remarkable gene arrangement [14] with, in particu-lar, an overrepresentation of highly expressed genes closeto domain borders [18]. In fact, replication domain bordersappears to be specified by a region (! 200 kb) of open andtranscriptionally active chromatin [16,19] that is signifi-cantly enriched in insulator DNA-binding proteins such asCTCF (the CCCTC-binding factor) [16]. Putative replica-tion origins at domain borders are thus associated withdistinctive attributes that make these origins key featuresof the replication-associated organization of the genome,qualifying them as ‘‘master’’ replication origins [19]. Wefocused our analysis on the coupling between the structuraldata and the replication domain organization in the humangenome. We have performed the analysis of interactiondata obtained from high-throughput chromosome confor-mation capture (Hi-C) technology [2] by mainly concen-trating on the intra- and interchromosomal contact mapsobtained in the human erythroid cell line K562 (100 kbresolution maps with GEO accession number GSE18199).These Hi-C contact maps are positively defined and sym-metric and so can be represented and analyzed using graphtheory [20]. We consider the Hi-C contact matrix as theadjacency matrix of a weighted graph, where the verticesvi are the 100 kb DNA loci and the edges are weightedaccording to the number of Hi-C binary interactions.Because the number of intrachromosome interactionsdecreases very fast when increasing the separation sbetween the loci (! s"1) [2,20], the weighted networkamounts to focus on interactions between loci separatedby short genomic distances (& 10 Mb) over which contactprobabilities are the highest. Alternatively, the non-weighted version of the network takes equally into accountshort-range and long-range interactions within a chromo-some. In this case, we optionally remove from the data allbinary interactions that are present only once (t ¼ 1) ortwice (t ¼ 2), as some of these may well be attributed toexperimental noise (t ¼ 0 corresponds to no thresholding).

In Fig. 1 is shown a Hi-C contact matrix [Fig. 1(b)]corresponding to intrachromosome interactions on a 12Mbfragment of human chromosome 10 where four replicationdomains were identified [Fig. 1(a)] in K562 as U-shapedpatterns in the mean replication timing (MRT) profile ofthis cell line [16,17]. As sketched by the dashed squares inFig. 1(b), these four U domains likely correspond to fourmatrix-square blocks of enriched interactions. This obser-vation suggests that MRT U domains correspond to somespatial compartmentalization into self-interacting struc-tural chromatin units where the bordering early initiationzones prevent cross talk between these domains [16]. Toquantify the importance of these U-domain borders in theHi-C contact interaction graph, we perform a statisticalanalysis over the 876 U domains ($ 3 Mb) identified in

K562 [16]. We also consider 140 additional ‘‘splitdomains’’ of size % 3 Mb whose borders have similargene organization and chromatin structure as replicationdomain borders [9].

(a)

(c)

(b)

FIG. 1 (color online). (a) MRT profile (thick black curve)[16,24] from early 0 to late 1 along a 12 Mb fragment of humanchromosome 10 in K562. The horizontal colored bars correspondto the four replication domains identified as MRT U-shapedpatterns [16] (red segments, 200 kb borders; dark blue segments,400 kb center; light blue segments, interior). CTCF enrichmentprofile (thin purple curve) (ENCODE release 3, March 2010) [25].(b) Corresponding intrachromosome Hi-C contact matrix [2].Each pixel represents the total number of interactions betweenpairs of 100 kb loci. The dashed squares delimit interactionswithin the four U domains. (c) Stationary configuration obtainedfor this 12 Mb fragment when using the 2D particle model tolayout the chromosome 10 interaction graph [Fig. 3(a)]. Verticesare colored according to their position relative to replicationdomains: the border is represented in red, the center in dark blue,the interior in light blue, and the exterior in black. The repre-sented edges correspond to connections between, respectively,replication domain borders (red symbols and lines) and centers(dark blue symbols and lines) with their neighbors distant frommore than 4 Mb. The contact threshold t ¼ 2 (see the text).

PRL 111, 118102 (2013) P HY S I CA L R EV I EW LE T T E R Sweek ending

13 SEPTEMBER 2013

118102-2

Page 16: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

0

2

4

6

8

10

12

14

0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000

GM

1287

8-C

tcf-S

tdR

aw (a

mea

n)

position (10 kb)

chr3

raw: γ=0.6γ=1.0γ=1.4

normalized: γ=0.6γ=1.0γ=1.4

CTCF

the biological factors (curves) vs the community boundaries (points)

chr10: normalized values

820 840 860 880 900 920position (100 kb)

820

840

860

880

900

920

posi

tion

(100

kb)

10-5

10-4

10-3

10-2

10-1

100

Page 17: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

0

5

10

15

20

25

30

35

40

0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000

GM

1287

8-H

3k79

me2

-Std

Sig

(am

ean)

position (10 kb)

chr3

raw: γ=0.6γ=1.0γ=1.4

normalized: γ=0.6γ=1.0γ=1.4

0

10

20

30

40

50

60

70

80

90

0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000

GM

1287

8-H

3k27

ac-S

tdSi

g (a

mea

n)

position (10 kb)

chr3

raw: γ=0.6γ=1.0γ=1.4

normalized: γ=0.6γ=1.0γ=1.4

0

2

4

6

8

10

12

14

16

18

20

0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000

GM

1287

8-H

3k9a

c-St

dSig

(am

ean)

position (10 kb)

chr3

raw: γ=0.6γ=1.0γ=1.4

normalized: γ=0.6γ=1.0γ=1.4

0

2

4

6

8

10

12

14

16

18

20

0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000

GM

1287

8-H

3k4m

e3-S

tdSi

g (a

mea

n)

position (10 kb)

chr3

raw: γ=0.6γ=1.0γ=1.4

normalized: γ=0.6γ=1.0γ=1.4

0

1

2

3

4

5

6

7

8

0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000

GM

1287

8-H

3k36

me3

-Std

Sig

(am

ean)

position (10 kb)

chr3

raw: γ=0.6γ=1.0γ=1.4

normalized: γ=0.6γ=1.0γ=1.4

H3k36me3H3k4me3

H3k9ac

H3k27ac

H3k79me2

0

2

4

6

8

10

12

14

0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000

GM

1287

8-C

tcf-S

tdR

aw (a

mea

n)

position (10 kb)

chr3

raw: γ=0.6γ=1.0γ=1.4

normalized: γ=0.6γ=1.0γ=1.4

CTCF

the biological factors (curves) vs the community boundaries (points)2017. 4. 6. 22)15CTCF - Wikipedia

1/9페이지https://en.wikipedia.org/wiki/CTCF

CTCFFor the prison, see Colorado Territorial Correctional Facility.

Transcriptional repressor CTCF also known as 11-zinc fingerprotein or CCCTC-binding factor is a transcription factor that inhumans is encoded by the CTCF gene.[3][4] CTCF is involved in manycellular processes, including transcriptional regulation, insulator activity,V(D)J recombination[5] and regulation of chromatin architecture.[6]

Discovery

CCCTC-Binding factor or CTCF was initially discovered as a negativeregulator of the chicken c-myc gene. This protein was found to be bindingto three regularly spaced repeats of the core sequence CCCTC and thus wasnamed CCCTC binding factor.[7]

Function

The primary role of CTCF is thought to be in regulating the 3D structure ofchromatin.[6] CTCF binds together strands of DNA, thus formingchromatin loops, and anchors DNA to cellular structures like the nuclearlamina.[8] It also defines the boundaries between active andheterochromatic DNA.

Since the 3D structure of DNA influences the regulation of genes, CTCF'sactivity influences the expression of genes. CTCF is thought to be a primarypart of the activity of insulators, sequences that block the interactionbetween enhancers and promoters. CTCF binding has also been bothshown to promote and repress gene expression. It is unknown whetherCTCF affects gene expression solely through its looping activity, or if it hassome other, unknown, activity.[6]

Observed activity

2017. 4. 6. 22)15CTCF - Wikipedia

2/9페이지https://en.wikipedia.org/wiki/CTCF

The binding of CTCF has been shown to have many effects, which areenumerated below. In each case, it is unknown if CTCF directly evokes theoutcome or if it does so indirectly (in particular through its looping role).

Transcriptional regulation

The protein CTCF plays a heavy role in repressing the insulin-like growthfactor 2 gene, by binding to the H-19 imprinting control region (ICR) alongwith differentially-methylated region-1 (DMR1) and MAR3.[9][10]

Insulation

Binding of targeting sequence elements by CTCF can block the interactionbetween enhancers and promoters, therefore limiting the activity ofenhancers to certain functional domains. Besides acting as enhancerblocking, CTCF can also act as a chromatin barrier[11] by preventing thespread of heterochromatin structures.

Regulation of chromatin architecture

CTCF physically binds to itself to form homodimers,[12] which causes thebound DNA to form loops.[13] CTCF also occurs frequently at theboundaries of sections of DNA bound to the nuclear lamina.[8] Usingchromatin immuno-precipitation (ChIP) followed by ChIP-seq, it wasfound that CTCF localizes with cohesin genome-wide and affects generegulatory mechanisms and the higher-order chromatin structure.[14]

Regulation of RNA splicing

CTCF binding has been shown to influence mRNA splicing.[15]

DNA binding

CTCF binds to the consensus sequence CCGCGNGGNGGCAG (in IUPACnotation).[16] This sequence is defined by 11 zinc finger motifs in its

2017. 4. 6. 21)59Histone Modifications - What is Epigenetics?

1/5페이지http://www.whatisepigenetics.com/histone-modifications/

Histone Modifications

Schematic representation shows the organization and packaging of genetic material. Nucleosomes arerepresented by DNA (grey) wrapped around eight histone proteins, H2A, H2B, H3, and H4 (coloredcircles). N-terminal histone tails (blue) are shown protruding from H3 and H4.

A histone modification is a covalent post-translational modification(PTM) to histone proteins which includes methylation, phosphorylation,acetylation, ubiquitylation, and sumoylation. The PTMs made to histonescan impact gene expression by altering chromatin structure or recruitinghistone modifiers. Histone proteins act to package DNA, which wrapsaround the eight histones, into chromosomes. Histone modifications act indiverse biological processes such as transcriptional activation/inactivation,chromosome packaging, and DNA damage/repair. In most species, histoneH3 is primarily acetylated at lysines 9, 14, 18, 23, and 56, methylated atarginine 2 and lysines 4, 9, 27, 36, and 79, and phosphorylated at ser10,ser28, Thr3, and Thr11. Histone H4 is primarily acetylated at lysines 5, 8,12 and 16, methylated at arginine 3 and lysine 20, and phosphorylated atserine 1. Thus, quantitative detection of various histone modificationswould provide useful information for a better understanding of epigeneticregulation of cellular processes and the development of histone modifyingenzyme-targeted drugs.

Histone Acetylation/Deacetylation

Histone acetylation occurs by the enzymatic addition of an acetyl group(COCH3) from acetyl coenzyme A. The process of histone acetylation istightly involved in the regulation of many cellular processes including

Page 18: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

comparison between the boundary points and CTCF peaks

communities by Louvain (our method)

TADs from Rao et al., Cell (2014).

CTCF peak

sequence

sequence

sequence

Page 19: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

comparison between the boundary points and CTCF peaks

communities by Louvain (our method)

TADs from Rao et al., Cell (2014).

CTCF peak

sequence

sequence

sequence

boundary points

Page 20: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

10-1 100 101 102

TPR

(sen

sitiv

ity) a

nd T

NR

(spe

cific

ity)

γ

chr1

Rao et al.: TPRTNR

CTCF peak (1): TPRTNR

Rao et al.: TPR, randomizedTNR, randomized

CTCF peak (1): TPR, randomizedTNR, randomized

Rao et al. vs CTCF peak (1): TPRTNR

sensitivity and specificity: 10 kb resolution

“sensitivity” (true positive rate: TPR) = TP/ (TP + FN)

“specificity” (true negative rate: TNR) = TN/ (TN + FP)

BP = the set of boundary points by Louvain (our method)

BP c= the set of nonempty sites � BP

BPext

= the set of boundary points by Rao et al. or the CTCF peaks

BP cext

= the set of nonempty sites � BPext

true positive (TP) = |BP \BPext

|true negative (TN) = |BP c \BP c

ext

|false positive (FP) = |BP \BP c

ext

|false negative (FN) = |BP c \BP

ext

|

comparison between the boundary points and peaks

communities by Louvain (our method)

TADs from Rao et al., Cell (2014).

CTCF peak data

sequence

sequence

sequence

boundary points

Page 21: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

10-1 100 101 102

TPR

(sen

sitiv

ity) a

nd T

NR

(spe

cific

ity)

γ

chr1

Rao et al.: TPRTNR

CTCF peak (1): TPRTNR

Rao et al.: TPR, randomizedTNR, randomized

CTCF peak (1): TPR, randomizedTNR, randomized

Rao et al. vs CTCF peak (1): TPRTNR

sensitivity and specificity: 10 kb resolution

“sensitivity” (true positive rate: TPR) = TP/ (TP + FN)

“specificity” (true negative rate: TNR) = TN/ (TN + FP)

BP = the set of boundary points by Louvain (our method)

BP c= the set of nonempty sites � BP

BPext

= the set of boundary points by Rao et al. or the CTCF peaks

BP cext

= the set of nonempty sites � BPext

true positive (TP) = |BP \BPext

|true negative (TN) = |BP c \BP c

ext

|false positive (FP) = |BP \BP c

ext

|false negative (FN) = |BP c \BP

ext

|

comparison between the boundary points and peaks

communities by Louvain (our method)

TADs from Rao et al., Cell (2014).

CTCF peak data

sequence

sequence

sequence

boundary points

Page 22: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

10-1 100 101 102

TPR

(sen

sitiv

ity) a

nd T

NR

(spe

cific

ity)

γ

chr1

Rao et al.: TPRTNR

CTCF peak (1): TPRTNR

Rao et al.: TPR, randomizedTNR, randomized

CTCF peak (1): TPR, randomizedTNR, randomized

Rao et al. vs CTCF peak (1): TPRTNR

sensitivity and specificity: 10 kb resolution

“sensitivity” (true positive rate: TPR) = TP/ (TP + FN)

“specificity” (true negative rate: TNR) = TN/ (TN + FP)

BP = the set of boundary points by Louvain (our method)

BP c= the set of nonempty sites � BP

BPext

= the set of boundary points by Rao et al. or the CTCF peaks

BP cext

= the set of nonempty sites � BPext

true positive (TP) = |BP \BPext

|true negative (TN) = |BP c \BP c

ext

|false positive (FP) = |BP \BP c

ext

|false negative (FN) = |BP c \BP

ext

|

comparison between the boundary points and peaks

communities by Louvain (our method)

TADs from Rao et al., Cell (2014).

CTCF peak data

sequence

sequence

sequence

boundary points

Page 23: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

10-1 100 101 102

TPR

(sen

sitiv

ity) a

nd T

NR

(spe

cific

ity)

γ

chr1

Rao et al.: TPRTNR

CTCF peak (1): TPRTNR

Rao et al.: TPR, randomizedTNR, randomized

CTCF peak (1): TPR, randomizedTNR, randomized

Rao et al. vs CTCF peak (1): TPRTNR

sensitivity and specificity: 10 kb resolution

“sensitivity” (true positive rate: TPR) = TP/ (TP + FN)

“specificity” (true negative rate: TNR) = TN/ (TN + FP)

BP = the set of boundary points by Louvain (our method)

BP c= the set of nonempty sites � BP

BPext

= the set of boundary points by Rao et al. or the CTCF peaks

BP cext

= the set of nonempty sites � BPext

true positive (TP) = |BP \BPext

|true negative (TN) = |BP c \BP c

ext

|false positive (FP) = |BP \BP c

ext

|false negative (FN) = |BP c \BP

ext

|

comparison between the boundary points and peaks

communities by Louvain (our method)

TADs from Rao et al., Cell (2014).

CTCF peak data

sequence

sequence

sequence

boundary points

Page 24: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

10-1 100 101 102

TPR

(sen

sitiv

ity) a

nd T

NR

(spe

cific

ity)

γ

chr1

Rao et al.: TPRTNR

CTCF peak (1): TPRTNR

Rao et al.: TPR, randomizedTNR, randomized

CTCF peak (1): TPR, randomizedTNR, randomized

Rao et al. vs CTCF peak (1): TPRTNR

sensitivity and specificity: 10 kb resolution

“sensitivity” (true positive rate: TPR) = TP/ (TP + FN)

“specificity” (true negative rate: TNR) = TN/ (TN + FP)

BP = the set of boundary points by Louvain (our method)

BP c= the set of nonempty sites � BP

BPext

= the set of boundary points by Rao et al. or the CTCF peaks

BP cext

= the set of nonempty sites � BPext

true positive (TP) = |BP \BPext

|true negative (TN) = |BP c \BP c

ext

|false positive (FP) = |BP \BP c

ext

|false negative (FN) = |BP c \BP

ext

|

comparison between the boundary points and peaks

communities by Louvain (our method)

TADs from Rao et al., Cell (2014).

CTCF peak data

sequence

sequence

sequence

boundary points

Page 25: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

10-1 100 101 102

TPR

(sen

sitiv

ity) a

nd T

NR

(spe

cific

ity)

γ

chr1

Rao et al.: TPRTNR

CTCF peak (1): TPRTNR

Rao et al.: TPR, randomizedTNR, randomized

CTCF peak (1): TPR, randomizedTNR, randomized

Rao et al. vs CTCF peak (1): TPRTNR

sensitivity and specificity: 10 kb resolution

“sensitivity” (true positive rate: TPR) = TP/ (TP + FN)

“specificity” (true negative rate: TNR) = TN/ (TN + FP)

BP = the set of boundary points by Louvain (our method)

BP c= the set of nonempty sites � BP

BPext

= the set of boundary points by Rao et al. or the CTCF peaks

BP cext

= the set of nonempty sites � BPext

randomly chosen boundary points (1000 realizations)

true positive (TP) = |BP \BPext

|true negative (TN) = |BP c \BP c

ext

|false positive (FP) = |BP \BP c

ext

|false negative (FN) = |BP c \BP

ext

|

comparison between the boundary points and peaks

communities by Louvain (our method)

TADs from Rao et al., Cell (2014).

CTCF peak data

sequence

sequence

sequence

boundary points

Page 26: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

comparison between the boundary points and peaks

communities by Louvain (our method)

TADs from Rao et al., Cell (2014).

CTCF peak

from Prof. Per Stenberg @ Umeå University

sequence

sequence

sequence

boundary points

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

10-1 100 101 102 103

PPV

(pre

cisi

on)

γ

chr1

Rao et al.CTCF peak (1)

Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)

true positive (TP) = |BP \BPext

|true negative (TN) = |BP c \BP c

ext

|false positive (FP) = |BP \BP c

ext

|false negative (FN) = |BP c \BP

ext

|

precision: 10 kb resolution

“precision” (positive predictive value: PPV) = TP/ (TP + FP)

BP = the set of boundary points by Louvain (our method)

BP c= the set of nonempty sites � BP

BPext

= the set of boundary points by Rao et al. or the CTCF peaks

BP cext

= the set of nonempty sites � BPext

randomly chosen boundary points (1000 realizations)

Page 27: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

comparison between the boundary points and peaks

communities by Louvain (our method)

TADs from Rao et al., Cell (2014).

CTCF peak

from Prof. Per Stenberg @ Umeå University

sequence

sequence

sequence

boundary points

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

10-1 100 101 102 103

PPV

(pre

cisi

on)

γ

chr1

Rao et al.CTCF peak (1)

Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)

true positive (TP) = |BP \BPext

|true negative (TN) = |BP c \BP c

ext

|false positive (FP) = |BP \BP c

ext

|false negative (FN) = |BP c \BP

ext

|

precision: 10 kb resolution

“precision” (positive predictive value: PPV) = TP/ (TP + FP)

BP = the set of boundary points by Louvain (our method)

BP c= the set of nonempty sites � BP

BPext

= the set of boundary points by Rao et al. or the CTCF peaks

BP cext

= the set of nonempty sites � BPext

randomly chosen boundary points (1000 realizations)

Their Arrowhead + HiCCUPS method is very slow

O(n4) ! O(n2

) by dynamic programming

O(n log n)

Page 28: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

other null-model terms in the modularity than Newman-Girvan?

or “communities.” Intuitively, a community consists of a setof nodes that are connected among one another more denselythan they are to nodes in other communities. A popular wayto identify community structure is to optimize a quality func-tion, which can be used to measure the relative densities ofintra-community connections versus inter-community connec-tions. See Refs. 16, 20, and 23 for recent reviews on networkcommunity structure and Refs. 24–27 for discussions of vari-ous caveats that should be considered when optimizing qualityfunctions to detect communities.

One begins with a network of N nodes and a given set ofconnections between those nodes. In the usual case ofsingle-layer networks (e.g., static networks with only onetype of edge), one represents a network using an N ! N adja-cency matrix A. The element Aij of the adjacency matrixindicates a direct connection or “edge” from node i to node j,and its value indicates the weight of that connection. Thequality of a hard partition of A into communities (wherebyeach node is assigned to exactly one community) can bequantified using a quality function. The most popular choiceis modularity16,20,21,28,29

Q0 ¼X

ij

½Aij $ cPij%dðgi; gjÞ ; (1)

where node i is assigned to community gi, node j is assignedto community gj, the Kronecker delta dðgi; gjÞ ¼ 1 if gi ¼ gj

and it equals 0 otherwise, c is a resolution parameter (whichwe will call a structural resolution parameter), and Pij is theexpected weight of the edge connecting node i to node junder a specified null model. The choice c ¼ 1 is very com-mon, but it is important to consider multiple values of c toexamine groups at multiple scales.16,30,31 Maximization ofQ0 yields a hard partition of a network into communitiessuch that the total edge weight inside of modules is as largeas possible (relative to the null model and subject to the limi-tations of the employed computational heuristics, as optimiz-ing Q0 is NP-hard16,20,32).

Recently, the null model in the quality function (1) hasbeen generalized so that one can consider sets of L adjacencymatrices, which are combined to form a rank-3 adjacencytensor A that can be used to represent time-dependent ormultiplex networks. One can thereby define a multilayermodularity (also called “multislice modularity”)3

Q ¼ 1

2l

X

ijlr

fðAijl $ clPijlÞdlr þ dijxjlrgdðgil; gjrÞ ; (2)

where the adjacency matrix of layer l has components Aijl,the element Pijl gives the components of the correspondinglayer-l matrix for the optimization null model, cl is the struc-tural resolution parameter of layer l, the quantity gil gives thecommunity assignment of node i in layer l, the quantity gjr

gives the community assignment of node j in layer r, the ele-ment xjlr gives the connection strength (i.e., an “interlayercoupling parameter,” which one can call a temporal resolu-tion parameter if one is using the adjacency tensor to repre-sent a time-dependent network) from node j in layer r tonode j in layer l, the total edge weight in the network isl ¼ 1

2

Pjr jjr, the strength (i.e., weighted degree) of node j in

layer l is jjl ¼ kjl þ cjl, the intra-layer strength of node j inlayer l is kjl ¼

Pi Aijl, and the inter-layer strength of node j

in layer l is cjl ¼P

r xjlr.Equivalent representations that use other notation can,

of course, be useful. For example, multilayer modularitycan be recast as a set of rank-2 matrices describing connec-tions between the set of all nodes across layers [e.g., forspectral partitioning29,33,34]. One can similarly generalize Qfor higher-rank tensors, which one can use when studyingcommunity structure in networks that are both time-dependent and multiplex, through appropriate specificationof inter-layer coupling tensors.

B. Network diagnostics

To characterize multilayer community structure, wecompute four example diagnostics for each hard partition:the modularity Q, the number of modules n, the mean com-munity size s (which is equal to the number of nodes in thecommunity and is proportional to 1/n), and the stationarityf.35 To compute f, we calculate the autocorrelation functionU(t, tþm) of two states of the same community G(t) at mtime steps (i.e., m network layers) apart

Uðt; tþ mÞ ) jGðtÞ \ Gðtþ mÞjjGðtÞ [ Gðtþ mÞj

; (3)

where jGðtÞ \ Gðtþ mÞj is the number of nodes that aremembers of both G(t) and G(tþm), and jGðtÞ [ Gðtþ mÞj isthe number of nodes in the union of the community at times tand tþm. Defining t0 to be the first time step in which thecommunity exists and t0 to be the last time in which it exists,the stationarity of a community is35

f )

Xt0$1

t¼t0Uðt; tþ 1Þ

t0 $ t0: (4)

This gives the mean autocorrelation over consecutive timesteps.36

In addition to these diagnostics, which are defined usingthe entire multilayer community structure, we also computetwo example diagnostics on the community structures of thecomponent layers: the mean single-layer modularity hQsi andthe variance varðQsÞ of the single-layer modularity over alllayers. The single-layer modularity Qs is defined as the staticmodularity quality function, Qs ¼

Pij½Aij $ cPij%dðgi; gjÞ,

computed for the partition g that we obtained via optimiza-tion of the multilayer modularity function Q. We have chosento use a few simple ways to help characterize the timeseries for Qs, though of course other diagnostics can also beinformative.

C. Data sets

We illustrate dynamic network null models using twoexample network ensembles: (1) 75-time-layer brain net-works drawn from each of 20 human subjects and (2) behav-ioral networks with about 150 time layers drawn from eachof 22 human subjects. Importantly, the use of network

013142-3 Bassett et al. Chaos 23, 013142 (2013)

Downloaded 18 Mar 2013 to 128.146.70.188. Redistribution subject to AIP license or copyright; see http://chaos.aip.org/about/rights_and_permissions

ref) D. S. Bassett, M. A. Porter, N. F. Wymbs, S. T. Grafton, J. M. Carlson, and P. J. Mucha, Chaos 23, 013142 (2013).

clustering techniques have been developed to identify com-munities, and they have yielded insights in the study of thecommittee structure in the United States Congress,18 func-tional groups in protein interaction networks,19 functionalmodules in brain networks,4 and more. A particularly success-ful technique for identifying communities in networks16,20 isoptimization of a quality function known as “modularity,”21

which recently has been generalized for detecting commun-ities in time-dependent and multiplex networks.3

Modularity optimization allows one to algorithmicallypartition a network’s nodes into communities such that thetotal connection strength within groups of the partition is morethan would be expected in some null model. However, modu-larity optimization always yields a network partition (into a setof communities) as an output whether or not a given networktruly contains modular structure. Therefore, application of sub-sequent diagnostics to a network partition is potentially mean-ingless without some comparison to benchmark or null-modelnetworks. That is, it is important to establish whether the parti-tion(s) obtained appear to represent meaningful communitystructures within the network data or whether they might havereasonably arisen at random. Moreover, robust assessment ofnetwork organization depends fundamentally on the develop-ment of statistical techniques to compare structures in a net-work derived from real data to those in appropriate models(see, e.g., Ref. 22). Indeed, as the constraints in null modelsand network benchmarks become more stringent, it canbecome possible to make stronger claims when interpretingorganizational structures such as community structure.

In the present paper, we examine null models in time-dependent networks and investigate their use in the algorithmicdetection of cohesive, dynamic communities in such networks(see Fig. 2). Indeed, community detection in temporal net-works necessitates the development of null models that areappropriate for such networks. Such null models can help pro-vide bases of comparison at various stages of the community-detection process, and they can thereby facilitate the principledidentification of dynamic structure in networks. Indeed, the im-portance of developing null models extends beyond commu-nity detection, as such models make it possible to obtainstatistically significant estimates of network diagnostics.

Our dynamic network null models fall into two catego-ries: optimization null models, which we use in the identifi-cation of community structure; and post-optimization nullmodels, which we use to examine the identified communitystructure. We describe how these null models can be selected

in a manner appropriate to known features of a network’sconstruction, identify potentially interesting network scalesby determining values of interest for structural and temporalresolution parameters, and inform the choice of representa-tive partitions of a network into communities.

II. METHODS

A. Community detection

Community-detection algorithms provide ways to decom-pose a network into dense groups of nodes called “modules”

FIG. 1. An important property of many real-world networks is communitystructure, in which there exist cohesive groups of nodes such that a networkhas stronger connections within such groups than it does between such groups.Community structure often changes in time, which can lead to the rearrange-ment of cohesive groups, the formation of new groups, and the fragmentationof existing groups.

FIG. 2. Methodological considerations important in the investigation ofdynamic community structure in temporal networks. (A) Depending on thesystem under study, a single network layer (which is represented using an or-dinary adjacency matrix with an extra index to indicate the layer) might bydefinition only allow edges from some subset of the complete set of nodepairs, as is the case in the depicted chain-like graph. We call such a situationpartial connectivity. (B) Although the most common optimization null modelemploys random graphs (e.g., the Newman-Girvan null model, which isclosely related to the configuration model1,16), other models can also provideimportant insights into network community structure. (C) After determining aset of partitions that maximize the modularity Q (or a similar quality function),it is interesting to test whether the community structure is different from, forexample, what would be expected with a scrambling of time layers (i.e., a tem-poral null model) or node identities (i.e., a nodal null model).4

013142-2 Bassett et al. Chaos 23, 013142 (2013)

Downloaded 18 Mar 2013 to 128.146.70.188. Redistribution subject to AIP license or copyright; see http://chaos.aip.org/about/rights_and_permissions

As described in more detail in Ref. 13, we construct anensemble of 66 behavioral networks from 22 individuals and3 experimental conditions. These networks represent a set offinger movements in the same simple motor learning experi-ment from which we constructed the brain networks in dataset 1. Subjects were instructed to press a sequence of buttonscorresponding to a sequence of 12 pseudo-musical notesshown to them on a screen.

Each node represents an interval between consecutivebutton presses. A single network layer consists of N¼ 11nodes (i.e., there is one interval between each pair of notes),which are connected in a chain via weighted, undirectededges. In Ref. 13, we examined the phenomenon of motor“chunking,” which is a fascinating but poorly understoodphenomenon in which groups of movements are made withsimilar inter-movement durations. (This is similar to remem-bering a phone number in groups of a few digits or groupingnotes together as one masters how to play a song.) For eachexperimental trial l and each pair of inter-movement inter-vals i and j, we define the weight of an edge connectinginter-movement i to inter-movement j as the normalized sim-ilarity in inter-movement durations. The normalized similar-ity between nodes i and j is defined as

qijl ¼!dl " dijl

!dl; (6)

where dijl is the absolute value of the difference of lengths ofthe ith and jth inter-movement time intervals in trial l and !dl

is the maximum value of dijl in trial l. These weights yieldthe elements Wijl of a weighted, undirected multilayer net-work W. Because finger movements occur in series, inter-movement i is connected in time to inter-movement i 6 1 butnot to any other inter-movements iþ n for jnj 6¼ 1.

To encode this conceptual relationship as a network, weset all non-contiguous connections in W to 0 and therebyconstruct a weighted, undirected chain network A. In Fig.3(b), we show an example trial layer from A for a single sub-ject in this experimental data. We couple layers of A to oneanother with weight xjlr, which gives the connection strengthbetween node j in experimental trial r and node j in trial l. Ina given instantiation of the network, we again let xjlr $ x 2½0:1; 40& be identical for all nodes j for all connectionsbetween nearest-neighbor layers. (Again, xjlr ¼ 0 in all othercases.) Because inter-movement nodes are ordered, one canapply community-detection algorithms to identify commun-ities of nodes in sequence. Each community represents amotor “chunk.”

III. RESULTS

A. Modularity-optimization null models

After constructing a multilayer network A with elementsAijl, it is necessary to select an optimization null model P inEq. (2). The most common modularity-optimization nullmodel used in undirected, single-layer networks is theNewman-Girvan null model16,20,21,28,29

Pij ¼kikj

2m; (7)

where ki ¼P

j Aij is the strength of node i and m ¼ 12

Pij Aij.

The definition (7) can be extended to multilayer networksusing

Pijl ¼kilkjl

2ml; (8)

where kil ¼P

j Aijl is the strength of node i in layer l andml ¼ 1

2

Pij Aijl. Optimization of Q using the null model (8)

identifies partitions of a network into groups that have moreconnections (in the case of binary networks) or higher con-nection densities (in the case of weighted networks) thanwould be expected for the distribution of connections (orconnection densities) expected in a null model. We use thenotation Al for the layer-l adjacency matrix composed of ele-ments Aijl and the notation Pl to denote the layer-l null-model matrix with elements Pijl. See Fig. 4(a) for an examplelayer Al from a multilayer behavioral network and Fig. 4(b)for an example instantiation of the Newman-Girvan nullmodel Pl.

1. Optimization null models for ordered nodenetworks

The Newman-Girvan null model is particularly usefulfor networks with categorical nodes, in which a connection

FIG. 3. Network layers and community assignments from two example datasets: (A) a brain network based on correlations between blood-oxygen-level-dependent (BOLD) signals4 and (B) a behavioral network based on similar-ities in movement times during a simple motor learning experiment.13 Weuse these data sets to illustrate situations with categorical nodes and orderednodes, respectively. In the bottom panels, we show community assignmentsobtained using multilayer community detection for (C) the brain networksand (D) the behavioral networks.

013142-5 Bassett et al. Chaos 23, 013142 (2013)

Downloaded 18 Mar 2013 to 128.146.70.188. Redistribution subject to AIP license or copyright; see http://chaos.aip.org/about/rights_and_permissions

between any pair of nodes can occur in theory. However,when using a chain network of ordered nodes, it is useful toconsider alternative null models. For example, in a networkrepresented by an adjacency matrix A0, one can define

Pij ¼ qA0ij ; (9)

where q is the mean edge weight of the chain network andA0 is the binarized version of A, in which nonzero elementsof A are set to 1 and zero-valued elements remain unaltered.Such a null model can also be defined for a multilayer net-work that is represented by a rank-3 adjacency tensor A. Onecan construct a null model P with components

Pijl ¼ qlA0ijl ; (10)

where ql is the mean edge weight in layer l and A0 is thebinarized version of A. The optimization of Q using this nullmodel identifies partitions of a network whose communitieshave a larger strength than the mean. See Fig. 4(c) for anexample of this chain null model Pl for the behavioral net-work layer shown in Fig. 4(a).

In Fig. 4(d), we illustrate the effect that the choice ofoptimization null model has on the modularity values Q ofthe behavioral networks as a function of the structural resolu-tion parameter. (Throughout the manuscript, we use aLouvain-like locally greedy algorithm to maximize the mul-tilayer modularity quality function.57,58) The Newman-Girvan null model gives decreasing values of Q forc 2 ½0:1; 2:1#, whereas the chain null model produces lowervalues of Q, which behaves in a qualitatively different

FIG. 4. Modularity-optimization null models. (A) Example layer Al from a behavioral network. (B) Newman-Girvan and (C) chain null models Pl for the layershown in panel (A). (D) Optimized multilayer modularity value Q, (E) number of communities n, and (F) mean community size s for the complete multilayerbehavioral network employing the Newman-Girvan (black) and chain (red) optimization null models as a function of the structural resolution parameter c.(G) Optimized modularity value Q, (H) number of communities n, and (I) mean community size s for the multilayer behavioral network employing chain opti-mization null models as a function of the effective fraction nmlðcÞ of edges that have larger weights than their null-model counterparts. We averaged the valuesof Q, n, and s over the 3 different 12-note sequences and C¼ 100 optimizations. Box plots in (D-F) indicate quartiles and 95% confidence intervals over the 22individuals in the study. The error bars in panels (G-I) indicate a standard deviation from the mean. In some instances, this is smaller than the line width. Thetemporal resolution-parameter value is x ¼ 1.

013142-6 Bassett et al. Chaos 23, 013142 (2013)

Downloaded 18 Mar 2013 to 128.146.70.188. Redistribution subject to AIP license or copyright; see http://chaos.aip.org/about/rights_and_permissions

Page 29: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

between any pair of nodes can occur in theory. However,when using a chain network of ordered nodes, it is useful toconsider alternative null models. For example, in a networkrepresented by an adjacency matrix A0, one can define

Pij ¼ qA0ij ; (9)

where q is the mean edge weight of the chain network andA0 is the binarized version of A, in which nonzero elementsof A are set to 1 and zero-valued elements remain unaltered.Such a null model can also be defined for a multilayer net-work that is represented by a rank-3 adjacency tensor A. Onecan construct a null model P with components

Pijl ¼ qlA0ijl ; (10)

where ql is the mean edge weight in layer l and A0 is thebinarized version of A. The optimization of Q using this nullmodel identifies partitions of a network whose communitieshave a larger strength than the mean. See Fig. 4(c) for anexample of this chain null model Pl for the behavioral net-work layer shown in Fig. 4(a).

In Fig. 4(d), we illustrate the effect that the choice ofoptimization null model has on the modularity values Q ofthe behavioral networks as a function of the structural resolu-tion parameter. (Throughout the manuscript, we use aLouvain-like locally greedy algorithm to maximize the mul-tilayer modularity quality function.57,58) The Newman-Girvan null model gives decreasing values of Q forc 2 ½0:1; 2:1#, whereas the chain null model produces lowervalues of Q, which behaves in a qualitatively different

FIG. 4. Modularity-optimization null models. (A) Example layer Al from a behavioral network. (B) Newman-Girvan and (C) chain null models Pl for the layershown in panel (A). (D) Optimized multilayer modularity value Q, (E) number of communities n, and (F) mean community size s for the complete multilayerbehavioral network employing the Newman-Girvan (black) and chain (red) optimization null models as a function of the structural resolution parameter c.(G) Optimized modularity value Q, (H) number of communities n, and (I) mean community size s for the multilayer behavioral network employing chain opti-mization null models as a function of the effective fraction nmlðcÞ of edges that have larger weights than their null-model counterparts. We averaged the valuesof Q, n, and s over the 3 different 12-note sequences and C¼ 100 optimizations. Box plots in (D-F) indicate quartiles and 95% confidence intervals over the 22individuals in the study. The error bars in panels (G-I) indicate a standard deviation from the mean. In some instances, this is smaller than the line width. Thetemporal resolution-parameter value is x ¼ 1.

013142-6 Bassett et al. Chaos 23, 013142 (2013)

Downloaded 18 Mar 2013 to 128.146.70.188. Redistribution subject to AIP license or copyright; see http://chaos.aip.org/about/rights_and_permissions

or “communities.” Intuitively, a community consists of a setof nodes that are connected among one another more denselythan they are to nodes in other communities. A popular wayto identify community structure is to optimize a quality func-tion, which can be used to measure the relative densities ofintra-community connections versus inter-community connec-tions. See Refs. 16, 20, and 23 for recent reviews on networkcommunity structure and Refs. 24–27 for discussions of vari-ous caveats that should be considered when optimizing qualityfunctions to detect communities.

One begins with a network of N nodes and a given set ofconnections between those nodes. In the usual case ofsingle-layer networks (e.g., static networks with only onetype of edge), one represents a network using an N ! N adja-cency matrix A. The element Aij of the adjacency matrixindicates a direct connection or “edge” from node i to node j,and its value indicates the weight of that connection. Thequality of a hard partition of A into communities (wherebyeach node is assigned to exactly one community) can bequantified using a quality function. The most popular choiceis modularity16,20,21,28,29

Q0 ¼X

ij

½Aij $ cPij%dðgi; gjÞ ; (1)

where node i is assigned to community gi, node j is assignedto community gj, the Kronecker delta dðgi; gjÞ ¼ 1 if gi ¼ gj

and it equals 0 otherwise, c is a resolution parameter (whichwe will call a structural resolution parameter), and Pij is theexpected weight of the edge connecting node i to node junder a specified null model. The choice c ¼ 1 is very com-mon, but it is important to consider multiple values of c toexamine groups at multiple scales.16,30,31 Maximization ofQ0 yields a hard partition of a network into communitiessuch that the total edge weight inside of modules is as largeas possible (relative to the null model and subject to the limi-tations of the employed computational heuristics, as optimiz-ing Q0 is NP-hard16,20,32).

Recently, the null model in the quality function (1) hasbeen generalized so that one can consider sets of L adjacencymatrices, which are combined to form a rank-3 adjacencytensor A that can be used to represent time-dependent ormultiplex networks. One can thereby define a multilayermodularity (also called “multislice modularity”)3

Q ¼ 1

2l

X

ijlr

fðAijl $ clPijlÞdlr þ dijxjlrgdðgil; gjrÞ ; (2)

where the adjacency matrix of layer l has components Aijl,the element Pijl gives the components of the correspondinglayer-l matrix for the optimization null model, cl is the struc-tural resolution parameter of layer l, the quantity gil gives thecommunity assignment of node i in layer l, the quantity gjr

gives the community assignment of node j in layer r, the ele-ment xjlr gives the connection strength (i.e., an “interlayercoupling parameter,” which one can call a temporal resolu-tion parameter if one is using the adjacency tensor to repre-sent a time-dependent network) from node j in layer r tonode j in layer l, the total edge weight in the network isl ¼ 1

2

Pjr jjr, the strength (i.e., weighted degree) of node j in

layer l is jjl ¼ kjl þ cjl, the intra-layer strength of node j inlayer l is kjl ¼

Pi Aijl, and the inter-layer strength of node j

in layer l is cjl ¼P

r xjlr.Equivalent representations that use other notation can,

of course, be useful. For example, multilayer modularitycan be recast as a set of rank-2 matrices describing connec-tions between the set of all nodes across layers [e.g., forspectral partitioning29,33,34]. One can similarly generalize Qfor higher-rank tensors, which one can use when studyingcommunity structure in networks that are both time-dependent and multiplex, through appropriate specificationof inter-layer coupling tensors.

B. Network diagnostics

To characterize multilayer community structure, wecompute four example diagnostics for each hard partition:the modularity Q, the number of modules n, the mean com-munity size s (which is equal to the number of nodes in thecommunity and is proportional to 1/n), and the stationarityf.35 To compute f, we calculate the autocorrelation functionU(t, tþm) of two states of the same community G(t) at mtime steps (i.e., m network layers) apart

Uðt; tþ mÞ ) jGðtÞ \ Gðtþ mÞjjGðtÞ [ Gðtþ mÞj

; (3)

where jGðtÞ \ Gðtþ mÞj is the number of nodes that aremembers of both G(t) and G(tþm), and jGðtÞ [ Gðtþ mÞj isthe number of nodes in the union of the community at times tand tþm. Defining t0 to be the first time step in which thecommunity exists and t0 to be the last time in which it exists,the stationarity of a community is35

f )

Xt0$1

t¼t0Uðt; tþ 1Þ

t0 $ t0: (4)

This gives the mean autocorrelation over consecutive timesteps.36

In addition to these diagnostics, which are defined usingthe entire multilayer community structure, we also computetwo example diagnostics on the community structures of thecomponent layers: the mean single-layer modularity hQsi andthe variance varðQsÞ of the single-layer modularity over alllayers. The single-layer modularity Qs is defined as the staticmodularity quality function, Qs ¼

Pij½Aij $ cPij%dðgi; gjÞ,

computed for the partition g that we obtained via optimiza-tion of the multilayer modularity function Q. We have chosento use a few simple ways to help characterize the timeseries for Qs, though of course other diagnostics can also beinformative.

C. Data sets

We illustrate dynamic network null models using twoexample network ensembles: (1) 75-time-layer brain net-works drawn from each of 20 human subjects and (2) behav-ioral networks with about 150 time layers drawn from eachof 22 human subjects. Importantly, the use of network

013142-3 Bassett et al. Chaos 23, 013142 (2013)

Downloaded 18 Mar 2013 to 128.146.70.188. Redistribution subject to AIP license or copyright; see http://chaos.aip.org/about/rights_and_permissions

Pij = ⇢A0ijPij =

kikj2m

other null-model terms in the modularity than Newman-Girvan?

ref) D. S. Bassett, M. A. Porter, N. F. Wymbs, S. T. Grafton, J. M. Carlson, and P. J. Mucha, Chaos 23, 013142 (2013).

Page 30: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

between any pair of nodes can occur in theory. However,when using a chain network of ordered nodes, it is useful toconsider alternative null models. For example, in a networkrepresented by an adjacency matrix A0, one can define

Pij ¼ qA0ij ; (9)

where q is the mean edge weight of the chain network andA0 is the binarized version of A, in which nonzero elementsof A are set to 1 and zero-valued elements remain unaltered.Such a null model can also be defined for a multilayer net-work that is represented by a rank-3 adjacency tensor A. Onecan construct a null model P with components

Pijl ¼ qlA0ijl ; (10)

where ql is the mean edge weight in layer l and A0 is thebinarized version of A. The optimization of Q using this nullmodel identifies partitions of a network whose communitieshave a larger strength than the mean. See Fig. 4(c) for anexample of this chain null model Pl for the behavioral net-work layer shown in Fig. 4(a).

In Fig. 4(d), we illustrate the effect that the choice ofoptimization null model has on the modularity values Q ofthe behavioral networks as a function of the structural resolu-tion parameter. (Throughout the manuscript, we use aLouvain-like locally greedy algorithm to maximize the mul-tilayer modularity quality function.57,58) The Newman-Girvan null model gives decreasing values of Q forc 2 ½0:1; 2:1#, whereas the chain null model produces lowervalues of Q, which behaves in a qualitatively different

FIG. 4. Modularity-optimization null models. (A) Example layer Al from a behavioral network. (B) Newman-Girvan and (C) chain null models Pl for the layershown in panel (A). (D) Optimized multilayer modularity value Q, (E) number of communities n, and (F) mean community size s for the complete multilayerbehavioral network employing the Newman-Girvan (black) and chain (red) optimization null models as a function of the structural resolution parameter c.(G) Optimized modularity value Q, (H) number of communities n, and (I) mean community size s for the multilayer behavioral network employing chain opti-mization null models as a function of the effective fraction nmlðcÞ of edges that have larger weights than their null-model counterparts. We averaged the valuesof Q, n, and s over the 3 different 12-note sequences and C¼ 100 optimizations. Box plots in (D-F) indicate quartiles and 95% confidence intervals over the 22individuals in the study. The error bars in panels (G-I) indicate a standard deviation from the mean. In some instances, this is smaller than the line width. Thetemporal resolution-parameter value is x ¼ 1.

013142-6 Bassett et al. Chaos 23, 013142 (2013)

Downloaded 18 Mar 2013 to 128.146.70.188. Redistribution subject to AIP license or copyright; see http://chaos.aip.org/about/rights_and_permissions

our data: not exactly (exclusively) the chain shape

or “communities.” Intuitively, a community consists of a setof nodes that are connected among one another more denselythan they are to nodes in other communities. A popular wayto identify community structure is to optimize a quality func-tion, which can be used to measure the relative densities ofintra-community connections versus inter-community connec-tions. See Refs. 16, 20, and 23 for recent reviews on networkcommunity structure and Refs. 24–27 for discussions of vari-ous caveats that should be considered when optimizing qualityfunctions to detect communities.

One begins with a network of N nodes and a given set ofconnections between those nodes. In the usual case ofsingle-layer networks (e.g., static networks with only onetype of edge), one represents a network using an N ! N adja-cency matrix A. The element Aij of the adjacency matrixindicates a direct connection or “edge” from node i to node j,and its value indicates the weight of that connection. Thequality of a hard partition of A into communities (wherebyeach node is assigned to exactly one community) can bequantified using a quality function. The most popular choiceis modularity16,20,21,28,29

Q0 ¼X

ij

½Aij $ cPij%dðgi; gjÞ ; (1)

where node i is assigned to community gi, node j is assignedto community gj, the Kronecker delta dðgi; gjÞ ¼ 1 if gi ¼ gj

and it equals 0 otherwise, c is a resolution parameter (whichwe will call a structural resolution parameter), and Pij is theexpected weight of the edge connecting node i to node junder a specified null model. The choice c ¼ 1 is very com-mon, but it is important to consider multiple values of c toexamine groups at multiple scales.16,30,31 Maximization ofQ0 yields a hard partition of a network into communitiessuch that the total edge weight inside of modules is as largeas possible (relative to the null model and subject to the limi-tations of the employed computational heuristics, as optimiz-ing Q0 is NP-hard16,20,32).

Recently, the null model in the quality function (1) hasbeen generalized so that one can consider sets of L adjacencymatrices, which are combined to form a rank-3 adjacencytensor A that can be used to represent time-dependent ormultiplex networks. One can thereby define a multilayermodularity (also called “multislice modularity”)3

Q ¼ 1

2l

X

ijlr

fðAijl $ clPijlÞdlr þ dijxjlrgdðgil; gjrÞ ; (2)

where the adjacency matrix of layer l has components Aijl,the element Pijl gives the components of the correspondinglayer-l matrix for the optimization null model, cl is the struc-tural resolution parameter of layer l, the quantity gil gives thecommunity assignment of node i in layer l, the quantity gjr

gives the community assignment of node j in layer r, the ele-ment xjlr gives the connection strength (i.e., an “interlayercoupling parameter,” which one can call a temporal resolu-tion parameter if one is using the adjacency tensor to repre-sent a time-dependent network) from node j in layer r tonode j in layer l, the total edge weight in the network isl ¼ 1

2

Pjr jjr, the strength (i.e., weighted degree) of node j in

layer l is jjl ¼ kjl þ cjl, the intra-layer strength of node j inlayer l is kjl ¼

Pi Aijl, and the inter-layer strength of node j

in layer l is cjl ¼P

r xjlr.Equivalent representations that use other notation can,

of course, be useful. For example, multilayer modularitycan be recast as a set of rank-2 matrices describing connec-tions between the set of all nodes across layers [e.g., forspectral partitioning29,33,34]. One can similarly generalize Qfor higher-rank tensors, which one can use when studyingcommunity structure in networks that are both time-dependent and multiplex, through appropriate specificationof inter-layer coupling tensors.

B. Network diagnostics

To characterize multilayer community structure, wecompute four example diagnostics for each hard partition:the modularity Q, the number of modules n, the mean com-munity size s (which is equal to the number of nodes in thecommunity and is proportional to 1/n), and the stationarityf.35 To compute f, we calculate the autocorrelation functionU(t, tþm) of two states of the same community G(t) at mtime steps (i.e., m network layers) apart

Uðt; tþ mÞ ) jGðtÞ \ Gðtþ mÞjjGðtÞ [ Gðtþ mÞj

; (3)

where jGðtÞ \ Gðtþ mÞj is the number of nodes that aremembers of both G(t) and G(tþm), and jGðtÞ [ Gðtþ mÞj isthe number of nodes in the union of the community at times tand tþm. Defining t0 to be the first time step in which thecommunity exists and t0 to be the last time in which it exists,the stationarity of a community is35

f )

Xt0$1

t¼t0Uðt; tþ 1Þ

t0 $ t0: (4)

This gives the mean autocorrelation over consecutive timesteps.36

In addition to these diagnostics, which are defined usingthe entire multilayer community structure, we also computetwo example diagnostics on the community structures of thecomponent layers: the mean single-layer modularity hQsi andthe variance varðQsÞ of the single-layer modularity over alllayers. The single-layer modularity Qs is defined as the staticmodularity quality function, Qs ¼

Pij½Aij $ cPij%dðgi; gjÞ,

computed for the partition g that we obtained via optimiza-tion of the multilayer modularity function Q. We have chosento use a few simple ways to help characterize the timeseries for Qs, though of course other diagnostics can also beinformative.

C. Data sets

We illustrate dynamic network null models using twoexample network ensembles: (1) 75-time-layer brain net-works drawn from each of 20 human subjects and (2) behav-ioral networks with about 150 time layers drawn from eachof 22 human subjects. Importantly, the use of network

013142-3 Bassett et al. Chaos 23, 013142 (2013)

Downloaded 18 Mar 2013 to 128.146.70.188. Redistribution subject to AIP license or copyright; see http://chaos.aip.org/about/rights_and_permissions

Pij = ⇢A0ijPij =

kikj2m

other null-model terms in the modularity than Newman-Girvan?

ref) D. S. Bassett, M. A. Porter, N. F. Wymbs, S. T. Grafton, J. M. Carlson, and P. J. Mucha, Chaos 23, 013142 (2013).

Page 31: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

between any pair of nodes can occur in theory. However,when using a chain network of ordered nodes, it is useful toconsider alternative null models. For example, in a networkrepresented by an adjacency matrix A0, one can define

Pij ¼ qA0ij ; (9)

where q is the mean edge weight of the chain network andA0 is the binarized version of A, in which nonzero elementsof A are set to 1 and zero-valued elements remain unaltered.Such a null model can also be defined for a multilayer net-work that is represented by a rank-3 adjacency tensor A. Onecan construct a null model P with components

Pijl ¼ qlA0ijl ; (10)

where ql is the mean edge weight in layer l and A0 is thebinarized version of A. The optimization of Q using this nullmodel identifies partitions of a network whose communitieshave a larger strength than the mean. See Fig. 4(c) for anexample of this chain null model Pl for the behavioral net-work layer shown in Fig. 4(a).

In Fig. 4(d), we illustrate the effect that the choice ofoptimization null model has on the modularity values Q ofthe behavioral networks as a function of the structural resolu-tion parameter. (Throughout the manuscript, we use aLouvain-like locally greedy algorithm to maximize the mul-tilayer modularity quality function.57,58) The Newman-Girvan null model gives decreasing values of Q forc 2 ½0:1; 2:1#, whereas the chain null model produces lowervalues of Q, which behaves in a qualitatively different

FIG. 4. Modularity-optimization null models. (A) Example layer Al from a behavioral network. (B) Newman-Girvan and (C) chain null models Pl for the layershown in panel (A). (D) Optimized multilayer modularity value Q, (E) number of communities n, and (F) mean community size s for the complete multilayerbehavioral network employing the Newman-Girvan (black) and chain (red) optimization null models as a function of the structural resolution parameter c.(G) Optimized modularity value Q, (H) number of communities n, and (I) mean community size s for the multilayer behavioral network employing chain opti-mization null models as a function of the effective fraction nmlðcÞ of edges that have larger weights than their null-model counterparts. We averaged the valuesof Q, n, and s over the 3 different 12-note sequences and C¼ 100 optimizations. Box plots in (D-F) indicate quartiles and 95% confidence intervals over the 22individuals in the study. The error bars in panels (G-I) indicate a standard deviation from the mean. In some instances, this is smaller than the line width. Thetemporal resolution-parameter value is x ¼ 1.

013142-6 Bassett et al. Chaos 23, 013142 (2013)

Downloaded 18 Mar 2013 to 128.146.70.188. Redistribution subject to AIP license or copyright; see http://chaos.aip.org/about/rights_and_permissions

our data: not exactly (exclusively) the chain shape

taking advantage of the contact probability scaling?

or “communities.” Intuitively, a community consists of a setof nodes that are connected among one another more denselythan they are to nodes in other communities. A popular wayto identify community structure is to optimize a quality func-tion, which can be used to measure the relative densities ofintra-community connections versus inter-community connec-tions. See Refs. 16, 20, and 23 for recent reviews on networkcommunity structure and Refs. 24–27 for discussions of vari-ous caveats that should be considered when optimizing qualityfunctions to detect communities.

One begins with a network of N nodes and a given set ofconnections between those nodes. In the usual case ofsingle-layer networks (e.g., static networks with only onetype of edge), one represents a network using an N ! N adja-cency matrix A. The element Aij of the adjacency matrixindicates a direct connection or “edge” from node i to node j,and its value indicates the weight of that connection. Thequality of a hard partition of A into communities (wherebyeach node is assigned to exactly one community) can bequantified using a quality function. The most popular choiceis modularity16,20,21,28,29

Q0 ¼X

ij

½Aij $ cPij%dðgi; gjÞ ; (1)

where node i is assigned to community gi, node j is assignedto community gj, the Kronecker delta dðgi; gjÞ ¼ 1 if gi ¼ gj

and it equals 0 otherwise, c is a resolution parameter (whichwe will call a structural resolution parameter), and Pij is theexpected weight of the edge connecting node i to node junder a specified null model. The choice c ¼ 1 is very com-mon, but it is important to consider multiple values of c toexamine groups at multiple scales.16,30,31 Maximization ofQ0 yields a hard partition of a network into communitiessuch that the total edge weight inside of modules is as largeas possible (relative to the null model and subject to the limi-tations of the employed computational heuristics, as optimiz-ing Q0 is NP-hard16,20,32).

Recently, the null model in the quality function (1) hasbeen generalized so that one can consider sets of L adjacencymatrices, which are combined to form a rank-3 adjacencytensor A that can be used to represent time-dependent ormultiplex networks. One can thereby define a multilayermodularity (also called “multislice modularity”)3

Q ¼ 1

2l

X

ijlr

fðAijl $ clPijlÞdlr þ dijxjlrgdðgil; gjrÞ ; (2)

where the adjacency matrix of layer l has components Aijl,the element Pijl gives the components of the correspondinglayer-l matrix for the optimization null model, cl is the struc-tural resolution parameter of layer l, the quantity gil gives thecommunity assignment of node i in layer l, the quantity gjr

gives the community assignment of node j in layer r, the ele-ment xjlr gives the connection strength (i.e., an “interlayercoupling parameter,” which one can call a temporal resolu-tion parameter if one is using the adjacency tensor to repre-sent a time-dependent network) from node j in layer r tonode j in layer l, the total edge weight in the network isl ¼ 1

2

Pjr jjr, the strength (i.e., weighted degree) of node j in

layer l is jjl ¼ kjl þ cjl, the intra-layer strength of node j inlayer l is kjl ¼

Pi Aijl, and the inter-layer strength of node j

in layer l is cjl ¼P

r xjlr.Equivalent representations that use other notation can,

of course, be useful. For example, multilayer modularitycan be recast as a set of rank-2 matrices describing connec-tions between the set of all nodes across layers [e.g., forspectral partitioning29,33,34]. One can similarly generalize Qfor higher-rank tensors, which one can use when studyingcommunity structure in networks that are both time-dependent and multiplex, through appropriate specificationof inter-layer coupling tensors.

B. Network diagnostics

To characterize multilayer community structure, wecompute four example diagnostics for each hard partition:the modularity Q, the number of modules n, the mean com-munity size s (which is equal to the number of nodes in thecommunity and is proportional to 1/n), and the stationarityf.35 To compute f, we calculate the autocorrelation functionU(t, tþm) of two states of the same community G(t) at mtime steps (i.e., m network layers) apart

Uðt; tþ mÞ ) jGðtÞ \ Gðtþ mÞjjGðtÞ [ Gðtþ mÞj

; (3)

where jGðtÞ \ Gðtþ mÞj is the number of nodes that aremembers of both G(t) and G(tþm), and jGðtÞ [ Gðtþ mÞj isthe number of nodes in the union of the community at times tand tþm. Defining t0 to be the first time step in which thecommunity exists and t0 to be the last time in which it exists,the stationarity of a community is35

f )

Xt0$1

t¼t0Uðt; tþ 1Þ

t0 $ t0: (4)

This gives the mean autocorrelation over consecutive timesteps.36

In addition to these diagnostics, which are defined usingthe entire multilayer community structure, we also computetwo example diagnostics on the community structures of thecomponent layers: the mean single-layer modularity hQsi andthe variance varðQsÞ of the single-layer modularity over alllayers. The single-layer modularity Qs is defined as the staticmodularity quality function, Qs ¼

Pij½Aij $ cPij%dðgi; gjÞ,

computed for the partition g that we obtained via optimiza-tion of the multilayer modularity function Q. We have chosento use a few simple ways to help characterize the timeseries for Qs, though of course other diagnostics can also beinformative.

C. Data sets

We illustrate dynamic network null models using twoexample network ensembles: (1) 75-time-layer brain net-works drawn from each of 20 human subjects and (2) behav-ioral networks with about 150 time layers drawn from eachof 22 human subjects. Importantly, the use of network

013142-3 Bassett et al. Chaos 23, 013142 (2013)

Downloaded 18 Mar 2013 to 128.146.70.188. Redistribution subject to AIP license or copyright; see http://chaos.aip.org/about/rights_and_permissions

Pij = ⇢A0ijPij =

kikj2m

pcontact

⇠(s�3/2

(equilibrium globule)

s�1

(fractal globule)

cf)

ref) E. Lieberman-Aiden et al., Science 326, 289 (2009);L. A. Mirny, Chromosome Res. 19, 37 (2011).

other null-model terms in the modularity than Newman-Girvan?

ref) D. S. Bassett, M. A. Porter, N. F. Wymbs, S. T. Grafton, J. M. Carlson, and P. J. Mucha, Chaos 23, 013142 (2013).

Page 32: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0 0.2 0.4 0.6 0.8 1 1.2

PPV

(pre

cisi

on)

γ

chr1

Rao et al.CTCF peak (1)

Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0 0.2 0.4 0.6 0.8 1 1.2

PPV

(pre

cisi

on)

γ

chr1

Rao et al.CTCF peak (1)

Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)

of the genome inferred from Hi-C. More gen-erally, a strong correlation was observed betweenthe number of Hi-C readsmij and the 3D distancebetween locus i and locus j as measured by FISH[Spearman’s r = –0.916, P = 0.00003 (fig. S3)],suggesting that Hi-C read count may serve as aproxy for distance.

Upon close examination of the Hi-C data, wenoted that pairs of loci in compartment B showeda consistently higher interaction frequency at agiven genomic distance than pairs of loci in com-partment A (fig. S4). This suggests that compart-ment B is more densely packed (15). The FISHdata are consistent with this observation; loci incompartment B exhibited a stronger tendency forclose spatial localization.

To explore whether the two spatial compart-ments correspond to known features of the ge-nome, we compared the compartments identifiedin our 1-Mb correlation maps with known geneticand epigenetic features. Compartment A correlatesstrongly with the presence of genes (Spearman’sr = 0.431, P < 10–137), higher expression [viagenome-wide mRNA expression, Spearman’sr = 0.476, P < 10–145 (fig. S5)], and accessiblechromatin [as measured by deoxyribonuclease I(DNAseI) sensitivity, Spearman’s r = 0.651, Pnegligible] (16, 17). Compartment A also showsenrichment for both activating (H3K36 trimethyl-ation, Spearman’s r = 0.601, P < 10–296) andrepressive (H3K27 trimethylation, Spearman’sr = 0.282, P < 10–56) chromatin marks (18).

We repeated the above analysis at a resolutionof 100 kb (Fig. 3G) and saw that, although thecorrelation of compartment A with all other ge-nomic and epigenetic features remained strong(Spearman’s r > 0.4, P negligible), the correla-tion with the sole repressive mark, H3K27 trimeth-ylation, was dramatically attenuated (Spearman’sr = 0.046, P < 10–15). On the basis of these re-sults we concluded that compartment A is moreclosely associated with open, accessible, activelytranscribed chromatin.

We repeated our experiment with K562 cells,an erythroleukemia cell line with an aberrant kar-yotype (19). We again observed two compart-ments; these were similar in composition to thoseobserved in GM06990 cells [Pearson’s r = 0.732,

Fig. 4. The local packing ofchromatin is consistent with thebehavior of a fractal globule. (A)Contact probability as a functionof genomic distance averagedacross the genome (blue) showsa power law scaling between500 kb and 7 Mb (shaded re-gion) with a slope of –1.08 (fitshown in cyan). (B) Simulationresults for contact probability asa function of distance (1 mono-mer ~ 6 nucleosomes ~ 1200base pairs) (10) for equilibrium(red) and fractal (blue) globules.The slope for a fractal globule isvery nearly –1 (cyan), confirm-ing our prediction (10). The slopefor an equilibrium globule is –3/2,matching prior theoretical expec-tations. The slope for the fractalglobule closely resembles the slopewe observed in the genome. (C)(Top) An unfolded polymer chain,4000 monomers (4.8 Mb) long.Coloration corresponds to distancefrom one endpoint, ranging fromblue to cyan, green, yellow, or-ange, and red. (Middle) An equi-librium globule. The structure ishighly entangled; loci that arenearby along the contour (sim-ilar color) need not be nearby in3D. (Bottom) A fractal globule.Nearby loci along the contourtend to be nearby in 3D, leadingto monochromatic blocks bothon the surface and in cross sec-tion. The structure lacks knots.(D) Genome architecture at threescales. (Top) Two compartments,corresponding to open and closedchromatin, spatially partition thegenome. Chromosomes (blue, cyan,green) occupy distinct territories.(Middle) Individual chromosomesweave back and forth betweenthe open and closed chromatincompartments. (Bottom) At thescale of single megabases, the chromosome consists of a series of fractal globules.

A

C D

B

9 OCTOBER 2009 VOL 326 SCIENCE www.sciencemag.org292

REPORTS

on

July

7, 2

016

http

://sc

ienc

e.sc

ienc

emag

.org

/D

ownl

oade

d fr

om

of the genome inferred from Hi-C. More gen-erally, a strong correlation was observed betweenthe number of Hi-C readsmij and the 3D distancebetween locus i and locus j as measured by FISH[Spearman’s r = –0.916, P = 0.00003 (fig. S3)],suggesting that Hi-C read count may serve as aproxy for distance.

Upon close examination of the Hi-C data, wenoted that pairs of loci in compartment B showeda consistently higher interaction frequency at agiven genomic distance than pairs of loci in com-partment A (fig. S4). This suggests that compart-ment B is more densely packed (15). The FISHdata are consistent with this observation; loci incompartment B exhibited a stronger tendency forclose spatial localization.

To explore whether the two spatial compart-ments correspond to known features of the ge-nome, we compared the compartments identifiedin our 1-Mb correlation maps with known geneticand epigenetic features. Compartment A correlatesstrongly with the presence of genes (Spearman’sr = 0.431, P < 10–137), higher expression [viagenome-wide mRNA expression, Spearman’sr = 0.476, P < 10–145 (fig. S5)], and accessiblechromatin [as measured by deoxyribonuclease I(DNAseI) sensitivity, Spearman’s r = 0.651, Pnegligible] (16, 17). Compartment A also showsenrichment for both activating (H3K36 trimethyl-ation, Spearman’s r = 0.601, P < 10–296) andrepressive (H3K27 trimethylation, Spearman’sr = 0.282, P < 10–56) chromatin marks (18).

We repeated the above analysis at a resolutionof 100 kb (Fig. 3G) and saw that, although thecorrelation of compartment A with all other ge-nomic and epigenetic features remained strong(Spearman’s r > 0.4, P negligible), the correla-tion with the sole repressive mark, H3K27 trimeth-ylation, was dramatically attenuated (Spearman’sr = 0.046, P < 10–15). On the basis of these re-sults we concluded that compartment A is moreclosely associated with open, accessible, activelytranscribed chromatin.

We repeated our experiment with K562 cells,an erythroleukemia cell line with an aberrant kar-yotype (19). We again observed two compart-ments; these were similar in composition to thoseobserved in GM06990 cells [Pearson’s r = 0.732,

Fig. 4. The local packing ofchromatin is consistent with thebehavior of a fractal globule. (A)Contact probability as a functionof genomic distance averagedacross the genome (blue) showsa power law scaling between500 kb and 7 Mb (shaded re-gion) with a slope of –1.08 (fitshown in cyan). (B) Simulationresults for contact probability asa function of distance (1 mono-mer ~ 6 nucleosomes ~ 1200base pairs) (10) for equilibrium(red) and fractal (blue) globules.The slope for a fractal globule isvery nearly –1 (cyan), confirm-ing our prediction (10). The slopefor an equilibrium globule is –3/2,matching prior theoretical expec-tations. The slope for the fractalglobule closely resembles the slopewe observed in the genome. (C)(Top) An unfolded polymer chain,4000 monomers (4.8 Mb) long.Coloration corresponds to distancefrom one endpoint, ranging fromblue to cyan, green, yellow, or-ange, and red. (Middle) An equi-librium globule. The structure ishighly entangled; loci that arenearby along the contour (sim-ilar color) need not be nearby in3D. (Bottom) A fractal globule.Nearby loci along the contourtend to be nearby in 3D, leadingto monochromatic blocks bothon the surface and in cross sec-tion. The structure lacks knots.(D) Genome architecture at threescales. (Top) Two compartments,corresponding to open and closedchromatin, spatially partition thegenome. Chromosomes (blue, cyan,green) occupy distinct territories.(Middle) Individual chromosomesweave back and forth betweenthe open and closed chromatincompartments. (Bottom) At thescale of single megabases, the chromosome consists of a series of fractal globules.

A

C D

B

9 OCTOBER 2009 VOL 326 SCIENCE www.sciencemag.org292

REPORTS

on

July

7, 2

016

http

://sc

ienc

e.sc

ienc

emag

.org

/D

ownl

oade

d fr

om

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1 1.2

PPV

(pre

cisi

on)

γ

chr1

Rao et al.CTCF peak (1)

Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1 1.2

PPV

(pre

cisi

on)

γ

chr1

Rao et al.CTCF peak (1)

Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)

different null model terms , taking advantage of the contact probability scaling

pcontact

⇠(s�3/2

(equilibrium globule)

s�1

(fractal globule)

the original Newman-Girvan null model

the equilibrium globule null model

the fractal globule null model

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

10-1 100 101 102 103

PPV

(pre

cisi

on)

γ

chr1

Rao et al.CTCF peak (1)

Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)

100 kb

100 kb

100 kb

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

10-1 100 101 102 103

PPV

(pre

cisi

on)

γ

chr1

Rao et al.CTCF peak (1)

Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)

10 kb

10 kb

10 kb

the modularity

cf)

PNGij =

2mkikjPi0 6=j0 ki0kj0

=2mkikj

(2m)(2m)=

kikj2m

PEGij =

2mkikj |i� j|�3/2

Pi0 6=j0 ki0kj0 |i0 � j0|�3/2

PFGij =

2mkikj |i� j|�1

Pi0 6=j0 ki0kj0 |i0 � j0|�1

Q =1

2m

X

i 6=j

h⇣Aij � �P (⇤)

ij

⌘� (gi, gj)

i

ref) E. Lieberman-Aiden et al., Science 326, 289 (2009);L. A. Mirny, Chromosome Res. 19, 37 (2011).

Pij

Page 33: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0 0.2 0.4 0.6 0.8 1 1.2

PPV

(pre

cisi

on)

γ

chr1

Rao et al.CTCF peak (1)

Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0 0.2 0.4 0.6 0.8 1 1.2

PPV

(pre

cisi

on)

γ

chr1

Rao et al.CTCF peak (1)

Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)

of the genome inferred from Hi-C. More gen-erally, a strong correlation was observed betweenthe number of Hi-C readsmij and the 3D distancebetween locus i and locus j as measured by FISH[Spearman’s r = –0.916, P = 0.00003 (fig. S3)],suggesting that Hi-C read count may serve as aproxy for distance.

Upon close examination of the Hi-C data, wenoted that pairs of loci in compartment B showeda consistently higher interaction frequency at agiven genomic distance than pairs of loci in com-partment A (fig. S4). This suggests that compart-ment B is more densely packed (15). The FISHdata are consistent with this observation; loci incompartment B exhibited a stronger tendency forclose spatial localization.

To explore whether the two spatial compart-ments correspond to known features of the ge-nome, we compared the compartments identifiedin our 1-Mb correlation maps with known geneticand epigenetic features. Compartment A correlatesstrongly with the presence of genes (Spearman’sr = 0.431, P < 10–137), higher expression [viagenome-wide mRNA expression, Spearman’sr = 0.476, P < 10–145 (fig. S5)], and accessiblechromatin [as measured by deoxyribonuclease I(DNAseI) sensitivity, Spearman’s r = 0.651, Pnegligible] (16, 17). Compartment A also showsenrichment for both activating (H3K36 trimethyl-ation, Spearman’s r = 0.601, P < 10–296) andrepressive (H3K27 trimethylation, Spearman’sr = 0.282, P < 10–56) chromatin marks (18).

We repeated the above analysis at a resolutionof 100 kb (Fig. 3G) and saw that, although thecorrelation of compartment A with all other ge-nomic and epigenetic features remained strong(Spearman’s r > 0.4, P negligible), the correla-tion with the sole repressive mark, H3K27 trimeth-ylation, was dramatically attenuated (Spearman’sr = 0.046, P < 10–15). On the basis of these re-sults we concluded that compartment A is moreclosely associated with open, accessible, activelytranscribed chromatin.

We repeated our experiment with K562 cells,an erythroleukemia cell line with an aberrant kar-yotype (19). We again observed two compart-ments; these were similar in composition to thoseobserved in GM06990 cells [Pearson’s r = 0.732,

Fig. 4. The local packing ofchromatin is consistent with thebehavior of a fractal globule. (A)Contact probability as a functionof genomic distance averagedacross the genome (blue) showsa power law scaling between500 kb and 7 Mb (shaded re-gion) with a slope of –1.08 (fitshown in cyan). (B) Simulationresults for contact probability asa function of distance (1 mono-mer ~ 6 nucleosomes ~ 1200base pairs) (10) for equilibrium(red) and fractal (blue) globules.The slope for a fractal globule isvery nearly –1 (cyan), confirm-ing our prediction (10). The slopefor an equilibrium globule is –3/2,matching prior theoretical expec-tations. The slope for the fractalglobule closely resembles the slopewe observed in the genome. (C)(Top) An unfolded polymer chain,4000 monomers (4.8 Mb) long.Coloration corresponds to distancefrom one endpoint, ranging fromblue to cyan, green, yellow, or-ange, and red. (Middle) An equi-librium globule. The structure ishighly entangled; loci that arenearby along the contour (sim-ilar color) need not be nearby in3D. (Bottom) A fractal globule.Nearby loci along the contourtend to be nearby in 3D, leadingto monochromatic blocks bothon the surface and in cross sec-tion. The structure lacks knots.(D) Genome architecture at threescales. (Top) Two compartments,corresponding to open and closedchromatin, spatially partition thegenome. Chromosomes (blue, cyan,green) occupy distinct territories.(Middle) Individual chromosomesweave back and forth betweenthe open and closed chromatincompartments. (Bottom) At thescale of single megabases, the chromosome consists of a series of fractal globules.

A

C D

B

9 OCTOBER 2009 VOL 326 SCIENCE www.sciencemag.org292

REPORTS

on

July

7, 2

016

http

://sc

ienc

e.sc

ienc

emag

.org

/D

ownl

oade

d fr

om

of the genome inferred from Hi-C. More gen-erally, a strong correlation was observed betweenthe number of Hi-C readsmij and the 3D distancebetween locus i and locus j as measured by FISH[Spearman’s r = –0.916, P = 0.00003 (fig. S3)],suggesting that Hi-C read count may serve as aproxy for distance.

Upon close examination of the Hi-C data, wenoted that pairs of loci in compartment B showeda consistently higher interaction frequency at agiven genomic distance than pairs of loci in com-partment A (fig. S4). This suggests that compart-ment B is more densely packed (15). The FISHdata are consistent with this observation; loci incompartment B exhibited a stronger tendency forclose spatial localization.

To explore whether the two spatial compart-ments correspond to known features of the ge-nome, we compared the compartments identifiedin our 1-Mb correlation maps with known geneticand epigenetic features. Compartment A correlatesstrongly with the presence of genes (Spearman’sr = 0.431, P < 10–137), higher expression [viagenome-wide mRNA expression, Spearman’sr = 0.476, P < 10–145 (fig. S5)], and accessiblechromatin [as measured by deoxyribonuclease I(DNAseI) sensitivity, Spearman’s r = 0.651, Pnegligible] (16, 17). Compartment A also showsenrichment for both activating (H3K36 trimethyl-ation, Spearman’s r = 0.601, P < 10–296) andrepressive (H3K27 trimethylation, Spearman’sr = 0.282, P < 10–56) chromatin marks (18).

We repeated the above analysis at a resolutionof 100 kb (Fig. 3G) and saw that, although thecorrelation of compartment A with all other ge-nomic and epigenetic features remained strong(Spearman’s r > 0.4, P negligible), the correla-tion with the sole repressive mark, H3K27 trimeth-ylation, was dramatically attenuated (Spearman’sr = 0.046, P < 10–15). On the basis of these re-sults we concluded that compartment A is moreclosely associated with open, accessible, activelytranscribed chromatin.

We repeated our experiment with K562 cells,an erythroleukemia cell line with an aberrant kar-yotype (19). We again observed two compart-ments; these were similar in composition to thoseobserved in GM06990 cells [Pearson’s r = 0.732,

Fig. 4. The local packing ofchromatin is consistent with thebehavior of a fractal globule. (A)Contact probability as a functionof genomic distance averagedacross the genome (blue) showsa power law scaling between500 kb and 7 Mb (shaded re-gion) with a slope of –1.08 (fitshown in cyan). (B) Simulationresults for contact probability asa function of distance (1 mono-mer ~ 6 nucleosomes ~ 1200base pairs) (10) for equilibrium(red) and fractal (blue) globules.The slope for a fractal globule isvery nearly –1 (cyan), confirm-ing our prediction (10). The slopefor an equilibrium globule is –3/2,matching prior theoretical expec-tations. The slope for the fractalglobule closely resembles the slopewe observed in the genome. (C)(Top) An unfolded polymer chain,4000 monomers (4.8 Mb) long.Coloration corresponds to distancefrom one endpoint, ranging fromblue to cyan, green, yellow, or-ange, and red. (Middle) An equi-librium globule. The structure ishighly entangled; loci that arenearby along the contour (sim-ilar color) need not be nearby in3D. (Bottom) A fractal globule.Nearby loci along the contourtend to be nearby in 3D, leadingto monochromatic blocks bothon the surface and in cross sec-tion. The structure lacks knots.(D) Genome architecture at threescales. (Top) Two compartments,corresponding to open and closedchromatin, spatially partition thegenome. Chromosomes (blue, cyan,green) occupy distinct territories.(Middle) Individual chromosomesweave back and forth betweenthe open and closed chromatincompartments. (Bottom) At thescale of single megabases, the chromosome consists of a series of fractal globules.

A

C D

B

9 OCTOBER 2009 VOL 326 SCIENCE www.sciencemag.org292

REPORTS

on

July

7, 2

016

http

://sc

ienc

e.sc

ienc

emag

.org

/D

ownl

oade

d fr

om

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1 1.2

PPV

(pre

cisi

on)

γ

chr1

Rao et al.CTCF peak (1)

Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1 1.2

PPV

(pre

cisi

on)

γ

chr1

Rao et al.CTCF peak (1)

Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)

different null model terms , taking advantage of the contact probability scaling

pcontact

⇠(s�3/2

(equilibrium globule)

s�1

(fractal globule)

the original Newman-Girvan null model

the equilibrium globule null model

the fractal globule null model

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

10-1 100 101 102 103

PPV

(pre

cisi

on)

γ

chr1

Rao et al.CTCF peak (1)

Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)

100 kb

100 kb

100 kb

better than the original null model?

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

10-1 100 101 102 103

PPV

(pre

cisi

on)

γ

chr1

Rao et al.CTCF peak (1)

Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)

10 kb

10 kb

10 kb

→ fractal globule at the ~ Mb scale?

the modularity

cf)

overcompensate?

PNGij =

2mkikjPi0 6=j0 ki0kj0

=2mkikj

(2m)(2m)=

kikj2m

PEGij =

2mkikj |i� j|�3/2

Pi0 6=j0 ki0kj0 |i0 � j0|�3/2

PFGij =

2mkikj |i� j|�1

Pi0 6=j0 ki0kj0 |i0 � j0|�1

Q =1

2m

X

i 6=j

h⇣Aij � �P (⇤)

ij

⌘� (gi, gj)

i

ref) E. Lieberman-Aiden et al., Science 326, 289 (2009);L. A. Mirny, Chromosome Res. 19, 37 (2011).

Pij

Page 34: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

comparison with the fractal globule model

generated with the conformation dependent polymerization (CDP) fractal globule model, cf) M. V. Tamm et al., Phys. Rev. Lett. 114, 178102 (2015).

time and length scales the entanglements play a crucial roleand the scaling theory [2] predicts αent ¼ 1=4.To check the predictions of the scaling theory we held out

extensive computer simulations using the dissipative particledynamics (DPD) technique, which is known [64,65] tocorrectly reflect dynamics of dense polymer systems. Thepolymer model we use consists of renormalized monomerswith the size of order of the chromatin persistence length,corresponding DPD time step is of order 1 nsec or more (seeRef. [47] for more details). Volume interactions between themonomers are chosen to guarantee the absence of chain self-intersections, the entanglement length isNe ≈ 50" 5mono-mer units [66]. The modeled chains have N ¼ 218 ¼262 144 units confined in a cubic volume with periodicboundary conditions. In a chain that is long (N=Ne ≃ 5000)the equilibration time by far exceeds the times accessible incomputer simulation, so the choice of starting configurationsplays a significant role. Here we provide a short outline ofhow we construct and prepare the initial states, addressingthe reader to [47] for further details.The first initial state we use is a randomized Moore curve

similar to that described in Ref. [26], it has a very distinctdomain structure with flat domain walls. The second initialstate is generated by a mechanism which we call“conformation-dependent polymerization in poor solvent.”This algorithm, which, for the best of our knowledge, hasnever been suggested before, is constructing the chainconformation by consecutively adding monomer units in away that they tend strongly to stick to the already existingpart of the chain. In Ref. [47] we show that the resultingconformations show exactly the statistical characteristicsexpected from fractal globules, while a full account of thisnew algorithm will be given in Ref. [30]. In what follows,for brevity we call the globule prepared by the randomizedMoore algorithm “Moore,” and one prepared by theconformation-dependent polymerization “random fractal.”As a control sample we use a standard equilibrium globulewhich we call “Gaussian.”Prior to the diffusion measurements all three initial states

are annealed for τ ¼ 3.2 × 107 modeling steps. The stat-istical properties of the random fractal and Gaussianglobule do not change visibly during the annealing time,while the Moore globule is evolving with domain wallsroughening and its statistical characteristics (e.g., depend-ence of the spatial distance between monomers on thegenomic distance hR2ðnÞi; see [47]) approaching those forthe random fractal globule state.Snapshots of conformations annealed from different

initial states are shown in Fig. 1. In fractal states, contraryto the Gaussian one, fragments close along the chain tend toform domains of the same color. The states are furthercharacterized in Fig. 2. The fractal globule curve appearsvery similar (but for the saturation at large n due to thefinite size effects) to the universal spatial size-length curvefor unentangled rings discussed in Refs. [20,68]. R2ðnÞ forthe Moore state seems to approach the fractal globule curvewith growing modeling time suggesting the existence of a

unique metastable fractal globule state. Fractal globulesprepared by two different techniques are significantlydifferent at first, but converge with growing simulationtime, making the results obtained after annealing unsensi-tive to the details of the initial state.Monomer spatial displacement was measured for t ¼

6.5 × 107 DPD time steps after the annealing (correspond-ing to ∼0.1 sec on the real time scale), with results shownin Fig. 3. Impressively, mean-square displacement for the

FIG. 1 (color online). The snapshots of globule conformations:random fractal (top), Moore (middle), and Gaussian (bottom)globules. (a) General view of the modeling cell after initialannealing. Chains are gradiently colored from blue to red. (b)–(d)The evolution of a 1000-monomer subchain conformation:(b) initial conformation at the start of measurement, (c) after 218 ≈2.5 × 105 DPD steps, (d) after 226 ≈ 6.5 × 107 DPD steps. Thecube on the figure corresponds to the whole simulation box andhas the size 46 × 46 × 46 DPD length units.

FIG. 2 (color online). Mean-square distance hR2i betweenmonomers as a function of genomic distance n. Gaussian (green)and random fractal (red) states are stable on the modeling timescale (see Fig. 2 in Ref. [47]). Initial Moore state (black) relaxesafter annealing to the blue curve, approaching the random fractalstate. Inset shows the same plots in (hR2in−0.8, n=Ne) coordinatesused in [20].

PRL 114, 178102 (2015) P HY S I CA L R EV I EW LE T T ER Sweek ending1 MAY 2015

178102-3

2

text) and assuming that for such small displacements therole of chain connectivity is negligible, one gets the esti-mate of 1nsec per DPD time step. The whole accessibletimescale is then of order of 0.1sec.

Note that this is an estimate from below, as the sim-ulated media is, generally speaking, more viscous thanpure water, and the chain connectivity in fact does playsome role in the self-di↵usion of DPD beads even on thesmall time-scales.

II. INITIAL STATES

In our work we use three di↵erent ways to constructinitial states of globules which we describe below in de-tail. In all cases chain of 218 = 262144 monomers aregenerated in a cubic box with periodic boundary con-ditions, the size of the modeling box is 46 ⇥ 46 ⇥ 46reduced DPD units, making the average number den-sity of monomers equal to ⇢ = 3 (this value is knownto be especially good for modeling the dynamic proper-ties of polymer chains). All three initial states are con-structed on a cubic lattice with lattice constant equalto 3�1/3 ⇡ 0.69 and after the construction are allowedto anneal for 225 = 3.2 ⇥ 107 DPD time steps. Onlyafter this annealing the self-di↵usion measurements arestarted.

A. Random fractal globule

The mechanism of fractal globule formation suggestedbelow is novel and will be discussed and characterized infull detail in [8]. Here we provide a brief overview of theidea necessary for the reader to understand the main textand convince himself that the initial state we are dealingwith indeed has all the properties of a fractal globule.

The idea of this mechanism to design a fractal globulestate, which we propose here for the first time, is basedon the following considerations. Imagine a polymer chainbeing synthesized while being in a poor solvent, in a waythat all the already synthesized part is forming a tightglobule. Assume also the synthesis to be very fast as com-pared to the internal movements of monomers within aglobule. In that way one expects that at all intermediatestages the already formed part of the globule is in a com-pact state. Also one expect that formation of knots andentanglements will be highly suppressed since the newmonomers are mostly to the surface of the existing glob-ule, and cannot go through it as there are no holes leftin the structure. Clearly, the conformation thus formedis very reminiscent of a fractal globule.

To exploit this idea we proceed as follows. We con-struct the polymer conformation as a trajectory of a lat-tice random walk in a potential strongly attracting the

PP

P

P

PP

1

4

53

6

2

A B

PP

P

P

PP

1

4

53

62

Figure 1: The conformation-dependent random walk. (A) onthe next step the probabilities of choosing steps 3 and 5 arelarge compared to steps 1 and 4 (P4 = P1;P3 = (1+2A)P1 =20001P1, P5 = (1 + A)P1 = 10001P1, probabilities of steps2 and 6 are proportional to " and are essentially zero; (B)trapped configuration: weights of all possible steps equal "and are equiprobable.

walker to the places it has already visited. At each stepa walker on a cubic lattice has 6 neighboring cites (seeFigure 1) where he can possibly move. We postulate theprobability to go at each of the possible target cites todepend on whether it was already visited, and on howmany visited cites it has as its neighbors. In particular,we use the following assumptions:

Pi = N�1

8>>>>>>><

>>>>>>>:

"

if the target cite is visited,

1 +A

# of visited neighbors

the target cite have

!

if the target cite is not visited,

N =P

i=1..6 Pi

(6)

Here, " should be extremely small so that double visitingof the same cites should be possible only if the walk getslocked (we use " = 10�9), while A is a constant definingthe strength of attraction to the existing trajectory, andshould therefore be large to keep all the intermediateconformations compact. By trial and error we have foundA = 10, 000 to work best.

A trajectory constructed in this way includes a finitefraction (of order of several percent) self-intersections.However, he resulting states happens to be almost un-knotted:

The segment of 104 monomers is reduced to a knot ofless than 102 monomers. For comparison, a segment ofan equilibrium globule of 104 monomers is reduced to aknot of 2 · 103 points.

� = 0.2

� = 0.4

� = 0.6

applying the Louvain algorithm with PNGij =

kikj2m

Page 35: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

comparison with the fractal globule model

generated with the conformation dependent polymerization (CDP) fractal globule model, cf) M. V. Tamm et al., Phys. Rev. Lett. 114, 178102 (2015).

time and length scales the entanglements play a crucial roleand the scaling theory [2] predicts αent ¼ 1=4.To check the predictions of the scaling theory we held out

extensive computer simulations using the dissipative particledynamics (DPD) technique, which is known [64,65] tocorrectly reflect dynamics of dense polymer systems. Thepolymer model we use consists of renormalized monomerswith the size of order of the chromatin persistence length,corresponding DPD time step is of order 1 nsec or more (seeRef. [47] for more details). Volume interactions between themonomers are chosen to guarantee the absence of chain self-intersections, the entanglement length isNe ≈ 50" 5mono-mer units [66]. The modeled chains have N ¼ 218 ¼262 144 units confined in a cubic volume with periodicboundary conditions. In a chain that is long (N=Ne ≃ 5000)the equilibration time by far exceeds the times accessible incomputer simulation, so the choice of starting configurationsplays a significant role. Here we provide a short outline ofhow we construct and prepare the initial states, addressingthe reader to [47] for further details.The first initial state we use is a randomized Moore curve

similar to that described in Ref. [26], it has a very distinctdomain structure with flat domain walls. The second initialstate is generated by a mechanism which we call“conformation-dependent polymerization in poor solvent.”This algorithm, which, for the best of our knowledge, hasnever been suggested before, is constructing the chainconformation by consecutively adding monomer units in away that they tend strongly to stick to the already existingpart of the chain. In Ref. [47] we show that the resultingconformations show exactly the statistical characteristicsexpected from fractal globules, while a full account of thisnew algorithm will be given in Ref. [30]. In what follows,for brevity we call the globule prepared by the randomizedMoore algorithm “Moore,” and one prepared by theconformation-dependent polymerization “random fractal.”As a control sample we use a standard equilibrium globulewhich we call “Gaussian.”Prior to the diffusion measurements all three initial states

are annealed for τ ¼ 3.2 × 107 modeling steps. The stat-istical properties of the random fractal and Gaussianglobule do not change visibly during the annealing time,while the Moore globule is evolving with domain wallsroughening and its statistical characteristics (e.g., depend-ence of the spatial distance between monomers on thegenomic distance hR2ðnÞi; see [47]) approaching those forthe random fractal globule state.Snapshots of conformations annealed from different

initial states are shown in Fig. 1. In fractal states, contraryto the Gaussian one, fragments close along the chain tend toform domains of the same color. The states are furthercharacterized in Fig. 2. The fractal globule curve appearsvery similar (but for the saturation at large n due to thefinite size effects) to the universal spatial size-length curvefor unentangled rings discussed in Refs. [20,68]. R2ðnÞ forthe Moore state seems to approach the fractal globule curvewith growing modeling time suggesting the existence of a

unique metastable fractal globule state. Fractal globulesprepared by two different techniques are significantlydifferent at first, but converge with growing simulationtime, making the results obtained after annealing unsensi-tive to the details of the initial state.Monomer spatial displacement was measured for t ¼

6.5 × 107 DPD time steps after the annealing (correspond-ing to ∼0.1 sec on the real time scale), with results shownin Fig. 3. Impressively, mean-square displacement for the

FIG. 1 (color online). The snapshots of globule conformations:random fractal (top), Moore (middle), and Gaussian (bottom)globules. (a) General view of the modeling cell after initialannealing. Chains are gradiently colored from blue to red. (b)–(d)The evolution of a 1000-monomer subchain conformation:(b) initial conformation at the start of measurement, (c) after 218 ≈2.5 × 105 DPD steps, (d) after 226 ≈ 6.5 × 107 DPD steps. Thecube on the figure corresponds to the whole simulation box andhas the size 46 × 46 × 46 DPD length units.

FIG. 2 (color online). Mean-square distance hR2i betweenmonomers as a function of genomic distance n. Gaussian (green)and random fractal (red) states are stable on the modeling timescale (see Fig. 2 in Ref. [47]). Initial Moore state (black) relaxesafter annealing to the blue curve, approaching the random fractalstate. Inset shows the same plots in (hR2in−0.8, n=Ne) coordinatesused in [20].

PRL 114, 178102 (2015) P HY S I CA L R EV I EW LE T T ER Sweek ending1 MAY 2015

178102-3

2

text) and assuming that for such small displacements therole of chain connectivity is negligible, one gets the esti-mate of 1nsec per DPD time step. The whole accessibletimescale is then of order of 0.1sec.

Note that this is an estimate from below, as the sim-ulated media is, generally speaking, more viscous thanpure water, and the chain connectivity in fact does playsome role in the self-di↵usion of DPD beads even on thesmall time-scales.

II. INITIAL STATES

In our work we use three di↵erent ways to constructinitial states of globules which we describe below in de-tail. In all cases chain of 218 = 262144 monomers aregenerated in a cubic box with periodic boundary con-ditions, the size of the modeling box is 46 ⇥ 46 ⇥ 46reduced DPD units, making the average number den-sity of monomers equal to ⇢ = 3 (this value is knownto be especially good for modeling the dynamic proper-ties of polymer chains). All three initial states are con-structed on a cubic lattice with lattice constant equalto 3�1/3 ⇡ 0.69 and after the construction are allowedto anneal for 225 = 3.2 ⇥ 107 DPD time steps. Onlyafter this annealing the self-di↵usion measurements arestarted.

A. Random fractal globule

The mechanism of fractal globule formation suggestedbelow is novel and will be discussed and characterized infull detail in [8]. Here we provide a brief overview of theidea necessary for the reader to understand the main textand convince himself that the initial state we are dealingwith indeed has all the properties of a fractal globule.

The idea of this mechanism to design a fractal globulestate, which we propose here for the first time, is basedon the following considerations. Imagine a polymer chainbeing synthesized while being in a poor solvent, in a waythat all the already synthesized part is forming a tightglobule. Assume also the synthesis to be very fast as com-pared to the internal movements of monomers within aglobule. In that way one expects that at all intermediatestages the already formed part of the globule is in a com-pact state. Also one expect that formation of knots andentanglements will be highly suppressed since the newmonomers are mostly to the surface of the existing glob-ule, and cannot go through it as there are no holes leftin the structure. Clearly, the conformation thus formedis very reminiscent of a fractal globule.

To exploit this idea we proceed as follows. We con-struct the polymer conformation as a trajectory of a lat-tice random walk in a potential strongly attracting the

PP

P

P

PP

1

4

53

6

2

A B

PP

P

P

PP

1

4

53

62

Figure 1: The conformation-dependent random walk. (A) onthe next step the probabilities of choosing steps 3 and 5 arelarge compared to steps 1 and 4 (P4 = P1;P3 = (1+2A)P1 =20001P1, P5 = (1 + A)P1 = 10001P1, probabilities of steps2 and 6 are proportional to " and are essentially zero; (B)trapped configuration: weights of all possible steps equal "and are equiprobable.

walker to the places it has already visited. At each stepa walker on a cubic lattice has 6 neighboring cites (seeFigure 1) where he can possibly move. We postulate theprobability to go at each of the possible target cites todepend on whether it was already visited, and on howmany visited cites it has as its neighbors. In particular,we use the following assumptions:

Pi = N�1

8>>>>>>><

>>>>>>>:

"

if the target cite is visited,

1 +A

# of visited neighbors

the target cite have

!

if the target cite is not visited,

N =P

i=1..6 Pi

(6)

Here, " should be extremely small so that double visitingof the same cites should be possible only if the walk getslocked (we use " = 10�9), while A is a constant definingthe strength of attraction to the existing trajectory, andshould therefore be large to keep all the intermediateconformations compact. By trial and error we have foundA = 10, 000 to work best.

A trajectory constructed in this way includes a finitefraction (of order of several percent) self-intersections.However, he resulting states happens to be almost un-knotted:

The segment of 104 monomers is reduced to a knot ofless than 102 monomers. For comparison, a segment ofan equilibrium globule of 104 monomers is reduced to aknot of 2 · 103 points.

� = 0.2

� = 0.4

� = 0.6

� = 0.6

� = 1

� = 1.4

the normalized Hi-C map

applying the Louvain algorithm with PNGij =

kikj2m

Page 36: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

. . .

applying the Louvain algorithm with PNGij =

kikj2m

Page 37: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

comparison with the fractal globule model

3 DE3E DE D 1 (D

N=8192,Radiusofvolumeexclusion=0.2Kuhnlength

3 DE3E DE D 1 (D

N=8192,Radiusofvolumeexclusion=0.2Kuhnlength,Noboundary

IdentifiedTADs=3975(raw),4451(normalized),g = 0.2

Radiusofgyration&End-to-enddistance/radiusofgyration

Powerlawofend-to-enddistance/radiusofgyration

1)Reference:easytoestimate(fromknownend-to-enddistancescalingandradiusofgyrationscaling)

2)TAD:foridealcase,itmaybecalculable

It’sbettertoextractresultsfromlongerglobules(longerthan~30000monomers),togetmorereliable

scalings atgivensubchain lengthdomain(101~103.5)

0slope

=structurewellpreserved?

generated with the conformation dependent polymerization (CDP) fractal globule model cf) M. V. Tamm et al., Phys. Rev. Lett. 114, 178102 (2015).

� = 0.2, averaged over 200 samples

applying the Louvain algorithm with PNGij =

kikj2m

detecting loops?

Chr 4

C

BA

E

= 13

= 30

Transitive

Intransitive

22.55 Mb20.55

Chr 4

20.55

22.55

5’-GAGCAATTCCGCCCCCTGGTGGCAGATCTG-3’

5’-GGCGGAGACCACAAGGTGGCGC CAGATCCC-3’

17.4

17.6

1 kb resolution

CTCFRAD21SMC3

Chr 1

Chr 1

17.6 Mb17.4

0 0.5 1 1.5 2 2.5 3 3.5 4Number of PeaksD

Reverse motif

Forward motif

Fold

Chan

ge

0

0.5

1.0

1.5

2.0

2.5

0% 20% 40% 60% 80% 100%Percentage of peak loci bound

YY1

ZNF143

CTCFRAD21SMC3

0 1 2-1-2Corner score

0

1

2

3

4

5x 100

RandomPeaks

Numb

er of

Pea

ks

(2%)(3%)(3%)(92%)

CTGCCACCTNGTGGconsensus

CCACNAGGTGGCAGconsensus

x 1000

CTCF anchor (arrowhead indicates motif orientation)

Loop domain

Ordinary domain

290 Kb110Kb

190 Kb

350 Kb

270 Kb

130 Kb

450 Kb

170Kb

F

Figure 6. Many Loops Demarcate Contact Domains; The Vast Majority of Loops Are Anchored at a Pair of Convergent CTCF/RAD21/SMC3Binding Sites(A) Histograms of corner scores for peak pixels versus random pixels with an identical distance distribution.

(B) Contact matrix for chr4:20.55 Mb–22.55 Mb in GM12878, showing examples of transitive and intransitive looping behavior.

(C) Percent of peak loci bound versus fold enrichment for 76 DNA-binding proteins.

(D) The pairs of CTCF motifs that anchor a loop are nearly all found in the convergent orientation.

(legend continued on next page)

1674 Cell 159, 1665–1680, December 18, 2014 ª2014 Elsevier Inc.

from S. S. P. Rao et al., Cell 159, 1665 (2014)

Page 38: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

summary and outlook • While adjusting the resolution parameter, finding the network communities as TADs + comparing with biological factors (“metadata”): CTCF, histone modification, etc.

• Incorporating more tailor-made null model terms for TAD detection• Trying the model Hi-C map generated from the model based on the fractal globule structure

• detecting loops by measuring the distance between the starting and ending points of TADs

• What can we learn from these various scales of TADs?

in collaboration with

Jae-Hyung Jeon(POSTECH)

Xavier Durang(KIAS)

Sungmin Lee(SKKU)

Ludvig Lizana(Umeå Univ.)

Per Stenberg(Umeå Univ.)

Markus Nyberg(Umeå Univ.)

Rajendra Kumar(Umeå Univ.)

sponsored by the NRF-STINT !Korea-Sweden" Research Cooperation

Yeonghoon Kim(POSTECH)

Page 39: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

for details and unlimited discussion . . .come visit me @ the poster presentation (P1057) of the NetSci main conference, 6pm–8pm, tomorrow (. . . and anytime later)

Page 40: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

Synthetic yeast genome’s chromosome structure: special issue of Science, March 10, 2017.

BUILDING ON NATURE’S DESIGN

In 1996, a breakthrough was achieved when the sequence of

~12 million base pairs, divided among 16 chromosomes, was

reported for baker’s yeast (Saccharomyces cerevisiae). Now, some

20 years later, the Synthetic Yeast Genome Project (Sc2.0) reports

on fi ve newly constructed synthetic yeast chromosomes, advanc-

ing e� orts to substantially reengineer all 16 yeast chromosomes

with the goal of creating a fully synthetic eukaryotic genome.

Genomes are in constant fl ux: They are prone to deletions,

duplications, and insertions; recombination and rearrangement;

and invasion and disruption by selfi sh genetic elements such as trans-

posable elements. These many changes are subject to the vagaries of

natural selection, resulting in a genome organization not based on

principles of e� ciency or economy of space, but instead contingent

on the evolutionary history of the organism.

Sc2.0 has set out to untangle, streamline, and reorganize the genetic

blueprint of one of the most studied of all eukaryotic genomes. Here

they report on their development, design, construction, testing, and

curation principles, which may be scalable to other, larger genomes.

Ultimately, researchers aspire to remove all transposons and repetitive

elements, recode UAG stop codons, and move transfer RNA genes to

a novel neochromsome without causing fi tness defects, while simul-

taneously adding features to facilitate chromosome construction and

manipulation. When complete, the fi nal synthetic yeast strain will

be another milestone in our ability to work with and understand the

eukaryotic genome.

By Laura M. Zahn and Guy Riddihough*

*Now at Life Science Editors. Email: [email protected]

SCIENCE sciencemag.org 10 MARCH 2017 • VOL 355 ISSUE 6329 1039

DA_0310SpecialIntropage.indd 1039 3/8/17 11:13 AM

Published by AAAS

on

Mar

ch 9

, 201

7ht

tp://

scie

nce.

scie

ncem

ag.o

rg/

Dow

nloa

ded

from

INSIGHTS | PERSPECTIVES

1024 10 MARCH 2017 • VOL 355 ISSUE 6329 sciencemag.org SCIENCE

By Krishna Kannan

1

and

Daniel G. Gibson

1,2

A core theme in synthetic biology, “un-

derstanding by creating,” inspired the

effort to generate the first synthetic

cell, JCVI-Syn1.0 (1). The project

Sc2.0 is elevating this concept by at-

tempting to create a synthetic version

of a more evolved organism, Saccharomyces

cerevisiae, a eukaryotic single-celled yeast.

In a set of papers in this issue (2–8), sci-

entists of the Sc2.0 project who previously

constructed a single yeast chromosome (9)

now report constructing five additional

yeast chromosomes (more than one-third of

the entire genome) (see the photo). Using a

variety of phenotypic assays and structural

and functional genomics techniques, the re-

searchers observed that the synthetic chro-

mosomes drive biological processes just like

the natural, native chromosomes.

The quintessential first step toward creat-

ing a synthetic organism is the careful design

of the genomic material, which ultimately

controls every physiological process in the

cell. Project Sc2.0 built a software framework,

BioStudio, to generate chromosomal designs

(2). A set of rules were applied while design-

ing each chromosome, including removal of

repetitive regions and introns (except for the

HAC1 intron), recoding of TAG stop codon to

TAA (allowing TAG to be repurposed), and

the relocation of transfer RNA genes into a

neochromosome. In addition, sites (loxPsym)

were introduced throughout the chromo-

some at the 3ʹ ends of nonessential genes

for chemically-inducible genome rearrange-

ments (through Cre-recombinase). This al-

lowed the selection of desired phenotypes

and the examination of corresponding geno-

types (synthetic chromosome rearrangement

and modification by loxP-mediated evolution,

or SCRaMbLE). Despite the many variations

(thousands) introduced during the construc-

for large-scale energy applications. The

nonperiodic layered nanophotonic struc-

ture showed good performance (6), but its

required nanometer precision control of the

thin films is still a challenge for scaling up

to the size of meters, which is needed for

even a small (kilowatt-scale) cooling system.

The unprecedented properties of a meta-

material such as negative refraction and

superlensing originates from its internal

structures instead of its chemical constitu-

ents (7). Because its structural unit cell is of-

ten smaller than the wavelength of interest,

practical implementations of optical meta-

materials have always been challenging. Zhai

et al. devised a glass-polymer metamaterial

in which a set of glass microspheres were

randomly and uniformly dispersed in a vis-

ibly transparent polymer matrix. Because of

the surface phonon-polariton Mie resonance

excited at room temperature on the glass

surface, this amorphous metamaterial has

a maximal broadband emissivity—near the

blackbody limit across the entire atmospheric

window—that results in cooling of the mate-

rial itself (8). Both the polymer and glass are

transparent to the full solar spectrum, so the

hybrid metamaterial minimally absorbs and

reflects most solar energy when backed with

a thin silver mirror (see the figure).

Zhai et al. demonstrated an average ra-

diative cooling flux greater than 110 W m–2

in a continuous 3-day field test. This en-

ergy flux is at a rate similar to that of pho-

tovoltaic solar cell energy conversion but

with the great advantage of running both

day and night. More impressively, the key

roadblock for large area deployment of ra-

diative cooling was removed. Because the

material is amorphous and flexible, the

authors developed a glass-polymer hybrid

manufacturing technique to produce the

microstructured metamaterial, which can

be made as films several meters in length

in a continuous roll-to-roll manner. Using

such a scalable metamaterial, they demon-

strated passive water cooling by nearly 10

Celsius degrees below ambient temperature

without use of electricity.

There are still challenges yet to be ad-

dressed for the implementation of radiative

cooling metamaterials into applications.

Given that the cooling occurs on both sides

of metamaterials, detailed thermal design

will be important to maximize the cooling

rate for the substrate side, and effective

heat exchange strategies therefore must

be developed. In addition, the IR radiation

transport inside metamaterials caused by

volumetric multiple scattering among the

random Mie resonating glass spheres should

be carefully studied so as to further maxi-

mize the total emissive power. Other issues

should also be carefully investigated, such

as how weather conditions negate cooling

performances and how the polymer-based

metamaterial maintains its performance

during long-term outdoor exposure.

Although extraction of the 110 W m–2 heat

flux is a relatively low cooling rate, these

designed metamaterials should find prom-

ising application for cooling large systems

such as buildings in warm climates (9). Pres-

ently, air conditioning uses ~6% of all of the

electricity produced in the United States,

and as a result, more than 100 million met-

ric tons of carbon dioxide are released into

the atmosphere each year. The impact of

such a passive radiative cooling without

use of electricity for building applications

alone can be immense. The broad use of ra-

diative cooling technology not only leads to

energy savings but also reduces fluorinated

greenhouse gases from refrigerants used in

conventional air conditioners, thus improv-

ing air quality. At higher temperatures T,

passive radiative cooling can be drastically

enhanced because the outgoing radiative

flux is proportional to T 4 according to the

Stefan-Boltzmann law. This scalably manu-

factured metamaterial may enable transfor-

mative cooling farms for power plants and

data centers, which consume unsustainable

amounts of water and electricity.

Although radiative cooling is promis-

ing, the better use of this waste energy can

be more desirable. For example, the waste

heat could be converted into electricity by

using thermoelectric devices. Nevertheless,

the passive radiative cooling demonstrated

here unleashes the immense potential of

using the cold universe as a new avenue of

keeping us cool on Earth. j

REFERENCES

1. Y. Zhai et al., Science 355, 1062 (2017). 2. F. D. Stacey, P. M. Davis, Physics of the Earth (Wiley, 1977). 3. R. Hillenbrand, T. Taubner, F. Keilmann, Nature 418, 159

(2002). 4. X. Lu et al., Renew. Sustain. Energy Rev. 65, 1079 (2016). 5. E. Rephaeli, A. Raman, S. Fan, Nano Lett. 13, 1457 (2013). 6. A. Raman et al., Nature 515, 540 (2014). 7. Y. Liu, X. Zhang, Chem. Soc. Rev. 40, 2494 (2011). 8. J. A. Schuller, R. Zia, T. Taubner, M. L. Brongersma, Phys.

Rev. Lett. 99, 107401 (2007). 9. N. Fernandez, W. Wang, K. Alvine, S. Katipamula, Pacific

Northwest National Laboratory Report no. PNNL-24904, Richland, WA (2015).

10.1126/science.aam8566

SYNTHETIC BIOLOGY

Yeast genome,

by design

Scientists are inching closer

to generating a

synthetic eukaryotic cell

1Synthetic Genomics, Inc., 11149 North Torrey PinesRoad, La Jolla, CA 92037, USA. 2J. Craig Venter Institute, 4120 Capricorn Lane, La Jolla, CA 92037, USA. Email: [email protected]

“The impact of such a

passive radiative cooling

without use of electricity for

building applications alone

can be immense.”

DA_0310Perspectives.indd 1024 3/8/17 11:09 AM

Published by AAAS

on

Mar

ch 9

, 201

7ht

tp://

scie

nce.

scie

ncem

ag.o

rg/

Dow

nloa

ded

from

Page 41: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

Synthetic yeast genome’s chromosome structure: special issue of Science, March 10, 2017.INSIGHTS | PERSPECTIVES

1024 10 MARCH 2017 • VOL 355 ISSUE 6329 sciencemag.org SCIENCE

By Krishna Kannan

1

and

Daniel G. Gibson

1,2

A core theme in synthetic biology, “un-

derstanding by creating,” inspired the

effort to generate the first synthetic

cell, JCVI-Syn1.0 (1). The project

Sc2.0 is elevating this concept by at-

tempting to create a synthetic version

of a more evolved organism, Saccharomyces

cerevisiae, a eukaryotic single-celled yeast.

In a set of papers in this issue (2–8), sci-

entists of the Sc2.0 project who previously

constructed a single yeast chromosome (9)

now report constructing five additional

yeast chromosomes (more than one-third of

the entire genome) (see the photo). Using a

variety of phenotypic assays and structural

and functional genomics techniques, the re-

searchers observed that the synthetic chro-

mosomes drive biological processes just like

the natural, native chromosomes.

The quintessential first step toward creat-

ing a synthetic organism is the careful design

of the genomic material, which ultimately

controls every physiological process in the

cell. Project Sc2.0 built a software framework,

BioStudio, to generate chromosomal designs

(2). A set of rules were applied while design-

ing each chromosome, including removal of

repetitive regions and introns (except for the

HAC1 intron), recoding of TAG stop codon to

TAA (allowing TAG to be repurposed), and

the relocation of transfer RNA genes into a

neochromosome. In addition, sites (loxPsym)

were introduced throughout the chromo-

some at the 3ʹ ends of nonessential genes

for chemically-inducible genome rearrange-

ments (through Cre-recombinase). This al-

lowed the selection of desired phenotypes

and the examination of corresponding geno-

types (synthetic chromosome rearrangement

and modification by loxP-mediated evolution,

or SCRaMbLE). Despite the many variations

(thousands) introduced during the construc-

for large-scale energy applications. The

nonperiodic layered nanophotonic struc-

ture showed good performance (6), but its

required nanometer precision control of the

thin films is still a challenge for scaling up

to the size of meters, which is needed for

even a small (kilowatt-scale) cooling system.

The unprecedented properties of a meta-

material such as negative refraction and

superlensing originates from its internal

structures instead of its chemical constitu-

ents (7). Because its structural unit cell is of-

ten smaller than the wavelength of interest,

practical implementations of optical meta-

materials have always been challenging. Zhai

et al. devised a glass-polymer metamaterial

in which a set of glass microspheres were

randomly and uniformly dispersed in a vis-

ibly transparent polymer matrix. Because of

the surface phonon-polariton Mie resonance

excited at room temperature on the glass

surface, this amorphous metamaterial has

a maximal broadband emissivity—near the

blackbody limit across the entire atmospheric

window—that results in cooling of the mate-

rial itself (8). Both the polymer and glass are

transparent to the full solar spectrum, so the

hybrid metamaterial minimally absorbs and

reflects most solar energy when backed with

a thin silver mirror (see the figure).

Zhai et al. demonstrated an average ra-

diative cooling flux greater than 110 W m–2

in a continuous 3-day field test. This en-

ergy flux is at a rate similar to that of pho-

tovoltaic solar cell energy conversion but

with the great advantage of running both

day and night. More impressively, the key

roadblock for large area deployment of ra-

diative cooling was removed. Because the

material is amorphous and flexible, the

authors developed a glass-polymer hybrid

manufacturing technique to produce the

microstructured metamaterial, which can

be made as films several meters in length

in a continuous roll-to-roll manner. Using

such a scalable metamaterial, they demon-

strated passive water cooling by nearly 10

Celsius degrees below ambient temperature

without use of electricity.

There are still challenges yet to be ad-

dressed for the implementation of radiative

cooling metamaterials into applications.

Given that the cooling occurs on both sides

of metamaterials, detailed thermal design

will be important to maximize the cooling

rate for the substrate side, and effective

heat exchange strategies therefore must

be developed. In addition, the IR radiation

transport inside metamaterials caused by

volumetric multiple scattering among the

random Mie resonating glass spheres should

be carefully studied so as to further maxi-

mize the total emissive power. Other issues

should also be carefully investigated, such

as how weather conditions negate cooling

performances and how the polymer-based

metamaterial maintains its performance

during long-term outdoor exposure.

Although extraction of the 110 W m–2 heat

flux is a relatively low cooling rate, these

designed metamaterials should find prom-

ising application for cooling large systems

such as buildings in warm climates (9). Pres-

ently, air conditioning uses ~6% of all of the

electricity produced in the United States,

and as a result, more than 100 million met-

ric tons of carbon dioxide are released into

the atmosphere each year. The impact of

such a passive radiative cooling without

use of electricity for building applications

alone can be immense. The broad use of ra-

diative cooling technology not only leads to

energy savings but also reduces fluorinated

greenhouse gases from refrigerants used in

conventional air conditioners, thus improv-

ing air quality. At higher temperatures T,

passive radiative cooling can be drastically

enhanced because the outgoing radiative

flux is proportional to T 4 according to the

Stefan-Boltzmann law. This scalably manu-

factured metamaterial may enable transfor-

mative cooling farms for power plants and

data centers, which consume unsustainable

amounts of water and electricity.

Although radiative cooling is promis-

ing, the better use of this waste energy can

be more desirable. For example, the waste

heat could be converted into electricity by

using thermoelectric devices. Nevertheless,

the passive radiative cooling demonstrated

here unleashes the immense potential of

using the cold universe as a new avenue of

keeping us cool on Earth. j

REFERENCES

1. Y. Zhai et al., Science 355, 1062 (2017). 2. F. D. Stacey, P. M. Davis, Physics of the Earth (Wiley, 1977). 3. R. Hillenbrand, T. Taubner, F. Keilmann, Nature 418, 159

(2002). 4. X. Lu et al., Renew. Sustain. Energy Rev. 65, 1079 (2016). 5. E. Rephaeli, A. Raman, S. Fan, Nano Lett. 13, 1457 (2013). 6. A. Raman et al., Nature 515, 540 (2014). 7. Y. Liu, X. Zhang, Chem. Soc. Rev. 40, 2494 (2011). 8. J. A. Schuller, R. Zia, T. Taubner, M. L. Brongersma, Phys.

Rev. Lett. 99, 107401 (2007). 9. N. Fernandez, W. Wang, K. Alvine, S. Katipamula, Pacific

Northwest National Laboratory Report no. PNNL-24904, Richland, WA (2015).

10.1126/science.aam8566

SYNTHETIC BIOLOGY

Yeast genome,

by design

Scientists are inching closer

to generating a

synthetic eukaryotic cell

1Synthetic Genomics, Inc., 11149 North Torrey PinesRoad, La Jolla, CA 92037, USA. 2J. Craig Venter Institute, 4120 Capricorn Lane, La Jolla, CA 92037, USA. Email: [email protected]

“The impact of such a

passive radiative cooling

without use of electricity for

building applications alone

can be immense.”

DA_0310Perspectives.indd 1024 3/8/17 11:09 AM

Published by AAAS

on

Mar

ch 9

, 201

7ht

tp://

scie

nce.

scie

ncem

ag.o

rg/

Dow

nloa

ded

from

10 MARCH 2017 • VOL 355 ISSUE 6329 1025SCIENCE sciencemag.org

PH

OT

O:

ST

EV

E G

SC

HM

EIS

SN

ER

/S

CIE

NC

E S

OU

RC

E

tion of synthetic chromosomes, these muta-

tions could still be considered “not drastic,”

that is, without radical changes to genome

size or its structural and functional organi-

zation. This conservative design could be,

in part, key to the success of functionalizing

each synthetic yeast chromosome created to

date.

The design rules were implemented in a

stepwise, hierarchical assembly of the syn-

thetic chromosomes, as previously described

(9, 10), starting with chunks built from oli-

gonucleotides (750 base pairs), which were

assembled into 2- to 3-kb minichunks and,

subsequently, megachunks of 10-kb or 30- to

60-kb DNA molecules in vitro. Each mega-

chunk (with the exception of terminal mega-

chunks) carried an auxotrophic selectable

marker at the 3ʹ end that was used to directly

swap out the wild-type chromosomal DNA.

This marker was recycled

during the swapping of the

next synthetic megachunk

with the native chromosome

(switching auxotrophies pro-

gressively by integration, or

SwAP-In). Depending on the

length of the chromosome,

as many as 33 serial SwAP-In

experiments were conducted

to generate cells carrying a

completely synthetic chro-

mosome alongside 15 other

native chromosomes. Using

this procedure, project Sc2.0

has generated six complete

yeast chromosomes (synII,

synIII, synV, synVI, synX,

synXII) and one half chro-

mosome (synIXR) (3, 10).

In what could be an impor-

tant stride toward generat-

ing a completely synthetic

yeast, Sc2.0 has initiated the

process of combining all the synthetic chro-

mosomes into a single strain by using an

endoduplication backcross process (3). Cells

carrying two and three completely synthetic

chromosomes have been generated with no

differential phenotype or genome architec-

ture compared to the wild-type cells (3, 4).

The design of the synthetic chromosomes

was not perfect, but close. Few deliberate

recoding events with synonymous codons

(PCRTags) were included in the synthetic

DNA to track the replacement of the native

chromosome with synthetic parts. Although

most of these “watermarks” were benign,

some altered expression of genes, which

modified messenger RNA (mRNA) secondary

structure and resulted in a conspicuous phe-

notype (3). Other watermarks altered gene

expression by either creating a putative site

for transcription factor binding (5) or by di-

rectly affecting mRNA translation efficiency

potentially due to discrepant decoding effi-

ciency (6).

In some cases, the introduction of loxPsym

sites reduced the expression of essential

genes, thus creating a detrimental pheno-

type. Sequencing, meiotic recombination,

pooled PCRTag mapping, and electrophoretic

karyotyping were used to identify “bugs”

(2–8). Simultaneous elimination of multiple

bugs was carried out by using clustered regu-

larly interspaced short palindromic repeats

(CRISPR)–Cas9 or by using at least a single

“selectable” insertion event (8).

The stepwise replacement of a native chro-

mosome with the SwAP-In method allows for

detecting adverse phenotypes of the design

during several stages of the assembly of the

synthetic chromosome. However, it also cre-

ates many opportunities for massive duplica-

tion and rearrangement events, most likely

detected only after the construction of the

entire synthetic chromosome (5, 8). Given the

recent advancements in complete de novo

chromosome synthesis (11, 12) and transplan-

tation technologies (1), in the near future, one

can envision the concomitant removal and

replacement of native chromosomes with

entire chromosomes that are designed and

chemically-synthesized from the bottom-up.

Indeed, some of these chromosome synthesis

and assembly technologies were used by the

Sc2.0 scientists to generate “minichunks” and

“megachunks.”

Recombination between the native and

the synthetic chromosomes could be avoided

by introducing sufficient differences (recod-

ing the genes, for example) in the sequence.

Knowledge from techniques like in vivo selec-

tive 2′-hydroxyl acylation and profiling (13)

could avoid disrupting critical mRNA struc-

tures during recoding. Complete synthesis

and transformation of chromosomes could

potentially be accomplished in a fraction of

time compared to the SwAP-In technology.

The scope of the design process could

be expanded in future studies to elucidate

genomic principles that underpin eukary-

otic life. Recently, a bacterial cell was de-

signed and synthesized with a minimal set

of genes (473) to answer a basic question

in biology about the smallest genomic con-

tent that could support life (14). The studies

reported in this issue open the door to ad-

dress a similar question pertaining to yeast.

Chromosomal design could also be extended

to functionally modularize the genome (or-

ganizing genes on the chromosome based

on their function). This could illuminate

complex and conserved regulatory mecha-

nisms that might eventually

apply to higher eukaryotes,

including humans. Notably,

efforts toward understand-

ing genome organization

principles and customizing

genome design by minimi-

zation and modularization

are underway in the yeast

Kluyveromyces marxianus

(15), the fastest-growing

eukaryotic organism, thus

greatly accelerating the ge-

nome design-build-test cycle.

Undoubtedly, progress by

the Sc2.0 project will ad-

vance our understanding of

basic biological processes

and how the genome func-

tions. Consequently, com-

putational models of yeasts

with highly predictable out-

comes could be designed

and generated with a large

degree of success. Such designer organisms

could be exploited as models to comprehend

human diseases (8), identify disease targets,

and generate therapeutics. j

REFERENCES

1. D. G. Gibson et al., Science 329, 52 (2010). 2. S. M. Richardson et al., Science 355, 1040 (2017). 3. L. A. Mitchell et al., Science 355, eaaf4831 (2017). 4. G. Mercy et al., Science 355, eaaf4597 (2017). 5. Y. Wu et al., Science 355, eaaf4706 (2017). 6. W. Zhang et al., Science 355, eaaf3981 (2017). 7. Y. Shen et al., Science 355, eaaf4791 (2017). 8. Z.-X. Xie et al., Science 355, eaaf4704 (2017). 9. N. Annaluru et al., Science 344, 55 (2014). 10. J. S. Dymond et al., Nature 477, 471 (2011). 11. D. G. Gibson et al., Proc. Natl. Acad. Sci. U.S.A. 105, 20404

(2008). 12. D. G. Gibson et al., Nat. Methods 6, 343 (2009). 13. R. C. Spitale et al., Nat. Chem. Biol. 9, 18 (2013). 14. C. A. Hutchison III et al., Science 351, aad6253 (2016). 15. M. Eisenstein, Nat. Methods 14, 117 (2017).

10.1126/science.aam9739

Six synthetic chromosomes for the budding yeast drive biological processes just like their

natural counterparts. S. cerevisiae has 16 chromosomes.

DA_0310Perspectives.indd 1025 3/8/17 11:09 AM

Published by AAAS

on

Mar

ch 9

, 201

7ht

tp://

scie

nce.

scie

ncem

ag.o

rg/

Dow

nloa

ded

from

Page 42: Higher-order organization of interactions in human chromosomes:   1D sequence → 3D structure →   topologically associated domain (TAD) via network community detection

Synthetic yeast genome’s chromosome structure: special issue of Science, March 10, 2017.INSIGHTS | PERSPECTIVES

1024 10 MARCH 2017 • VOL 355 ISSUE 6329 sciencemag.org SCIENCE

By Krishna Kannan

1

and

Daniel G. Gibson

1,2

A core theme in synthetic biology, “un-

derstanding by creating,” inspired the

effort to generate the first synthetic

cell, JCVI-Syn1.0 (1). The project

Sc2.0 is elevating this concept by at-

tempting to create a synthetic version

of a more evolved organism, Saccharomyces

cerevisiae, a eukaryotic single-celled yeast.

In a set of papers in this issue (2–8), sci-

entists of the Sc2.0 project who previously

constructed a single yeast chromosome (9)

now report constructing five additional

yeast chromosomes (more than one-third of

the entire genome) (see the photo). Using a

variety of phenotypic assays and structural

and functional genomics techniques, the re-

searchers observed that the synthetic chro-

mosomes drive biological processes just like

the natural, native chromosomes.

The quintessential first step toward creat-

ing a synthetic organism is the careful design

of the genomic material, which ultimately

controls every physiological process in the

cell. Project Sc2.0 built a software framework,

BioStudio, to generate chromosomal designs

(2). A set of rules were applied while design-

ing each chromosome, including removal of

repetitive regions and introns (except for the

HAC1 intron), recoding of TAG stop codon to

TAA (allowing TAG to be repurposed), and

the relocation of transfer RNA genes into a

neochromosome. In addition, sites (loxPsym)

were introduced throughout the chromo-

some at the 3ʹ ends of nonessential genes

for chemically-inducible genome rearrange-

ments (through Cre-recombinase). This al-

lowed the selection of desired phenotypes

and the examination of corresponding geno-

types (synthetic chromosome rearrangement

and modification by loxP-mediated evolution,

or SCRaMbLE). Despite the many variations

(thousands) introduced during the construc-

for large-scale energy applications. The

nonperiodic layered nanophotonic struc-

ture showed good performance (6), but its

required nanometer precision control of the

thin films is still a challenge for scaling up

to the size of meters, which is needed for

even a small (kilowatt-scale) cooling system.

The unprecedented properties of a meta-

material such as negative refraction and

superlensing originates from its internal

structures instead of its chemical constitu-

ents (7). Because its structural unit cell is of-

ten smaller than the wavelength of interest,

practical implementations of optical meta-

materials have always been challenging. Zhai

et al. devised a glass-polymer metamaterial

in which a set of glass microspheres were

randomly and uniformly dispersed in a vis-

ibly transparent polymer matrix. Because of

the surface phonon-polariton Mie resonance

excited at room temperature on the glass

surface, this amorphous metamaterial has

a maximal broadband emissivity—near the

blackbody limit across the entire atmospheric

window—that results in cooling of the mate-

rial itself (8). Both the polymer and glass are

transparent to the full solar spectrum, so the

hybrid metamaterial minimally absorbs and

reflects most solar energy when backed with

a thin silver mirror (see the figure).

Zhai et al. demonstrated an average ra-

diative cooling flux greater than 110 W m–2

in a continuous 3-day field test. This en-

ergy flux is at a rate similar to that of pho-

tovoltaic solar cell energy conversion but

with the great advantage of running both

day and night. More impressively, the key

roadblock for large area deployment of ra-

diative cooling was removed. Because the

material is amorphous and flexible, the

authors developed a glass-polymer hybrid

manufacturing technique to produce the

microstructured metamaterial, which can

be made as films several meters in length

in a continuous roll-to-roll manner. Using

such a scalable metamaterial, they demon-

strated passive water cooling by nearly 10

Celsius degrees below ambient temperature

without use of electricity.

There are still challenges yet to be ad-

dressed for the implementation of radiative

cooling metamaterials into applications.

Given that the cooling occurs on both sides

of metamaterials, detailed thermal design

will be important to maximize the cooling

rate for the substrate side, and effective

heat exchange strategies therefore must

be developed. In addition, the IR radiation

transport inside metamaterials caused by

volumetric multiple scattering among the

random Mie resonating glass spheres should

be carefully studied so as to further maxi-

mize the total emissive power. Other issues

should also be carefully investigated, such

as how weather conditions negate cooling

performances and how the polymer-based

metamaterial maintains its performance

during long-term outdoor exposure.

Although extraction of the 110 W m–2 heat

flux is a relatively low cooling rate, these

designed metamaterials should find prom-

ising application for cooling large systems

such as buildings in warm climates (9). Pres-

ently, air conditioning uses ~6% of all of the

electricity produced in the United States,

and as a result, more than 100 million met-

ric tons of carbon dioxide are released into

the atmosphere each year. The impact of

such a passive radiative cooling without

use of electricity for building applications

alone can be immense. The broad use of ra-

diative cooling technology not only leads to

energy savings but also reduces fluorinated

greenhouse gases from refrigerants used in

conventional air conditioners, thus improv-

ing air quality. At higher temperatures T,

passive radiative cooling can be drastically

enhanced because the outgoing radiative

flux is proportional to T 4 according to the

Stefan-Boltzmann law. This scalably manu-

factured metamaterial may enable transfor-

mative cooling farms for power plants and

data centers, which consume unsustainable

amounts of water and electricity.

Although radiative cooling is promis-

ing, the better use of this waste energy can

be more desirable. For example, the waste

heat could be converted into electricity by

using thermoelectric devices. Nevertheless,

the passive radiative cooling demonstrated

here unleashes the immense potential of

using the cold universe as a new avenue of

keeping us cool on Earth. j

REFERENCES

1. Y. Zhai et al., Science 355, 1062 (2017). 2. F. D. Stacey, P. M. Davis, Physics of the Earth (Wiley, 1977). 3. R. Hillenbrand, T. Taubner, F. Keilmann, Nature 418, 159

(2002). 4. X. Lu et al., Renew. Sustain. Energy Rev. 65, 1079 (2016). 5. E. Rephaeli, A. Raman, S. Fan, Nano Lett. 13, 1457 (2013). 6. A. Raman et al., Nature 515, 540 (2014). 7. Y. Liu, X. Zhang, Chem. Soc. Rev. 40, 2494 (2011). 8. J. A. Schuller, R. Zia, T. Taubner, M. L. Brongersma, Phys.

Rev. Lett. 99, 107401 (2007). 9. N. Fernandez, W. Wang, K. Alvine, S. Katipamula, Pacific

Northwest National Laboratory Report no. PNNL-24904, Richland, WA (2015).

10.1126/science.aam8566

SYNTHETIC BIOLOGY

Yeast genome,

by design

Scientists are inching closer

to generating a

synthetic eukaryotic cell

1Synthetic Genomics, Inc., 11149 North Torrey PinesRoad, La Jolla, CA 92037, USA. 2J. Craig Venter Institute, 4120 Capricorn Lane, La Jolla, CA 92037, USA. Email: [email protected]

“The impact of such a

passive radiative cooling

without use of electricity for

building applications alone

can be immense.”

DA_0310Perspectives.indd 1024 3/8/17 11:09 AM

Published by AAAS

on

Mar

ch 9

, 201

7ht

tp://

scie

nce.

scie

ncem

ag.o

rg/

Dow

nloa

ded

from

10 MARCH 2017 • VOL 355 ISSUE 6329 1025SCIENCE sciencemag.org

PH

OT

O:

ST

EV

E G

SC

HM

EIS

SN

ER

/S

CIE

NC

E S

OU

RC

E

tion of synthetic chromosomes, these muta-

tions could still be considered “not drastic,”

that is, without radical changes to genome

size or its structural and functional organi-

zation. This conservative design could be,

in part, key to the success of functionalizing

each synthetic yeast chromosome created to

date.

The design rules were implemented in a

stepwise, hierarchical assembly of the syn-

thetic chromosomes, as previously described

(9, 10), starting with chunks built from oli-

gonucleotides (750 base pairs), which were

assembled into 2- to 3-kb minichunks and,

subsequently, megachunks of 10-kb or 30- to

60-kb DNA molecules in vitro. Each mega-

chunk (with the exception of terminal mega-

chunks) carried an auxotrophic selectable

marker at the 3ʹ end that was used to directly

swap out the wild-type chromosomal DNA.

This marker was recycled

during the swapping of the

next synthetic megachunk

with the native chromosome

(switching auxotrophies pro-

gressively by integration, or

SwAP-In). Depending on the

length of the chromosome,

as many as 33 serial SwAP-In

experiments were conducted

to generate cells carrying a

completely synthetic chro-

mosome alongside 15 other

native chromosomes. Using

this procedure, project Sc2.0

has generated six complete

yeast chromosomes (synII,

synIII, synV, synVI, synX,

synXII) and one half chro-

mosome (synIXR) (3, 10).

In what could be an impor-

tant stride toward generat-

ing a completely synthetic

yeast, Sc2.0 has initiated the

process of combining all the synthetic chro-

mosomes into a single strain by using an

endoduplication backcross process (3). Cells

carrying two and three completely synthetic

chromosomes have been generated with no

differential phenotype or genome architec-

ture compared to the wild-type cells (3, 4).

The design of the synthetic chromosomes

was not perfect, but close. Few deliberate

recoding events with synonymous codons

(PCRTags) were included in the synthetic

DNA to track the replacement of the native

chromosome with synthetic parts. Although

most of these “watermarks” were benign,

some altered expression of genes, which

modified messenger RNA (mRNA) secondary

structure and resulted in a conspicuous phe-

notype (3). Other watermarks altered gene

expression by either creating a putative site

for transcription factor binding (5) or by di-

rectly affecting mRNA translation efficiency

potentially due to discrepant decoding effi-

ciency (6).

In some cases, the introduction of loxPsym

sites reduced the expression of essential

genes, thus creating a detrimental pheno-

type. Sequencing, meiotic recombination,

pooled PCRTag mapping, and electrophoretic

karyotyping were used to identify “bugs”

(2–8). Simultaneous elimination of multiple

bugs was carried out by using clustered regu-

larly interspaced short palindromic repeats

(CRISPR)–Cas9 or by using at least a single

“selectable” insertion event (8).

The stepwise replacement of a native chro-

mosome with the SwAP-In method allows for

detecting adverse phenotypes of the design

during several stages of the assembly of the

synthetic chromosome. However, it also cre-

ates many opportunities for massive duplica-

tion and rearrangement events, most likely

detected only after the construction of the

entire synthetic chromosome (5, 8). Given the

recent advancements in complete de novo

chromosome synthesis (11, 12) and transplan-

tation technologies (1), in the near future, one

can envision the concomitant removal and

replacement of native chromosomes with

entire chromosomes that are designed and

chemically-synthesized from the bottom-up.

Indeed, some of these chromosome synthesis

and assembly technologies were used by the

Sc2.0 scientists to generate “minichunks” and

“megachunks.”

Recombination between the native and

the synthetic chromosomes could be avoided

by introducing sufficient differences (recod-

ing the genes, for example) in the sequence.

Knowledge from techniques like in vivo selec-

tive 2′-hydroxyl acylation and profiling (13)

could avoid disrupting critical mRNA struc-

tures during recoding. Complete synthesis

and transformation of chromosomes could

potentially be accomplished in a fraction of

time compared to the SwAP-In technology.

The scope of the design process could

be expanded in future studies to elucidate

genomic principles that underpin eukary-

otic life. Recently, a bacterial cell was de-

signed and synthesized with a minimal set

of genes (473) to answer a basic question

in biology about the smallest genomic con-

tent that could support life (14). The studies

reported in this issue open the door to ad-

dress a similar question pertaining to yeast.

Chromosomal design could also be extended

to functionally modularize the genome (or-

ganizing genes on the chromosome based

on their function). This could illuminate

complex and conserved regulatory mecha-

nisms that might eventually

apply to higher eukaryotes,

including humans. Notably,

efforts toward understand-

ing genome organization

principles and customizing

genome design by minimi-

zation and modularization

are underway in the yeast

Kluyveromyces marxianus

(15), the fastest-growing

eukaryotic organism, thus

greatly accelerating the ge-

nome design-build-test cycle.

Undoubtedly, progress by

the Sc2.0 project will ad-

vance our understanding of

basic biological processes

and how the genome func-

tions. Consequently, com-

putational models of yeasts

with highly predictable out-

comes could be designed

and generated with a large

degree of success. Such designer organisms

could be exploited as models to comprehend

human diseases (8), identify disease targets,

and generate therapeutics. j

REFERENCES

1. D. G. Gibson et al., Science 329, 52 (2010). 2. S. M. Richardson et al., Science 355, 1040 (2017). 3. L. A. Mitchell et al., Science 355, eaaf4831 (2017). 4. G. Mercy et al., Science 355, eaaf4597 (2017). 5. Y. Wu et al., Science 355, eaaf4706 (2017). 6. W. Zhang et al., Science 355, eaaf3981 (2017). 7. Y. Shen et al., Science 355, eaaf4791 (2017). 8. Z.-X. Xie et al., Science 355, eaaf4704 (2017). 9. N. Annaluru et al., Science 344, 55 (2014). 10. J. S. Dymond et al., Nature 477, 471 (2011). 11. D. G. Gibson et al., Proc. Natl. Acad. Sci. U.S.A. 105, 20404

(2008). 12. D. G. Gibson et al., Nat. Methods 6, 343 (2009). 13. R. C. Spitale et al., Nat. Chem. Biol. 9, 18 (2013). 14. C. A. Hutchison III et al., Science 351, aad6253 (2016). 15. M. Eisenstein, Nat. Methods 14, 117 (2017).

10.1126/science.aam9739

Six synthetic chromosomes for the budding yeast drive biological processes just like their

natural counterparts. S. cerevisiae has 16 chromosomes.

DA_0310Perspectives.indd 1025 3/8/17 11:09 AM

Published by AAAS

on

Mar

ch 9

, 201

7ht

tp://

scie

nce.

scie

ncem

ag.o

rg/

Dow

nloa

ded

from

RESEARCH ARTICLE SUMMARY◥

SYNTHETIC BIOLOGY

3D organization of synthetic andscrambled chromosomesGuillaume Mercy,* Julien Mozziconacci,* Vittore F. Scolari, Kun Yang, Guanghou Zhao,Agnès Thierry, Yisha Luo, Leslie A. Mitchell, Michael Shen, Yue Shen, Roy Walker,Weimin Zhang, Yi Wu, Ze-xiong Xie, Zhouqing Luo, Yizhi Cai, Junbiao Dai, Huanming Yang,Ying-Jin Yuan, Jef D. Boeke, Joel S. Bader, Héloïse Muller,† Romain Koszul†

INTRODUCTION: The overall organization ofbudding yeast chromosomes is driven and reg-ulated by four factors: (i) the tethering andclustering of centromeres at the spindle polebody; (ii) the loose tethering of telomeres at thenuclear envelope, where they form small, dynamicclusters; (iii) a single nucleolus in which the ribo-somal DNA (rDNA) cluster is sequestered fromother chromosomes; and (iv) chromosomal armlengths. Hi-C, a genomic derivative of the chro-mosome conformation capture approach, quan-tifies the proximity of all DNA segments presentin the nuclei of a cell population, unveiling theaveragemultiscale organization of chromosomesin the nuclear space. We exploited Hi-C to inves-tigate the trajectories of synthetic chromosomeswithin the Saccharomyces cerevisiae nucleus andcompare them with their native counterparts.

RATIONALE: The Sc2.0 genome design speci-fies strong conservation of gene content andarrangement with respect to the native chro-mosomal sequence. However, synthetic chromo-somes incorporate thousands of designer changes,notably the removal of transfer RNA genes andrepeated sequences such as transposons andsubtelomeric repeats to enhance stability. Theyalso carry loxPsym sites, allowing for induciblegenome SCRaMbLE (synthetic chromosomerearrangement andmodification by loxP-mediatedevolution) aimed at accelerating genomic plas-ticity. Whether these changes affect chromosomeorganization, DNA metabolism, and fitnessis a critical question for completion of the Sc2.0project. To address these questions, we usedHi-C to characterize the organization of syn-thetic chromosomes.

RESULTS: Comparison of synthetic chromo-somes with native counterparts revealed no sub-stantial changes, showing that the redesignedsequences, and especially the removal of re-peated sequences, had little or no effect onaverage chromosome trajectories. Sc2.0 synthet-ic chromosomes have Hi-C contact maps withmuch smoother contact patterns than those ofnative chromosomes, especially in subtelomer-

ic regions. This improved“mappability” results di-rectly from the removal ofrepeated elements all alongthe length of the syntheticchromosomes. These obser-vations highlight a concep-

tual advance enabled by bottom-up chromosomesynthesis, which allows refinement of exper-imental systems to make complex questionseasier to address. Despite the overall similar-ity, differences were observed in two instances.First, deletion of the HML and HMR silentmating-type cassettes on chromosome III ledto a loss of their specific interaction. Second,repositioning the large array of rDNA repeatsnearer to the centromere cluster forced sub-stantial genome-wide conformational changes—for instance, inserting the array in the mid-dle of the small right arm of chromosome IIIsplit the arm into two noninteracting regions.The nucleolus structure was then trapped inthe middle between small and large chromo-some arms, imposing a physical barrier betweenthem.In addition to describing the Sc2.0 chromo-

some organization, we also used Hi-C to identifychromosomal rearrangements resulting fromSCRaMbLE experiments. Inducible recombina-tion between the hundreds of loxPsym sitesintroduced into Sc2.0 chromosomes enablescombinatorial rearrangements of the genomestructure. Hi-C contact maps of two SCRaMbLEstrains carrying synIII and synIXR chromosomesrevealed a variety of cis events, including simpledeletions, inversions, and duplications, as wellas translocations, the latter event representinga class of trans SCRaMbLE rearrangements notpreviously observed.

CONCLUSION: This large data set is a re-source that will be exploited in future studiesexploring the power of the SCRaMbLE system.By investigating the trajectories of Sc2.0 chro-mosomes in the nuclear space, this work pavesthe way for future studies addressing the in-fluence of genome-wide engineering approacheson essential features of living systems.▪

RESEARCH | SYNTHETIC YEAST GENOME

Mercy et al., Science 355, 1050 (2017) 10 March 2017 1 of 1

The list of author affiliations is available in the full article online.*These authors contributed equally to this work.†Corresponding author. Email: [email protected] (H.M.);[email protected] (R.K.)Cite this article as G. Mercy et al., Science 355, eaaf4597(2017). DOI: 10.1126/science.aaf4597

Synthetic chromosome organization. (A) Hi-C contact maps of synII and native (wild-type,WT)chromosome II. Red arrowheads point to filtered bins (white vectors) that are only present in thenative chromosome map. kb, kilobases. (B) Three-dimensional (3D) representations of Hi-C mapsof strains carrying rDNA either on synXII or native chromosome III. (C) Contact maps and 3Drepresentations of synIXR (yellow) and synIII (pink) before (left) and after (right) SCRaMbLE.Translocation breakpoints are indicated by green and blue arrowheads.

ON OUR WEBSITE◥

Read the full articleat http://dx.doi.org/10.1126/science.aaf4597..................................................

on

Mar

ch 9

, 201

7ht

tp://

scie

nce.

scie

ncem

ag.o

rg/

Dow

nloa

ded

from

RESEARCH ARTICLE SUMMARY◥

SYNTHETIC BIOLOGY

3D organization of synthetic andscrambled chromosomesGuillaume Mercy,* Julien Mozziconacci,* Vittore F. Scolari, Kun Yang, Guanghou Zhao,Agnès Thierry, Yisha Luo, Leslie A. Mitchell, Michael Shen, Yue Shen, Roy Walker,Weimin Zhang, Yi Wu, Ze-xiong Xie, Zhouqing Luo, Yizhi Cai, Junbiao Dai, Huanming Yang,Ying-Jin Yuan, Jef D. Boeke, Joel S. Bader, Héloïse Muller,† Romain Koszul†

INTRODUCTION: The overall organization ofbudding yeast chromosomes is driven and reg-ulated by four factors: (i) the tethering andclustering of centromeres at the spindle polebody; (ii) the loose tethering of telomeres at thenuclear envelope, where they form small, dynamicclusters; (iii) a single nucleolus in which the ribo-somal DNA (rDNA) cluster is sequestered fromother chromosomes; and (iv) chromosomal armlengths. Hi-C, a genomic derivative of the chro-mosome conformation capture approach, quan-tifies the proximity of all DNA segments presentin the nuclei of a cell population, unveiling theaveragemultiscale organization of chromosomesin the nuclear space. We exploited Hi-C to inves-tigate the trajectories of synthetic chromosomeswithin the Saccharomyces cerevisiae nucleus andcompare them with their native counterparts.

RATIONALE: The Sc2.0 genome design speci-fies strong conservation of gene content andarrangement with respect to the native chro-mosomal sequence. However, synthetic chromo-somes incorporate thousands of designer changes,notably the removal of transfer RNA genes andrepeated sequences such as transposons andsubtelomeric repeats to enhance stability. Theyalso carry loxPsym sites, allowing for induciblegenome SCRaMbLE (synthetic chromosomerearrangement andmodification by loxP-mediatedevolution) aimed at accelerating genomic plas-ticity. Whether these changes affect chromosomeorganization, DNA metabolism, and fitnessis a critical question for completion of the Sc2.0project. To address these questions, we usedHi-C to characterize the organization of syn-thetic chromosomes.

RESULTS: Comparison of synthetic chromo-somes with native counterparts revealed no sub-stantial changes, showing that the redesignedsequences, and especially the removal of re-peated sequences, had little or no effect onaverage chromosome trajectories. Sc2.0 synthet-ic chromosomes have Hi-C contact maps withmuch smoother contact patterns than those ofnative chromosomes, especially in subtelomer-

ic regions. This improved“mappability” results di-rectly from the removal ofrepeated elements all alongthe length of the syntheticchromosomes. These obser-vations highlight a concep-

tual advance enabled by bottom-up chromosomesynthesis, which allows refinement of exper-imental systems to make complex questionseasier to address. Despite the overall similar-ity, differences were observed in two instances.First, deletion of the HML and HMR silentmating-type cassettes on chromosome III ledto a loss of their specific interaction. Second,repositioning the large array of rDNA repeatsnearer to the centromere cluster forced sub-stantial genome-wide conformational changes—for instance, inserting the array in the mid-dle of the small right arm of chromosome IIIsplit the arm into two noninteracting regions.The nucleolus structure was then trapped inthe middle between small and large chromo-some arms, imposing a physical barrier betweenthem.In addition to describing the Sc2.0 chromo-

some organization, we also used Hi-C to identifychromosomal rearrangements resulting fromSCRaMbLE experiments. Inducible recombina-tion between the hundreds of loxPsym sitesintroduced into Sc2.0 chromosomes enablescombinatorial rearrangements of the genomestructure. Hi-C contact maps of two SCRaMbLEstrains carrying synIII and synIXR chromosomesrevealed a variety of cis events, including simpledeletions, inversions, and duplications, as wellas translocations, the latter event representinga class of trans SCRaMbLE rearrangements notpreviously observed.

CONCLUSION: This large data set is a re-source that will be exploited in future studiesexploring the power of the SCRaMbLE system.By investigating the trajectories of Sc2.0 chro-mosomes in the nuclear space, this work pavesthe way for future studies addressing the in-fluence of genome-wide engineering approacheson essential features of living systems.▪

RESEARCH | SYNTHETIC YEAST GENOME

Mercy et al., Science 355, 1050 (2017) 10 March 2017 1 of 1

The list of author affiliations is available in the full article online.*These authors contributed equally to this work.†Corresponding author. Email: [email protected] (H.M.);[email protected] (R.K.)Cite this article as G. Mercy et al., Science 355, eaaf4597(2017). DOI: 10.1126/science.aaf4597

Synthetic chromosome organization. (A) Hi-C contact maps of synII and native (wild-type,WT)chromosome II. Red arrowheads point to filtered bins (white vectors) that are only present in thenative chromosome map. kb, kilobases. (B) Three-dimensional (3D) representations of Hi-C mapsof strains carrying rDNA either on synXII or native chromosome III. (C) Contact maps and 3Drepresentations of synIXR (yellow) and synIII (pink) before (left) and after (right) SCRaMbLE.Translocation breakpoints are indicated by green and blue arrowheads.

ON OUR WEBSITE◥

Read the full articleat http://dx.doi.org/10.1126/science.aaf4597..................................................

on

Mar

ch 9

, 201

7ht

tp://

scie

nce.

scie

ncem

ag.o

rg/

Dow

nloa

ded

from