37
Supplementary Figure 1: Number and pattern of somatic mutations in 30 LCBs. (a) Number of mutations. RK308, which had a deletion in MLH1 gene and missense mutation in MSH2 gene, had exceptionally large number of somatic mutations. (b) Whole-genome substitution pattern of 30 LCBs.

(a) Number of mutations. RK308, which had a deletion in ... · Ha pa titis + LCB HCC Kidne y Cle a r Ce ll-4 -2 0 2 4-2 0 2 4 Lu n g Ad e n o c a rc in o m a P C1 P C 2 Ha pa titis

  • Upload
    doduong

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Supplementary Figure 1: Number and pattern of somatic mutations in 30 LCBs.

(a) Number of mutations. RK308, which had a deletion in MLH1 gene and missense

mutation in MSH2 gene, had exceptionally large number of somatic mutations. (b)

Whole-genome substitution pattern of 30 LCBs.

Supplementary Figure 2: Pattern of rearrangement in LCBs. (a) RK067. (b) RK069.

(c) RK073. (d) RK084. (e) RK108. (f) RK109. (g) RK112. (h) RK137. (i) RK138.

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Supplementary Figure 2: Pattern of rearrangement in LCBs. (j) RK142. (k) RK146.

(l) RK166. (m) RK182. (n) RK184. (o) RK194. (p) RK204. (q) RK208. (r) RK226.

(j) (k) (l)

(m) (n) (o)

(p) (q) (r)

Supplementary Figure 2: Pattern of rearrangement in LCBs. (s) RK269. (t) RK272.

(u) RK279. (v) RK298. (w) RK303. (x) RK307. (y) RK308. (z) RK309. (aa) RK310.

(s) (t) (u)

(v) (w) (x)

(y) (z) (aa)

Supplementary Figure 2: Pattern of rearrangement in LCBs. (ab) RK312. (ac)

RK316. (ad) RK317.

(ab) (ac) (ad)

Supplementary Figure 3: Comparison of substitution pattern between HCCs and

LCBs. The proportion of C:G to T:A was higher in the LCBs and T:A to C:G was lower.

Hinge of the boxes indicate the first and third quartile. The whiskers extend to the most

extreme data point which is no more than 1.5 times the interquartile range. (a) C:G to

T:A. (b) C:G to G:C. (c) C:G to A:T. (d) T:A to C:G. (e) T:A to G:C (f) T:A to A:T.

LCB HCC

0.0

0.1

0.2

0.3

0.4

0.5

0.6

C:G to T:A

LCB HCC

0.0

00

.05

0.1

00

.15

0.2

00.2

5

C:G to G:C

LCB HCC

0.0

0.1

0.2

0.3

C:G to A:T

LCB HCC

0.0

0.1

0.2

0.3

0.4

0.5

T:A to C:G

LCB HCC

0.0

00

.05

0.1

00.1

50

.20

0.2

50

.30

T:A to G:C

LCB HCC

0.0

00

.05

0.1

00

.15

0.2

00

.25

0.3

0

T:A to A:T

Fre

qu

ency

Fre

qu

en

cy

P-value=4.1x10-7

P-value=0.00016

(a) (b) (c)

(d) (e) (f)

Supplementary Figure 4: Comparison of substitution pattern between

hepatitis-positive and -negative LCBs. In hepatitis-positive LCBs, the proportion of

C:G to T:A was higher, and those of T:A to C:G and T:A to A:T were lower. Hinge of

the boxes indicate the first and third quartile. The whiskers extend to the most extreme

data point which is no more than 1.5 times the interquartile range. (a) C:G to T:A. (b)

C:G to G:C. (c) C:G to A:T. (d) T:A to C:G. (e) T:A to G:C (f) T:A to A:T.

Hepatitis-LCB

Hepatitis+LCB

0.0

0.1

0.2

0.3

0.4

0.5

0.6

C:G to T:A

Hepatitis-LCB

Hepatitis+LCB

0.0

00

.05

0.1

00

.15

0.2

00.2

5

C:G to G:C

Hepatitis-LCB

Hepatitis+LCB

0.0

00

.05

0.1

00

.15

0.2

00

.25

0.3

0

C:G to A:T

Hepatitis-LCB

Hepatitis+LCB

0.0

0.1

0.2

0.3

0.4

0.5

T:A to C:G

Hepatitis-LCB

Hepatitis+LCB

0.0

00

.05

0.1

00.1

50

.20

0.2

50

.30

T:A to G:C

Hepatitis-LCB

Hepatitis+LCB

0.0

00

.05

0.1

00

.15

0.2

00.2

5

Fre

qu

en

cy

Fre

qu

en

cy

T:A to A:T

P-value=0.0010

P-value=0.0053 P-value=0.0082

(a) (b) (c)

(d) (e) (f)

Supplementary Figure 5: Principal component analysis of substitution pattern for

LCBs, HCCs and other types of cancers.

-4 -2 0 2 4

-20

24

Bladder

PC1

PC

2

Hapatitis - LCB

Hapatitis + LCB

HCC

Bladder

-4 -2 0 2 4

-20

24

Breast

PC1P

C2

Hapatitis - LCB

Hapatitis + LCB

HCC

Breast

-4 -2 0 2 4

-20

24

CLL

PC1

PC

2

Hapatitis - LCB

Hapatitis + LCB

HCC

CLL

-4 -2 0 2 4

-20

24

Cervix

PC1

PC

2

Hapatitis - LCB

Hapatitis + LCB

HCC

Cervix

-4 -2 0 2 4

-20

24

Colorectum

PC1

PC

2

Hapatitis - LCB

Hapatitis + LCB

HCC

Colorectum

-4 -2 0 2 4

-20

24

Esophageal

PC1P

C2

Hapatitis - LCB

Hapatitis + LCB

HCC

Esophageal

-4 -2 0 2 4

-20

24

Head and Neck

PC1

PC

2

Hapatitis - LCB

Hapatitis + LCB

HCC

Head and Neck

-4 -2 0 2 4

-20

24

Kidney Clear Cell

PC1

PC

2

Hapatitis - LCB

Hapatitis + LCB

HCC

Kidney Clear Cell

-4 -2 0 2 4

-20

24

Lung Adenocarcinoma

PC1

PC

2

Hapatitis - LCB

Hapatitis + LCB

HCC

Lung Adenocarcinoma

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Supplementary Figure 5: Principal component analysis of substitution pattern for

LCBs, HCCs and other types of cancers.

-4 -2 0 2 4

-20

24

Lung Small Cell

PC1

PC

2

Hapatitis - LCB

Hapatitis + LCB

HCC

Lung Small Cell

-4 -2 0 2 4

-20

24

Lung Squamous

PC1P

C2

Hapatitis - LCB

Hapatitis + LCB

HCC

Lung Squamous

-4 -2 0 2 4

-20

24

B-cell Lymphoma

PC1

PC

2

Hapatitis - LCB

Hapatitis + LCB

HCC

B-cell Lymphoma

-4 -2 0 2 4

-20

24

Medulloblastoma

PC1

PC

2

Hapatitis - LCB

Hapatitis + LCB

HCC

Medulloblastoma

-4 -2 0 2 4

-20

24

Melanoma

PC1

PC

2

Hapatitis - LCB

Hapatitis + LCB

HCC

Melanoma

-4 -2 0 2 4

-20

24

Ovary

PC1P

C2

Hapatitis - LCB

Hapatitis + LCB

HCC

Ovary

-4 -2 0 2 4

-20

24

Pancreas

PC1

PC

2

Hapatitis - LCB

Hapatitis + LCB

HCC

Pancreas

-4 -2 0 2 4

-20

24

Pilocytic Astrocytoma

PC1

PC

2

Hapatitis - LCB

Hapatitis + LCB

HCC

Pilocytic Astrocytoma

-4 -2 0 2 4

-20

24

Prostate

PC1

PC

2

Hapatitis - LCB

Hapatitis + LCB

HCC

Prostate

(j) (k) (l)

(m) (n) (o)

(p) (q) (r)

Supplementary Figure 5: Principal component analysis of substitution pattern for

LCBs, HCCs and other types of cancers.

-4 -2 0 2 4

-20

24

Stomach

PC1

PC

2

Hapatitis - LCB

Hapatitis + LCB

HCC

Stomach

-4 -2 0 2 4

-20

24

Uterus

PC1P

C2

Hapatitis - LCB

Hapatitis + LCB

HCC

Uterus

(s) (t)

Supplementary Figure 6: Mutational signatures in the LCBs identified by EMu

software.

Supplementary Figure 7: Proportion of mutations per process in LCBs.

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

RK067_C01

RK069_C01

RK073_C01

RK084_C01

RK108_C01

RK109_C01

RK112_C01

RK137_C01

RK138_C01

RK142_C01

RK146_C01

RK166_C01

RK182_C01

RK184_C01

RK194_C01

RK204_C02

RK208_C01

RK226_C01

RK269_C01

RK272_C01

RK279_C01

RK298_C01

RK303_C01

RK307_C01

RK308_C01

RK309_C01

RK310_C01

RK312_C01

RK316_C01

RK317_C01

Proportion of mutations per process

A

B

C

D

E

Supplementary Figure 8: Comparison of number of mutations per process between

hepatitis-positive and -negative LCBs. (a) Signature A. (b) Signature B. (c)

Signature C. (d) Signature D. (e) Signature E. (f) Signature E. Three outlier samples

in the PCA plot (RK138, RK269 and RK307) were excluded from the analysis.

Mutation per process of signature E was significantly different between

hepatitis-positive and -negative LCBs. Hinge of the boxes indicates the first and third

quartile. The whiskers extend to the most extreme data point which is no more than 1.5

times the interquartile range. P-values were obtained by Wilcox test.

Supplementary Figure 9: Mutations of chromatin regulators in LCBs. Mutations in

genes with a GO term of “chromatin modification”, “chromatin organization”,

“chromatin remodeling” or “chromatin remodeling complex” are summarized. More

than half of LCBs have mutations in chromatin regulators, and this result suggests that

rearrangements play an important role in the mutational landscape in the LCBs.

Supplementary Figure 10: Kaplan–Meier survival curve for samples with

mutations in IDH genes (n=8) and wild-type. IDH mutations were associated with

poor disease free survival (log-rank test; P-value = 0.038 and Cox proportional hazards

model; P-value = 0.016).

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

Disease-free survival

Months

Cum

ula

tive

surv

iva

l

WT

IDH Mutant

Supplementary Figure 11: Functional screening for PCLO, XIRP2, ODZ1 and

EPHA2 in liver cancer cell lines. (a) Knockdown efficiencies. (b) Cell growth assay.

(c) Migration assay. (d) Invasion assay. *P-value < 0.05.

Supplementary Figure 12: Knockdown of PCLO and XIRP2 and

migration/invasion assays in liver cancer cell lines. (a) Relative expression status of

XIRP2 (left) and PCLO (right) in siRNA knockdown experiments. (b) Migration assay

on XIRP2 in SSP-25 and Li-7 cell lines. (c) Invasion assay for PCLO in SNU-449 and

RBE cell lines.. Invasion and migration assay were performed in triplicate. *

Deviation from both the control siRNAs were significant (P-value < 0.05).

(a) �

(c)�

(b)�

0

50

100

150

200

250

300

siEGFP

siLUC

siXIR

P2_siR

NA#1

siXIR

P2_siR

NA#2

SSP-25�

0

50

100

150

200

250

300

350

siEGFP

siLUC

siXIR

P2_siR

NA#1

siXIR

P2_siR

NA#2

Li-7�

0

20

40

60

80

100

120

140

160

siEGFP

siLUC

siPCLO

_siR

NA#1

siPCLO

_siR

NA#2

SNU-449�

0

1000

2000

3000

4000

5000

6000

siEGFP

siLUC

siPCLO

_siR

NA#1

siPCLO

_siR

NA#2

RBE�

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

siEGFP

siLUC

siXIR

P2_siR

NA#1

siXIR

P2_siR

NA#2

SSP-25�

0 0.2 0.4 0.6 0.8

1 1.2 1.4 1.6 1.8

2

siEGFP

siLUC

siXIR

P2_siR

NA#1

siXIR

P2_siR

NA#2

Li-7�

0

0.2

0.4

0.6

0.8

1

1.2

siEGFP

siLUC

siPCLO

_siR

NA#1

siPCLO

_siR

NA#2

SNU-449�

0

0.2

0.4

0.6

0.8

1

1.2

1.4

siEGFP

siLUC

siPCLO

_siR

NA#1

siPCLO

_siR

NA#2

RBE�

PCLO�XIRP2�N

orm

aliz

ed c

ell c

ount

/rep

licat

e�N

orm

aliz

ed c

ell c

ount

/rep

licat

e�re

lativ

e ex

pres

sion

sta

tus

(/siE

GFP

)�

*� *�

Supplementary Figure 13: Distribution of the proportion of mutated allele. The

proportion of mutated alleles (PMA) was standardized by the maximum PMA in each

sample. Samples with ≥ 20 mutations were used in the analysis. The median of the

distribution for cHCC/CC was significantly higher than for ICC (Wilcox test P-value =

0.0047), suggesting a difference in population structure of these tumors.

Pro

port

ion o

f m

uta

ted a

llele

cHCC/CC

ICC

HCC

Ranking of mutations

Supplementary Figure 14: Comparison of substitution pattern between

hepatitis-negative ICC and cHCC/CC. In the cHCC/CCs, the proportion of C:G to

T:A was higher, and these of T:A to C:G and T:A to A:T were lower. (a) C:G to T:A.

(b) C:G to G:C. (c) C:G to A:T. (d) T:A to C:G. (e) T:A to G:C (f) T:A to A:T.

cHCC/CC

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Hepatitis-

ICC

Hepatitis-

ICC

Hepatitis-

ICC

Hepatitis-

ICC

Hepatitis-

ICC

Hepatitis-

ICC

cHCC/CC cHCC/CC

cHCC/CC cHCC/CC cHCC/CC

Frequency

Frequency

C.G to T.A

0.00

0.05

0.10

0.15

0.20

0.25

C.G to G.C

0.00

0.05

0.10

0.15

0.20

0.25

0.30

C.G to A.T

0.0

0.1

0.2

0.3

0.4

0.5

T.A to C.G

0.00

0.05

0.10

0.15

0.20

T.A to G.C

0.00

0.05

0.10

0.15

0.20

0.25

T.A to A.T

P-value=0.0068

P-value=0.033 P-value=0.025

(a) (b) (c)

(d) (e) (f)

Supplementary Figure 15: Principal component analysis (PCA) of the

whole-genome substitution patterns of 30 LCBs and 60 HCCs. cHCC/CC (black),

CoCC (blue), hepatitis-negative ICC (yellow), hepatitis-positive ICC (red) and HCC

(gray) are shown. Hepatitis-negative ICCs diverge from other samples. This result

suggests that the influence of chronic hepatitis is stronger than the histological types.

-4 -2 0 2 4 6

-20

24

6

PC1

PC

2

cHCC-CC

CoCC

Hepatitis - ICC

Hepatitis + ICC

HCC

Supplementary Table 1: Clinical and pathological features of 60 HCCs analyzed by whole genome sequencingThe mutation data for the 60 HCCs were deposited by RIKEN to the ICGC dataset version 8 released at 2012 March (http://icgc.org/)

ID Age Gender Viral infection TNMa Tumor size (mm) Edmondson grade Portal vein invasion (vp) Hepatic vein invasion (vv) Liver fibrosisb

RK001_C01 56 M (-) T2N0M0 33 2 0 0 0RK002_C 62 M HCV T4N0M0 87 3 1 1 4RK003_C 71 M HCV T2N0M0 45 2.5 0 0 3RK004_C01 76 M HCV T3N0M0 21 1 0 0 4RK005_C 66 M HBV T2N0M0 21 2 0 0 0RK006_C1 69 M HCV T2N0M0 20 2 0 0 4RK006_C1 69 M HCV T2N0M0 10 2 0 0 4RK007_C01 69 M HCV T3N0M0 47 2.5 0 1 2RK010_C 46 M HBV T3N0M0 24 2 0 0 4RK012_C01 63 M HBV T2N0M0 23 1 0 0 4RK015_C 61 M HCV T1N0M0 18 2 0 0 2RK016_C01 63 F HCV T2N0M0 33 1 0 0 4RK018_C01 63 M (-) T2N0M0 65 2 0 0 1RK019_C 74 M HCV T2N0M0 25 1.5 0 0 0RK020_C01 58 M HBV T3N0M0 50 2.5 0 1 3RK021_C01 81 M HCV T2N0M0 25 2.5 0 0 4RK022_C01 73 F HCV T2N0M0 40 3 0 0 4RK023_C 74 M HBV T3N0M0 60 3 3 1 1RK024_C 61 M HBV T3N0M0 30 2 0 1 2RK025_C 78 F HCV T1N0M0 15 1.5 0 0 2RK026_C01 58 M HCV T3N0M0 24 2 0 0 3RK027_C01 73 M HCV T3N0M0 150 2 0 0 2RK029_C 61 M HCV T3N0M0 35 2 1 0 4RK031_C01 53 M HBV, HCV T4N0M0 115 2.5 0 0 4RK032_C01 64 M HCV T3N0M0 20 2 1 0 4RK033_C01 77 M HBV, HCV T1N0M0 10 2.5 0 0 4RK034_C 38 M HBV T1N0M0 18 2.5 0 0 3RK035_C01 63 M (-) T2N0M0 60 2.5 0 0 1RK036_C01 73 M HCV T2N0M0 25 2 0 0 4RK037_C01 69 M HCV T3N0M0 15 2 0 0 4RK041_C01 78 M HCV T3N0M0 30 3 4 1 3RK042_C 57 M HBV T2N0M0 30 3 1 0 4RK046_C01 64 M HCV T3N0M0 27 2 0 0 4RK046_C01 64 M HCV T3N0M0 27 2 0 0 4RK047_C01 77 F HCV T2N0M0 30 2 0 0 3RK048_C 69 M HCV T2N0M0 25 2 0 0 2RK049_C01 62 M HBV T1N0M0 18 1 0 0 4RK050_C 65 M HBV T3N0M0 30 2 0 0 3RK051_C01 74 M HCV T2N0M0 22 2 0 0 4RK054_C01 75 M HCV T2N0M0 20 2 0 0 3RK055_C01 67 M HCV T2N0M0 8 1 0 0 4RK056_C01 74 M HCV T3N0M0 50 2 1 0 1RK063_C 81 F (-) T3N0M0 70 2 0 0 0RK068_C 32 F HBV T1N0M0 13 2 0 0 2RK075_C01 62 M HBV T2N0M0 17 1 0 0 3RK079_C01 51 M HBV T2N0M0 100 3 1 1 2RK083_C01 71 F HBV T2N0M0 45 2 1 0 2RK086_C01 68 M HBV T1N0M0 16 1 0 0 0RK089_C01 70 M (-) T4N0M0 52 2 1 0 2RK092_C01 58 M HBV T1N0M0 15 2 0 1 4RK098_C01 69 F HBV T3N0M0 150 3 2 2 1RK099_C01 78 M HBV, HCV T2N0M0 50 2 0 0 1RK100_C01 75 F HCV T4N0M0 50 2.5 1 0 3RK106_C01 45 F HBV T2N0M0 26 1 0 0 2RK107_C01 57 F HBV T2N0M0 26 2 0 0 4RK126_C01 58 M HBV T2N0M0 33 2 0 0 1RK130_C01 42 M HBV T2N0M0 43 2 1 0 3RK133_C01 49 M HBV T1N0M0 18 2 1 0 3RK141_C01 31 F HBV T2N0M0 60 2 0 0 1RK209_C01 81 M HCV T1N0M0 16 2 1 0 4a; TNM staging in UICCb; Fibrosis in non-cancerous liver tissue is determined according the New Inuyama Classification.

Supplementary Table 2: Summary of whole genome sequences

Sample Normal (bp) Cancer (bp) Normal Cancer Normal CancerRK067 86,956,162,900 97,006,299,000 29.0 32.3 EGAN00001187542 EGAN00001187543RK069 81,732,939,800 92,470,837,000 27.2 30.8 EGAN00001187546 EGAN00001187547RK073 92,426,644,400 122,874,644,000 30.8 41.0 EGAN00001187552 EGAN00001187553RK084 81,123,594,600 104,662,707,300 27.0 34.9 EGAN00001187568 EGAN00001187569RK108 82,495,228,700 94,531,606,500 27.5 31.5 EGAN00001187612 EGAN00001187613RK109 80,157,325,000 103,711,673,600 26.7 34.6 EGAN00001187614 EGAN00001187615RK112 91,538,483,200 86,149,126,200 30.5 28.7 EGAN00001187618 EGAN00001187619RK137 90,007,498,000 111,235,218,100 30.0 37.1 EGAN00001187650 EGAN00001187651RK138 80,372,218,700 145,026,165,500 26.8 48.3 EGAN00001187652 EGAN00001187653RK142 107,051,316,800 110,001,474,900 35.7 36.7 EGAN00001187658 EGAN00001187659RK146 106,902,470,600 155,099,856,500 35.6 51.7 EGAN00001187664 EGAN00001187665RK166 98,499,743,500 97,797,826,800 32.8 32.6 EGAN00001187690 EGAN00001187691RK182 78,283,214,800 109,544,357,600 26.1 36.5 EGAN00001187707 EGAN00001187708RK184 95,745,441,000 117,712,541,700 31.9 39.2 EGAN00001187709 EGAN00001187710RK194 100,951,293,100 124,401,206,900 33.7 41.5 EGAN00001187712 EGAN00001187713RK204 92,369,757,500 91,143,974,700 30.8 30.4 EGAN00001187714 EGAN00001187715RK208 91,156,929,500 105,289,017,300 30.4 35.1 EGAN00001187716 EGAN00001187717RK226 90,013,518,300 113,469,169,700 30.0 37.8 EGAN00001187719 EGAN00001187720RK269 91,434,236,200 101,743,552,576 30.5 33.9 EGAN00001187721 EGAN00001187722RK272 98,585,740,800 130,068,546,500 32.9 43.4 EGAN00001187723 EGAN00001187724RK279 92,823,038,200 210,730,083,400 30.9 70.2 EGAN00001187725 EGAN00001187726RK298 94,729,917,200 130,521,746,000 31.6 43.5 EGAN00001187727 EGAN00001187728RK303 104,753,964,000 126,610,303,000 34.9 42.2 EGAN00001187729 EGAN00001187730RK307 130,923,658,200 93,288,090,400 43.6 31.1 EGAN00001187731 EGAN00001187732RK308 133,917,540,200 101,864,586,200 44.6 34.0 EGAN00001187733 EGAN00001187734RK309 116,478,725,200 100,082,859,200 38.8 33.4 EGAN00001187735 EGAN00001187736RK310 134,696,730,800 93,155,950,000 44.9 31.1 EGAN00001187737 EGAN00001187738RK312 103,855,605,000 167,293,765,800 34.6 55.8 EGAN00001187739 EGAN00001187740RK316 116,683,764,800 149,836,996,600 38.9 49.9 EGAN00001187741 EGAN00001187742RK317 122,068,848,400 209,263,579,600 40.7 69.8 EGAN00001187744 EGAN00001187743Average 98,957,851,647 119,886,258,753 33.0 40.0

EGA Sample AccessionSequence bases Depth of coverage (x)

* Non-cancerous liver tissues were used as the normal tissue for RK182, RK307, RK308, RK309 and RK310.Lymphocytes were sequenced for others.

Supplementary Table 3: Summary of somatic mutation numbers in 30 LCB genomesSample Point mutation Coding point

mutationNonsynonymouspoint mutation

Point mutationin splice site Indel Coding indel Indel in splice site

RK067 14,226 77 42 1 295 8 0RK069 4,651 51 42 1 166 4 0RK073 1,229 11 7 0 72 0 0RK084 8,937 68 44 0 250 9 0RK108 6,910 64 48 2 242 4 0RK109 5,549 46 35 2 277 7 0RK112 3,067 30 17 0 161 2 0RK137 8,313 53 44 3 252 4 0RK138 2,836 39 30 2 187 2 0RK142 525 7 4 0 67 1 0RK146 2,307 33 27 2 75 2 0RK166 6,268 33 26 1 187 3 0RK182 9,010 56 39 1 273 7 1RK184 11,033 81 64 0 244 5 0RK194 3,294 36 25 0 121 4 0RK204 2,257 22 20 0 110 1 0RK208 4,593 30 22 0 167 2 0RK226 326 6 4 0 88 2 0RK269 1,535 12 5 0 276 8 0RK272 3,130 29 28 1 142 6 0RK279 2,083 22 17 0 98 2 0RK298 5,929 74 55 1 284 9 1RK303 260 3 3 0 65 1 0RK307 2,100 28 21 0 113 4 0RK308 178,763 2,247 1,529 50 6,743 610 1RK309 4,785 43 32 3 241 1 0RK310 2,279 23 19 0 107 3 0RK312 3,427 27 19 1 115 3 0RK316 1,729 17 12 1 108 3 0RK317 2,215 15 9 0 43 1 0

Total except forRK308

124,803 1,036 760 22 4,826 108 2

Average exceptfor RK308 4,304 36 26 1 166 4 0

Supplementary Table 4: HBV integration sites identified in 30 LCBs

Sample Chr1 Pos1 Chr2 Pos2Number of

supportread-pair

Number ofconsistentread-pair

Clonal proportion AIR Proportion ofmutated allele

Affected gene

RK069 HBV|AP011098.1 193 3 169,023,909 22 46 0.32 (0.22-0.45) 0.75 0.43 (0.29-0.60) MDS1RK166 HBV|AP011098.1 981 2 14,604,475 9 46 0.16 (0.08-0.29) 0.55 0.30 (0.14-0.52)RK166 HBV|AP011098.1 1,741 4 63,879,681 7 41 0.15 (0.06-0.28) 0.66 0.22 (0.09-0.42)RK166 HBV|AP011098.1 2,419 12 30,301,884 32 47 0.41 (0.30-0.52) 0.65 0.63 (0.46-0.81)RK166 HBV|AP011098.1 731 13 103,554,268 128 28 0.82 (0.75-0.88) 0.94 0.88 (0.80-0.94)RK166 HBV|AP011098.1 2,391 13 104,161,305 56 191 0.23 (0.18-0.28) 0.94 0.24 (0.19-0.30)RK166 HBV|AP011098.1 2,996 13 104,214,794 52 21 0.71 (0.59-0.81) 0.94 0.76 (0.64-0.87)RK166 HBV|AP011098.1 3,015 13 104,280,513 179 56 0.76 (0.70-0.81) 0.93a 0.81 (0.75-0.87)RK166 HBV|AP011098.1 2,307 13 104,316,887 35 38 0.48 (0.36-0.60) 0.79 0.61 (0.46-0.76)RK166 HBV|AP011098.1 816 13 104,473,938 5 39 0.12 (0.04-0.25) 0.94 0.12 (0.04-0.26)RK166 HBV|AP011098.1 2,725 13 104,474,088 13 36 0.27 (0.15-0.41) 0.94 0.28 (0.16-0.44)RK166 HBV|AP011098.1 2,729 15 48,094,848 7 30 0.19 (0.08-0.35) 0.53 0.36 (0.15-0.66)RK166 HBV|AP011098.1 1,818 15 48,094,912 13 21 0.38 (0.22-0.56) 0.53 0.72 (0.42-1.06)RK208 HBV|AP011098.1 428 16 81,543,548 4 17 0.19 (0.05-0.42) 0.57 0.33 (0.10-0.74) CMIPRK208 HBV|AP011098.1 1,810 X 25,471,062 7 95 0.07 (0.03-0.14) 0.88 0.08 (0.03-0.15)

Supplementary Table 5: Recurrently mutated genes in the 29 LCBsGene Sample Chr Genomic position Genotype RNA_id Exon AA position Type Reference AA Mutant AA

TERT promoter RK067_C01 5 1,295,228 CT NM_198253 - - promoter - -TERT promoter RK084_C01 5 1,295,228 CT NM_198253 - - promoter - -TERT promoter RK108_C01 5 1,295,228 CT NM_198253 - - promoter - -TERT promoter RK109_C01 5 1,295,228 CT NM_198253 - - promoter - -TERT promoter RK137_C01 5 1,295,228 CT NM_198253 - - promoter - -TERT promoter RK182_C01 5 1,295,228 CT NM_198253 - - promoter - -TERT promoter RK184_C01 5 1,295,228 CT NM_198253 - - promoter - -TERT promoter RK316_C01 5 1,295,228 CT NM_198253 - - promoter - -TERT promoter RK317_C01 5 1,295,228 CT NM_198253 - - promoter - -

XIRP2 RK067_C01 2 168110581 CT NM_152381 9 3532 nonsynonymous S LXIRP2 RK108_C01 2 168099862 GT NM_152381 8 654 nonsynonymous A SXIRP2 RK137_C01 2 167992500 GT NM_152381 2 164 nonsynonymous S AXIRP2 RK182_C01 2 168101515 AG NM_152381 8 1205 nonsynonymous E KXIRP2 RK309_C01 2 168106288 AC NM_152381 8 2796 nonsynonymous K QTP53 RK166_C01 17 7577018 CT NM_000546 8 - splice site - -TP53 RK182_C01 17 7577555 -GCAGGAACTG NM_000546 6 - coding indel - -TP53 RK184_C01 17 7579491 -TTCTGGGAGC NM_000546 3 - coding indel - -TP53 RK298_C01 17 7578285 -CAGACCTAAGA NM_000546 5 - coding indel - -TP53 RK298_C01 17 7578285 -CAGACCTAAGA NM_000546 5 - splice site indel - -

GPR98 RK182_C01 5 89947486 GT NM_032119 18 1119 nonsynonymous G CGPR98 RK204_C02 5 89979509 CT NM_032119 28 1924 nonsynonymous I TGPR98 RK298_C01 5 90079720 AT NM_032119 67 4500 nonsynonymous H LGPR98 RK298_C01 5 90079731 AG NM_032119 67 4504 nonsynonymous R GPTEN RK298_C01 10 89692911 GT NM_000314 5 132 nonsynonymous G VPTEN RK312_C01 10 89720778 AG NM_000314 8 310 nonsynonymous D GPTEN RK084_C01 10 89720733 +T NM_000314 8 - coding indel - -KIF2B RK307_C01 17 51900500 CT NM_032559 1 36 nonsynonymous R CKIF2B RK312_C01 17 51901483 -G NM_032559 1 - coding indel - -KIF2B RK316_C01 17 51901460 +T NM_032559 1 - coding indel - -PBRM1 RK067_C01 3 52620557 CG NM_018313 20 1066 nonsynonymous P APBRM1 RK084_C01 3 52651369 AC NM_018313 14 576 nonsynonymous R LPBRM1 RK269_C01 3 52610639 -CT NM_018313 22 - coding indel - -ERBB4 RK067_C01 2 212566778 AT NM_005235 12 468 nonsynonymous Y FERBB4 RK108_C01 2 212652850 GT NM_005235 4 152 nonsynonymous N KSYT1 RK067_C01 12 79679738 CT NM_005639 2 113 nonsynonymous T MSYT1 RK137_C01 12 79837988 AT NM_005639 10 - splice site - -

WDR11 RK067_C01 10 122624642 CG NM_018117 6 266 nonsynonymous R PWDR11 RK182_C01 10 122637942 AG NM_018117 12 545 nonsynonymous N SPLCB1 RK084_C01 20 8639201 CT NM_015192 9 238 nonsynonymous P SPLCB1 RK109_C01 20 8637906 AG NM_015192 8 224 nonsynonymous E KEPHA2 RK269_C01 1 16462199 -G NM_004431 6 - coding indel - -EPHA2 RK312_C01 1 16459707 +A NM_004431 11 - coding indel - -DHX8 RK067_C01 17 41571205 CG NM_004941 8 388 nonsynonymous R PDHX8 RK109_C01 17 41566895 AG NM_004941 2 76 nonsynonymous E GODZ1 RK084_C01 23 123526125 AC NM_001163278 28 1822 nonsynonymous R LODZ1 RK084_C01 23 123526126 AG NM_001163278 28 1822 nonsynonymous R StopODZ1 RK194_C01 23 123779139 CT NM_001163278 10 577 nonsynonymous H R

PIK3CA RK138_C01 3 178928079 AG NM_006218 7 453 nonsynonymous E KPIK3CA RK310_C01 3 178936082 AG NM_006218 9 542 nonsynonymous E KGREB1 RK272_C01 2 11718509 CT NM_014668 5 242 nonsynonymous P SGREB1 RK307_C01 2 11777861 CT NM_014668 30 1789 nonsynonymous A V

MGAT4C RK073_C01 12 86373720 AT NM_013244 3 262 nonsynonymous K StopMGAT4C RK310_C01 12 86374167 AC NM_013244 3 113 nonsynonymous G Stop

SIGLEC12 RK208_C01 19 52001481 AG NM_053003 5 399 nonsynonymous L PSIGLEC12 RK138_C01 19 52004869 -AC NM_053003 1 - coding indel - -

TRO RK208_C01 23 54949387 AT NM_016157 2 141 nonsynonymous K MTRO RK309_C01 23 54948681 CT NM_016157 1 1 nonsynonymous M T

CDH2 RK204_C02 18 25565056 CT NM_001792 13 706 nonsynonymous N SCDH2 RK182_C01 18 25583107 -T NM_001792 7 - coding indel - -KRAS RK146_C01 12 25380275 GT NM_004985 2 61 nonsynonymous Q HKRAS RK194_C01 12 25398284 CT NM_004985 1 12 nonsynonymous G D

MYO10 RK084_C01 5 16672856 CT NM_012334 37 1751 nonsynonymous V IMYO10 RK109_C01 5 16794908 CG NM_012334 4 105 nonsynonymous P RBAP1 RK279_C01 3 52441268 AG NM_004656 7 168 nonsynonymous F LBAP1 RK310_C01 3 52440866 -CGCTCCATGATGACCCGCCGGG NM_004656 8 - coding indel - -PCLO RK166_C01 7 82580198 AC NM_033026 6 3236 nonsynonymous E StopPCLO RK307_C01 7 82763799 +T NM_033026 3 - coding indel - -NR2C1 RK067_C01 12 95416034 CT NM_003297 13 595 nonsynonymous N DNR2C1 RK208_C01 12 95451522 CT NM_003297 5 226 nonsynonymous D G

MAGEL2 RK067_C01 15 23890195 GT NM_019066 1 899 nonsynonymous R SMAGEL2 RK146_C01 15 23892197 AA NM_019066 1 231 nonsynonymous Q HMAGEL2 RK146_C01 15 23892426 AG NM_019066 1 155 nonsynonymous P L

TTN RK184_C01 2 179604159 CT NM_133432 45 4363 nonsynonymous I VTTN RK279_C01 2 179593252 -G NM_133378 62 - coding indel - -

ACCN1 RK298_C01 17 31351013 GT NM_001094 6 354 nonsynonymous D EACCN1 RK316_C01 17 31618531 AC NM_183377 1 201 nonsynonymous E DFBN3 RK279_C01 19 8152995 CT NM_032447 51 2149 nonsynonymous E KFBN3 RK312_C01 19 8190851 CT NM_032447 21 886 nonsynonymous V I

DOPEY2 RK298_C01 21 37586822 CT NM_005128 8 366 nonsynonymous V ADOPEY2 RK310_C01 21 37537089 GT NM_005128 1 20 nonsynonymous V LKCNJ16 RK184_C01 17 68128909 GT NM_018658 1 227 nonsynonymous S RKCNJ16 RK317_C01 17 68128653 AG NM_018658 1 142 nonsynonymous E GARID2 RK184_C01 12 46231458 GT NM_152641 10 433 nonsynonymous C FARID2 RK184_C01 12 46287315 CT NM_152641 19 1754 nonsynonymous R StopARID2 RK084_C01 12 46230421 -AC NM_152641 7 - coding indel - -ARID2 RK084_C01 12 46287469 -A NM_152641 20 - coding indel - -

ARID1A RK272_C01 1 27099046 -G NM_006015 13 - coding indel - -ARID1A RK303_C01 1 27099976 +T NM_006015 15 - coding indel - -

Supplementary Table 6: List of 68 LCB samples for the validation studySample ID Age Gneder Histology Viral infection Liver fibrosis Tumor size

(mm) n factor Portal veininvasion (vp)

Hepatic veininvasion (vv)

Bile ductinvasion (b)

Desease freesurvival censor Overall

survival censor

HK02 69 M ICC (-) 2 38 1 0 2 3 6 1 6 0HK03 55 M ICC (-) 0 25 0 0 0 1 9 0 34 0HK04 78 M ICC (-) 4 45 0 0 0 3 78 1 78 1HK05 62 F ICC HBV 1 93 0 0 3 1 2 0 8 0HK06 66 M ICC (-) 1 35 0 1 0 2 94 1 94 1HK07 73 M ICC (-) 3 50 0 1 0 4 15 0 20 0HK08 50 M ICC HCV 1 35 1 3 0 3 2 0 6 0HK09 58 M ICC (-) 1 33 0 3 1 2 4 0 28 0HK10 76 M ICC (-) 0 56 0 0 0 4 8 0 18 0HK11 61 M ICC (-) 1 65 1 1 2 0 16 0 28 0HK12 72 M ICC (-) 1 17 0 0 0 0 54 1 54 1HK13 71 F ICC (-) 1 26 1 0 0 4 17 0 19 0HK14 55 M ICC (-) 0 100 0 2 2 4 52 1 52 1HK15 71 M ICC (-) 1 40 0 1 0 1 29 0 37 0HK16 55 M ICC (-) 1 27 1 2 0 3 8 0 15 0HK17 71 M ICC (-) 2 61 1 3 2 2 8 0 28 0HK18 75 M ICC (-) 1 24 0 3 0 4 49 1 49 1HK21 53 M ICC (-) 1 90 1 3 2 4 7 0 20 0HK22 68 F ICC (-) 0 36 0 1 0 2 38 1 38 1HK24 66 F ICC (-) 3 60 0 3 2 4 24 1 24 1HK25 80 F ICC (-) 0 51 0 0 1 0 32 1 32 1OS01 50 F ICC (-) 0 100 1 0 0 3 2 0 3 0OS02 74 M cHCC/CC (-) 0 50 0 0 0 0 11 0 47 1OS03 55 M ICC (-) 1 40 1 0 0 3 0 0 4 1OS04 84 F ICC (-) 0 56 0 0 0 0 16 1 51 1OS05 80 M ICC (-) 0 35 0 0 0 1 0 0 12 0OS06 65 F ICC (-) 2 20 0 0 0 2 0 0 17 0OS07 77 M ICC HCV 3 60 0 1 0 0 76 1 76 1OS08 65 M ICC (-) 0 35 1 0 0 1 72 1 72 1OS09 71 M cHCC/CC (-) 2 120 0 1 0 0 5 1 5 1OS10 69 M ICC (-) 0 25 0 1 1 1 14 0 19 0OS11 54 F ICC HCV 1 18 0 1 0 0 70 1 70 1OS12 74 M ICC (-) 0 28 0 0 0 1 57 1 57 1OS13 79 M ICC (-) 1 34 0 2 0 0 57 1 57 1OS14 74 F ICC (-) 3 25 0 2 0 0 9 0 15 0OS15 58 M ICC (-) 1 30 0 1 0 1 24 0 46 1OS16 61 M CoCC HBV 1 42 0 0 0 1 53 1 53 1OS17 61 F ICC (-) 0 72 0 1 0 0 48 1 48 1OS18 68 F ICC (-) 0 80 0 0 0 0 40 1 40 1OS19 78 M CoCC HCV 2 23 0 0 0 0 4 0 37 1OS20 68 M ICC (-) 0 80 0 1 0 0 35 1 35 1OS21 76 M ICC (-) 0 50 0 1 1 0 31 1 31 1OS22 67 M cHCC/CC HCV 2 21 0 0 0 0 6 0 30 1OS23 73 M ICC (-) 1 23 0 0 0 0 22 1 22 1OS24 69 F ICC (-) 0 23 0 1 1 0 16 1 16 1OS25 55 M ICC (-) 0 40 0 1 0 0 8 0 13 1OS26 62 M cHCC/CC (-) 3 45 0 1 2 0 1 0 6 1OS27 72 M ICC (-) 3 32 1 2 0 0 1 0 7 1OS28 74 M ICC (-) 0 55 1 3 1 4 5 1 5 1OS29 60 F ICC HBV 1 30 0 1 1 1 4 1 4 1NCC1 68 M ICC HBV 2 90 0 1 1 1 7 0 9 0NCC2 74 M ICC HBV 1 95 0 1 1 1 8 1 8 1NCC3 72 F cHCC/CC HCV 3 60 0 0 0 0 21 0 22 1

OS_NBNC21 74 M ICC (-) 0 90 0 0 0 0 44 1 44 1OS_NBNC37 69 F ICC (-) 0 23 0 1 1 0 12 1 12 1

RK347 82 M cHCC/CC HCV 3 20 0 0 0 0 6 0 13 1RK348 76 F ICC (-) 0 46 0 0 0 1 44 1 44 1RK349 62 M cHCC/CC HBV 3 15 0 0 0 0 7 1 7 1HK98 61 F ICC (-) 0 ? 0 0 0 3 60 1 60 1HK99 66 M ICC (-) 3 25 0 0 2 3 18 0 54 0

HK100 64 M ICC (-) 0 6 1 3 1 4 2 0 12 0HK101 72 M ICC (-) 0 84 0 3 2 3 1 0 1 0HK102 74 F ICC (-) 1 35 0 0 1 4 18 0 29 1HK103 61 F ICC (-) 0 50 0 1 3 0 25 1 25 12900 70 M ICC HBV 2 40 0 0 1 0 40 1 40 12256 56 M ICC HCV 2 35 0 0 0 1 90 1 90 12371 65 F ICC HBV 2 14 0 0 0 0 75 1 75 12745 61 M cHCC/CC (-) 4 25 0 0 0 0 2 0 49 1

Supplementary Table 7: Validation for the mutated genes in independent 68 LCBsGene Sample Chr Genomic position Genotype RNA_id Exon AA position Type Reference AA Mutant AA

TERT promoter OS02 5 1,295,228 CT NM_198253 - - promoter - -TERT promoter OS09 5 1,295,228 CT NM_198253 - - promoter - -TERT promoter OS19 5 1,295,228 CT NM_198253 - - promoter - -TERT promoter OS22 5 1,295,228 CT NM_198253 - - promoter - -TERT promoter RK347 5 1,295,228 CT NM_198253 - - promoter - -

KRAS HK98 12 25,398,285 CT NM_004985 2 12 nonsynonymous G SKRAS HK10 12 25,398,284 CT NM_004985 2 12 nonsynonymous G DKRAS HK16 12 25,398,284 CA NM_004985 2 12 nonsynonymous G VKRAS OS01 12 25,398,284 CA NM_004985 2 12 nonsynonymous G VKRAS OS04 12 25,398,284 CT NM_004985 2 12 nonsynonymous G DKRAS OS21 12 25,398,284 CT NM_004985 2 12 nonsynonymous G DKRAS OS27 12 25,398,284 CA NM_004985 2 12 nonsynonymous G V

PBRM1 NCC2 3 52,598,142 GT NM_181042 23 1267 nonsynonymous Q KPBRM1 HK15 3 52,610,695 AG NM_181042 22 1185 nonsense R XPBRM1 OS10 3 52,613,124 +C NM_181042 21 1160 indel - -PBRM1 OS09 3 52,643,803 AG NM_181042 16 698 nonsynonymous L PPBRM1 2900 3 52,651,357 CT NM_181042 14 580 nonsynonymous Y CPBRM1 2371 3 52,651,466 +A NM_181042 14 544 indel - -PBRM1 NCC2 3 52,682,380 AG NM_181042 7 265 nonsynonymous P SPBRM1 NCC1 3 52,696,289 CT NM_181042 4 130 nonsynonymous D NARID2 OS12 12 46,230,605 AG NM_152641 21 285 nonsynonymous R DARID2 HK10 12 46,233,114 -TGTTAGTGTGTCTGGTT NM_152641 11 445 coding indel - -ARID2 HK15 12 46,242,736 +T NM_152641 13 567 coding indel - -ARID2 OS02 12 46,244,448 +T NM_152641 15 848 coding indel - -ARID2 OS22 12 46,287,324 -C NM_152641 19 1757 coding indel - -BAP1 OS29 3 52,436,421 -A NM_004656 17 691 indel - -BAP1 HK08 3 52,437,662 CT NM_004656 13 500 nonsynonymous G DBAP1 OS28 3 52,441,262 AT NM_004656 7 170 nonsynonymous F IBAP1 HK14 3 52,442,095 CT NM_004656 4 - splice site - -PCLO OS22 7 82,451,870 CT NM_033026 20 4911 nonsynonymous D GPCLO OS22 7 82,508,713 CT NM_033026 10 4532 nonsynonymous A TPCLO NCC2 7 82,581,822 AG NM_033026 5 2816 nonsynonymous A VPCLO NCC3 7 82,585,015 CT NM_033026 5 1752 nonsynonymous S GPCLO HK101 7 82,595,521 -T NM_033026 4 1195 indel - -PCLO OS22 7 82,791,715 AC NM_033026 1 65 nonsynonymous G VIDH1 HK22 2 209,113,113 AG NM 005896 2 132 nonsynonymous R GIDH1 OS05 2 209,113,113 AG NM 005896 2 132 nonsynonymous R CIDH1 OS24 2 209,113,113 AG NM 005896 2 132 nonsynonymous R GIDH1 OS_NBNC37 2 209,113,113 AG NM 005896 2 132 nonsynonymous R GIDH2 HK14 15 90,631,839 AT NM_002168 4 172 nonsynonymous R WIDH2 NCC2 15 90,631,839 AT NM_002168 4 172 nonsynonymous R WODZ1 OS03 23 123,637,447 AC NM_001163279 19 1135 nonsynonymous W CODZ1 OS21 23 123,680,772 CT NM_001163279 15 867 nonsynonymous G DODZ1 NCC1 23 123,870,852 AT NM_001163279 4 244 nonsynonymous L Q

EPHA2 OS22 1 16,451,787 AG NM_004431 17 952 nonsynonymous P SEPHA2 OS28 1 16,475,422 +A NM_004431 3 92 indel - -SYT1 NCC1 12 79,611,400 AG NM_001135805 1 34 nonsynonymous E GSYT1 OS22 12 79,693,221 CT NM_001135805 5 234 nonsynonymous R CCDH2 NCC3 18 25,568,514 CT NM_001792 11 572 nonsynonymous N SCDH2 HK101 18 25,593,693 AG NM_001792 3 118 nonsynonymous A VXIRP2 OS28 2 167,992,500 GT NM_152381 9 164 nonsynonymous S AKIF2B OS03 17 51,901,459 +T NM_032559 1 355 indel - -

Supplementary Table 8: Association of clinical factors and mutations in LCBs and HCCsAge Gender ICC vs other cHCC/CC vs

otherICC vs

cHCC/CCviral infection (+)

vs (-) HBV vs HCV Liver fibrosis0 vs other n factor vp + vs. - vv + vs. - b + vs. - vp- vv- b- vs.

otherTumor size

(mm)TERT promoter 0.049* ns 8.4×10-6*** 6.5×10-5*** 1.9×10-5*** 0.0015 0.0052 ns ns ns ns 0.00082*** 0.027 ns

KRAS a ns ns ns ns ns 0.028* ns 0.001** 0.048* ns ns ns ns nsXIRP2 ns ns ns ns ns ns ns ns ns ns ns ns ns nsARID2 ns ns 0.031* 0.01* 0.012* ns ns ns ns ns ns ns ns nsPBRM1 ns ns ns ns ns ns ns ns ns ns ns ns ns nsBAP1 ns ns ns ns ns ns ns ns ns 0.029* 0.019* 0.012* ns nsPCLO ns ns ns 0.046* ns ns ns ns ns ns ns ns ns nsIDH1 ns ns ns ns ns ns ns 0.005** ns ns ns ns ns nsIDH2 ns ns ns ns ns ns ns ns ns ns 0.041* ns ns 0.0084**

IDH1 or IDH2 ns ns ns ns ns ns ns 0.003** ns ns 0.021* ns ns nsODZ1 ns ns ns ns ns ns ns ns ns ns ns ns ns nsMethod Wilcoxon Fisher Fisher Fisher Fisher Fisher Fisher Fisher Fisher Fisher Fisher Fisher Fisher Wilcoxon

Note

AIRD2 andTERT promoter

mutations areless frequent inthe ICC thanthe others.

TERTpromoter,

AIRD2 andPCLO

mutations aremore frequent

in thecHCC/CC than

the others.

TERT promoterand AIRD2

mutations aremore frequent

in thecHCC/CC than

the ICC.

TERT promotermuattions are more

frequent in viralinfection (+) cases.

KRAS mutationsare more frequent in

viral infection (-)cases.

TERT promotermuattions aremore frequentin the HCV(+)

cases.

KRAS , IDH1and IDH2

mutations aremore frequentin the subjects

with liverfibrosis = 0.

KRAS mutationis more frequentin the subjects

with lymphnode metastasis.

BAP1mutations are

more frequenetin the subjects

with portal veininvasion.

IDH1 , IDH2and BAP1

mutations aremore frequentin the subjectswith hepatic

vein invasion.

TERT promotermutations are

less frequenet inthe subjects

with bile ductinvasion. BAP1mutations are

more frequenetin the subjectswith bile duct

invasion.

TERT promotermutations are

less frequenet inthe subjects

with portal veininvasion,

hepatic veininvasion or bileduct invasion.

The subjectswith IDH2

mutation havelarger tumor

size.

a; TERT promoter hotspot (chr5:1,295,228C>T) was examined by the Sanger sequence method.* < 0.05, ** < 0.01, *** < 0.001

b; KRAS mutation at codon 61 was excluded from the analysis, because of its unknown significance.

Supplementary Table 9: Gene Set Enrichment (GSE) analysis for genes with protein-alternating mutations and rearrangementsCategory Term Count % P-value List Total Pop Hits Pop Total Fold Enrichment q-value a

SP_PIR_KEYWORDS phosphoprotein 909 48.1 1.92E-32 1789 7263 19235 1.35 2.88E-29SP_PIR_KEYWORDS alternative splicing 921 48.8 5.41E-30 1789 7488 19235 1.32 8.09E-27UP_SEQ_FEATURE splice variant 915 48.4 1.39E-28 1785 7458 19113 1.31 2.67E-25

SP_PIR_KEYWORDS polymorphism 1249 66.1 6.56E-20 1789 11550 19235 1.16 9.82E-17UP_SEQ_FEATURE sequence variant 1281 67.8 9.28E-18 1785 11992 19113 1.14 1.78E-14

SP_PIR_KEYWORDS coiled coil 275 14.6 1.97E-11 1789 2019 19235 1.46 2.94E-08SP_PIR_KEYWORDS atp-binding 192 10.2 3.02E-10 1789 1326 19235 1.56 4.52E-07SP_PIR_KEYWORDS calcium 127 6.7 2.51E-09 1789 803 19235 1.70 3.75E-06UP_SEQ_FEATURE domain:EGF-like 1 34 1.8 8.72E-09 1785 120 19113 3.03 1.67E-05UP_SEQ_FEATURE compositionally biased region:Ser-rich 78 4.1 9.90E-09 1785 425 19113 1.97 1.90E-05

SP_PIR_KEYWORDS cell adhesion 77 4.1 1.35E-08 1789 422 19235 1.96 2.02E-05UP_SEQ_FEATURE domain:EGF-like 2 28 1.5 1.53E-08 1785 88 19113 3.41 2.93E-05UP_SEQ_FEATURE domain:C2 26 1.4 3.44E-08 1785 80 19113 3.48 6.59E-05

SP_PIR_KEYWORDS nucleotide-binding 220 11.6 1.13E-07 1789 1686 19235 1.40 1.68E-04SP_PIR_KEYWORDS ionic channel 60 3.2 1.96E-07 1789 318 19235 2.03 2.94E-04UP_SEQ_FEATURE domain:Fibronectin type-III 6 15 0.8 2.08E-07 1785 31 19113 5.18 3.99E-04UP_SEQ_FEATURE nucleotide phosphate-binding region:ATP 138 7.3 2.81E-07 1785 962 19113 1.54 5.38E-04

INTERPRO IPR000008:C2 calcium-dependent membrane targeting 35 1.9 3.36E-07 1650 137 16659 2.58 5.80E-04UP_SEQ_FEATURE domain:Fibronectin type-III 4 21 1.1 4.47E-07 1785 62 19113 3.63 8.57E-04

SP_PIR_KEYWORDS actin-binding 49 2.6 6.62E-07 1789 247 19235 2.13 9.90E-04SP_PIR_KEYWORDS cytoskeleton 98 5.2 6.88E-07 1789 636 19235 1.66 0.0010GOTERM_BP_FAT GO:0050808~synapse organization 21 1.1 6.85E-07 1324 61 13528 3.52 0.0013GOTERM_MF_FAT GO:0001883~purine nucleoside binding 222 11.8 8.43E-07 1331 1601 12983 1.35 0.0014UP_SEQ_FEATURE domain:EGF-like 3 23 1.2 7.67E-07 1785 75 19113 3.28 0.0015GOTERM_MF_FAT GO:0001882~nucleoside binding 223 11.8 9.29E-07 1331 1612 12983 1.35 0.0015GOTERM_MF_FAT GO:0046872~metal ion binding 503 26.6 1.02E-06 1331 4140 12983 1.19 0.0017GOTERM_MF_FAT GO:0060589~nucleoside-triphosphatase regulator activity 75 4.0 1.03E-06 1331 413 12983 1.77 0.0017

SP_PIR_KEYWORDS egf-like domain 46 2.4 1.21E-06 1789 230 19235 2.15 0.0018GOTERM_MF_FAT GO:0032559~adenyl ribonucleotide binding 209 11.1 1.15E-06 1331 1497 12983 1.36 0.0019

INTERPRO IPR003961:Fibronectin, type III 42 2.2 1.24E-06 1650 190 16659 2.23 0.0021GOTERM_MF_FAT GO:0030554~adenyl nucleotide binding 218 11.5 1.32E-06 1331 1577 12983 1.35 0.0021

SP_PIR_KEYWORDS metal-binding 347 18.4 1.80E-06 1789 2972 19235 1.26 0.0027GOTERM_BP_FAT GO:0006468~protein amino acid phosphorylation 104 5.5 1.48E-06 1324 667 13528 1.59 0.0027GOTERM_MF_FAT GO:0043167~ion binding 512 27.1 1.74E-06 1331 4241 12983 1.18 0.0028GOTERM_MF_FAT GO:0030695~GTPase regulator activity 73 3.9 1.77E-06 1331 404 12983 1.76 0.0029GOTERM_MF_FAT GO:0043169~cation binding 505 26.7 1.92E-06 1331 4179 12983 1.18 0.0031GOTERM_MF_FAT GO:0005524~ATP binding 205 10.9 2.29E-06 1331 1477 12983 1.35 0.0037GOTERM_BP_FAT GO:0043062~extracellular structure organization 37 2.0 2.25E-06 1324 163 13528 2.32 0.0041GOTERM_MF_FAT GO:0005509~calcium ion binding 138 7.3 2.71E-06 1331 919 12983 1.46 0.0044UP_SEQ_FEATURE domain:Fibronectin type-III 2 31 1.6 2.47E-06 1785 130 19113 2.55 0.0047UP_SEQ_FEATURE domain:Fibronectin type-III 1 31 1.6 2.92E-06 1785 131 19113 2.53 0.0056

SP_PIR_KEYWORDS cytoplasm 381 20.2 3.79E-06 1789 3332 19235 1.23 0.0057KEGG_PATHWAY hsa04070:Phosphatidylinositol signaling system 22 1.2 4.84E-06 501 74 5085 3.02 0.0059

INTERPRO IPR008957:Fibronectin, type III-like fold 40 2.1 3.53E-06 1650 184 16659 2.19 0.0061SP_PIR_KEYWORDS chromosomal rearrangement 51 2.7 4.57E-06 1789 279 19235 1.97 0.0068UP_SEQ_FEATURE domain:Fibronectin type-III 7 12 0.6 3.60E-06 1785 24 19113 5.35 0.0069UP_SEQ_FEATURE domain:Fibronectin type-III 8 12 0.6 3.60E-06 1785 24 19113 5.35 0.0069UP_SEQ_FEATURE compositionally biased region:Poly-Ser 76 4.0 4.05E-06 1785 475 19113 1.71 0.0078UP_SEQ_FEATURE domain:EGF-like 6 16 0.8 4.07E-06 1785 43 19113 3.98 0.0078

INTERPRO IPR001965:Zinc finger, PHD-type 25 1.3 4.59E-06 1650 90 16659 2.80 0.0079GOTERM_BP_FAT GO:0030001~metal ion transport 77 4.1 4.68E-06 1324 465 13528 1.69 0.0086

INTERPRO IPR013032:EGF-like region, conserved site 55 2.9 5.10E-06 1650 293 16659 1.90 0.0088UP_SEQ_FEATURE domain:Fibronectin type-III 3 23 1.2 4.95E-06 1785 83 19113 2.97 0.0095GOTERM_CC_FAT GO:0030054~cell junction 83 4.4 7.18E-06 1253 518 12782 1.63 0.011

INTERPRO IPR005821:Ion transport 28 1.5 6.62E-06 1650 110 16659 2.57 0.011GOTERM_MF_FAT GO:0004672~protein kinase activity 97 5.1 7.97E-06 1331 606 12983 1.56 0.013GOTERM_MF_FAT GO:0005085~guanyl-nucleotide exchange factor activity 35 1.9 8.68E-06 1331 152 12983 2.25 0.014

INTERPRO IPR019786:Zinc finger, PHD-type, conserved site 25 1.3 8.50E-06 1650 93 16659 2.71 0.015UP_SEQ_FEATURE domain:Fibronectin type-III 5 16 0.8 7.73E-06 1785 45 19113 3.81 0.015

INTERPRO IPR013098:Immunoglobulin I-set 32 1.7 1.01E-05 1650 138 16659 2.34 0.017SMART SM00239:C2 35 1.9 1.34E-05 1064 137 9079 2.18 0.018

GOTERM_BP_FAT 051056~regulation of small GTPase mediated signal transd 48 2.5 1.04E-05 1324 252 13528 1.95 0.019UP_SEQ_FEATURE zinc finger region:PHD-type 17 0.9 1.24E-05 1785 52 19113 3.50 0.024GOTERM_BP_FAT GO:0007155~cell adhesion 104 5.5 1.29E-05 1324 700 13528 1.52 0.024

INTERPRO IPR011993:Pleckstrin homology-type 55 2.9 1.39E-05 1650 303 16659 1.83 0.024GOTERM_BP_FAT GO:0022610~biological adhesion 104 5.5 1.36E-05 1324 701 13528 1.52 0.025GOTERM_MF_FAT GO:0003779~actin binding 59 3.1 1.87E-05 1331 326 12983 1.77 0.030

SP_PIR_KEYWORDS cell junction 64 3.4 2.06E-05 1789 399 19235 1.72 0.031GOTERM_BP_FAT GO:0016310~phosphorylation 115 6.1 1.96E-05 1324 800 13528 1.47 0.036

SP_PIR_KEYWORDS ion transport 85 4.5 2.42E-05 1789 578 19235 1.58 0.036SP_PIR_KEYWORDS calcium transport 22 1.2 2.44E-05 1789 85 19235 2.78 0.036

INTERPRO IPR017441:Protein kinase, ATP binding site 74 3.9 2.24E-05 1650 455 16659 1.64 0.039SP_PIR_KEYWORDS calcium channel 18 1.0 2.70E-05 1789 61 19235 3.17 0.040UP_SEQ_FEATURE binding site:ATP 81 4.3 2.48E-05 1785 542 19113 1.60 0.048

a q-value was obtained by Benjamini and Hochberg’s FDR method

Supplementary Table 10: Genes and their pathways that were highly ranked in the proportion of their mutated alleles

GO number GO term #gene #gene withoutGO term

Average rankof genes with

GO

Average rankof genes

without GOAdjusted P-value P-value Gene

GO:0090399 replicative senescence 12 1072 6.9 25.7 0.003 1.90E-05 TP53,TP53,TP53,ATR,TP53,ATM,TP53,ATR,TP53,TP53,TP53,TP53GO:0008156 negative regulation of DNA replication 11 1073 7.5 25.7 0.014 7.75E-05 TP53,TP53,TP53,ATR,TP53,TP53,ATR,TP53,TP53,TP53,TP53GO:0043525 positive regulation of neuron apoptotic process 13 1071 9.4 25.7 0.024 0.00013663 MUSK,TP53,TP53,TP53,TP53,ATM,TP53,TP53,TFAP2B,TP53,TP53,TP53,PTPRFGO:0006974 response to DNA damage stimulus 18 1066 11.6 25.8 0.037 0.00020964 UBR5,TP53,TP53,BAZ1B,RAD50,TP53,ATR,BRE,TP53,ATM,BLM,TP53,ATR,TP53,TP53,TP53,TP53GO:2001244 positive regulation of intrinsic apoptotic signaling pathway 9 1075 6.9 25.7 0.038 0.00021388 TP53,TP53,TP53,TP53,TP53,TP53,TP53,TP53,TP53GO:0002326 B cell lineage commitment 9 1075 6.9 25.7 0.038 0.00021388 TP53,TP53,TP53,TP53,TP53,TP53,TP53,TP53,TP53GO:0006983 ER overload response 9 1075 6.9 25.7 0.038 0.00021388 TP53,TP53,TP53,TP53,TP53,TP53,TP53,TP53,TP53GO:0002360 T cell lineage commitment 9 1075 6.9 25.7 0.038 0.00021388 TP53,TP53,TP53,TP53,TP53,TP53,TP53,TP53,TP53GO:0032461 positive regulation of protein oligomerization 9 1075 6.9 25.7 0.038 0.00021388 TP53,TP53,TP53,TP53,TP53,TP53,TP53,TP53,TP53GO:0035690 cellular response to drug 9 1075 6.9 25.7 0.038 0.00021388 TP53,TP53,TP53,TP53,TP53,TP53,TP53,TP53,TP53GO:0071850 mitotic cell cycle arrest 9 1075 6.9 25.7 0.038 0.00021388 TP53,TP53,TP53,TP53,TP53,TP53,TP53,TP53,TP53GO:0051276 chromosome organization 9 1075 6.9 25.7 0.038 0.00021388 TP53,TP53,TP53,TP53,TP53,TP53,TP53,TP53,TP53GO:0007406 negative regulation of neuroblast proliferation 9 1075 6.9 25.7 0.038 0.00021388 TP53,TP53,TP53,TP53,TP53,TP53,TP53,TP53,TP53GO:0009651 response to salt stress 9 1075 6.9 25.7 0.038 0.00021388 TP53,TP53,TP53,TP53,TP53,TP53,TP53,TP53,TP53

Supplementary Table 11: List of siRNAs used for knockdown experiments

sense sequence (5'-3') anti-sense sequence (5'-3')

GCAGCACGACUUCUUCAAGTT CUUGAAGAAGUCGUGCUGCTT

CGUACGCGGAAUACUUCGATT UCGAAGUAUUCCGCGUACGTT

siRNA1 CCAUCAAGAUGCAGCAGUATT UACUGCUGCAUCUUGAUGGTT

siRNA2 GACUCAAGGACCAGGUGAATT UGAGCUCAAUGAAGAUACGTT

siRNA1 GACCAUUCCACCUGGUUUATT UAAACCAGGUGGAAUGGUCTT

siRNA2 CUGGUUUAUUCUGGCGUUUTT AAACGCCAGAAUAAACCAGTT

siRNA1 CUCUAGAUUGCAUAGUUAUTT AUAACUAUGCAAUCUAGAGTT

siRNA2 GAAAGAUUGGUCCACUCUATT UAGAGUGGACCAAUCUUUCTT

siRNA1 CCGUAAUACCUUUGCUCAATT UUGAGCAAAGGUAUUACGGTT

siRNA2 GACAAGAUGUCACCUGAAATT UUUCAGGUGACAUCUUGUCTTsiXIRP2

siRNA name

siEGFPsiLuc

siEPHA2

siODZ1

siPCLO

Supplementary Methods

Comparison of substitution patterns among ICC, cHCC/CC and HCC

Proportions of each base substitution were compared between ICC, cHCC/CC and HCC

with the Wilcoxon signed-rank test. No significant difference was observed between

ICC and cHCC/CC, nor between cHCC/CC and HCC. However, between

hepatitis-negative ICC and cHCC/CC, substitutions of C:G to T:A, T:A to C:G and T:A

to C:G were significantly different (Supplementary Fig. 14). The same group of

substitutions was not significantly different between hepatitis-positive ICC and

cHCC/CC. These results suggest that cHCC/CC, all of which were hepatitis positive,

and hepatitis-positive ICC are more similar to HCC than hepatitis-negative ICC and that

hepatitis has strong influence on the substitution pattern (see also Supplementary Fig.

15).

Permutation test for substitution pattern

To evaluate the distance in the substitution patterns between the two groups (such as

hepatitis+ and hepatitis- LCBs), we performed permutation tests based on the PCA (Fig.

2e). To examine the distance between n and m tumors (n, m; number of samples), we

randomly selected 2 sets of tumors (n and m tumors) from the 90 tumors and calculated

the centers of gravity for each set. Then the distance between the centers of gravity was

calculated within the randomly selected two groups. We repeated this process 100,000

times to obtain the null distribution. The distance between two groups was tested under

this null distribution. After the Bonferroni correction, we identified significant

difference between all LCBs and HCCs (P-value < 0.00001), hepatitis positive LCBs

and hepatitis negative LCBs (P-value = 0.0012), and between hepatitis negative LCBs

and HCCs (P-value < 0.00001), but the difference between hepatitis positive LCBs and

HCCs was not significant.

Detection of HBV integration sites

HBV integration sites were identified using read-pair mapping information in three of

the LCBs12

, and subsequently validated by PCR and Sanger sequencing

(Supplementary Table 4). One HBV-positive LCB (RK166) had twelve observed

integration sites, seven of which were in a region also identified as amplified in copy

number analysis on chromosome 13. These clonal proportions suggested the

integrations occurred during an early stage of carcinogenesis. RK069 and RK208 tested

negative for the HBs antigen in their sera but positive for either the anti-HB core

antibody or HBV-DNA in their sera, indicating occult-HBV infection4. These results

indicate that HBV integration can contribute to LCB development as well as HCC

development25

.

Copy number alternations

Copy number alternations (CNAs) were detected by calculating the ratio of the average

depth of coverage in cancer to that in blood for 20kbp bins and analyzed the ratio using

the DNAcopy R package49

.

Estimation of allelic imbalance ratio

The allelic imbalance ratio (AIR) was estimated from the ratio of allele frequency at a

heterozygous site in the tumor samples. First, we identified heterozygous SNP sites in

the lymphocyte sample by the method described previously50,51

. Then the major allele

frequency at the heterozygous site in the matched tumor sample was estimated from the

number of reads at the position with the depth ≥ 10. The allele frequency was averaged

with 10 adjacent SNP sites. Segmentation and estimation of AIR were carried out by the

DNAcopy software49

.

Since we used the major allele frequency at each heterogynous SNP position,

the AIR should be overestimated. To correct for this, we used a maximum likelihood

approach. Allele frequencies follow a binomial distribution, and the major allele

frequency follows a folded binomial distribution. In each segment, we calculated the

likelihood using a folded binomial distribution. For a segment with n SNPs, the

likelihood for the AIR was calculated as follows:

L(D|p) = ∑ log 𝑝(𝑟𝑖, 𝑚𝑖|𝑝)𝑛𝑖=1 ,

where p is the AIR of the segment and mi is the depth of coverage and ri is the number

of major allele calls of the ith SNP in the segment. The probability of mass function of a

folded binomial distribution is given by;

𝑝(𝑟𝑖 , 𝑚𝑖|𝑝) = [1 − 1

2𝛿𝑟𝑖,𝑚𝑖−𝑟𝑖

] (𝑚𝑖

𝑟𝑖) [𝑝𝑟𝑖(1 − 𝑝)(𝑚𝑖−𝑟𝑖) + 𝑝(𝑚𝑖−𝑟𝑖)(1 − 𝑝)𝑟𝑖] (𝑟𝑖 ≥

1

2𝑚𝑖),

𝑝(𝑟𝑖 , 𝑚𝑖|𝑝) = 0 (𝑟𝑖 <1

2𝑚𝑖) ,

where 𝛿𝑟𝑖,𝑚𝑖−𝑟𝑖 = 1 or 0 as ri = mi – ri or ri ≠ mi – ri, respectively

52. Likelihoods were

calculated for each p (0.01, 0.02, ….. ,1) and the p with the highest likelihood was

considered as an estimate of AIR for the segment.

Estimation of proportion of mutated allele (PMA)

To infer intra-tumor heterogeneity and identify founder mutations, we measured the

clonal proportion of nonsynonymous point mutations and coding indels. These

mutations candidates were amplified by PCR, from which a library was constructed and

sequenced on the HiSeq2000 platform.

Mapping was done by BWA-SW against reference sequences, which were

constructed from the amplified region53

. Of the 1,169 protein alternating mutations,

1,084 candidates were sequenced with depth ≥ 100. For point mutations, we used read

sequences with mapping quality ≥ 10, base calls with base quality ≥ 30 and targets with

depth ≥ 100 for the analysis. Clonal proportions were estimated from the number of

reads with and without the candidate mutation.

For indels, clonal proportions would be underestimated, because read

sequences containing the indel may not be accurately mapped, and indels at the end of

read sequences would not be identified by the BWA-SW. To reduce the mapping bias,

we generated reference sequence with the indel allele, and mapped read sequences to

the both the wild type and indel reference sequences. Clonal proportions were estimated

by a/(a + b), where a is the number of read sequences mapped without indel to the

reference with indel and b is the number of read sequences mapped without indel to the

reference without indel.

Since clonal proportion is affected by deletions and amplifications in the

target region, we estimated the proportion of mutated allele (PMA) from the clonal

proportion and allelic imbalance ratio.

Our simulation study suggested that AIR ≥ 0.55 indicate allelic imbalance

event (data not shown), therefore we adjusted clonal proportion for target mutations in a

region with AIR ≥ 0.55 as follows:

(PMA) = (Clonal proportion)/(AIR).

If AIR is lower than 0.55, the PMA was estimated as follows:

(PMA) = (Clonal proportion)/0.5.

The PMA of copy number loss at the target sites was also estimated from the copy

number. For example, if the region had deletion of 0.6 copies, PMA was estimated to be

0.6.

In this method, because haplotype information is difficult to obtain, we

assumed that the target mutations were harbored on the major haplotype if the target

mutation was in region with allelic imbalance. This assumption is correct if the clonal

proportion is larger than 0.5. However, if one allelic copy is amplified and founder

mutation is harbored on the other copy, the PMA will be underestimated.

Clonal proportion of rearrangement and HBV integration sites

To estimate the clonal proportion of HBV integration sites, we used the number of

read-pairs at the estimate insertion sites.

(Clonal proportion) = a/(a + b),

where a is the number of read pairs that support the rearrangement and b is the number

of consistent read pairs that traverse the breakpoint on the reference genome

(Supplementary Table 4). The clonal proportions of HBV integration were adjusted by

the AIR of the region, and the PMA was estimated. Most HBV integration sites had a

low PMA. However, the PMA of several integration sites in RK166 reached ~80%,

higher than that observed for point mutations and short indels in the same sample. This

result suggests that the HBV integration sites on chr13 were early events in tumor

progression in RK166. The clonal proportions of rearrangements in RK142, which had

chromothripsis on chr1 and chr14, were low, suggesting that the chromothripsis events

occurred in sub-populations of the tumor.

Exome sequencing of liver cancer cell lines

Exome sequencing of the 31 liver cancer cell lines was done using Agilent SureSelect

Human All Exon V4 and HiSeq2000 according to manufacturer’s instruction. SNVs and

indels were detected by VCMM software51

. Identified nonsynonymous SNVs were

filtered on dbSNP, 1000genome and in house exome databases to exclude possible

germline variants54

.

Knockdown experiments and migration/invasion assays in liver cancer cell lines

Three cholangiocarcinoma cell lines (SSP-25, RBE, and OZ) and three HCC cell lines

(HLF, Li-7, and SNU-449) were used for knockdown experiments of target genes. HLF

and RBE were obtained from the Japanese Collection of Research Bioresources Cell

Bank, SNU-449 from ATCC, and SSP-25, RBE, and Li-7 from RIKEN Bioresource

Center. We confirmed no deleterious mutation in the genes of interest by sequencing

analysis. 1,000 cells were seeded on 24-well plates a day before transfection, and

transfected with two pre-designed siRNAs for each genes or negative control siEGFP

(and siLuc), respectively, using Lipofectamine RNAiMAX (Invitrogen, Carlsbad, CA)

according to the manufacturer's instruction. Sequence information of siRNAs is

provided in Supplemental Table 11. The knockdown efficiency of genes were assessed

by quantitative RT-PCR using primers: 5’- GGCTACACTGCCATCGAGAAG -3’ and

5’- TGGTCCTTGAGTCCCAGCA-3’ for EPHA2, 5’-

TGCACCACAGTGATTATGGGTTT-3’ and 5’- TCCAAGGCAACCAGAGTGTTC-3’

for ODZ1, 5’-CAAGCCACTCATTGCCTGAA-3’ and 5’-

CTGTTTCCTCACGACGCAGA-3’ for PCLO, and 5’-

ACTTCTTCCCGTAATACCTTTGCTC-3’ and

5’-CTTTGGACCCTTCTTGTTCATTTC-3’ for XIRP2. For the cell proliferation

assay, cells were seeded in 24-well plate format a day before transfection. The cell

numbers in triplicate wells were assessed by water-soluble tetrazolium salt (WST) assay

(Dojindo, Kumamoto Japan) 96 hours after siRNA transfection. Transwell migration

and invasion assay were performed in 24-well modified Boyden chambers pre-coated

with (invasion) or without (migration) Matrigel (BD Transduction, Franklin Lakes, NJ,

USA). Cells were transfected with siRNAs in 6-well plate format using Lipofectamine

RNAiMAX according to the manufacture’s instructions. A day after siRNA

transfection, appropriate numbers of each cells were transferred into the upper chamber.

Following 24 hours of incubation, the migrated or invasive cells on the lower surface of

filters were fixed and stained with the Diff-Quik stain (Sysmex, Kobe, Japan), and

stained cells were counted directly with three different field of microscopy. The

numbers of viable cells in each condition were assessed by WST assay at the same time

point and the numbers of cells on invasion or migration assay were normalized with the

ratio of viability score in siEGFP transfectants. Differences between subgroups were

tested by Student’s t-test and considered significant at the P-value < 0.05 level.

Supplementary References

49. Andersson, R. et al. A segmental maximum a posteriori approach to genome-wide

copy number profiling. Bioinformatics 24, 751-8 (2008).

50. Fujimoto, A. et al. Whole-genome sequencing and comprehensive variant analysis

of a Japanese individual using massively parallel sequencing. Nat Genet 42, 931-6

(2010).

51. Shigemizu, D. et al. A practical method to detect SNVs and indels from whole

genome and exome sequencing data. Scientific Reports 3, 2161 (2013).

52. Porzioa, G.C. & Ragozinib, G. On the stochastic ordering of folded binomials.

Statistics & Probability Letters 79, 1299–1304 (2009).

53. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler

transform. Bioinformatics 25, 1754-60 (2009).

54. Abecasis, G.R. et al. A map of human genome variation from population-scale

sequencing. Nature 467, 1061-73 (2010).