Transcript
Page 1: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

Supplementary Information for

Whole-genome sequence of a flatfish provides insights into ZW sex

chromosome evolution and adaptation to a benthic lifestyle

Songlin Chen1,10,11

, Guojie Zhang2,10

, Changwei Shao1,10

, Quanfei Huang2,10

, Geng Liu 2,10

, Pei Zhang2,10

, Wentao Song1, Na An

2, Domitille Chalopin

3, Jean-Nicolas Volff

3,

Yunhan Hong4, Qiye Li

2, Zhenxia Sha

1, Heling Zhou

2, Mingshu Xie

1, Qiulin Yu

2, Yang

Liu5, Hui Xiang

6, Na Wang

1, Kui Wu

2, Changgeng Yang

1, Qian Zhou

2, Xiaolin Liao

1,

Linfeng Yang2, Qiaomu Hu

1, Jilin Zhang

2, Liang Meng

1, Lijun Jin

2, Yongsheng Tian

1,

Jinmin Lian2, Jingfeng Yang

1, Guidong Miao

1, Shanshan Liu

1, Zhuo Liang

1, Fang Yan

1,

Yangzhen Li1, Bin Sun

1, Hong Zhang

1, Jing Zhang

1,Ying Zhu

1, Min Du

1, Yongwei

Zhao1, Manfred Schartl

7,11, Qisheng Tang

1,11& Jun Wang

2,8,9,11

1Yellow Sea Fisheries Research Institute, CAFS, Key Laboratory for Sustainable

Development of Marine Fisheries, Ministry of Agriculture, Qingdao 266071, China. 2BGI-Shenzhen, Shenzhen 518000, China.

3Institut de Génomique Fonctionnelle de Lyon,

Université de Lyon, CNRS, INRA, Ecole Normale Supérieure de Lyon, Lyon, France. 4Department of Biological Sciences, National University of Singapore, Science Drive 4,

Singapore 117543, Singapore.5Dalian Ocean University, Heishijiao Street 52, Dalian

116023, China.6State Key Laboratory of Genetic Resources and Evolution, Kunming

Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, 650223, China. 7Physiologische Chemie I, University of Würzburg, Biozentrum, Am Hubland, and

Comprehensive Cancer Center, University Clinic Würzburg, Josef Schneider Straße 6,

D-97074 Würzburg, Germany.8Department of Biology, University of Copenhagen,

Universitetsparken 15, København, 2100, Denmark.9Princess Al Jawhara Center of

Excellence in the Research of Hereditary Disorders, King Abdulaziz University, Jeddah,

Saudi Arabia. 10

Theseauthors contributed equally to this work.11

These authors jointly

directed this work.

Correspondence should be addressed to J. W. ([email protected]), S.C.

([email protected]), M.S.([email protected]) or Q.T.

([email protected]).

Nature Genetics: doi:10.1038/ng.2890

Page 2: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

2

Supplementary Information

Supplementary Figures 1-38 .................................................................................................. 6

Supplementary Figure 1. Distribution of sequencing depth of the assembled female

genome by reads from the female and male samples. ........................................................................ 6

Supplementary Figure 2. Distribution of 17-mers in the usable sequencing reads from the

female sample. ......................................................................................................................................... 7

Supplementary Figure 3. Distribution of 17-mers in the usable sequencing reads from the

male sample. ............................................................................................................................................. 8

Supplementary Figure 4. Phylogenetic tree of Cynoglossus semilaevis retroelements based

on reverse transcriptase alignment. ....................................................................................................... 9

Supplementary Figure 5. Phylogenetic tree of Cynoglossus semilaevis long terminal repeat

(LTR) retroelements based on reverse transcriptase alignment. .................................................... 10

Supplementary Figure 6. Phylogenetic tree of Cynoglossus semilaevis long interspersed

nuclear elements (LINE) retroelements based on reverse transcriptase alignment. .................... 11

Supplementary Figure 7. Distribution of divergence rate of each type of TEs in the tongue

sole genome. .......................................................................................................................................... 12

Supplementary Figure 8. Venn diagram showing supporting evidence for the reference

gene set. ................................................................................................................................................. 13

Supplementary Figure 9. Comparisons of gene parameters among tongue sole, medaka,

Takifugu, Tetraodon, stickleback and zebrafish genomes. .............................................................. 14

Supplementary Figure 10. Statistics of orthologous families for zebrafish, tongue sole,

Tetraodon, Takifugu, stickleback, and medaka (representing Osteichthyes), human

(representing mammals), and chicken (representing birds). ........................................................... 15

Supplementary Figure 11. Venn diagram showing shared orthologous groups for

Pleuronectiformes (tongue sole), Tetraodontidae (Takifugu and Tetraodon), Smegmamorpha

(medaka and stickleback), and Cypriniformes (zebrafish). ............................................................ 16

Supplementary Figure 12. Distribution of protein identities of orthologs between human

and fish species and chicken in all single-copy families. ............................................................... 17

Supplementary Figure 13. Dynamic evolution of gene families. ............................................... 17

Supplementary Figure 14. qRT-PCR analysis of positively selected genes and

differentially expressed genes between pre- and post-metamorphosis fish. ................................ 18

Supplementary Figure 15. Phylogenetic tree using all single-copy orthologs. ......................... 19

Supplementary Figure 16. Estimation of divergence time. ......................................................... 19

Supplementary Figure 17. Reconstructed vertebrate ancestral chromosomes. ......................... 20

Supplementary Figure 18. Model of teleost genome evolution. ................................................. 21

Supplementary Figure 19. Rectangular dot plots show chromosomal locations of

Z-orthologous genes. ............................................................................................................................ 22

Supplementary Figure 20. Structure of sex chromosomes. ......................................................... 23

Supplementary Figure 21. Distribution of Ks for Z-W gene pairs in the non-PAR region. .... 24

Supplementary Figure 22. Dosage compensation of the Z chromosome in tongue sole. ....... 25

Supplementary Figure 23. Up-regulation of Z gene expression in females. ............................. 26

Nature Genetics: doi:10.1038/ng.2890

Page 3: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

3

Supplementary Figure 24. Methylation status across the differentially methylated region

(DMR) of dmrt1, sf-1, patched1, follistatin, and neurl3 genes. ..................................................... 27

Supplementary Figure 25. Gonad histological structure at different developmental stages

in Cynoglossus semilaevis. .................................................................................................................. 29

Supplementary Figure 26. Expression of Z chromosome sex-related genes. ........................... 30

Supplementary Figure 27. RT-PCR analysis of sf-1_chr.Z, dmrt1, patched1_chr.Z, and

follistatin expression from various tissues from female and male Cynoglossus semilaevis. ..... 30

Supplementary Figure 28. Expression pattern of sex-related genes (dmrt1, sf-1_chr.Z,

patched1_chr.Z, and follistatin) during the sex reversal period treatment with high

temperature. ........................................................................................................................................... 31

Supplementary Figure 29. Comparison of Z and W-linked sex-related genes. ........................ 32

Supplementary Figure 30. qPCR analysis of the dmrt1 gene in the whole fish. ...................... 33

Supplementary Figure 31. Location of dmrt1 gene in the tongue sole genome: metaphases

from male and female showing the hybridization signal of BAC probe containing dmrt1. ...... 34

Supplementary Figure 32. Gonad in situ hybridization using a sense RNA probe to dmrt1

and no RNA probe as a control. .......................................................................................................... 35

Supplementary Figure 33. Expression of the Z-linked E3 ubiquitin ligase gene, neurl3. ...... 36

Supplementary Figure 34. Gonad in situ hybridization using a neurl3 sense RNA probe. .... 37

Supplementary Figure 35. Apparent absence of W sperm from pseudo-males using

W-linked SSR marker. ......................................................................................................................... 37

Supplementary Figure 36. Gene expression profiling in sexual reversals. ............................... 38

Supplementary Figure 37. RT-PCR analysis of aqp1, gas8, ropn1l, nme5, tekt1, plcz1,

tbpl1, spag6, gal3st1, dnajb13, cldn11, gpr64 expression from three individuals of female

and pseudomale C. semilaevis. ........................................................................................................... 39

Supplementary Figure 38. Comparison of the assembled genome with four BAC

sequences. .............................................................................................................................................. 40

Supplementary Tables 1-13,15-43 and 45-55 ..................................................................... 41

Supplementary Table 1. Statistics for each Illumina libarary. ..................................................... 41

Supplementary Table 2. Summary of usable data of the tongue sole genome. ......................... 42

Supplementary Table 3. Summary result of the tongue sole genome assembly by

SOAPdenovo. ........................................................................................................................................ 43

Supplementary Table 4. Number of markers and total scaffold size for each chromosome. .. 43

Supplementary Table 5. Validation of the Z-linked scaffolds. .................................................... 44

Supplementary Table 6. Transposable elements families that are present in the genome of

the tongue sole. ..................................................................................................................................... 47

Supplementary Table 7. Percentage of the tongue sole genome masked as each class of

transposable elements. ......................................................................................................................... 47

Supplementary Table 8. Copy number of each TE family. .......................................................... 47

Supplementary Table 9. Summary of the copy number of each TE class. ................................ 48

Supplementary Table 10. Transcriptome sequencing data statistics. .......................................... 49

Supplementary Table 11. Statistics of homology-based gene sets using proteins from

different species as parent proteins. ................................................................................................... 49

Supplementary Table 12. General statistics of each gene set. ...................................................... 50

Supplementary Table 13. General statistics of non-coding RNA genes. .................................... 50

Nature Genetics: doi:10.1038/ng.2890

Page 4: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

4

Supplementary Table 15. Enrichment of GO terms in differentially expressed genes between

pre-and post-metamorphosis. ............................................................................................................... 50

Supplementary Table 16. Enrichment of GO terms in down-regulated genes in

post-metamorphosis. ............................................................................................................................. 53

Supplementary Table 17. Enrichment of GO terms in up-regulated genes in

post-metamorphosis. ............................................................................................................................. 56

Supplementary Table 18. Metabolism pathways (KEGG) enrichment by DGEs between

pre-and post-metamorphosis. ............................................................................................................... 59

Supplementary Table 19. Positively selected genes involved in the benthic adaptation. ......... 61

Supplementary Table 20. Differentially expression of visual genes in tongue sole. ................. 62

Supplementary Table 21. Distribution of visual genes among different teleost species. .......... 63

Supplementary Table 22. Oxford grid showing the numbers of paralogues between all pairs of

tongue sole chromosomes. ................................................................................................................... 71

Supplementary Table 23. Oxford grid showing the numbers of orthologues between tongue

sole and Tetraodon chromosomes. ...................................................................................................... 71

Supplementary Table 24. Oxford grid showing the numbers of orthologues between tongue

sole and medaka chromosomes. .......................................................................................................... 72

Supplementary Table 25. Oxford grid showing the numbers of orthologues between tongue

sole and zebrafish chromosomes. ........................................................................................................ 72

Supplementary Table 26. List of DCSs. .......................................................................................... 73

Supplementary Table 27. Comparison of structural features of tongue sole Z and W with

autosomes. .............................................................................................................................................. 75

Supplementary Table 28. PAR genes and protein function........................................................... 76

Supplementary Table 29. Classification of Z and W genes in non-PAR region. ....................... 77

Supplementary Table 30. Distribution of pseudogenes on different chromosomes. ................. 77

Supplementary Table 31. Estimation of divergence rate and divergence time between Z and W

chromosomes. ........................................................................................................................................ 78

Supplementary Table 32. Percentage of genes expressed in testis. ............................................. 78

Supplementary Table 33. GO enrichment of chicken Z genes (P value<0.01, Fisher exact test).

.................................................................................................................................................................. 79

Supplementary Table 34. GO enrichment of tongue sole Z genes (P value<0.01, Fisher exact

test). ......................................................................................................................................................... 80

Supplementary Table 35. GO enrichment of tongue sole Z-specific (Z-S) genes (P value<0.01,

Fisher exact test). ................................................................................................................................... 81

Supplementary Table 36. GO enrichment of orthologous Z genes between chicken and tongue

sole (P value<0.01, Fisher exact test). ................................................................................................ 82

Supplementary Table 37. GO enrichment of tongue sole W genes (P value<0.01, Fisher exact

test). ......................................................................................................................................................... 83

Supplementary Table 38. Fisher’s exact test for compensated (comp) /uncompensated

(uncomp) Z genes in tongue sole and zebra finch. ........................................................................... 84

Supplementary Table 39. Fisher’s exact test for compensated (comp)/ uncompensated

(uncomp) Z genes in tongue sole and chicken. ................................................................................. 84

Supplementary Table 40. Sex reversal rate of different families including the pseudomale

families, normal families and temperature-induced families. ......................................................... 85

Nature Genetics: doi:10.1038/ng.2890

Page 5: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

5

Supplementary Table 41. Sex ratio of offspring in the pseudomale families and normal

families. ................................................................................................................................................... 86

Supplementary Table 42. Paternal inheritance of Z chromosome in three WZ pseudomale

families determined by microsatellite analysis. ................................................................................ 87

Supplementary Table 43. Characterization and expression of sex-related genes in tongue sole.

.................................................................................................................................................................. 87

Supplementary Table 45. GO enrichment by DEGs up-regulated in normal female ovaries. . 91

Supplementary Table 46. GO enrichment by DEGs up-regulated in pseudomale testes.......... 94

Supplementary Table 47. Sex-biased GO. ....................................................................................... 96

Supplementary Table 48. Metabolism Pathway (KEGG) enriched by DEGs between female

ovaries and pseudomale testes. ............................................................................................................ 98

Supplementary Table 49. Data production and alignment statistic of smRNA-Seq. ................ 99

Supplementary Table 50. Differentially expressed miRNAs between female and reversed male

.................................................................................................................................................................. 99

Supplementary Table 51. Comparison of assembled scaffolds and independently finished 4

BACs of tongue sole genome. .......................................................................................................... 101

Supplementary Table 52. Comparison of assembled scaffolds and ESTs. .............................. 101

Supplementary Table 53. Enrichment of GO terms in expanded gene families of tongue sole

genome. ................................................................................................................................................ 102

Supplementary Table 54. Enrichment of GO terms in contracted gene families of tongue sole

genome. ................................................................................................................................................ 102

Supplementary Table 55. Oligonucleotide primers used in the study. ..................................... 103

Supplementary Note .............................................................................................................. 106

Supplementary URLs ............................................................................................................ 132

References ................................................................................................................................ 132

Nature Genetics: doi:10.1038/ng.2890

Page 6: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

6

Supplementary Figures 1-38

Supplementary Figure 1. Distribution of sequencing depth of the assembled female

genome by reads from the female and male samples. The peak depth is 117 and 95 for

the female and male reads, respectively.

Nature Genetics: doi:10.1038/ng.2890

Page 7: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

7

Supplementary Figure 2. Distribution of 17-mers in the usable sequencing reads

from the female sample. We used 185 × 106 sequence reads, corresponding to 18.2 Gb

of corrected data from the short insert-size libraries (≤800 bp), and obtained 15.3 × 109

17-mers. The peak depth is 28. The genome size (G) is correlated with the 17-mer

number (N) and the peak of 17-mer frequency (D). Their relationship can be expressed in

an empiric formula: G = N / D. The estimated genome size is 545 Mb. The sub-peak at

about 14-fold is likely due to the half sequence depth of the sex chromosome to

autosomes ratio.

Nature Genetics: doi:10.1038/ng.2890

Page 8: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

8

Supplementary Figure 3. Distribution of 17-mers in the usable sequencing reads

from the male sample. We used 205 × 106 reads corresponding to 17.6 Gb of corrected

data from the short insert-size libraries (≤800 bp), and obtained 14.3 × 109 17-mers. The

peak depth is 29. The genome size (G) is correlated with the 17-mer number (N) and the

peak of 17-mer frequency (D). Their relationship can be expressed in an empiric formula:

G = N / D. The estimated genome size is 495 Mb.

Nature Genetics: doi:10.1038/ng.2890

Page 9: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

9

Supplementary Figure 4. Phylogenetic tree of Cynoglossus semilaevis retroelements

based on reverse transcriptase alignment. Protein sequences were aligned using

ClustalW (244 amino acids) and the phylogenetic tree was constructed with the PhyML

package using maximum likelihood methods with default bootstrap calculation (shown at

the beginning of branches). Tongue sole elements are written in red.

Nature Genetics: doi:10.1038/ng.2890

Page 10: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

10

Supplementary Figure 5. Phylogenetic tree of Cynoglossus semilaevis long terminal

repeat (LTR) retroelements based on reverse transcriptase alignment. Protein

sequences were aligned using ClustalW (180 amino acids) and the phylogenetic tree was

constructed with the PhyML package using maximum likelihood methods with default

bootstrap calculation (shown at the beginning of branches). Tongue sole elements are

written in red.

Nature Genetics: doi:10.1038/ng.2890

Page 11: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

11

Supplementary Figure 6. Phylogenetic tree of Cynoglossus semilaevis long

interspersed nuclear elements (LINE) retroelements based on reverse transcriptase

alignment. Protein sequences were aligned using ClustalW (189 amino acids) and the

phylogenetic tree was constructed with the PhyML package using maximum likelihood

methods with default bootstrap calculation (shown at the beginning of branches). Tongue

sole elements are written in red.

Nature Genetics: doi:10.1038/ng.2890

Page 12: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

12

Supplementary Figure 7. Distribution of divergence rate of each type of TEs in the

tongue sole genome. The divergence rate was calculated between the identified TE

elements in the genome and the consensus sequence in the de novo library we used.

Nature Genetics: doi:10.1038/ng.2890

Page 13: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

13

Supplementary Figure 8. Venn diagram showing supporting evidence for the

reference gene set. A total of 99% (21,309 out of 21,516) of the reference genes were

supported by homology-based or RNA-seq genes. Only 207 genes were predicted by the

pure de novo method.

Nature Genetics: doi:10.1038/ng.2890

Page 14: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

14

Supplementary Figure 9. Comparisons of gene parameters among tongue sole,

medaka, Takifugu, Tetraodon, stickleback and zebrafish genomes.

Nature Genetics: doi:10.1038/ng.2890

Page 15: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

15

Supplementary Figure 10. Statistics of orthologous families for zebrafish, tongue

sole, Tetraodon, Takifugu, stickleback, and medaka (representing Osteichthyes),

human (representing mammals), and chicken (representing birds). Single-copy

orthologs represent single-copy genes in Osteichthyes, human, and chicken.

Multiple-copy orthologs represent genes with multiple copies in at least one genome out

of Osteichthyes, human, and chicken. Fish multiple-copy orthologs represent genes with

multiple copies in at least one Osteichthyes genome, but being single or absent in the

human and chicken genomes. Complex orthologs represent genes in other families.

Homologs represent genes that could not be clustered into any families, but could be

aligned to other genes with a cutoff E-value of <1e-20.

Nature Genetics: doi:10.1038/ng.2890

Page 16: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

16

Supplementary Figure 11. Venn diagram showing shared orthologous groups for

Pleuronectiformes (tongue sole), Tetraodontidae (Takifugu and Tetraodon),

Smegmamorpha (medaka and stickleback), and Cypriniformes (zebrafish).

Nature Genetics: doi:10.1038/ng.2890

Page 17: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

17

Supplementary Figure 12. Distribution of protein identities of orthologs between

human and fish species and chicken in all single-copy families.

Supplementary Figure 13. Dynamic evolution of gene families. The number of gene

families that expanded or contracted in each lineage after speciation is shown on the

corresponding branch, with “+” referring to expansion and “-” referring to contraction.

Nature Genetics: doi:10.1038/ng.2890

Page 18: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

18

Supplementary Figure 14. qRT-PCR analysis of positively selected genes and

differentially expressed genes between pre- and post-metamorphosis fish. Vertical

bars showmean ± standard error (SE) (n=3).* P<0.05, ** P<0.01.

Nature Genetics: doi:10.1038/ng.2890

Page 19: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

19

Supplementary Figure 15. Phylogenetic tree using all single-copy orthologs.

Phylogenetic tree was constructed using 4-fold degenerate sites from 2,426 single-copy

orthologs from tongue sole, zebrafish, medaka, stickleback, Takifugu, Tetraodon, human,

and chicken. The branch length represents the neutral divergence. Numbers on the branch

represent the dn/ds. The posterior probabilities (credibility of the topology) for each inner

branch are all 100%.

Supplementary Figure 16. Estimation of divergence time. The numbers on the nodes

are the divergence times from present (million years ago, Mya). Divergence times from

human-chicken (267–325 Mya), human-zebrafish (438–455 Mya), and zebrafish-medaka

(258–307 Mya) from the TimeTree database were used as the calibration times.

Nature Genetics: doi:10.1038/ng.2890

Page 20: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

20

Supplementary Figure 17. Reconstructed vertebrate ancestral chromosomes. Ten

proto-chromosomes in the vertebrate ancestor shown at the top are assigned distinct

colors, and their duplication-derived chromosomes in the gnathostome ancestor are

distinguished by respective vertical bars. In the genomes of the osteichthyan, teleost, and

amniote ancestors, and chicken and tongue sole genomes, genomic regions are assigned

colors and vertical bars represent the correspondence of individual regions to the

proto-chromosomes in the gnathostome ancestor, from which respective regions

originated. White blocks represent the unknown original chromosomes in the chicken

genome. Unassigned blocks are shown in the rightmost chromosome (Un) in the

osteichthyan and amniote ancestors.

Nature Genetics: doi:10.1038/ng.2890

Page 21: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

21

Supplementary Figure 18. Model of teleost genome evolution. The 13 ancestral

chromosomes are represented by different colored bars at the top of the figure. Regions

originating from the same ancestral chromosome were assigned the corresponding color.

The black arrows indicate fusion, fission, and duplication events, while gray arrows

represent translocations. Anc, ancestral chromosome; Cse, tongue sole chromosome; Ola,

medaka chromosome; Tni, Tetraodon chromosome; Hsa, human chromosome.

Nature Genetics: doi:10.1038/ng.2890

Page 22: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

22

Supplementary Figure 19. Rectangular dot plots show chromosomal locations of

Z-orthologous genes. a, Tongue sole Z chromosome versus selected medaka

chromosomes. The tongue sole Z chromosome is not orthologous to the medaka

chromosome 1 which is considered as a sex chromosome, but is orthologous to large

portions of medaka autosome 9 (blue). At right: one-colour projection of dot plots onto a

unified schematic of the tongue sole Z chromosome, showing that orthology to medaka

chromosome 9 accounts for most of the Z chromosome. b, Tongue sole Z chromosome

versus selected human chromosomes. The tongue sole Z chromosome is orthologous to

large portions of human 9 (blue), 5 (yellow), 12 (green), 22 (purple) and several

segements of the human X chromosome (red). At right: five-colour projection of dot plots

onto a unified schematic of the tongue sole Z chromosome, showing that orthology to

human chromosome5, 9, 12, 22 and X. c, Chicken Z chromosome versus selected human

chromosomes. The Chicken Z chromosome is orthologous to large portions of human 5

(yellow), 9 (blue), and18 (purple). At right: three-colour projection of dot plots onto a

unified schematic of the chicken Z chromosome, showing that orthology to human

chromosome5, 9and 18.

Nature Genetics: doi:10.1038/ng.2890

Page 23: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

23

Supplementary Figure 20. Structure of sex chromosomes. Purple curves represent the

TE content of the Z (top) and W (bottom) chromosomes in 5 kb windows. The two bars,

which are colored according to the classification of genes, represent the Z and W

chromosomes. Regions between two genes with the same classification have the same

color. Z-S, Z specific genes; W-S, W specific genes; Z-W, genes on both Z and W;

W-Z_random, W genes homologous to unplaced Z-linked genes; Z-A, Z genes with

paralogs on autosomes or unplaced scaffolds; W-A, W genes with paralogs on autosomes

or unplaced scaffolds; PAR, genes in pseudoautosomal regions.

Nature Genetics: doi:10.1038/ng.2890

Page 24: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

24

Supplementary Figure 21. Distribution of Ks for Z-W gene pairs in the non-PAR

region. We used sliding windows with different sizes (1–50 genes) and the same step

(one gene) to calculate the weighted mean of Ks for all 297 Z-W gene pairs in the

non-PAR region. (These gene pairs include pseudogenes. If a Z gene is homologous to

multiple W genes, we chose the best matching W gene to calculate Ks.) Results are

plotted according to the Z-linked gene order (physical position). We found that most of

the calculated Ks values distribute around 0.15.

Nature Genetics: doi:10.1038/ng.2890

Page 25: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

25

Supplementary Figure 22. Dosage compensation of the Z chromosome in tongue sole.

a, Density distribution of log2 (M:F) of gene expression in the entire tongue sole body

(without gonad). Black line denotes genes on autosomes, which follow a Gaussian

distribution with a mean value of zero (M:F=1), indicating that genes on autosomes are

expressed at a similar level in males and females. Orange line denotes genes on Z, with a

peak at 0.404 (M:F=1.323), indicating an incomplete dosage compensation in female

whole body compared with male whole body. b, log2 (M:F) of gene expression in whole

body for all chromosomes. Mean value of log2 (M:F) in every autosome (green) was

always around 0, but mean value of log2 (M:F) in the Z chromosome (blue) was

significantly larger than 0.

Nature Genetics: doi:10.1038/ng.2890

Page 26: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

26

Supplementary Figure 23. Up-regulation of Z gene expression in females. a, log2

(M:F) of gene expression distribution across Z. log2 (M:F) was at almost the same level

across Z, indicating the compensated genes distributed randomly in Z, and were not

enriched in a specific region. b, Z:A distribution of female whole body across the Z

chromosome. c, Z:A distribution of male whole body across the Z chromosome. Blue bar

denotes the Z-specific region with a higher proportion of methylated cytosine, and green

bars denote other regions.

Nature Genetics: doi:10.1038/ng.2890

Page 27: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

27

Supplementary Figure 24. Methylation status across the differentially methylated

region (DMR) of dmrt1, sf-1, patched1, follistatin, and neurl3 genes in male parent

(ZZ testis P), first generation pseudo-male (ZW testis F1), first generation female

Nature Genetics: doi:10.1038/ng.2890

Page 28: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

28

(ZW ovary F1), second generation pseudo-male (ZW testis F2) and second

generation female (ZW ovary F2). Schematic diagram at the top shows the gene

structurein tongue sole. Exons are depicted as blue boxes, and 3′- and 5′-UTRs are

indicated by a white box. The black arrowhead shows the direction of the gene from the

transcriptional start site. Methylation levels of mCpGs identified on both DNA strands in

female and male are indicated by the vertical green lines. The gray shadow indicates the

DMR. Correspondingly, open and filled circles represent unmethylated and methylated

cytosines for two samples identified by Sanger sequence, respectively.

Nature Genetics: doi:10.1038/ng.2890

Page 29: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

29

Supplementary Figure 25. Gonad histological structure at different developmental

stages in Cynoglossus semilaevis. A: 25 days; B: 48 days; C: 70 days; D: 160 days; E: 1

year; F: 2 years; G, gonium; PO, primary oocyte; OC, ovarian cavity; OG, oogonium; OL,

ovarian lamellae; NU, nucleolus; YK, yolk; FO, follicle; SL, seminiferous lobula; SG,

spermatogonia; SC, spermatocyte; SP, spermatid; ST, spermatozoa.

Nature Genetics: doi:10.1038/ng.2890

Page 30: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

30

Supplementary Figure 26. Expression of Z chromosome sex-related genes.

Reverse-transcription polymerase chain reaction (RT-PCR) analysis of sex related genes

during developmental stages in female and male tongue sole. Vertical bars represent mean

± standard error (SE) (n = 3).

Supplementary Figure 27. RT-PCR analysis of sf-1_chr.Z, dmrt1, patched1_chr.Z,

and follistatin expression from various tissues from female and male Cynoglossus

semilaevis.

Nature Genetics: doi:10.1038/ng.2890

Page 31: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

31

Supplementary Figure 28. Expression pattern of sex-related genes (dmrt1, sf-1_chr.Z,

patched1_chr.Z, and follistatin) during the sex reversal period treatment with high

temperature. Dmrt1 is upregulated in ZW temperature-induced female to male sex

reversal at the sex determination stage, while the other three genes have no obvious

function in the sex reversal. 25d, pretreated by temperature on day 25; 60d-T, treated by

temperature on day 60; 60d-C, untreated control; NC, water control; M, DL2000 marker.

Nature Genetics: doi:10.1038/ng.2890

Page 32: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

32

Supplementary Figure 29. Comparison of Z and W-linked sex-related genes. a,

Comparison of dmrt1 genes. Homologous regions between Z and W sequences are linked

by green lines. Dmrt1 on Z is intact and has five exons. The DM domain is located on the

first and second exons. The W copy on a short scaffold (scaffold1544, ≈18 kb) has no

DM domain and just two exon domains with a Ka/Ks=0.6726. b, Validation of W-linked

dmrt1 by genomic PCR. The discontinuous segments covering almost the entire

scaffold1544 region were sequenced by ABI 3730 sequencer using 20 pairs of primers

that were designed from scaffold1544, revealing the sequence of the incomplete remnant

Nature Genetics: doi:10.1038/ng.2890

Page 33: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

33

of dmrt1 on W. c, Comparison of sf-1 genes. The incomplete sf-1 on W-linked

scaffold260 with no expression in different tissues reveals that it is a pseudogene. d,

Comparison of patched1genes. The structure of the two patched1 genes on the Z and W

chromosomes is very similar, other than the loss of one intron by the W ortholog.

0

1

2

3

4

5

6

WW ZZ ZW

Supplementary Figure 30. qPCR analysis of the dmrt1 gene in the whole fish. Male

(ZZ) and female (ZW) fish at the sex determination stage were cultured under normal

temperatures (28°C) and the embryo from the super-female (WW) was produced by

gynogenesis. The Y-coordinates are relative normalized expression levels. All data are

mean ± S.D. (n=3). β-Actin was used for normalization.

Nature Genetics: doi:10.1038/ng.2890

Page 34: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

34

Supplementary Figure 31. Location of dmrt1 gene in the tongue sole genome:

metaphases from male and female showing the hybridization signal of BAC probe

containing dmrt1. A, metaphase of the male; B, karyotype of the male; C, metaphase of

the female; D, karyotype of the female. Scale bar: 5μm. Note the order of the

chromosome numbers is not consistent with the genome.

Nature Genetics: doi:10.1038/ng.2890

Page 35: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

35

Supplementary Figure 32. Gonad in situ hybridization using a sense RNA probe to

dmrt1 and no RNA probe as a control. a, 56 days. b, 83 days. c, 150 days. OC, ovarian

cavity.

Nature Genetics: doi:10.1038/ng.2890

Page 36: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

36

Supplementary Figure 33. Expression of the Z-linked E3 ubiquitin ligase gene,

neurl3. a, RT-PCR analysis of neurl3 expression in various tissues from female and male

tongue sole. G: gonad; Mu: muscle; L: liver; K: kidney; B: brain; I: intestine; P: pituitary;

S: spleen. b, RT-PCR analysis of neurl3 during developmental stages in female and male

of tongue sole. Vertical bars represent mean ± standard error (SE) (n = 3). c, RT-PCR

analysis of neurl3 in the whole fish. Male (ZZ) and female (ZW) fish at the sex

determination stage were cultured under normal temperatures (28°C) and the embryo

from the super-female (WW) was produced by gynogenesis. The Y-coordinates are

relative normalized expression level. All data are mean ± S.D. (n=3). β-Actin was used for

normalization.

Nature Genetics: doi:10.1038/ng.2890

Page 37: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

37

Supplementary Figure 34. Gonad in situ hybridization using a neurl3 sense RNA

probe. a, Testis of normal male. b, Testis of pseudo-male. c, Ovary of normal female. OC,

ovarian cavity; SL, seminiferous lobula; SZ, spermatozoon.

Supplementary Figure 35. Apparent absence of W sperm from pseudo-males using

W-linked SSR marker. Fertile sperm DNA from pseudo-males: 1, 3, 5, 7, 9, 11, 13, 15,

17, 19, 21, 23; Fin DNA from corresponding pseudo-males: 2, 4, 6, 8, 10, 12, 14, 16, 18,

20, 22, 24; Fertile sperm DNA from normal males: 25, 26, 27; Fin DNA from

corresponding normal males: 28, 29, 30; Fin DNA from normal females: 31, 32.

Nature Genetics: doi:10.1038/ng.2890

Page 38: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

38

Supplementary Figure 36. Gene expression profiling in sexual reversals. a, GO

categories with significantly different expression (P< 0.05) in the female (red) and in the

pseudomale (blue) gonad are highlighted. Data points represent pairs of female and

pseudomale log2 mean GO RPKM. A complete list of categories is provided in

Supplementary Tables 45 and 46. b, miRNAs with significantly different expression (P <

0.05) in the ovary (red) and in the testis (blue) are highlighted. Data points represent

pairs of ovary and testis log2 RPM. A complete list of miRNA is provided in

Supplementary Table 50.

Nature Genetics: doi:10.1038/ng.2890

Page 39: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

39

Supplementary Figure 37. RT-PCR analysis of aqp1, gas8, ropn1l, nme5, tekt1, plcz1,

tbpl1, spag6, gal3st1, dnajb13, cldn11, gpr64 expression from three individuals of

female and pseudomale C. semilaevis.

Nature Genetics: doi:10.1038/ng.2890

Page 40: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

40

Supplementary Figure 38. Comparison of the assembled genome with four BAC

sequences.

Nature Genetics: doi:10.1038/ng.2890

Page 41: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

41

Supplementary Tables 1-13,15-43 and 45-55

Supplementary Table 1. Statistics for each Illumina libarary. All reads were

generated by Illumina paired-end sequencing. After filtering low quality data, we

obtained usable data.

Sample Library ID

Insert

size

(bp)

Lane

s

GC

(%)

Avg.

read

length

(bp)*

Raw

reads

(M)

Raw

bases

(G)

Avg.

read

length

(bp)**

Usabl

e

reads

(M)

Usable

bases

(G)

Female

(ZW)

BHSciuRBFDBAA

PEI-4

164 1 42.4 100 76.5 7.65 100 72.2 7.22

BHSciuRAADBAA

PE

165 1 40.7 100 56.2 5.62 100 52.6 5.26

BHSciuRAODBAA

PE

175 1 39.9 100 61.6 6.16 100 57.5 5.75

BHSciuRAODBAB

PE

172 1 40.1 100 57.6 5.76 100 47.6 4.76

BHSciuRADDIAAP

E

471 2 39.8 100 119.9 11.99 100 103.2 10.32

BHSciuRBFDIAAP

EI-5

471 1 42.4 100 68.6 6.86 100 57.1 5.71

BHSciuRACDMAA

PE

765 2 39.2 100 87.7 8.77 100 68.8 6.88

BHSciuRAODWAA

PE

2,208 2 42.2 44 125.7 5.53 44 111.9 4.93

BHSciuRAODWBB

PE

2,424 1 43.3 44 55.9 2.46 44 49.7 2.19

BHSciuRAODLAA

PE

4,936 2 42.7 44 108.1 4.76 44 97.9 4.31

BHSciuRAODTAA

PE

8,546 1 43.0 44 47.8 2.10 44 35.6 1.57

BHSciuRAADTAA

PEI-1

9,104 1 44.1 49 159.1 7.80 49 27.5 1.35

BHSciuRAADUAA

PE

19,912 1 45.5 44 53.0 2.33 44 19.3 0.85

CYNcumDAVDVA

APE

34,419 1 41.4 49 138.8 6.80 49 45.9 2.25

BHSciuRAADVAA

PE

34,467 1 39.3 49 138.3 6.78 49 10.5 0.51

All libraries / 19 41.6 67 1354.7 91.35 74 857.5 63.86

Male

(ZZ)

BHScqzDAPDBBA

PE

164 2 41.7 100 116.58 11.66 95 101.6 9.62

BHScqzDAPDBBC

PE

155 1 40.8 100 75.53 7.55 81 63.5 5.14

BHScqzDAPDIAA

PE

477 2 39.8 100 135.44 13.54 83 104.3 8.66

BHScqzDAPDIBA

PE

501 1 39.6 100 68.44 6.84 83 51.2 4.22

BHScqzDAPDMAA

PE

752 1 39.4 100 50.92 5.09 93 40.4 3.74

BHScqzDAPDWA

APE

2,041 2 41.8 44 118.91 5.23 39 104.9 4.09

BHScqzDAPDWB

APE

2,249 1 41.8 44 59.57 2.62 44 53.0 2.33

BHScqzDAQDLAA

PE

5,029 3 40.4 44 175.92 7.74 41 128.7 5.23

BHScqzDAQDTAA

PE

10,045

1 40.9 44 62.63 2.76 44 43.6 1.92

BHScqzDBCDUAA

PE

20,648

1 39.7 44 56.08 2.47 44 14.0 0.62

BHScqzDBCDUBB

PE

22,336

1 42.2 44 53.57 2.36 44 24.9 1.10

All libraries / 16 40.8 70 973.58 67.86 64 730.0 46.67

* Average length of raw reads

** Average length of usable reads

Nature Genetics: doi:10.1038/ng.2890

Page 42: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

42

Supplementary Table 2. Summary of usable data of the tongue sole genome. All

reads were generated by Illumina paired-end sequencing. For our calculation of sequence

coverage and physical coverage, we assumed a genome size of 545 Mb and 495 Mb for

female and male genome, respectively.

Sample

Paired-end

libraries

(bp)

Paired-end

insert size

(bp)

Librarie

s Lanes

Average

reads

length

(bp)

Sequence

coverage

(X)

Physical

coverage

(X)

Female

(ZW)

200 164~175 4 4 100.0 42 36

500 ~471 2 3 100.0 29 69

800 ~765 1 2 100.0 13 48

2000 2,204~2,424 2 3 44.0 13 337

5000 4,934~4,938 1 2 44.0 8 443

10000 8,546~9,104 2 2 46.0 5 509

20000 ~19,912 1 1 44.0 2 352

40000 ~34,467

34,419~34,46

7

2 2 49.0 5 1,781

Total 164~34,467 15 19 74.5 117 3,575

Male

(ZZ)

200 155~169 2 3 89.0 30 27

500 475~501 2 3 83.0 26 76

800 ~752 1 1 92.0 8 31

2000 2,033~2,249 2 3 41.0 13 337

5000 5,024~5,037 1 3 41.0 11 654

10000 ~10,045 1 1 44.0 4 442

20000 20,648~22,33

6

2 2 44.0 3 854

Total 155~22,336 11 16 63.9 95 2,421

Nature Genetics: doi:10.1038/ng.2890

Page 43: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

43

Supplementary Table 3. Summary result of the tongue sole genome assembly by

SOAPdenovo.

Type Contig Scaffold

Size (bp) Num. Size (bp) Num.

N95 1,318 28,715 38,905 726

N90 4,419 20,032 272,602 526

N80 9,564 13,208 493,436 400

N70 14,668 9,388 627,811 315

N60 20,264 6,762 734,563 244

N50 26,524 4,806 867,956 185

N40 33,954 3,295 993,045 133

N30 43,078 2,108 1,132,664 88

N20 55,534 1,179 1,409,670 50

N10 75,365 467 1,818,425 20

Longest 194,815 1 4,694,140 1

Total 453,103,890 113,432 477,207,161 80,677

Supplementary Table 4. Number of markers and total scaffold size for each

chromosome. We anchored Z-linked scaffolds from male assembly and autosomal

scaffolds from female assembly onto Z chromosome and autosomes (chr1~20).

Chr. # SSR #

RAD-tag

Contig Scaffold # Genes

# Len.(bp) # Len.(bp) Source

1 81 1,184 2,410 32,791,084 53 34,529,112 Female 1,490

2 40 1,288 1,227 19,259,417 29 20,052,734 Female 911

3 29 810 1,189 15,467,848 25 16,253,993 Female 596

4 85 949 1,263 19,377,156 31 20,014,501 Female 871

5 43 777 1,147 18,609,661 29 19,279,693 Female 706

6 30 865 1,270 18,113,957 29 18,841,016 Female 978

7 54 642 993 13,185,383 15 13,814,722 Female 645

8 53 825 2,144 28,615,567 37 30,153,790 Female 1,397

9 50 703 1,314 18,790,677 31 19,618,599 Female 1,029

10 46 454 1,507 20,081,642 33 21,015,569 Female 1,037

11 42 553 1,428 19,676,390 34 20,528,432 Female 1,047

12 40 517 1,349 17,485,432 35 18,398,590 Female 745

13 43 323 1,518 20,959,882 34 21,922,143 Female 946

14 50 639 1,782 27,668,722 47 28,847,931 Female 1,228

15 46 430 1,478 19,132,837 32 20,094,621 Female 779

16 40 484 1,252 17,874,443 29 18,785,820 Female 814

Nature Genetics: doi:10.1038/ng.2890

Page 44: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

44

17 38 246 1,333 15,583,495 25 16,472,647 Female 984

18 28 226 1,092 14,404,870 22 15,207,555 Female 783

19 33 54 1,108 17,115,378 24 17,747,288 Female 847

20 34 89 1,036 14,355,002 18 15,234,830 Female 881

Z 37 53 2,044 20,757,346 26 21,915,962 Male 926

W NA NA 2,436 13,020,023 306 16,461,726 Female 317

Total 942 12,111 32,320 422,326,212 944 445,191,274 NA 19,957

Supplementary Table 5. Validation of the Z-linked scaffolds. 1-4 genes of each

scaffold, which preliminarily were taken as putative Z-linked, were used to confirm the

depth ratio between male and female. Oligonucleotide primers were designed by primer

premier5.

Scaffold ID Gene ID M:F Avg. Primer Sequence 5’-3’

scaffold100

Cse_R017172 2.134

2.092

1109-F: TCACAGCAGGGCTCACTTCAT

1109-R: ACATTGTGGCTGCGGTTGG

Cse_R016987 2.418 1067-F:TGTTCGTCCCAGCCAAACC

1067-R:CTGCTCCCTCCTTCTGTCCC

Cse_R016598 1.723 1068-F:GGCTCAATGTCAGAACACCAAA

1068-R:GTCCGAGCAGAAGGAGGTAAAT

scaffold569 Cse_R016528 2.135 2.135 1865-F:CCCTGGCTGTCAGCACGATA

1865-R:GGTGGAGGCTTGCAGATGTTA

scaffold55

Cse_R016709 2.130

2.211

2555-F: CAGATTTCACAGTCATCCACCAA

2555-R: AGCATTCCCGCAGTTTCGT

Cse_R017120 2.561 2631-F: ATCATCCAAGGACTGCCTCAAA

2631-R: CGGTCCCTACGTGGGAGTAAA

Cse_R016880 1.943

2626-F:GGTCGCCATCTTCCAGTTCC

2626-R:CTCCACCCTGGTCGTTCCTC

scaffold116

Cse_R016422 2.082

2.119

3668-F: CTTATCAGCCGCTCCAAACAG

3668-R: AGAGGCCATCCTCATCTACCAT

Cse_R016983 2.143 3675-F: TCACAACCCAACCAACGACG

3675-R: ACCCTCAGACCCTCCACGAA

Cse_R016526 2.134 3685-F: TGGGAGGGAAATCACAGGTC

3685-R: TCTGGAGCGCATTTAGGGAC

Scaffold710 Cse_R022132 1.997

2.023

1971-F:CGAGAGGCTGCGAGACAAACT

1971-R:GCGTCTGGGATGGCTCTTTT

Cse_R016996 1.851

1981-F:GTAAACATTCCCACAAACAACA

1981-R:AGTCCTGGAGGTGAAGGCAC

Cse_R017284 2.179

1983-F:TGCGAGTAAGACGGAACCAA

1983-R:GTGTTGCGAGTGAAAGGAGA

Cse_R016667 2.068 1984-F:CTTCAAACAAGCCTTTCCTG

Nature Genetics: doi:10.1038/ng.2890

Page 45: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

45

1984-R:TTCGTCCGAGTCTATGTGCC

Scaffold676 Cse_R016625 2.310

2.327

5311-F:GAGGCCCTGCATCAATCTGTAT

5311-R:TCTAGGTGGAGTCTGGTGCGTAT

Cse_R017228 2.344 5333-F:TCCACAGCCTGCTTAGTCTTGC

5333-R:CCACATCCTTTGACTGCTGCTC

scaffold246

Cse_R017132 1.817

2.085

0018-F: GGACTCCTGGTCCAGCAGTAAGT

0018-R: CAGCTATGAAGGCAGATTGTCTTTT

Cse_R017325 2.138 0021-F: ACCACTGAGAAGGAGGGTTCG

0021-R: CGGGTTGAATTTGGCAAGAGT

Cse_R016499 1.824 0050-F: AGTCCTGGGTCAAAGCATTATCT

0050-R: CCTGTAGCCTCCTGAATCTCCT

Cse_R017033 2.562 0109-F: CTTTATTGCCAGACCTCAACATG

0109-R: TTACAAGCCACTGAAAGGATTACC

scaffold317

Cse_R017274 2.117

2.144

0546-F: GGATACTCCTGCTTGACACCAA

0546-R: GTGATGAGTTTACTTTCACCCTCC

Cse_R016718 2.172 0568-F:TGATCGTAGTGTTCCTGCCTCT

0568-R:GTACCTGGCGACAACATAGAGTTT

scaffold553

Cse_R016442 2.077

2.086

0745-F: CCGTCGGAGAAGAACTTGAGC

0745-R: TGGTGGAGGAGATGGTGTCG

Cse_R017143 2.096 0750-F:CCAGACACTCAGGGACAATGAA

0750-R:CCACTGTTGGCTTTGGACGA

Scaffold76 Cse_R016569 2.047 2.047 0723-F:GCGCTGTCTCGGTAAATCTCA

0723-R:GTAGAGTGACATACGGAGGCTGAC

Scaffold813 Cse_R016596 2.234

2.153

9160-F:GTTCAGGATGGATGGAGGCAG

9160-R:AGGAAGCTCTACACCACCGAGA

Cse_R016573 2.072 9165-F:CCTCCTTATGTTTGCCATCTCC

9165-R:CAATGAGGAAGGCTCCAGTGAT

Scaffold351 Cse_R016747 1.976 1.976

9827-F:GGTGGGCTTTCAGATGGGAT

9827-R:TGGTGGCGATGATTCTACTGG

Scaffold96 Cse_R017130 2.114

2.036

2278-F:GTAGGGATCGACCTGGTGTAGT

2278-R:GTGATTGGCGATGCGTAGAT

Cse_R016841 2.281

2294-F:TTCACATCTTCCCTTTCGTCAT

2294-R:CTTTGGTAGCTTTGCAGTCC

Cse_R017352 1.718

2306-F:AATGGCTCGTTGTATGACTTCC

2306-R:TCCGTTCCTCTTCACCAGTATG

Scaffold893 Cse_R016550 2.117

2.010

0424-F:ACGCTCGTGTTCATTAGATGTGG

0424-R:CAGTTTTCTCCTGTCCTCGGTC

Cse_R016660 1.903 0428-F:CCTGGTTCGTTCCTGCTTTG

0428-R:CTATTGGGATGCCCGCTTT

scaffold980 Cse_R016633 2.182 2.204 2687-F: TGGATGCCTGTAAAGTTATGGGTA

2687-R: TGAGTTCGCTTGGTTCTGCTG

Nature Genetics: doi:10.1038/ng.2890

Page 46: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

46

Cse_R017235 2.300 2703-F: CCTCTTATGTGATGAAGGTGGATG

2703-R: CAGGGAAGGTGTCTTCTGGATAT

Cse_R017263 1.964 2693-F: GCAGGAGAAGGAGGTGAAGAAAT

2693-R: TGGTAGGAGCCTGTGATGATGTT

Cse_R016555 2.370 2717-F: CTTCATCCAGCAGTTCAGTCAGT

2717-R: TGGTCAGAGCCTTTCATTATCTC

Scaffold831 Cse_R017231 2.140

2.206

3354-F:CTGTCTGTCCGTTTACTCCTGAAT

3354-R:AGGTGCTGTCTTCTAACGCCTAC

Cse_R016515 2.272 3357-F:TAAACCTTTCTCCTTCCTGCTTC

3357-R:TTCAGATGACATCAGGGACTGC

Scaffold631 Cse_R016917 2.038

2.105

4929-F:TTGTCAACCTTTCCCACTTCCA

4929-R:TCAACTGAGGCCGGTCTGC

Cse_R016613 1.996 4932-F:CCCTCGTGCAGACTTATCAAAC

4932-R:GGCTTCCGCAACTTCAGTGTA

Cse_R016592 2.281 4933-F:TTGGTAAAGAACAGCTATGTTACCAG

4933-R:AACTCACCTAGCAGCCTTGACC

Scaffold636 Cse_R016897 2.038 2.038

8200-F:CCAGTGTTTCAGCCTTTAACCT

8200-R:GACAGACCAGCGAGTCATTAGG

Scaffold753

Cse_R016867 1.983

1.956

2382-F:CACATCCAGAGGAAACCGCA

2382-R:CATTTGGCCCAAGTCCACAG

Cse_R017111 1.934

2391-F:AAGTCCAATGAGATTCTCCCTCC

2391-R:ACTCCAAACTGAGCCACAACAC

Scaffold1068 Cse_R016793 2.169

1.924

9008-F:GCCCAGACTCAGTGGAGATGC

9008-R:CAAAGCCAACGAGCCAATAAC

Cse_R016933 1.678 9015-F:CTGTGGCAACCCGATTTCTC

9015-R:CCTCCTGTTCTCCCTGCTCC

scaffold120

Cse_R022129 2.382

2.167

4429-F: TCTCCGCTCCATCACGCTC

4429-R: GAAATGACGACGGCCACGA

Cse_R022130 2.152 4430-F:GTGACCCACCAGGACACCC

4430-R:CAGGTTCTCCCGCAGGATC

Cse_R016874 1.966

4442-F:ACCCTATCACCAAAGCCAAGA

4442-R:GGATTTCACAGCCATCACTCA

Scaffold677 Cse_R017185 1.897 1.897

3140-F:ATCCACGACGGTCTGGGTAG

3140-R:GGCTCAAAGCGTTCAAGGG

Scaffold757 Cse_R017286_

E3

1.063

1.038

1296-F:TGAAGCAGGTCAGCAGCAGG

1296-R:GTGGAAGCCAACGAAGGGA

Cse_R016652 1.014 1297-F:ACTGGATTTGGAGGACAGAAGC

1297-R:TTAGCAGATTTGGTCGTGGATT

Scaffold589 Cse_R017095 1.228

1.174

6755-F:GCTGCTGGCGTTCTGCTACA

6755-R:CAGGACTTGCGTGCATTTGTC

Cse_R016437 1.121

6757-F:CTGTATGTGACTCCAGACCTCCAC

6757-R:CAGACCCTGACACTCAGTTCCTC

Nature Genetics: doi:10.1038/ng.2890

Page 47: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

47

Supplementary Table 6. Transposable elements families that are present in the

genome of the tongue sole.

LTR families LINE families TIRs families Other families

Gypsy/Sushi R2 (REL endonuclease) TcMariner/Tc1 Penelope

Gypsy/Rodin RTE/Rex3-BovB Hat/Ac

Gypsy/LreO3 Rex/Babar Hat/Tol2

BEL/Suzu CR1 Buster

Supplementary Table 7. Percentage of the tongue sole genome masked as each class

of transposable elements.

Class SINE LINE LTR DNA

transposons

Unclassifie

d Total

Percentage in

genome 0.22% 1.04% 0.08% 2.45% 2.06% 5.85%

Supplementary Table 8. Copy number of each TE family. Result 2 column: the

number of copy but being careful that each copy is counted just one time. Result 4

column: the number of copy that are 80% long of the reference size. The reference size is

the size of each element in the de novo library. Result 5 column: threshold 50%. 6 column:

threshold 30%.

Family Result2

(all)

Result4

(80%)

Yield

(80%)

Result5

(50%)

Yield

(50%)

Result6

(30%)

Yield

(30%)

DNA 1,430 36 2.52 53 3.71 75 5.24

DNA/En-Spm 1,884 581 30.84 917 48.67 1,156 61.36

DNA/Harbinger 302 155 51.32 233 77.15 283 93.71

DNA/Hat 3,062 693 22.63 1,086 35.47 1,472 48.07

DNA/Hat-Ac 976 443 45.39 677 69.36 834 85.45

DNA/Hat-Charlie 39,152 11,032 28.18 21,582 55.12 29,898 76.36

DNA/Hat-Tip100 673 162 24.07 334 49.63 498 74.00

DNA/Hat-Tol2 155 53 34.19 84 54.19 114 73.55

DNA/Helitron 91 52 57.14 74 81.32 80 87.91

DNA/PiggyBac 33 19 57.58 24 72.73 28 84.85

DNA/Sola 849 250 29.45 379 44.64 511 60.19

DNA/TcMar-Fot1 43 15 34.88 29 67.44 38 88.37

DNA/TcMar-Tc1 5,489 1,064 19.38 1,989 36.24 3,061 55.77

DNA/TcMar-Tc2 8,260 2,643 32.00 4,568 55.30 5,904 71.48

DNA/TcMar-Tigger 6,301 1,783 28.30 3,421 54.29 4,665 74.04

LINE/L2 1,237 290 23.44 437 35.33 616 49.80

Nature Genetics: doi:10.1038/ng.2890

Page 48: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

48

LINE/Penelope 4,594 636 13.84 1,090 23.73 1,858 40.44

LINE/R2 44 20 45.45 34 77.27 40 90.91

LINE/Rex1 291 14 4.81 27 9.28 47 16.15

LINE/Rex-Babar 2,748 875 31.84 1,502 54.66 1,999 72.74

LINE/RTE 5,040 3,159 62.68 4,119 81.73 4,603 91.33

LINE/RTE-BovB 8,019 780 9.73 1,705 21.26 3,137 39.12

Low_complexity 142,314 141,677 99.55 142,171 99.90 142,253 99.96

LTR 12 11 91.67 11 91.67 11 91.67

LTR/Copia 22 16 72.73 19 86.36 21 95.45

LTR/ERV 174 0 0.00 0 0.00 26 14.94

LTR/ERV1 420 221 52.62 313 74.52 367 87.38

LTR/ERVK 39 1 2.56 1 2.56 1 2.56

LTR/ERVL 19 15 78.95 17 89.47 19 100.00

LTR/Gypsy 977 303 31.01 439 44.93 512 52.41

LTR/Pao 21 17 80.95 20 95.24 20 95.24

LTR/Viper 20 7 35.00 9 45.00 11 55.00

Satellite 263 4 1.52 4 1.52 8 3.04

Simple_repeat 198,234 187,463 94.57 190,866 96.28 192,615 97.17

SINE 7,384 2,833 38.37 4,485 60.74 6,282 85.08

SINE? 2,276 224 9.84 1,238 54.39 1,712 75.22

SINE/Alu 17 5 29.41 9 52.94 15 88.24

SINE/Trna-Lys 160 11 6.88 143 89.38 159 99.38

SINE/V 31 2 6.45 11 35.48 31 100.00

snRNA 100 13 13.00 26 26.00 55 55.00

Unknown 79,164 31,238 39.46 50,920 64.32 65,857 83.19

Supplementary Table 9. Summary of the copy number of each TE class. Result 2

column: the number of copy but being careful that each copy is counted just one time.

Result 4 column: the number of copy that are 80% long of the reference size. The

reference size is the size of each element in the de novo library. Result 5 column:

threshold 50%. Result 6 column: threshold 30%.

Class Result2

(all)

Result4

(80%)

Yield

(80%)

Result5

(50%)

Yield

(50%)

Result6

(30%)

Yield

(30%)

DNA 68,700 18,981 27.63 35,450 51.60 48,617 70.77

LINE 21,973 5,774 26.28 8,914 40.57 12,300 55.98

LTR 1,704 591 34.68 829 48.65 988 57.98

SINE 9,868 3,075 31.16 5,886 59.65 8,199 83.09

Nature Genetics: doi:10.1038/ng.2890

Page 49: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

49

Supplementary Table 10. Transcriptome sequencing data statistics.

Sample

Accession

no. # of

library

# of read

pairs

Read1

len.

(bp)

Read2

len.

(bp)

# of

basepairs

(G)

% mapped

on

genome

Testis (ZZ testis P) SRX106096 1 8,847,831 73 75 1.31 90.18

Ovary (ZW ovary F1) SRX106097 1 8,897,059 73 75 1.32 88.68

Testis (ZW testis F1) SRX106098 1 6,671,850 90 90 1.20 79.01

Testis (ZW testis F2) SRX106099 1 9,827,201 90 90 1.77 73.79

Whole body (female

pre-)

SRX106100 1 11,983,356 90 90 2.16 77.72

Whole body (female

post-)

SRX106101 1 11,989,388 90 90 2.16 75.48

Whole body (male) SRX106103 1 13,657,075 90 90 2.46 89.98

Total/Average - 7 71,873,760 - - 12.38 82.12

Supplementary Table 11. Statistics of homology-based gene sets using proteins from

different species as parent proteins.

Medaka Takifugu Tetraodon Stickleback Zebrafish Human

# of parent proteins from

Ensembl 19,671 18,507 19,583 20,772 24,046 22,402

# of transcripts after rough

alignment 488,444 418,643 403,897 431,484 592,590 543,494

# of transcripts after precise

alignment 138,720 103,190 103,719 116,755 163,582 117,280

# of transcripts after transcript

clustering 17,228 16,876 17,262 17,675 16,593 13,609

# of transcripts after filtering

pseudogenes 15,191 16,221 16,751 17,158 16,237 13,263

Nature Genetics: doi:10.1038/ng.2890

Page 50: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

50

Supplementary Table 12. General statistics of each gene set. Gene length included the

exon and intron regions but excluded UTRs.

Gene set Number

Average

transcript

length (bp)

Average

CDS length

(bp)

# of exons

per gene

Average

exon

length (bp)

Average

intron

length (bp)

Homology-based 18,284 9,252 1,595 9.4 169 910

RNA-seq 30,253 5,383 1,054 5.6 189 945

De novo 27,327 11,052 1908 11.8 161 844

Reference 21,516 8,575 1,462 8.7 168 925

Supplementary Table 13. General statistics of non-coding RNA genes.

Type Copy # Average

length(bp)

Total length

(bp)

% of genome

miRNA 285 91 25,898 0.005

tRNA 674 77 52,204 0.109

rRNA

Total 104 107 11,175 0.002

18S 39 118 4,604 0.001

28S 32 107 3,432 0.001

5.8S 1 40 40 0.000

5S 32 97 3,099 0.001

snRNA

snRNA 221 128 28,381 0.006

CD-box 105 97 10,142 0.002

HACA-box 46 152 7,004 0.001

splicing 62 161 9,953 0.002

Supplementary Table 14. Differentially expressed genes between pre- and

post-metamorphosis. (see Excel file ‘Supplementary Table 14.xls’)

Supplementary Table 15. Enrichment of GO terms in differentially expressed genes

between pre-and post-metamorphosis.

GO_ID GO_Term GO_Class Gene no. Pvalue AdjustedPv

GO:0030246 carbohydrate binding MF 48 6.25E-33 3.09E-30

GO:0006959 humoral immune response BP 20 7.80E-17 8.68E-15

GO:0004866 endopeptidase inhibitor activity MF 26 2.12E-16 2.30E-14

GO:0042627 chylomicron CC 8 1.41E-12 8.87E-11

GO:0016491 oxidoreductase activity MF 53 3.34E-12 1.84E-10

GO:0004867 serine-type endopeptidase inhibitor MF 15 4.65E-11 2.28E-09

Nature Genetics: doi:10.1038/ng.2890

Page 51: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

51

activity

GO:0043691 reverse cholesterol transport BP 10 4.89E-11 2.37E-09

GO:0005579 membrane attack complex CC 7 5.03E-11 2.39E-09

GO:0030300

regulation of intestinal cholesterol

absorption BP 7 5.03E-11 2.39E-09

GO:0005506 iron ion binding MF 21 1.19E-10 5.31E-09

GO:0046486 glycerolipid metabolic process BP 26 1.64E-10 7.07E-09

GO:0042157 lipoprotein metabolic process BP 18 1.65E-10 7.07E-09

GO:0006721 terpenoid metabolic process BP 13 1.72E-10 7.30E-09

GO:0016064

immunoglobulin mediated immune

response BP 14 2.21E-10 9.06E-09

GO:0031526 brush border membrane CC 12 2.47E-10 1.00E-08

GO:0034361 very-low-density lipoprotein particle CC 8 4.22E-10 1.66E-08

GO:0034369 plasma lipoprotein particle remodeling BP 9 5.24E-10 1.99E-08

GO:0001523 retinoid metabolic process BP 12 6.15E-10 2.32E-08

GO:0042445 hormone metabolic process BP 19 1.51E-07 3.05E-06

GO:0007596 blood coagulation BP 34 1.82E-07 3.63E-06

GO:0002449 lymphocyte mediated immunity BP 15 1.91E-07 3.80E-06

GO:0015908 fatty acid transport BP 12 2.01E-07 3.98E-06

GO:0032052 bile acid binding MF 6 2.09E-07 4.11E-06

GO:0030195 negative regulation of blood coagulation BP 8 2.33E-07 4.54E-06

GO:0045923

positive regulation of fatty acid

metabolic process BP 9 2.64E-07 5.10E-06

GO:0006644 phospholipid metabolic process BP 22 2.91E-07 5.56E-06

GO:0006766 vitamin metabolic process BP 17 2.94E-07 5.59E-06

GO:0009629 response to gravity BP 4 4.74E-07 6.46E-05

GO:0051234 establishment of localization BP 150 6.30E-07 1.13E-05

GO:0042439

ethanolamine-containing compound

metabolic process BP 11 6.42E-07 1.14E-05

GO:0050878 regulation of body fluid levels BP 38 6.60E-07 1.15E-05

GO:0010817 regulation of hormone levels BP 27 7.31E-07 1.25E-05

GO:0050996

positive regulation of lipid catabolic

process BP 8 7.58E-07 1.29E-05

GO:0010743

regulation of macrophage derived foam

cell differentiation BP 7 7.88E-07 1.34E-05

GO:0000302 response to reactive oxygen species BP 7 8.53E-07 9.81E-05

GO:0009416 response to light stimulus BP 9 1.04E-06 9.81E-05

GO:0010898

positive regulation of triglyceride

catabolic process BP 5 2.33E-06 3.50E-05

GO:0014070 response to organic cyclic compound BP 9 2.72E-06 0.000230196

GO:0009612 response to mechanical stimulus BP 7 3.17E-06 0.000255324

GO:0046581 intercellular canaliculus CC 5 4.17E-06 6.04E-05

GO:0051918 negative regulation of fibrinolysis BP 5 4.17E-06 6.04E-05

GO:0009628 response to abiotic stimulus BP 13 4.21E-06 0.000310857

Nature Genetics: doi:10.1038/ng.2890

Page 52: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

52

GO:0032870 cellular response to hormone stimulus BP 11 4.38E-06 0.000310857

GO:0042730 fibrinolysis BP 6 5.27E-06 7.45E-05

GO:0019430 removal of superoxide radicals BP 6 5.27E-06 7.45E-05

GO:0030449 regulation of complement activation BP 4 5.88E-06 8.18E-05

GO:0051156 glucose 6-phosphate metabolic process BP 4 1.34E-05 0.000170447

GO:0009743 response to carbohydrate stimulus BP 16 1.35E-05 0.000170447

GO:0006711 estrogen catabolic process BP 3 1.62E-05 0.000193963

GO:0007565 female pregnancy BP 6 2.07E-05 0.001110553

GO:0019433 triglyceride catabolic process BP 6 3.02E-05 0.000335585

GO:0009991 response to extracellular stimulus BP 9 4.24E-05 0.002214245

GO:0031100 organ regeneration BP 4 0.000110264 0.004657322

GO:0060674 placenta blood vessel development BP 3 0.000136057 0.005613122

GO:0006694 steroid biosynthetic process BP 13 0.000186357 0.001615211

GO:0002755

MyD88-dependent toll-like receptor

signaling pathway BP 4 0.000215013 0.00794651

GO:0048545 response to steroid hormone stimulus BP 23 0.00022706 0.0019231

GO:0070412 R-SMAD binding MF 3 0.000231981 0.008398676

GO:0034142 toll-like receptor 4 signaling pathway BP 4 0.000288057 0.00964035

GO:0004181 metallocarboxypeptidase activity MF 6 0.000306543 0.002474005

GO:0005577 fibrinogen complex CC 4 0.000333554 0.002682283

GO:0042542 response to hydrogen peroxide BP 4 0.000447005 0.013440461

GO:0009888 tissue development BP 15 0.000462133 0.013663731

GO:0009725 response to hormone stimulus BP 38 0.000466229 0.003637563

GO:0055085 transmembrane transport BP 53 0.000626006 0.004700989

GO:0003051 angiotensin-mediated drinking behavior BP 2 0.000641572 0.004700989

GO:0002019

regulation of renal output by

angiotensin BP 2 0.000641572 0.004700989

GO:0060136

embryonic process involved in female

pregnancy BP 2 0.000653734 0.018275894

GO:0042573 retinoic acid metabolic process BP 4 0.000746605 0.005373383

GO:0043065 positive regulation of apoptotic process BP 9 0.001033357 0.024701008

GO:0012501 programmed cell death BP 15 0.001063105 0.024701008

GO:0001661 conditioned taste aversion BP 2 0.001124652 0.024701008

GO:0032496 response to lipopolysaccharide BP 5 0.00114176 0.024701008

GO:0007612 learning BP 4 0.001249489 0.026441915

GO:0048146

positive regulation of fibroblast

proliferation BP 3 0.001266946 0.026441915

GO:0071383

cellular response to steroid hormone

stimulus BP 4 0.001367844 0.0275745

GO:0001938

positive regulation of endothelial cell

proliferation BP 3 0.001405245 0.02777252

GO:0061113 pancreas morphogenesis BP 2 0.001892322 0.011692505

GO:0046983 protein dimerization activity MF 10 0.002031371 0.035679729

GO:0051403 stress-activated MAPK cascade BP 5 0.002055217 0.035744651

Nature Genetics: doi:10.1038/ng.2890

Page 53: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

53

GO:0008146 sulfotransferase activity MF 3 0.002748172 0.042028079

GO:0048731 system development BP 27 0.002790516 0.042310897

GO:0045087 innate immune response BP 6 0.003042499 0.045356244

GO:0003707 steroid hormone receptor activity MF 3 0.003203879 0.046244792

GO:0048856 anatomical structure development BP 29 0.003258511 0.046244792

GO:0043401

steroid hormone mediated signaling

pathway BP 3 0.003448075 0.047830647

GO:0006916 anti-apoptosis BP 5 0.003471181 0.047830647

GO:0006915 apoptotic process BP 14 0.003657897 0.049159912

GO:0019934 Cgmp-mediated signaling BP 3 0.005871181 0.02999554

GO:0006107 oxaloacetate metabolic process BP 3 0.005871181 0.02999554

GO:0015671 oxygen transport BP 3 0.005871181 0.02999554

GO:0019841 retinol binding MF 3 0.005871181 0.02999554

GO:0050308 sugar-phosphatase activity MF 3 0.005871181 0.02999554

GO:0009235 cobalamin metabolic process BP 2 0.00609787 0.030353083

GO:0009268 response to Ph BP 4 0.006630306 0.03263869

GO:0009651 response to salt stress BP 4 0.008361563 0.03969594

GO:0004745 retinol dehydrogenase activity MF 3 0.008451289 0.039799676

GO:0005859 muscle myosin complex CC 3 0.009953381 0.045386196

GO:0006111 regulation of gluconeogenesis BP 3 0.009953381 0.045386196

Supplementary Table 16. Enrichment of GO terms in down-regulated genes in

post-metamorphosis.

GO_ID GO_Term GO_Class Gene no. Pvalue AdjustedPv

GO:0051412 response to corticosterone stimulus BP 6 4.03E-09 3.77E-06

GO:0048545 response to steroid hormone stimulus BP 13 7.24E-09 3.77E-06

GO:0051384 response to glucocorticoid stimulus BP 9 9.17E-09 3.77E-06

GO:0032570 response to progesterone stimulus BP 6 1.09E-08 3.77E-06

GO:0010035 response to inorganic substance BP 13 1.10E-08 3.77E-06

GO:0071277 cellular response to calcium ion BP 5 3.77E-08 8.37E-06

GO:0051591 response to cAMP BP 7 5.99E-08 1.18E-05

GO:0042493 response to drug BP 12 8.45E-08 1.50E-05

GO:0009314 response to radiation BP 11 2.94E-07 4.74E-05

GO:0009605 response to external stimulus BP 18 3.95E-07 5.84E-05

GO:0009629 response to gravity BP 4 4.74E-07 6.46E-05

GO:0009725 response to hormone stimulus BP 15 7.62E-07 9.65E-05

GO:0000302 response to reactive oxygen species BP 7 8.53E-07 9.81E-05

GO:0009416 response to light stimulus BP 9 1.04E-06 9.81E-05

GO:0046022

positive regulation of transcription from

RNA polymerase II promoter during

mitosis

BP 3 1.11E-06 9.81E-05

Nature Genetics: doi:10.1038/ng.2890

Page 54: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

54

GO:0014070 response to organic cyclic compound BP 9 2.72E-06 0.000230196

GO:0009612 response to mechanical stimulus BP 7 3.17E-06 0.000255324

GO:0010033 response to organic substance BP 19 3.60E-06 0.000277463

GO:0009628 response to abiotic stimulus BP 13 4.21E-06 0.000310857

GO:0032870 cellular response to hormone stimulus BP 11 4.38E-06 0.000310857

GO:0051592 response to calcium ion BP 6 4.95E-06 0.000337525

GO:0070887 cellular response to chemical stimulus BP 16 8.23E-06 0.000540913

GO:0070482 response to oxygen levels BP 8 1.14E-05 0.000672302

GO:0010038 response to metal ion BP 8 1.75E-05 0.000969469

GO:0007565 female pregnancy BP 6 2.07E-05 0.001110553

GO:0009991 response to extracellular stimulus BP 9 4.24E-05 0.002214245

GO:0051726 regulation of cell cycle BP 12 4.38E-05 0.002220981

GO:0001666 response to hypoxia BP 7 5.90E-05 0.002692604

GO:0034097 response to cytokine stimulus BP 8 5.92E-05 0.002692604

GO:0042221 response to chemical stimulus BP 24 7.17E-05 0.003181564

GO:0003690 double-stranded DNA binding MF 6 7.84E-05 0.003392193

GO:0031100 organ regeneration BP 4 0.000110264 0.004657322

GO:0060674 placenta blood vessel development BP 3 0.000136057 0.005613122

GO:0002756 MyD88-independent toll-like receptor

signaling pathway BP 4 0.00016522 0.006661359

GO:0034130 toll-like receptor 1 signaling pathway BP 4 0.000174431 0.006876443

GO:0034134 toll-like receptor 2 signaling pathway BP 4 0.000204289 0.00771083

GO:0034138 toll-like receptor 3 signaling pathway BP 4 0.000204289 0.00771083

GO:0002755 MyD88-dependent toll-like receptor

signaling pathway BP 4 0.000215013 0.00794651

GO:0070412 R-SMAD binding MF 3 0.000231981 0.008398676

GO:0008063 Toll signaling pathway BP 4 0.000237668 0.008432454

GO:0060395 SMAD protein signal transduction BP 3 0.000280021 0.00964035

GO:0034142 toll-like receptor 4 signaling pathway BP 4 0.000288057 0.00964035

GO:0005667 transcription factor complex CC 7 0.000289006 0.00964035

GO:0043565 sequence-specific DNA binding MF 11 0.000293449 0.00964035

GO:0045655 regulation of monocyte differentiation BP 2 0.00030738 0.009914399

GO:0045944 positive regulation of transcription from

RNA polymerase II promoter BP 10 0.000332291 0.010526515

GO:0046332 SMAD binding MF 4 0.000411156 0.012796334

GO:0042542 response to hydrogen peroxide BP 4 0.000447005 0.013440461

GO:0009888 tissue development BP 15 0.000462133 0.013663731

GO:0044092 negative regulation of molecular function BP 10 0.000609477 0.01743892

GO:0060136 embryonic process involved in female

pregnancy BP 2 0.000653734 0.018275894

GO:0060711 labyrinthine layer development BP 3 0.000659333 0.018275894

GO:0031668 cellular response to extracellular stimulus BP 5 0.000699928 0.019102655

GO:0045088 regulation of innate immune response BP 5 0.000790853 0.021257177

GO:0071310 cellular response to organic substance BP 12 0.00084939 0.022159084

Nature Genetics: doi:10.1038/ng.2890

Page 55: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

55

GO:0035767 endothelial cell chemotaxis BP 2 0.000954015 0.023978928

GO:0001077

RNA polymerase II core promoter proximal

region sequence-specific DNA binding

transcription factor activity involved in

positive regulation of transcription

MF 3 0.000959698 0.023978928

GO:0044451 nucleoplasm part CC 10 0.00099865 0.024605634

GO:0031099 regeneration BP 5 0.001021702 0.024701008

GO:0043065 positive regulation of apoptotic process BP 9 0.001033357 0.024701008

GO:0010628 positive regulation of gene expression BP 12 0.001060828 0.024701008

GO:0012501 programmed cell death BP 15 0.001063105 0.024701008

GO:0003700 sequence-specific DNA binding

transcription factor activity MF 12 0.001111446 0.024701008

GO:2000278 regulation of DNA biosynthetic process BP 2 0.001124652 0.024701008

GO:0001661 conditioned taste aversion BP 2 0.001124652 0.024701008

GO:0032496 response to lipopolysaccharide BP 5 0.00114176 0.024701008

GO:0007612 learning BP 4 0.001249489 0.026441915

GO:0048146 positive regulation of fibroblast

proliferation BP 3 0.001266946 0.026441915

GO:2000108 positive regulation of leukocyte apoptosis BP 2 0.001308811 0.026998038

GO:0006357 regulation of transcription from RNA

polymerase II promoter BP 12 0.001334601 0.027213589

GO:0071383 cellular response to steroid hormone

stimulus BP 4 0.001367844 0.0275745

GO:0001938 positive regulation of endothelial cell

proliferation BP 3 0.001405245 0.02777252

GO:0002573 myeloid leukocyte differentiation BP 4 0.001408978 0.02777252

GO:0040029 regulation of gene expression, epigenetic BP 4 0.001537556 0.029329297

GO:0032501 multicellular organismal process BP 37 0.001597897 0.030144325

GO:0060255 regulation of macromolecule metabolic

process BP 27 0.001627004 0.030144325

GO:0003008 system process BP 17 0.00163126 0.030144325

GO:0045893 positive regulation of transcription,

DNA-dependent BP 11 0.001790929 0.032419476

GO:0009636 response to toxin BP 4 0.001868987 0.033490731

GO:0007049 cell cycle BP 13 0.001974191 0.035022146

GO:0046983 protein dimerization activity MF 10 0.002031371 0.035679729

GO:0051403 stress-activated MAPK cascade BP 5 0.002055217 0.035744651

GO:0005654 nucleoplasm CC 13 0.002103813 0.035886194

GO:0060716 labyrinthine layer blood vessel

development BP 2 0.002178608 0.036808097

GO:0050679 positive regulation of epithelial cell

proliferation BP 4 0.002304991 0.038001282

GO:0007275 multicellular organismal development BP 30 0.002307141 0.038001282

GO:0004879 ligand-activated sequence-specific DNA MF 3 0.002334915 0.038001282

Nature Genetics: doi:10.1038/ng.2890

Page 56: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

56

binding RNA polymerase II transcription

factor activity

GO:0043154

negative regulation of cysteine-type

endopeptidase activity involved in

apoptotic process

BP 3 0.002334915 0.038001282

GO:0046697 decidualization BP 2 0.002691962 0.0415904

GO:0007568 aging BP 5 0.002696108 0.0415904

GO:0008146 sulfotransferase activity MF 3 0.002748172 0.042028079

GO:0048731 system development BP 27 0.002790516 0.042310897

GO:0045087 innate immune response BP 6 0.003042499 0.045356244

GO:0003707 steroid hormone receptor activity MF 3 0.003203879 0.046244792

GO:0071478 cellular response to radiation BP 3 0.003203879 0.046244792

GO:0005634 nucleus CC 32 0.003209925 0.046244792

GO:0048856 anatomical structure development BP 29 0.003258511 0.046244792

GO:0043401 steroid hormone mediated signaling

pathway BP 3 0.003448075 0.047830647

GO:0006916 anti-apoptosis BP 5 0.003471181 0.047830647

GO:0007184 SMAD protein import into nucleus BP 2 0.003557786 0.048179487

GO:0006915 apoptotic process BP 14 0.003657897 0.049159912

GO:0031323 regulation of cellular metabolic process BP 28 0.00375143 0.049664457

Supplementary Table 17. Enrichment of GO terms in up-regulated genes in

post-metamorphosis.

GO_ID GO_Term

GO

_Cl

ass

Gene

No. Pvalue AdjustedPv

GO:0008202 steroid metabolic process BP 43 4.19E-39 3.73E-36

GO:0010876 lipid localization BP 40 1.66E-38 1.23E-35

GO:0030246 carbohydrate binding MF 48 6.25E-33 3.09E-30

GO:0071702 organic substance transport BP 63 1.04E-29 3.84E-27

GO:0072376 protein activation cascade BP 23 7.00E-29 2.23E-26

GO:0042180 cellular ketone metabolic process BP 75 6.26E-27 1.86E-24

GO:0044255 cellular lipid metabolic process BP 75 1.58E-26 4.39E-24

GO:0019752 carboxylic acid metabolic process BP 73 2.41E-26 5.65E-24

GO:0006956 complement activation BP 18 8.87E-26 1.97E-23

GO:0006066 alcohol metabolic process BP 55 3.67E-21 5.64E-19

GO:0000267 cell fraction CC 87 8.89E-18 1.04E-15

GO:0006959 humoral immune response BP 20 7.80E-17 8.68E-15

GO:0030299 intestinal cholesterol absorption BP 11 9.60E-15 8.91E-13

GO:0005792 microsome CC 29 1.97E-14 1.68E-12

GO:0017127 cholesterol transporter activity MF 11 1.55E-13 1.17E-11

Nature Genetics: doi:10.1038/ng.2890

Page 57: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

57

GO:0044243 multicellular organismal catabolic process BP 13 2.05E-13 1.49E-11

GO:0043498 cell surface binding MF 15 2.27E-13 1.58E-11

GO:0005903 brush border CC 17 2.52E-13 1.72E-11

GO:0009308 amine metabolic process BP 49 2.62E-13 1.77E-11

GO:0065008 regulation of biological quality BP 122 7.95E-13 5.21E-11

GO:0016491 oxidoreductase activity MF 53 3.34E-12 1.84E-10

GO:0050778 positive regulation of immune response BP 26 7.03E-12 3.77E-10

GO:0042221 response to chemical stimulus BP 126 3.31E-11 1.64E-09

GO:0005506 iron ion binding MF 21 1.19E-10 5.31E-09

GO:0001523 retinoid metabolic process BP 12 6.15E-10 2.32E-08

GO:0005496 steroid binding MF 15 3.22E-09 1.01E-07

GO:0006706 steroid catabolic process BP 9 5.69E-09 1.66E-07

GO:0031406 carboxylic acid binding MF 20 5.97E-09 1.73E-07

GO:0006805 xenobiotic metabolic process BP 16 8.82E-09 2.49E-07

GO:0006775 fat-soluble vitamin metabolic process BP 13 1.33E-08 3.39E-07

GO:0032374 regulation of cholesterol transport BP 10 1.40E-08 3.54E-07

GO:0005902 microvillus CC 14 1.43E-08 3.59E-07

GO:0008237 metallopeptidase activity MF 18 1.88E-08 4.57E-07

GO:0019835 cytolysis BP 9 2.55E-08 6.01E-07

GO:0019439 aromatic compound catabolic process BP 9 3.56E-08 8.22E-07

GO:0034754 cellular hormone metabolic process BP 15 5.51E-08 1.23E-06

GO:0006982 response to lipid hydroperoxide BP 5 6.02E-08 1.31E-06

GO:0007597 blood coagulation, intrinsic pathway BP 6 1.07E-07 2.21E-06

GO:0031210 phosphatidylcholine binding MF 6 1.07E-07 2.21E-06

GO:0006776 vitamin A metabolic process BP 10 1.43E-07 2.92E-06

GO:0042445 hormone metabolic process BP 19 1.51E-07 3.05E-06

GO:0006766 vitamin metabolic process BP 17 2.94E-07 5.59E-06

GO:0051234 establishment of localization BP 150 6.30E-07 1.13E-05

GO:0042439 ethanolamine-containing compound metabolic

process BP 11 6.42E-07 1.14E-05

GO:0042574 retinal metabolic process BP 6 6.49E-07 1.14E-05

GO:0033194 response to hydroperoxide BP 6 6.49E-07 1.14E-05

GO:0050878 regulation of body fluid levels BP 38 6.60E-07 1.15E-05

GO:0048037 cofactor binding MF 24 7.01E-07 1.22E-05

GO:0010817 regulation of hormone levels BP 27 7.31E-07 1.25E-05

GO:0005788 endoplasmic reticulum lumen CC 15 7.91E-07 1.34E-05

GO:0000303 response to superoxide BP 7 1.11E-06 1.84E-05

GO:0006576 cellular biogenic amine metabolic process BP 17 1.23E-06 2.03E-05

GO:0009617 response to bacterium BP 20 1.27E-06 2.09E-05

GO:0009986 cell surface CC 28 1.93E-06 3.02E-05

GO:0030141 secretory granule CC 18 2.14E-06 3.29E-05

GO:0032101 regulation of response to external stimulus BP 22 3.18E-06 4.74E-05

GO:0008206 bile acid metabolic process BP 9 3.34E-06 4.94E-05

GO:0031983 vesicle lumen CC 9 3.98E-06 5.87E-05

Nature Genetics: doi:10.1038/ng.2890

Page 58: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

58

GO:0051918 negative regulation of fibrinolysis BP 5 4.17E-06 6.04E-05

GO:0010885 regulation of cholesterol storage BP 5 4.17E-06 6.04E-05

GO:0006950 response to stress BP 104 4.65E-06 6.69E-05

GO:0015294 solute:cation symporter activity MF 16 4.77E-06 6.84E-05

GO:0042730 fibrinolysis BP 6 5.27E-06 7.45E-05

GO:0019430 removal of superoxide radicals BP 6 5.27E-06 7.45E-05

GO:0019842 vitamin binding MF 17 5.82E-06 8.15E-05

GO:0007584 response to nutrient BP 19 6.46E-06 8.97E-05

GO:0001798 positive regulation of type IIa hypersensitivity BP 3 1.62E-05 0.000193963

GO:0006711 estrogen catabolic process BP 3 1.62E-05 0.000193963

GO:0071944 cell periphery CC 145 1.64E-05 0.000195379

GO:0009607 response to biotic stimulus BP 27 1.69E-05 0.000201647

GO:0042572 retinol metabolic process BP 6 3.02E-05 0.000335585

GO:0052548 regulation of endopeptidase activity BP 16 4.85E-05 0.000516727

GO:0002576 platelet degranulation BP 10 5.13E-05 0.00054259

GO:0005501 retinoid binding MF 6 6.02E-05 0.000628552

GO:0042744 hydrogen peroxide catabolic process BP 5 8.88E-05 0.000867574

GO:0010575 positive regulation vascular endothelial growth

factor production BP 5 8.88E-05 0.000867574

GO:0019825 oxygen binding MF 6 0.000132992 0.001219092

GO:0016298 lipase activity MF 13 0.000134788 0.001230493

GO:0006694 steroid biosynthetic process BP 13 0.000186357 0.001615211

GO:0002526 acute inflammatory response BP 9 0.000188245 0.0016275

GO:0032501 multicellular organismal process BP 195 0.000206683 0.001767317

GO:0048545 response to steroid hormone stimulus BP 23 0.00022706 0.0019231

GO:0051181 cofactor transport BP 5 0.000236341 0.001975423

GO:0005507 copper ion binding MF 8 0.000282264 0.002303086

GO:0006548 histidine catabolic process BP 3 0.000305933 0.002473563

GO:0032488 Cdc42 protein signal transduction BP 3 0.000305933 0.002473563

GO:0005577 fibrinogen complex CC 4 0.000333554 0.002682283

GO:0008201 heparin binding MF 11 0.000509559 0.003961755

GO:0016829 lyase activity MF 14 0.000517595 0.004017219

GO:0033627 cell adhesion mediated by integrin BP 7 0.000568752 0.004338684

GO:0060613 fat pad development BP 2 0.000641572 0.004700989

GO:0055004 atrial cardiac myofibril development BP 2 0.000641572 0.004700989

GO:0042573 retinoic acid metabolic process BP 4 0.000746605 0.005373383

GO:0010744 positive regulation of macrophage derived

foam cell differentiation BP 3 0.001213816 0.008180862

GO:0001527 microfibril CC 3 0.001701511 0.011131031

GO:0004771 sterol esterase activity MF 2 0.001892322 0.011692505

GO:0044258 intestinal lipid catabolic process BP 2 0.001892322 0.011692505

GO:0060975 cardioblast migration to the midline involved in

heart field formation BP 2 0.001892322 0.011692505

GO:0055005 ventricular cardiac myofibril development BP 3 0.002295769 0.013655073

Nature Genetics: doi:10.1038/ng.2890

Page 59: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

59

GO:0009235 cobalamin metabolic process BP 2 0.00609787 0.030353083

GO:0031904 endosome lumen CC 2 0.00609787 0.030353083

GO:0006032 chitin catabolic process BP 2 0.00609787 0.030353083

GO:0090197 positive regulation of chemokine secretion BP 2 0.00609787 0.030353083

GO:0061302 smooth muscle cell-matrix adhesion BP 2 0.00609787 0.030353083

GO:0009268 response to pH BP 4 0.006630306 0.03263869

GO:0000302 response to reactive oxygen species BP 9 0.007174399 0.034854904

GO:0009651 response to salt stress BP 4 0.008361563 0.03969594

GO:0006909 phagocytosis BP 7 0.008366888 0.03969594

GO:0031016 pancreas development BP 12 0.008711503 0.04093855

GO:0042542 response to hydrogen peroxide BP 7 0.008864281 0.041612615

GO:0005859 muscle myosin complex CC 3 0.009953381 0.045386196

GO:0042359 vitamin D metabolic process BP 3 0.009953381 0.045386196

Note: The full lists can be obtained from the authors upon request.

Supplementary Table 18. Metabolism pathways (KEGG) enrichment by DGEs

between pre-and post-metamorphosis.

KO_ID Pvalue Gene Num Drscription

ko04610 1.36E-32 28 Complement and coagulation cascades

ko04974 3.84E-15 16 Protein digestion and absorption

ko01120 7.52E-07 14 Microbial metabolism in diverse environments

ko04975 6.47E-18 14 Fat digestion and absorption

ko01110 1.49E-03 13 Biosynthesis of secondary metabolites

ko04976 2.12E-10 13 Bile secretion

ko03320 5.92E-09 11 PPAR signaling pathway

ko04151 4.07E-02 10 PI3K-Akt signaling pathway

ko04972 5.90E-07 10 Pancreatic secretion

ko05322 1.75E-08 9 Systemic lupus erythematosus

ko01200 2.13E-04 8 Carbon metabolism

ko05150 1.29E-08 8 Staphylococcus aureus infection

ko05168 6.22E-03 8 Herpes simplex infection

ko00380 7.97E-06 7 Tryptophan metabolism

ko00500 1.02E-06 7 Starch and sucrose metabolism

ko00830 4.87E-07 7 Retinol metabolism

ko00983 4.87E-07 7 Drug metabolism - other enzymes

ko04973 7.87E-08 7 Carbohydrate digestion and absorption

ko04977 1.31E-07 7 Vitamin digestion and absorption

ko05020 7.13E-07 7 Prion diseases

ko05146 4.05E-04 7 Amoebiasis

ko00010 1.64E-04 6 Glycolysis / Gluconeogenesis

ko00600 1.64E-04 6 Sphingolipid metabolism

ko02010 1.64E-04 6 ABC transporters

Nature Genetics: doi:10.1038/ng.2890

Page 60: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

60

ko04145 1.25E-02 6 Phagosome

ko04380 1.50E-02 6 Osteoclast differentiation

ko04910 1.25E-02 6 Insulin signaling pathway

ko04920 7.60E-04 6 Adipocytokine signaling pathway

ko05164 3.30E-02 6 Influenza A

ko00260 1.60E-03 5 Glycine, serine and threonine metabolism

ko00330 2.70E-03 5 Arginine and proline metabolism

ko00340 5.92E-05 5 Histidine metabolism

ko00980 8.45E-06 5 Metabolism of xenobiotics by cytochrome P450

ko00982 6.52E-07 5 Drug metabolism - cytochrome P450

ko04260 8.38E-03 5 Cardiac muscle contraction

ko04514 3.52E-02 5 Cell adhesion molecules (CAMs)

ko04530 2.35E-02 5 Tight junction

ko04614 8.45E-06 5 Renin-angiotensin system

ko04668 3.52E-02 5 TNF signaling pathway

ko04670 2.82E-02 5 Leukocyte transendothelial migration

ko04950 2.82E-04 5 Maturity onset diabetes of the young

ko05160 3.72E-02 5 Hepatitis C

ko05204 3.00E-05 5 Chemical carcinogenesis

ko00052 1.35E-03 4 Galactose metabolism

ko00140 3.32E-03 4 Steroid hormone biosynthesis

ko00480 2.38E-03 4 Glutathione metabolism

ko00520 1.18E-02 4 Amino sugar and nucleotide sugar metabolism

ko00561 5.16E-03 4 Glycerolipid metabolism

ko00564 3.48E-02 4 Glycerophospholipid metabolism

ko00591 6.84E-06 4 Linoleic acid metabolism

ko00680 5.15E-04 4 Methane metabolism

ko04512 3.72E-02 4 ECM-receptor interaction

ko04640 1.89E-02 4 Hematopoietic cell lineage

ko04917 3.48E-02 4 Prolactin signaling pathway

ko04918 2.61E-02 4 Thyroid hormone synthesis

ko04930 5.89E-03 4 Type II diabetes mellitus

ko04978 1.65E-03 4 Mineral absorption

ko05031 1.89E-02 4 Amphetamine addiction

ko05133 1.73E-02 4 Pertussis

ko05140 2.42E-02 4 Leishmaniasis

ko00030 1.10E-02 3 Pentose phosphate pathway

ko00051 9.38E-03 3 Fructose and mannose metabolism

ko00120 3.34E-03 3 Primary bile acid biosynthesis

ko00270 3.01E-02 3 Cysteine and methionine metabolism

ko00350 2.44E-02 3 Tyrosine metabolism

ko00590 1.48E-02 3 Arachidonic acid metabolism

ko00620 2.44E-02 3 Pyruvate metabolism

ko00860 1.48E-02 3 Porphyrin and chlorophyll metabolism

Nature Genetics: doi:10.1038/ng.2890

Page 61: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

61

ko04612 4.73E-02 3 Antigen processing and presentation

ko05130 2.44E-02 3 Pathogenic Escherichia coli infection

ko05219 4.73E-02 3 Bladder cancer

ko05321 4.00E-02 3 Inflammatory bowel disease (IBD)

ko00053 1.74E-02 2 Ascorbate and aldarate metabolism

ko00360 3.25E-02 2 Phenylalanine metabolism

ko00592 1.33E-02 2 alpha-Linolenic acid metabolism

ko00627 1.74E-02 2 Aminobenzoate degradation

ko00760 4.45E-02 2 Nicotinate and nicotinamide metabolism

ko00770 2.20E-02 2 Pantothenate and CoA biosynthesis

ko00920 3.25E-02 2 Sulfur metabolism

ko00626 2.63E-02 1 Naphthalene degradation

Supplementary Table 19. Positively selected genes involved in the benthic adaptation.

We detected the significance of positive selection using branch-site likelihood ratio tests

(LRT) for differentially expressed genes between pre- and post metamorphosis fish. In

order to reduce false positive results, we further filtered genes without positive selected

site with Bayesian empirical Bayes posterior probability >0.95. After that, we identified

15 positive selected tongue sole genes (FDR<0.05).

ID Gene

name

P value q value Expression

in

pre-metamo

rphosis

Expression in

post-metamorp

hosis

Pvalue

Cse_R005509 xdh 5.14E-04 1.17E-02 2.12 15.64 0.00478

Cse_R006646 cd74 6.87E-06 7.04E-06 1.38 57.02 0.00282

Cse_R006647 cdhr2 1.81E-03 2.26E-02 0.29 24.56 5.82E-07

Cse_R007133 slc15a2 6.21E-04 1.22E-02 1.72 19.23 0.000714

Cse_R010110 cp 3.38E-05 3.63E-05 2.89 192.85 2.00E-06

Cse_R010893 mep1b 4.23E-03 3.94E-02 1.54 37.81 0.000106

Cse_R011343 hnf4a 2.35E-03 2.68E-02 0.13 13.55 0.00840

Cse_R011953 Mgam 4.27E-03 1.12E-02 1.05 69.54 1.08E-07

Cse_R012520 fbn1 4.65E-06 4.89E-06 2.04 11.80 0.0227

Cse_R012542 pepd 5.99E-05 6.21E-05 15.23 80.23 0.0215

Cse_R017122 gda 1.02E-05 1.18E-05 0.70 10.13 0.0480

Cse_R017410 itih2 7.24E-05 7.40E-05 7.82 634.28 1.01E-07

Cse_R019232 ace2 4.06E-05 4.31E-05 0.64 127.63 2.39E-08

Cse_R021489 cpb1 1.98E-03 2.39E-02 22.36 1147.71 2.15E-06

Cse_R004565 tmem67 1.55E-11 1.74E-11 14.43 1.19 0.00372

Nature Genetics: doi:10.1038/ng.2890

Page 62: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

62

Supplementary Table 20. Differentially expression of visual genes in tongue sole.

Gene ID Gene name

Pre-metamorp

hosis

Post-meta

morphosis

Deseq

Pvalue

Cuffdiff

Pvalue

Cse_R012558 arl6 6.59 2.70

Cse_R021804 arr3a 5.29 0.00

Cse_R009217 arrb2a 20.36 25.25

Cse_R013257 cngb1a 0.79 0.22

Cse_R009643 crx 8.13 6.53

Cse_R005505 cryaa 7.66 9.67

Cse_R009471 cryba1 22.03 56.49

Cse_R006443 cryba1-2 6.63 5.97

Cse_R007509 cryba2-2 53.91 83.75

Cse_R005516 cryba4 36.03 61.36

Cse_R005643 crybb1 48.15 111.86

Cse_R022133 crygm1 45.77 217.69

Cse_R022134 crygm2b 79.00 143.78

Cse_R022135 crygm2d1 0.33 0.00

Cse_R022136 crygm3 105.71 195.36

Cse_R022137 crygm4 141.11 365.55

Cse_R022138 crygm6 93.50 123.73

Cse_R022139 crygm7 29.50 71.71

Cse_R022140 crygmx 17.05 25.63

Cse_R011703 crygn1 63.38 105.15

Cse_R007152 crygs1 1.10 0.00

Cse_R020157 gnb1b 55.26 59.07

Cse_R014457 gprc5c 1.92 6.54

Cse_R006986 grk1a 0.63 1.48

Cse_R009520 grk1b 4.62 1.01

Cse_R010769 grk7a 2.94 0.94

Cse_R020372 guca1a 0.93 1.61

Cse_R012153 gucy2f 0.36 0.59

Cse_R003506 lws-1 10.02 72.00 1.65E-01 6.87E-03

Cse_R004508 nr2f5 7.42 6.39

Cse_R020265 nrl 0.00 1.75

Cse_R007130 nyx 4.17 0.21

Cse_R018786 opn3 1.06 2.02

Cse_R004111 opn4a 1.04 0.67

Cse_R012671 pax6b 1.26 1.46

Cse_R006160 pde6a 0.52 1.27

Cse_R009845 pde6d 8.00 7.27

Cse_R007917 pde6h 89.67 32.60

Cse_R021007 prph2b 3.58 1.65

Cse_R001520 prph2l 0.80 0.24

Nature Genetics: doi:10.1038/ng.2890

Page 63: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

63

Cse_R022101 rbp4l 58.95 46.96

Cse_R021726 rcv1 17.41 4.09

Cse_R001743 rdh12 5.52 8.48

Cse_R003214 rdh5 0.89 9.58

Cse_R004398 rgra 12.68 9.88

Cse_R002190 rh1 32.40 202.00 1.72E-01 1.62E-02

Cse_R002696 rh1-2 1.32 0.25

Cse_R003604 rh2-3 116.42 0.66 1.46E-06 1.33E-01

Cse_R003115 rh2-4 5.67 1.36

Cse_R012426 rlbp1a 3.47 2.07

Cse_R012949 rlbp1b 9.33 11.19

Cse_R020449 rom1a 0.00 0.89

Cse_R010366 rpe65a 4.21 4.45

Cse_R010235 rpe65b 3.09 4.33

Cse_R001395 rrh 0.00 12.51 2.19E-02 5.11E-01

Cse_R012478 slc17a6b 3.97 3.34

Cse_R012973 slc24a1 0.09 1.25

Cse_R003307 sws2 0.55 1.50

Cse_R021296 tmt opsin 5.27 1.48

Cse_R020410 unc119b 4.82 10.12

Cse_R021776 val-opsin 0.30 0.22

Cse_R004132 vsx1 2.65 1.97

Cse_R001732 vsx2 0.77 0.72

Supplementary Table 21. Distribution of visual genes among different teleost

species.

Nature Genetics: doi:10.1038/ng.2890

Page 64: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

64

Gene name position of zebrafish position of tongue sole position of medaka position of

stickleback position of takifugu position of tetradon

arl6 chr1:34630127:34637106 chr5:7137306:7138772 chr3:15631376:15633

349

chrII:8144964:814627

4 chrUn:5950961:5952185

chr5:7965139:796633

1

arr3a chr10:22210250:22214264 chr15:7865396:7868083 chr14:7947483:79497

87

chrIV:5743631:57462

65

chrUn:257902508:25790

5103

chr7:10308243:10310

311

arrb2a chr10:23288618:23304120 chr19:16572349:16599076 chr14:24197757:2420

8045

chrUn:29402686:2941

7341

chrUn:303685493:30369

3065

chr7:10573105:10580

512

cngb1a chr18:46076831:46110055 chr6:2061655:2072145 chr20:9640800:96628

66

chrII:22694544:22706

165

chrUn:306614647:30662

1499

chr5:13085214:13090

204

crx chr5:35041102:35043309 chr19:5803489:5805460 chr13:17569000:1757

0552

chrVII:12798906:1280

0819

chrUn:172424089:17242

5882

chr7:1171440:117342

2

cryaa chr1:22268452:22271596 chr14:4858664:4860180 NA chrI:27017447:270183

51

chrUn:217123687:21712

5074

chr3:7534674:753586

6

cryaba chr15:16791823:16796052 NA chr14:13383276:1338

3930 NA NA NA

cryabb chr5:58618320:58620950 chr19:17025715:17025826 NA chrVII:21681057:2168

1180

chrUn:26471838:264721

50

chr7:4487432:448774

5

cryba1 chr15:28615131:28620923 chr19:14128450:14129422 chr13:16481334:1648

3413 chrI:7614263:7615844

chrUn:35523910:355258

02

chr7:8298276:829926

3

cryba1-2 chr1:42024656:42033958 chr15:9864342:9865758 chr10:10377305:1037

8365

chrIV:9405971:94071

83

chrUn:223925480:22392

6459

chr1:7105519:710648

7

cryba2 chr6:15673295:15675736 NA chr14:8524577:85251

81 NA NA NA

cryba2-2 chr9:11131802:11137423 chr16:17922641:17923681 NA chrXVI:14978970:149 chrUn:155219702:15522 chr2:17695776:17697

Nature Genetics: doi:10.1038/ng.2890

Page 65: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

65

80571 1085 141

cryba4 chr14:48367386:48372475 chr14:17892038:17894070 NA chrVII:19097782:1909

8722

chrUn:74765633:747666

65

chr4:3449657:345071

6

crybb1 chr10:43061615:43066563 chr14:17897036:17899608 chr12:1196622:11994

01

chrIV:9409132:94100

24

chrUn:74761122:747632

99

chr4:3452912:345484

2

crybb2 chr8:44340649:44346685 NA chr9:21418050:21419

130

chrXIII:14810946:148

12077

chrUn:29139679:291406

51

chr12:11778378:1177

9338

crybb3 chr5:4046262:4061457 NA chr5:17098228:17099

427

chrXIII:14806389:148

07775

chrUn:188989243:18899

0331

chr11:1309064:13100

05

crygm1 chr9:21365809:21366459 chr16:2421646:2422326 NA chrXVI:17433612:174

34243

chrUn:397245533:39724

6165

chr2:9027954:902859

0

crygm2a chr9:21429273:21430015 NA chr9:864770:865364 chrXVI:17415572:174

16177 NA

chr2:9042505:904569

9

crygm2b chr9:21422138:21422798 chr16:2436089:2436699 NA chrUn:6802526:68031

41

chrUn:83016035:830166

25

chr2:9040268:904085

6

crygm2c chr9:21375831:21376491 NA NA NA NA NA

crygm2d1 chr9:21692397:21693001 chrW:2540941:2541560 chr6:372787:373433 chrXVI:17420627:174

24128

chrUn:83001430:830047

47

chr2:9034905:903789

8

crygm2d2 chr9:21671332:21671999 NA NA NA chrUn:83008738:830093

66 NA

crygm3 chr9:21452416:21559110 chr16:2424421:2425582 NA NA NA NA

crygm4 chr6:33688:34417 chr16:2433575:2434176 NA chrXVI:17405304:174

09304

chrUn:83019584:830202

08

chr2:9029746:903036

6

crygm6 chr9:21354619:21355403 chr16:2427241:2428202 NA NA NA NA

crygm7 chr9:21346935:21347528 chrZ:12490586:12491202 NA chrXVI:17438402:174 chrUn:217290278:21729 chr3:7688566:768916

Nature Genetics: doi:10.1038/ng.2890

Page 66: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

66

39131 0877 9

crygmx chr12:29716014:29718879 chr8:1700678:1701431 chr19:8685105:86857

06

chrV:7963350:796397

7

chrUn:78951321:789519

80

chr2_random:234222

:234896

crygn1 chr24:39452487:39456180 chr3:11087047:11088304 chr20:2703841:27061

45

chrIII:11471667:11473

075

chrUn:137833855:13783

4986

chr15_random:20041

99:2005300

crygs1 chr22:12263776:12266898 chr16:2450946:2458805 NA NA NA NA

crygs2 chr9:27530684:27530959 NA chr19:8687483:86877

05

chrXVI:17677440:176

77712

chrUn:217298046:21729

8289

chr2:9025887:902612

7

crygs3 chr9:21705154:21714358 chr8:1702178:1703179 chr10:10380534:1038

0642

chrXVI:17444639:174

45744

chrUn:82997227:829978

78

chr2:9049313:904994

2

crygs4 chr9:21718929:21719857 NA NA NA NA NA

exorhodopsin chr5:3289911:32901252 C18048053:0:132 NA NA chrUn:100317234:10056

3867

chr9:6675526:667795

4

gnb1b chr6:56918432:56934276 chr10:6957465:6964184 chr5:6191224:619460

9

chrXII:5927076:59362

08

chrUn:214305327:21431

3356

chr11:5409500:54115

64

gprc5c chr12:35420963:35428566 chr8:14767304:14769095 chr19:20498779:2050

2965

chrUn:25840490:2584

2902

chrUn:200648387:20065

0714

chr2:2277187:227893

6

grk1a chr1:118715:129410 chr16:3035456:3039078 chr3:14912611:14918

790

chrII:7743346:774845

1 chrUn:5580656:5583387

chr5:8313086:831559

5

grk1b chr5:54855883:54888962 chr19:17064421:17067690 NA chrVII:18425527:1843

6420

chrUn:25266325:252689

39

chr7:5698768:570135

5

grk7a chr2:15562539:15568697 chr20:4393705:4397167 chr17:5644952:56492

12

chrIII:9026311:903145

3

chrUn:227707038:22771

0945

chr15:3157486:31614

65

grk7b chr18:40733846:40746948 NA NA NA NA NA

guca1a chr4:22136311:22142451 chr8:19759018:19761342 chr23:17668993:1767

1643 NA

chrUn:63168264:631697

04

chr19:1664665:16655

72

Nature Genetics: doi:10.1038/ng.2890

Page 67: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

67

gucy2f chr15:29896697:29912292 chr4:2154308:2160233 chr13:28006805:2801

8225 chrI:3129415:3136785

chrUn:201555218:20156

3682

chrUn_random:19059

923:19069042

lws-1 chr11:25243074:25244840 chr11:13019711:13021894 chr5:27015550:27017

094

chrXVII:10627215:106

29057

chrUn:151405983:15140

7509

chr11:10122886:1012

4420

lws-2 chr11:25246618:25248121 NA NA NA NA NA

melanopsin chr2:30821271:30822770 NA NA NA NA NA

nr2f5 Zv8_scaffold1983:39478:7

0574 chr13:17583519:17596933

chr16:26934074:2695

2893

chrXIII:10982581:109

88486

chrUn:230315105:23032

5107

chrUn_random:29073

690:29082306

nrl chr20:679387:687840 chr2:16689639:16692983 chr4:4467562:447035

2

chrXIX:15625005:156

25472

chrUn:45260212:452627

94

chr1:12102639:12105

214

nyx chr9:33757710:33761079 chr16:903013:905666 chr21:13997418:1400

2239

chrXVI:16092902:160

94781

chrUn:153909047:15391

1803

chr2:8833914:883590

5

opn3 chr13:43477486:43498878 chr12:11340305:11345446 chr15:2447345:24618

11

chrVI:2329590:23353

42

chrUn:116719299:11672

4962

chrUn_random:53788

012:53814378

opn4a chr13:23043402:23086206 chr12:1859621:1866421 chr15:11524970:1153

7721

chrVI:15314511:1531

8456

chrUn:193506712:19350

9497

chr17:12081248:1208

3233

opn4b chr12:21759249:21827465 chr8:2508389:2515823 NA NA NA NA

opn4d Zv8_NA655:10839:11037 NA NA NA NA NA

opn4xa Zv8_scaffold3005:11589:3

6184 chr14:21630852:21635640

chr12:12936170:1294

2529

chrXIV:6298337:6304

189

chrUn:156719206:15672

1931

chr4:2148619:215174

9

pax6b chr7:15139477:15157263 chr5:13259747:13274639 chr3:22095052:22104

761

chrII:12855506:12865

591

chrUn:292688922:29269

8404

chr5:4249477:425881

5

pde6a chr14:26497193:26525473 chr15:11714942:11738426 chr12:14585041:1459

2731

chrIV:4187015:42019

95

chrUn:159149146:15915

3473

chr1_random:742637

:751021

Nature Genetics: doi:10.1038/ng.2890

Page 68: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

68

pde6d chr6:31612913:31624502 chr2:8289672:8295840 chr4:5917997:592164

7

chrUn:12645200:1265

3807

chrUn:136520885:13652

6092

chrUn_random:97000

19:9702642

pde6h chr6:297076:297612 chr17:12893019:12893489 chr1:14878202:14878

498 chrIX:585765:586081

chrUn:133985019:13398

5345

chr1:9679461:967978

9

prph2a chr12:34128364:34131833 NA NA chrV:2546241:254999

4

chrUn:300569865:30057

3269 chr2:302665:305548

prph2b chr13:3179895:3184398 chr8:6100658:6103500 chr15:12169467:1218

5745

chrVI:16880111:1688

1793

chrUn:309697503:30969

9893

chr17:7941048:79433

76

prph2l chr20:27026320:27031560 chr1:4746754:4748889 chr22:12108231:1210

9881

chrXV:5358746:53609

25

chrUn:160681709:16068

3450

chr10:5763182:57646

82

rbp4l chr21:21626947:21628781 chr14:2861227:2862229 chr12:14599160:1460

0382

chrXIV:7331671:7332

831

chrUn:268901711:26890

2547

chr4:1357247:135807

2

rcv1 chr16:12621814:12630231 chr13:116279:131800 chr16:16772746:1677

5866

chrXX:12101113:1210

3944

chrUn:249252638:24925

5091

chr8:8417368:841987

0

rdh12 chr13:32552398:32562215 chr1:15632409:15635405 chr22:1285332:12908

45

chrXV:12254941:1225

9032

chrUn:324578526:32458

1199

chr10:11977602:1198

0953

rdh5 chr22:10202816:10207270 chr11:18375517:18378022 chr5:11670763:11672

453

chrXVII:4130715:4132

561

chrUn:108967023:10896

8429

chrUn_random:10473

2706:104733737

rgra chr13:28942048:28951369 chr12:6247212:6250653 chr15:21191229:2119

5707

chrVI:9578464:95822

32

chrUn:159082004:15908

5928

chr17:5859132:58624

67

rgrb chr12:46181534:46186396 chr8:9263190:9264701 chr19:10354791:1036

0151

chrV:1041957:104410

0

chrUn:126204382:12620

7456

chr2_random:135851

6:1360332

rgs9a chr3:33364845:33544686 chr8:6135569:6192105 chr8:25558944:25566

994

chrV:2578466:259720

7

chrUn:300538227:30054

8989 chr2:325346:346546

rh1 chr8:55021585:55022646 chr10:19960637:19961692 chr7:17099105:17100

166 NA

chrUn:236948934:23695

1379

chr9:6477145:647820

3

rh1-2 chr11:18457297:18458361 chr10:19704632:19710318 chr7:17427109:17431 chrXII:809524:810585 chrUn:329632812:32963 NA

Nature Genetics: doi:10.1038/ng.2890

Page 69: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

69

438 3870

rh2-1 chr6:44758736:44760163 NA NA chrUn:4146636:41481

63

chrUn:116235050:11624

1405

chr11:5969516:59711

95

rh2-2 chr6:44763173:44765699 NA NA chrUn:4159229:41607

11

chrUn:121175430:12117

6839 NA

rh2-3 chr6:44768320:44769925 chr11:1838439:1839836 NA NA NA NA

rh2-4 chr6:44777361:44779083 chr11:1832872:1834306 NA NA NA NA

rlbp1a chr7:13783337:13797933 chr5:11935601:11939095 NA chrII:18541351:18543

730 chrUn:3546250:3548552

chr5:10250288:10252

112

rlbp1b chr25:19784470:19794521 chr6:5183885:5186084 chr6:3016355:301932

0

chrXIX:8198698:8200

593

chrUn:177166614:17716

7920

chr13:12922108:1292

3430

rom1a chr5:69548675:69563648 chr19:3496846:3500860 chr14:20464466:2046

9166

chrVII:15435309:1543

8336

chrUn:268607997:26861

0395

chr1:8145042:814756

1

rom1b chr14:47702028:47724669 NA NA NA NA NA

rpe65a chr18:32417354:32431572 chr20:6990336:6993526 chr17:27878325:2788

1385

chrIII:16242210:16245

102

chrUn:180255831:18025

8488

chr15_random:56733

4:570132

rpe65b chr8:17010806:17034228 chr2:12003333:12009540 NA NA NA NA

rrh chr13:12684316:12696453 chr1:26469191:26474035 chr22:24528456:2469

1117

chrUn:28395602:2840

2886

chrUn:313296421:31331

0193

chrUn_random:84538

671:84549916

slc17a6b chr7:33140640:33160060 chr5:5687819:5698346 chr6:10458327:10466

126

chrII:8899523:890958

5 chrUn:6579519:6587249

chr13:11543805:1154

9215

slc24a1 chr18:19367141:19397385 chr6:12425077:12429331 chr6:11551320:11558

370

chrXIX:6447101:6452

379 chrUn:8377228:8383521

chr13:7978598:79818

51

sws1 chr4:13183278:13185192 NA NA chrUn:22274430:2227

5993 NA NA

sws2 chr11:25238223:25240517 chr11:12958047:12960079 chr5:27005056:27006 chrXVII:10617511:106 chrUn:151401100:15140 chr11:10118271:1012

Nature Genetics: doi:10.1038/ng.2890

Page 70: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

70

280 18964 3145 0350

tmt opsin chr24:25845993:25890266 scaffold774:17374:32411 chr20:22126027:2215

4963

chrIII:10225105:10244

263

chrUn:277070438:27708

9218

chr15:2765956:27776

37

tmtopsa chr2:3163071:3247880 NA NA chrXXI:11213300:112

25660 NA NA

unc119b chr21:40981379:41036043 chr4:10969487:10982802 chr14:8502824:85197

33

chrVII:19085243:1909

2983

chrUn:142281722:14228

7312

chr7:8290900:829550

4

valopb chr12:24446255:24463083 NA NA NA NA NA

val-opsin Zv8_scaffold3004:58940:9

0833 chr12:11916080:11925922 chr15:767496:797320

chrVI:1692943:17063

49

chrUn:121171792:12117

265

chrUn_random:94902

109:94915058

vsx1 chr17:15568595:15570735 chr12:3663321:3664736 chr15:11908728:1191

0809

chrVI:16670946:1667

3485

chrUn:333035705:33303

7533

chr17:7840128:78418

68

vsx2 chr17:29320140:29326566 chr1:6241492:6246999 chr24:17331779:1733

6544

chrXV:2100271:21043

65

chrUn:67613215:676166

32

chr10:4896479:49001

16

Note: The pseudogenes were marked by red color.

Nature Genetics: doi:10.1038/ng.2890

Page 71: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

71

Supplementary Table 22. Oxford grid showing the numbers of paralogues between

all pairs of tongue sole chromosomes. Red cells imply paralogous chromosomes with

more than 20 common paralogous chromosomes, while other cells to conclude that two

chromosomes are non-paralogous.

Supplementary Table 23. Oxford grid showing the numbers of orthologues between

tongue sole and Tetraodon chromosomes. Chromosome numbers are shown in the order

used in Supplementary Table 22, so that paralogous chromosomes are placed in proximity.

Cells with more than 100 orthologues are highlighted in red and those with more than 20

are in yellow, except for cells labeled “Un”, which represents sequences unplaced to

chromosomes.

Nature Genetics: doi:10.1038/ng.2890

Page 72: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

72

Supplementary Table 24. Oxford grid showing the numbers of orthologues between

tongue sole and medaka chromosomes. Chromosome numbers are shown in the order

used in Supplementary Table 22, so that paralogous chromosomes are placed in proximity.

Cells with more than 100 orthologues are highlighted in red and those with more than 20

are in yellow, except for cells labeled “Un”, which represents sequences unplaced to

chromosomes.

Supplementary Table 25. Oxford grid showing the numbers of orthologues between

tongue sole and zebrafish chromosomes. Chromosome numbers are shown in the order

used in Supplementary Table 22, so that paralogous chromosomes are placed in proximity.

Cells with more than 100 orthologues are highlighted in red and those with more than 20

are in yellow, except for cells labeled “Un”, which represents sequences unplaced to

chromosomes.

Nature Genetics: doi:10.1038/ng.2890

Page 73: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

73

Supplementary Table 26. List of DCSs. Each row shows the values associated with one

DCS. The columns display, from left to right, the human chromosome number,

chromosome numbers of two tongue sole duplicate chromosomes (Cse-a and Cse-b),

numbers of orthologous genes on the two tongue sole duplicate chromosomes in the DCS.

Hsa Cse-a Cse-b Cse-a

orthologues

Cse-b

orthologues

1 2 20 101 59

1 10 11 93 74

1 13 18 45 66

2 1 7 33 36

2 4 20 10 2

2 8 12 7 30

2 9 12 8 19

2 14 16 43 125

2 14 Z 30 101

3 2 20 12 9

3 3 20 2 2

3 4 20 22 2

3 10 11 29 127

3 13 18 22 13

4 1 9 19 46

4 9 15 43 42

4 14 Z 7 25

5 13 18 10 11

5 14 Z 26 73

5 15 19 89 19

6 1 7 22 35

6 9 12 4 33

6 10 11 8 18

6 13 18 6 15

7 3 20 2 4

7 4 19 9 11

7 6 8 8 28

7 8 17 13 20

7 13 18 34 38

8 3 20 32 24

8 13 18 43 20

8 14 Z 31 19

9 1 9 3 7

9 9 15 19 4

Nature Genetics: doi:10.1038/ng.2890

Page 74: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

74

9 14 Z 24 74

10 3 20 38 12

10 6 8 2 5

10 8 12 52 75

11 4 19 51 21

11 5 6 52 80

12 6 8 15 74

12 10 11 56 33

12 14 Z 13 54

13 4 19 5 16

13 14 16 14 30

14 1 7 200 26

15 1 7 46 4

15 5 6 92 39

16 5 6 44 71

16 8 17 42 57

16 9 17 15 23

17 4 19 13 13

17 8 17 50 47

17 9 17 40 69

18 2 20 5 8

18 3 20 7 17

19 2 20 56 10

19 5 6 12 3

19 9 17 39 17

20 9 12 10 15

20 10 11 68 32

21 4 19 6 8

21 14 16 5 15

22 6 8 3 17

22 9 17 14 27

22 14 Z 5 22

X 14 16 8 9

X 15 19 52 18

Nature Genetics: doi:10.1038/ng.2890

Page 75: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

75

Supplementary Table 27. Comparison of structural features of tongue sole Z and W

with autosomes.

Chr. Genes per

megabase

Total

Tes

DNA

Tes LINE LTR SINE

Unclassified

Tes

Average

gene

size

(bp)

chr1 43 5.11% 2.41% 0.69% 0.01% 0.32% 1.68% 8,930

chr2 45 4.39% 2.15% 0.45% 0.01% 0.23% 1.55% 9,101

chr3 37 4.97% 2.15% 0.83% 0.06% 0.17% 1.76% 10,650

chr4 44 4.82% 2.17% 0.51% 0.02% 0.20% 1.92% 9,351

chr5 37 3.35% 1.55% 0.33% 0.01% 0.21% 1.25% 10,264

chr6 52 4.10% 1.73% 0.31% 0.01% 0.18% 1.87% 8,039

chr7 47 4.31% 1.90% 0.62% 0.04% 0.12% 1.63% 8,556

chr8 49 5.05% 2.42% 0.68% 0.05% 0.22% 1.68% 9,428

chr9 52 4.65% 2.16% 0.55% 0.02% 0.24% 1.68% 7,833

chr10 49 4.72% 2.34% 0.57% 0.03% 0.24% 1.54% 8,500

chr11 51 3.42% 1.42% 0.40% 0.01% 0.22% 1.37% 8,612

chr12 40 3.46% 1.70% 0.29% 0.01% 0.14% 1.32% 10,820

chr13 43 4.30% 2.15% 0.52% 0.03% 0.20% 1.40% 9,080

chr14 43 4.49% 1.93% 0.64% 0.02% 0.28% 1.62% 9,550

chr15 39 5.08% 2.04% 0.58% 0.08% 0.20% 2.18% 8,594

chr16 43 4.99% 2.42% 0.53% 0.02% 0.19% 1.83% 9,839

chr17 60 4.56% 2.23% 0.50% 0.01% 0.17% 1.65% 7,061

chr18 51 3.55% 1.62% 0.27% 0.01% 0.13% 1.52% 7,630

chr19 48 3.05% 1.32% 0.33% 0.03% 0.18% 1.19% 8,782

chr20 58 2.32% 0.91% 0.21% 0.01% 0.08% 1.11% 7,733

Autosomes 46 4.33% 1.99% 0.51% 0.02% 0.21% 1.60% 8,876

chrZ 42 13.13% 4.74% 3.95% 0.23% 0.43% 3.78% 9,857

chrW 19 29.94% 8.74% 9.39% 1.09% 0.46% 10.26% 12,156

Nature Genetics: doi:10.1038/ng.2890

Page 76: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

76

Supplementary Table 28. PAR genes and protein function. PAR region includes two

scaffolds, scaffold589 (398,660 bp) and scaffold757 (243,113 bp), which are anchored to

the distal of Z by and have the same coverage depth in both male and female samples. We

identified 22 protein-coding genes and 1 pseudo gene on PAR, and inferred their function

by BLAST searching against SwissProt (E-value<1e-5) and kept the best hit. Furthermore,

we presented the human orthologue loci, if any.

Gene ID Scaffold Functional Gene

Name

Human

chr. Protein

CSZ00000142.4 scaffold589 Yes pbx3 9 Pre-B-cell leukemia transcription

factor 3

CSZ00000940.4 scaffold589 Yes unknown

CSZ00000660.4 scaffold589 Yes fam125b 9 Multivesicular body subunit 12B

CSZ00000791.4 scaffold589 Yes lmx1b 9 LIM homeobox transcription factor

1-beta

CSZ00000041.4 scaffold589 Yes zbtb34 9 Zinc finger and BTB

domain-containing protein 34

CSZ00000311.4 scaffold589 Yes angptl2 9 Angiopoietin-related protein 2

CSZ00000543.4 scaffold589 Yes stat2 12 Signal transducer and activator of

transcription 2

CSZ00000762.4 scaffold589 Yes hmcn2 9 Hemicentin-2

CSZ00000433.4 scaffold589 Yes ncs1 9 Neuronal calcium sensor 1

CSZ00000899.4 scaffold589 Yes adamts13 9 A disintegrin and metalloproteinase

with thrombospondin motifs 13

CSZ00000859.4 scaffold757 No pbx3 9 Pre-B-cell leukemia transcription

factor 3

CSZ00000288.4 scaffold757 Yes unknown

CSZ00000490.4 scaffold757 Yes gapvd1 9 GTPase-activating protein and VPS9

domain-containing protein 1

CSZ00000020.4 scaffold757 Yes c9orf172 9 Uncharacterized protein

CSZ00000272.4 scaffold757 Yes syn1 X Synapsin-1

CSZ00000758.4 scaffold757 Yes vgll4 3 Transcription cofactor vestigial-like

protein 4

CSZ00000040.4 scaffold757 Yes slc20a1a 2 Sodium-dependent phosphate

transporter 1-A

CSZ00000664.4 scaffold757 Yes dtx1 11 Protein deltex-1

CSZ00000508.4 scaffold757 Yes rasal1 12 RasGAP-activating-like protein 1

CSZ00000897.4 scaffold757 Yes rasal1 12 RasGAP-activating-like protein 1

CSZ00000423.4 scaffold757 Yes dgcr6 22 Protein DGCR6

CSZ00000596.4 scaffold757 Yes slc7a4 22 Cationic amino acid transporter 4

CSZ00000458.4 scaffold757 Yes rnf34 12 E3 ubiquitin-protein ligase RNF34

Nature Genetics: doi:10.1038/ng.2890

Page 77: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

77

Supplementary Table 29. Classification of Z and W genes in non-PAR region. Z-W,

genes both on Z and W; Z-A, Z genes which have paralogues on autosomes or unplaced

scaffolds; Z-S, Z specific genes; W-Z_random, W genes homologous to unplaced

Z-linked genes; W-A, W genes which have paralogues on autosomes or unplaced

scaffolds; W-S, W specific genes.

Type

Z (non-PAR) W (non-PAR)

Functional

genes

Pseudo

genes Total

Functional

genes

Pseudo

genes Total

Z-W 286 11 297 272 67 339

Z-A 248 10 258 NA* NA NA

Z-S 370 12 382 NA NA NA

W-Z_random NA NA NA 17 7 24

W-A NA NA NA 26 4 30

W-S NA NA NA 2 0 2

Total 904 33 937 317 78 395

*NA, not available.

Supplementary Table 30. Distribution of pseudogenes on different chromosomes.

Type # Functional

genes # Pseudogenes % Pseudogenes

Z 926 34 3.54

W

(non-PAR)* 317 78 19.74

Autosomal 18,714 475 2.48

Unplaced 1,559 18 1.14

Total 21,516 605 2.73

*Genes in non-PAR region.

Nature Genetics: doi:10.1038/ng.2890

Page 78: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

78

Supplementary Table 31. Estimation of divergence rate and divergence time

between Z and W chromosomes. We assume the Z and W chromosome evolutionary

rate both being equal to the lineage specific rate. MY, million years.

Type Ks Time/Rate Min Mean Max

Lineage 0.47 Rate(/Site/Year) 2.76E-09 2.39E-09 2.14E-09

Time(MY) 170 197 220

Z-W 0.15 Rate(/Site/Year) 5.53E-09 4.77E-09 4.27E-09

Time(MY) 27 31 35

Supplementary Table 32. Percentage of genes expressed in testis. “Autosomes

(ortholog)” means autosomal genes that have reciprocal best orthologs on chicken

autosomes, while “chrZ (ortholog)” means chrZ genes that have reciprocal best orthologs

on chicken Z.

Categories % of gene expressed in testis

(RPKM>=1)

% of gene expressed in testis

(RPKM>=10)

Autosomes 87.52 59.47

chrZ 86.73 (0.9646*) 59.46 (0.8866*)

Autosomes(ortholog) 93.73 71.30

chrZ(ortholog) 92.26(0.8964**) 74.23(0.7592**)

*P value of Chi-square test between “chrZ” and “Autosomes”.

**P value of Chi-square test between “chrZ (ortholog)” and “Autosomes (ortholog)”.

Nature Genetics: doi:10.1038/ng.2890

Page 79: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

79

Supplementary Table 33. GO enrichment of chicken Z genes (P value<0.01, Fisher

exact test). We annotated the motifs and domains of chicken Z genes by InterProScan

against publicly available databases including Pfam, PRINTS, PROSITE, ProDom,

SMART and PANTHER, and then retrieved Gene Ontology (GO) annotation from the

results of InterProScan.

Terms Definition P value

Z

genes

in GO

All Z

gene

s

All

genes

in GO

All

gene

Enrich

rate

Ontolog

y

GO:0006950 response to stress 0.0005 19 489 180 10843 2.3 BP

GO:0015057 thrombin receptor activity 0.0016 3 489 6 10843 11.1 MF

GO:0004465 lipoprotein lipase activity 0.0020 2 489 2 10843 22.2 MF

GO:0000149 SNARE binding 0.0020 2 489 2 10843 22.2 MF

GO:0015204 urea transmembrane

transporter activity 0.0020 2 489 2 10843 22.2 MF

GO:0008892 guanine deaminase activity 0.0020 2 489 2 10843 22.2 MF

GO:0019905 syntaxin binding 0.0020 2 489 2 10843 22.2 MF

GO:0008531 riboflavin kinase activity 0.0020 2 489 2 10843 22.2 MF

GO:0018342 protein prenylation 0.0020 2 489 2 10843 22.2 BP

GO:0006771 riboflavin metabolic process 0.0020 2 489 2 10843 22.2 BP

GO:0042887 amide transporter activity 0.0020 2 489 2 10843 22.2 MF

GO:0042886 amide transport 0.0020 2 489 2 10843 22.2 BP

GO:0042727 riboflavin and derivative

biosynthetic process 0.0020 2 489 2 10843 22.2 BP

GO:0042726 riboflavin and derivative

metabolic process 0.0020 2 489 2 10843 22.2 BP

GO:0009231 riboflavin biosynthetic

process 0.0020 2 489 2 10843 22.2 BP

GO:0015840 urea transport 0.0020 2 489 2 10843 22.2 BP

GO:0008152 metabolic process 0.0023 160 489 2924 10843 1.2 BP

GO:0033554 cellular response to stress 0.0028 10 489 79 10843 2.8 BP

GO:0000003 reproduction 0.0028 4 489 14 10843 6.3 BP

GO:0022414 reproductive process 0.0028 4 489 14 10843 6.3 BP

GO:0003684 damaged DNA binding 0.0028 4 489 14 10843 6.3 MF

GO:0051716 cellular response to stimulus 0.0037 10 489 82 10843 2.7 BP

GO:0008318 protein prenyltransferase

activity 0.0059 2 489 3 10843 14.8 MF

GO:0055102 lipase inhibitor activity 0.0059 2 489 3 10843 14.8 MF

GO:0004859 phospholipase inhibitor

activity 0.0059 2 489 3 10843 14.8 MF

GO:0006281 DNA repair 0.0082 9 489 78 10843 2.6 BP

GO:0034984 cellular response to DNA 0.0082 9 489 78 10843 2.6 BP

Nature Genetics: doi:10.1038/ng.2890

Page 80: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

80

damage stimulus

GO:0006974 response to DNA damage

stimulus 0.0089 9 489 79 10843 2.5 BP

GO:0003824 catalytic activity 0.0097 176 489 3368 10843 1.2 MF

Supplementary Table 34. GO enrichment of tongue sole Z genes (P value<0.01,

Fisher exact test).

GO Terms Definition P

value

Z

gen

es in

GO

All Z

gene

s

All

gen

es

in

GO

All

genes

Enric

h rate Ontology

GO:0003887 DNA-directed DNA polymerase

activity 0.0015 5 682 20 15403 5.6 MF

GO:0034061 DNA polymerase activity 0.0019 5 682 21 15403 5.4 MF

GO:0004375 glycine dehydrogenase

(decarboxylating) activity 0.0020 2 682 2 15403 22.6 MF

GO:0005185 neurohypophyseal hormone

activity 0.0020 2 682 2 15403 22.6 MF

GO:0016642

oxidoreductase activity, acting on

the CH-NH2 group of donors,

disulfide as acceptor

0.0020 2 682 2 15403 22.6 MF

GO:0016638 oxidoreductase activity, acting on

the CH-NH2 group of donors 0.0035 4 682 15 15403 6.0 MF

GO:0050896 response to stimulus 0.0040 25 682 318 15403 1.8 BP

GO:0005730 nucleolus 0.0057 2 682 3 15403 15.1 CC

GO:0006281 DNA repair 0.0059 10 682 89 15403 2.5 BP

GO:0034984 cellular response to DNA damage

stimulus 0.0059 10 682 89 15403 2.5 BP

GO:0004091 carboxylesterase activity 0.0060 5 682 27 15403 4.2 MF

GO:0006974 response to DNA damage

stimulus 0.0064 10 682 90 15403 2.5 BP

GO:0006950 response to stress 0.0077 17 682 200 15403 1.9 BP

GO:0004623 phospholipase A2 activity 0.0082 3 682 10 15403 6.8 MF

GO:0003006 reproductive developmental

process 0.0082 3 682 10 15403 6.8 BP

GO:0007548 sex differentiation 0.0082 3 682 10 15403 6.8 BP

GO:0033554 cellular response to stress 0.0093 10 682 95 15403 2.4 BP

Nature Genetics: doi:10.1038/ng.2890

Page 81: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

81

Supplementary Table 35. GO enrichment of tongue sole Z-specific (Z-S) genes (P

value<0.01, Fisher exact test).

GO Terms Definition P value

Z-S

genes

in GO

All

Z-S

genes

All

genes

in GO

All

genes

Enrich

rate Ontology

GO:0006281 DNA repair 0.00003 9 275 89 15403 5.7 BP

GO:0034984 cellular response to

DNA damage stimulus 0.00003 9 275 89 15403 5.7 BP

GO:0006974 response to DNA

damage stimulus 0.00003 9 275 90 15403 5.6 BP

GO:0033554 cellular response to

stress 0.00005 9 275 95 15403 5.3 BP

GO:0050896 response to stimulus 0.00006 17 275 318 15403 3.0 BP

GO:0051716 cellular response to

stimulus 0.00006 9 275 98 15403 5.1 BP

GO:0006259 DNA metabolic process 0.00011 12 275 183 15403 3.7 BP

GO:0004375

glycine dehydrogenase

(decarboxylating)

activity

0.00032 2 275 2 15403 56.0 MF

GO:0005185 neurohypophyseal

hormone activity 0.00032 2 275 2 15403 56.0 MF

GO:0016642

oxidoreductase activity,

acting on the CH-NH2

group of donors,

disulfide as acceptor

0.00032 2 275 2 15403 56.0 MF

GO:0007548 sex differentiation 0.00062 3 275 10 15403 16.8 BP

GO:0003006 reproductive

developmental process 0.00062 3 275 10 15403 16.8 BP

GO:0005730 nucleolus 0.00094 2 275 3 15403 37.3 MF

GO:0006950 response to stress 0.00095 11 275 200 15403 3.1 BP

GO:0005184 neuropeptide hormone

activity 0.00186 2 275 4 15403 28.0 MF

GO:0008009 chemokine activity 0.00377 3 275 18 15403 9.3 MF

GO:0042379 chemokine receptor

binding 0.00377 3 275 18 15403 9.3 MF

GO:0000003 reproduction 0.00512 3 275 20 15403 8.4 BP

GO:0022414 reproductive process 0.00512 3 275 20 15403 8.4 BP

GO:0001664 G-protein-coupled

receptor binding 0.00512 3 275 20 15403 8.4 MF

GO:0003887 DNA-directed DNA

polymerase activity 0.00512 3 275 20 15403 8.4 MF

GO:0034061 DNA polymerase activity 0.00590 3 275 21 15403 8.0 MF

GO:0006955 immune response 0.00607 6 275 92 15403 3.7 BP

Nature Genetics: doi:10.1038/ng.2890

Page 82: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

82

GO:0004659 prenyltransferase

activity 0.00629 2 275 7 15403 16.0 MF

GO:0002376 immune system process 0.00674 6 275 94 15403 3.6 BP

Supplementary Table 36. GO enrichment of orthologous Z genes between chicken

and tongue sole (P value<0.01, Fisher exact test). We aligned tongue sole genes to

chicken genes using BLASTP (E-value<1e-5) and identified reciprocal best orthologues

between tongue sole Z and chicken Z. Then we performed GO enrichment of these

orthologous genes on tongue sole Z.

GO Terms Definition P value

Orthologo

us genes

in GO

All

Orthologou

s genes

All

genes

in GO

All

genes

Enrich

rate Ontology

GO:0007548 sex differentiation 0.0001 3 127 10 15403 36.4 BP

GO:0003006

reproductive

developmental

process

0.0001 3 127 10 15403 36.4 BP

GO:0000003 reproduction 0.0006 3 127 20 15403 18.2 BP

GO:0022414 reproductive

process 0.0006 3 127 20 15403 18.2 BP

GO:0006281 DNA repair 0.0008 5 127 89 15403 6.8 BP

GO:0034984

cellular response to

DNA damage

stimulus

0.0008 5 127 89 15403 6.8 BP

GO:0006974 response to DNA

damage stimulus 0.0009 5 127 90 15403 6.7 BP

GO:0033554 cellular response to

stress 0.0011 5 127 95 15403 6.4 BP

GO:0051716 cellular response to

stimulus 0.0013 5 127 98 15403 6.2 BP

GO:0006950 response to stress 0.0014 7 127 200 15403 4.2 BP

GO:0005739 mitochondrion 0.0036 5 127 124 15403 4.9 CC

GO:0008152 metabolic process 0.0042 47 127 4017 15403 1.4 BP

GO:0008892 guanine

deaminase activity 0.0083 1 127 1 15403 121.3 MF

GO:0008410 CoA-transferase

activity 0.0083 1 127 1 15403 121.3 MF

GO:0043566 structure-specific

DNA binding 0.0085 2 127 17 15403 14.3 MF

Nature Genetics: doi:10.1038/ng.2890

Page 83: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

83

Supplementary Table 37. GO enrichment of tongue sole W genes (P value<0.01,

Fisher exact test).

GO Terms Definition P

value

W genes in

GO

All W

genes

All

genes

in GO

All

genes

Enrich

rate Ontology

GO:0015103

inorganic anion

transmembrane

transporter

activity

0.0012 4 234 31 15403 8.5 MF

GO:0016192 vesicle-mediated

transport 0.0024 7 234 120 15403 3.8 BP

GO:0015301 anion:anion

antiporter activity 0.0043 3 234 22 15403 9.0 MF

GO:0005452

inorganic anion

exchanger

activity

0.0043 3 234 22 15403 9.0 MF

GO:0015108

chloride

transmembrane

transporter

activity

0.0043 3 234 22 15403 9.0 MF

GO:0015106

bicarbonate

transmembrane

transporter

activity

0.0043 3 234 22 15403 9.0 MF

GO:0015380 anion exchanger

activity 0.0043 3 234 22 15403 9.0 MF

GO:0009266

response to

temperature

stimulus

0.0095 2 234 10 15403 13.2 BP

GO:0009408 response to heat 0.0095 2 234 10 15403 13.2 BP

Nature Genetics: doi:10.1038/ng.2890

Page 84: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

84

Supplementary Table 38. Fisher’s exact test for compensated (comp)

/uncompensated (uncomp) Z genes in tongue sole and zebra finch.

Tongue sole

Fisher’s exact test Comp Uncomp

Zebra finch(d1)

Comp 5 9 0.1007

Uncomp 17 9

Zebra finch(d25)

Comp 9 9 0.7504

Uncomp 13 9

Zebra finch(d45)

Comp 6 10 0.1064

Uncomp 16 8

Zebra finch(adult)

Comp 7 9 0.3345

Uncomp 15 9

Supplementary Table 39. Fisher’s exact test for compensated (comp)/

uncompensated (uncomp) Z genes in tongue sole and chicken.

Tongue sole

Fisher’s exact test Comp Uncomp

Chicken(heart)

Comp 19 28 0.3206

Uncomp 27 26

Chicken(brain)

Comp 19 20 0.6858

Uncomp 27 34

Chicken(liver)

Comp 24 30 0.841

Uncomp 22 24

Nature Genetics: doi:10.1038/ng.2890

Page 85: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

85

Supplementary Table 40. Sex reversal rate of different families including the

pseudomale families, normal families and temperature-induced families.

Group #Family #Sampl

e

Genetic sex

(♀/♂)

ratio of

genotype

(♀)

Physical sex

(♀/♂)

ratio of

phenotype(♀)

ratio of sex

reversal

Pseudo-male

family 3 87 35/52 0.4 1/86 0.0116 0.9714

6 87 39/48 0.448 5/82 0.057 0.872

56 54 24/30 0.44 0/54 0 1

60 67 48/19 0.71 1/66 0.0149 0.979

78 96 58/38 0.604 5/91 0.052 0.913

Total 391 204/187 12/379 0.9412

Normal family 5 58 27/31 0.465 24/34 0.413 0.111

38 44 18/26 0.409 13/31 0.295 0.277

39 102 45/57 0.441 45/57 0.441 0

57 75 36/39 0.48 31/44 0.413 0.139

65 168 61/107 0.363 50/118 0.297 0.1803

Total 447 187/260 163/284 0.14146

Temperature-indu

ced family 1 90 39/51 0.433 14/76 0.156 0.641

2 90 34/56 0.377 7/83 0.07 0.794

3 87 48/39 0.555 6/81 0.068 0.875

4 52 32/26 0.615 7/45 0.134 0.865

5 70 35/35 0.5 15/55 0.21 0.57

Total 389 188/201 49/340 0.734

Nature Genetics: doi:10.1038/ng.2890

Page 86: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

86

Supplementary Table 41. Sex ratio of offspring in the pseudomale families and

normal families.

#family #genetic

females

#total Ratio of genetic

female

normal

family

5# 27 58 46.55%

28# 75 184 40.76%

30# 29 55 52.73%

38# 49 102 48.04%

39# 68 163 41.72%

40# 49 88 55.68%

44# 67 161 41.61%

57# 80 159 50.31%

61# 100 156 64.10%

69# 30 58 51.72%

total 574 1184 48.48%

Pseudomale

family

(2010)

2# 72 156 46.16%

3# 35 87 40.23%

4# 216 384 56.25%

6# 39 87 44.83%

7# 77 152 50.66%

9# 36 57 63.20%

13# 48 96 50.00%

56# 68 138 49.28%

60# 168 236 71.19%

Pseudomale

family

(2011)

7# 32 64 50%

18# 27 58 46.60%

33# 35 79 44.30%

49# 32 60 53.30%

50# 30 60 50%

56# 50 98 51%

total 965 1812 53.26%

Nature Genetics: doi:10.1038/ng.2890

Page 87: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

87

Supplementary Table 42. Paternal inheritance of Z chromosome in three WZ

pseudomale families determined by microsatellite analysis. We selected Z specific

microsatellite markers to determine the genotype of parents (pseudo-male and normal

female) in pseudo-male families. For the pseudo-male families with the different

genotype between parents, we next determined the F1 individuals of these families. The

results of microsatellite analysis on F1 individuals showed that about 84%-90% ZW

individual is inherited from the pseudomale.

#Family genotype ZZ ZW

4# marker cyse548 cyse282 shared cyse548 cyse282 shared

homozygous 0 0 0 33 33 31

heterozygous 50 50 50 3 2 1

No Result 0 0 0 0 1 0

total 50 50 50 36 36 32

Ratio of Z inheritance of pseudomale: 90%

6# marker cyse188 cyse203 share cyse188 cyse203 share

homozygous 11 8 7 64 63 63

heterozygous 3 6 1 8 9 8

No Result 0 0 0 0 0 0

total 14 14 8 72 72 71

Ratio of Z inheritance of pseudomale: 86%

20# marker cyse054 cyse167 shared cyse054 cyse167 shared

homozygous 0 0 0 30 28 28

heterozygous 49 47 47 7 7 7

No Result 0 2 0 0 2 0

total 49 49 47 37 37 32

Ratio of Z inheritance of pseudomale: 84%

Supplementary Table 43. Characterization and expression of sex-related genes in

tongue sole. ZW_f1, whole body female (pre-); ZW_f2, whole body female (post-);

ZW_f3, ZW ovary F1; ZW_m1, ZW testis F1; ZW_m2, ZW testis F2; ZZ_m1, whole

body male; ZZ_m2, ZZ testis P.

Nature Genetics: doi:10.1038/ng.2890

Page 88: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

88

Gene name Gene ID Chr. Start End ZW_f1 ZW_f2 ZW_f3 ZW_m1 ZW_m2 ZZ_m1 ZZ_m2

amh Cse_R020243 chr2 17672760 17675590 2.56 0.89 0.11 14.44 11.23 1.08 150.3

amhr2 Cse_R019270 chr11 12363596 12365593 0 0 1.39 7.06 12.3 0 31.82

aqp1 Cse_R010791 chr20 8760186 8761902 41.56 26.83 0 2.26 201.28 49 26.62

aqp1o Cse_R010765 chr20 8766748 8768041 1.55 0 828.88 39.33 0 0.97 9.44

arx Cse_R018649 chr3 5193761 5199755 8.5 7.78 0.24 1.51 0 9.76 2

atrx Cse_R006810 chr15 10672523 10695167 35.75 35.68 34.86 44.31 62.18 42.37 44.6

cbx2 Cse_R008247 chr17 567183 571221 18.48 8.91 55.35 49.87 24 16.56 24.05

cd220 Cse_R018728 chr20 9612452 9641156 22.4 27.18 1.64 4.33 3.51 16.62 22.2

cd220 Cse_R009842 chr2 825861 860099 13.36 8.62 1.59 3.49 4.58 5.97 6.18

ctnnb1 Cse_R005015 chr13 6042544 6047398 153.56 128.19 82.98 107.41 97.91 178.64 68.2

cxcr4a Cse_R019874 chr20 5928026 5930142 4.33 1.08 0.17 5.87 8.9 3.33 1.94

cyp19a1a Cse_R012562 chr5 4551572 4554029 0 0 1.03 2.31 6.26 0.1 8.11

cyp19a1b Cse_R021368 chr6 16298965 16303214 1.86 2.62 0 0 0.61 2.33 0.78

dax1 Cse_R021479 chr16 4687139 4688694 4.41 2.55 3.5 3.69 3.34 5.09 16.96

daz1 Cse_R008770 chr18 8407158 8408584 0 0.5 20.9 188.1 312.51 0 212.57

dhh Cse_R002534 chr10 9735734 9739753 1.96 4.42 0 1.6 2.9 1.11 20.74

dkk1 Cse_R004204 chr12 11745215 11747451 2.92 2.68 0.19 1.88 19.97 4.05 0.27

dkk2 Cse_R020696 chr9 4865191 4877866 2.06 4.95 0 0 0.54 3.3 2.43

dmrt1 Cse_R022120 chrZ 8547598 8568446 0 0 0 39.56 10.84 0 71.17

emx2 Cse_R021546 chr12 424796 428876 11.45 14.73 0.7 1.19 2.15 15.14 8.08

fgf20 Cse_R006206 chr15 14686469 14687686 6.01 2.26 4.43 14.35 27.05 1.51 12.1

fgf20 Cse_R015779 chr9 12042006 12044193 0.44 0.33 0.25 0.32 0.29 0.67 0.56

figalpha Cse_R022133 chrZ 1768621 1769856 0 0 0 0.7 0.32 0 0.23

figalpha Cse_R016079 chrW 14879739 14885716 0 0 34.23 2.97 11.26 0 0

follistatin Cse_R017224 chrZ 9741019 9742894 7.66 8.37 0.8 0.76 1.6 8.2 4.54

foxl2 Cse_R021526 chr4 14132731 14133654 1.32 5.19 3.02 1.67 1.08 8.73 2.77

Nature Genetics: doi:10.1038/ng.2890

Page 89: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

89

gata4 Cse_R013761 chr7 3907315 3913489 1.7 14.67 0.33 6.79 4.64 0.43 7.98

gsdf Cse_R005415 chr14 21583496 21584102 1.54 1.93 18 318.7 327.1 0 597.83

lgr8 Cse_R009697 chr19 2894064 2924937 0.99 1.81 0.33 1.13 1.21 0.64 4.41

lhx9 Cse_R009838 chr2 3444511 3454533 23.03 18.08 1.37 18.06 13.72 42.27 21.81

lim1 Cse_R011784 chr4 2061610 2068877 11.82 4.73 0.45 0.38 1.55 11.31 1.22

patched1 Cse_R016021 chrW 14123173 14205242 3.8 4.33 0.47 0.64 0.67 0 0

patched1 Cse_R018277 Z_scaffold1319 13876 113068 2.01 4.05 0.17 0.84 0.99 3.89 3.66

pax2a Cse_R004397 chr12 459096 485494 4.46 1.67 0 0.44 0.27 3.25 2.47

pax2b Cse_R014998 chr8 20343835 20371684 3.44 1.86 0 0 0.13 3.25 0.24

pdgfrb Cse_R006353 chr15 2952085 2971257 8.65 10.32 3.55 4.36 7.52 6.61 20.47

pgd2 Cse_R011327 chr3 2242183 2243193 445.36 880.88 4.77 8.85 10.9 531.75 320.5

pod1 Cse_R018196 scaffold1282 223127 224338 4.06 6.96 57.81 369.28 196.46 0.58 346.6

rspo1 Cse_F003435 chr13 2856464 2866399 2.53 7.22 0.41 1.55 2.36 2.61 4.13

rspo2 Cse_R004637 chr13 13688188 13739758 5.47 6.56 0.48 0 0 3.06 0.35

rspo3 Cse_R005061 chr13 11648629 11653403 2 2.4 0.92 0.29 0.26 3.4 0.5

rspo4 Cse_R003828 chr11 4270345 4275630 1.81 1.36 1.73 0.44 0 5.74 2.03

sdf1a Cse_R018932 chr9 15011785 15015943 3.27 17.19 0 0 23.6 3.27 59.12

sf-1 Cse_R016234 chrW 3519616 3570157 0.91 0.68 0 0.5 1.94 0 0

sf-1 Cse_R017371 chrZ 8587385 8596522 0.64 0.48 0 3.07 5 1.27 20.47

sf-1 Cse_R005325 chr14 12590621 12593382 0 0 0 0 0.6 0 0

sox8 Cse_R007785 chr17 13747785 13749575 22.79 16.79 2.12 1.73 0.57 18.37 5.2

sox9a Cse_R008386 chr17 13941510 13944875 15.48 28.5 0 3.54 6.68 22.62 13.11

sox9b Cse_R014685 chr8 12734601 12736657 5.99 12.86 0 0.25 0.34 5.06 2.02

srd5a1 Cse_R008713 chr18 11947508 11950539 17.53 8.87 6.78 3.6 1 16.41 9.45

srd5a2 Cse_R015835 chr9 2972058 2973152 7.52 5.95 0.68 10.64 16.37 6.74 30.14

vasa Cse_R005517 chr14 8303652 8305695 0 0.35 68.01 99.6 83.28 0 135.5

Nature Genetics: doi:10.1038/ng.2890

Page 90: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

90

wnt4a Cse_R019600 chr11 10692050 10697716 0.29 0.66 0 0.21 0.19 1.32 0.74

wnt4b Cse_F007376 chr13 2091153 2093238 2.61 1.06 0 0.17 0 1.59 0

wt1a Cse_R013026 chr6 2198125 2215107 1.6 3.61 3.83 22.86 19.43 0.13 35.45

wt1b Cse_R012523 chr5 7208403 7211392 0 0.22 1.17 5.71 2.87 0 7.1

zp3a Cse_R010502 chr20 11812163 11816744 0 0 2010.07 130.69 0 0 0.68

zp3b Cse_R018460 chr17 6833492 6835390 0 0.29 4474.61 302.65 1.16 0.29 2.06

Nature Genetics: doi:10.1038/ng.2890

Page 91: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

91

Supplementary Table 44. Differentially expressed genes between normal female

ovaries and pseudomale testes. (see Excel file ‘Supplementary Table 44.xls’)

Supplementary Table 45. GO enrichment by DEGs up-regulated in normal female

ovaries.

GO_ID GO_Term GO_Class P-value Gene number

GO:2000194 regulation of female gonad development BP 3.88E-07 6

GO:0002081 outer acrosomal membrane CC 3.88E-07 5

GO:2000368 positive regulation of acrosomal vesicle exocytosis BP 3.88E-07 5

GO:0001809 positive regulation of type IV hypersensitivity BP 3.88E-07 5

GO:2000388 positive regulation of antral ovarian follicle growth BP 3.88E-07 5

GO:0010513 positive regulation of phosphatidylinositol biosynthetic process BP 3.88E-07 5

GO:2000386 positive regulation of ovarian follicle development BP 3.88E-07 5

GO:0035803 egg coat formation BP 1.15E-06 5

GO:0032190 acrosin binding MF 2.51E-06 5

GO:2000360 negative regulation of binding of sperm to zona pellucida BP 2.51E-06 5

GO:0071421 manganese ion transmembrane transport BP 4.75E-06 5

GO:0048599 oocyte development BP 7.52E-06 8

GO:0032753 positive regulation of interleukin-4 production BP 7.52E-06 5

GO:0005384 manganese ion transmembrane transporter activity MF 7.52E-06 5

GO:0090280 positive regulation of calcium ion import BP 1.22E-05 5

GO:2000344 positive regulation of acrosome reaction BP 1.76E-05 5

GO:0002455 humoral immune response mediated by circulating

immunoglobulin BP 6.49E-05 6

GO:0032236 positive regulation of calcium ion transport via store-operated

calcium channel activity BP 1.21E-04 5

GO:0002922 positive regulation of humoral immune response BP 1.21E-04 5

GO:0032729 positive regulation of interferon-gamma production BP 2.05E-04 5

Nature Genetics: doi:10.1038/ng.2890

Page 92: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

92

GO:0070528 protein kinase C signaling cascade BP 3.17E-04 5

GO:0001825 blastocyst formation BP 3.30E-04 6

GO:0005771 multivesicular body CC 3.81E-04 5

GO:0016064 immunoglobulin mediated immune response BP 4.14E-04 7

GO:0032609 interferon-gamma production BP 4.20E-04 6

GO:0045921 positive regulation of exocytosis BP 5.34E-04 6

GO:0002687 positive regulation of leukocyte migration BP 9.81E-04 6

GO:2000242 negative regulation of reproductive process BP 1.23E-03 6

GO:0002250 adaptive immune response BP 1.40E-03 9

GO:0046545 development of primary female sexual characteristics BP 1.40E-03 9

GO:2001257 regulation of cation channel activity BP 1.67E-03 6

GO:0005529 sugar binding MF 1.86E-03 10

GO:0007338 single fertilization BP 2.33E-03 6

GO:0022602 ovulation cycle process BP 2.74E-03 8

GO:0046889 positive regulation of lipid biosynthetic process BP 2.74E-03 6

GO:0046943 carboxylic acid transmembrane transporter activity MF 3.54E-03 9

GO:0061039 ovum-producing ovary development BP 5.07E-03 7

GO:0048924 posterior lateral line neuromast mantle cell differentiation BP 5.07E-03 2

GO:0015291 secondary active transmembrane transporter activity MF 5.55E-03 12

GO:0001817 regulation of cytokine production BP 7.93E-03 12

GO:0008037 cell recognition BP 8.00E-03 7

GO:0002819 regulation of adaptive immune response BP 8.48E-03 6

GO:0015293 symporter activity MF 9.63E-03 10

GO:0017157 regulation of exocytosis BP 9.63E-03 7

GO:0010389 regulation of G2/M transition of mitotic cell cycle BP 9.96E-03 4

GO:0034381 plasma lipoprotein particle clearance BP 9.96E-03 4

GO:0002443 leukocyte mediated immunity BP 1.10E-02 8

GO:0001541 ovarian follicle development BP 1.10E-02 6

Nature Genetics: doi:10.1038/ng.2890

Page 93: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

93

GO:0034764 positive regulation of transmembrane transport BP 1.10E-02 6

GO:0046890 regulation of lipid biosynthetic process BP 1.13E-02 7

GO:0002526 acute inflammatory response BP 1.15E-02 6

GO:0002703 regulation of leukocyte mediated immunity BP 1.15E-02 6

GO:0046486 glycerolipid metabolic process BP 1.17E-02 12

GO:0051928 positive regulation of calcium ion transport BP 1.20E-02 6

GO:0005615 extracellular space CC 1.35E-02 19

GO:0042102 positive regulation of T cell proliferation BP 1.47E-02 5

GO:0002699 positive regulation of immune effector process BP 1.47E-02 6

GO:0045017 glycerolipid biosynthetic process BP 1.47E-02 8

GO:0045137 development of primary sexual characteristics BP 1.59E-02 10

GO:0022804 active transmembrane transporter activity MF 1.65E-02 14

GO:0002252 immune effector process BP 1.72E-02 11

GO:0002697 regulation of immune effector process BP 1.85E-02 8

GO:0051897 positive regulation of protein kinase B signaling cascade BP 1.99E-02 5

GO:0005215 transporter activity MF 2.24E-02 33

GO:0022857 transmembrane transporter activity MF 2.32E-02 28

GO:0015101 organic cation transmembrane transporter activity MF 2.35E-02 3

GO:0022891 substrate-specific transmembrane transporter activity MF 2.68E-02 26

GO:0030246 carbohydrate binding MF 2.82E-02 13

GO:0042461 photoreceptor cell development BP 2.82E-02 5

GO:0050727 regulation of inflammatory response BP 2.90E-02 7

GO:0010876 lipid localization BP 3.01E-02 10

GO:0015171 amino acid transmembrane transporter activity MF 4.08E-02 6

GO:0050900 leukocyte migration BP 4.14E-02 9

GO:0004185 serine-type carboxypeptidase activity MF 4.25E-02 2

GO:0022892 substrate-specific transporter activity MF 4.28E-02 28

GO:0003006 developmental process involved in reproduction BP 4.28E-02 13

Nature Genetics: doi:10.1038/ng.2890

Page 94: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

94

GO:0006869 lipid transport BP 4.88E-02 9

GO:0006641 triglyceride metabolic process BP 4.88E-02 6

GO:0043491 protein kinase B signaling cascade BP 4.88E-02 6

Supplementary Table 46. GO enrichment by DEGs up-regulated in pseudomale

testes.

GO_ID GO_Term GO_Class P-value Gene Number

GO:0005874 microtubule CC 8.80E-07 27

GO:0005929 cilium CC 1.33E-06 20

GO:0006928 cellular component movement BP 7.88E-03 58

GO:0044463 cell projection part CC 7.88E-03 36

GO:0004111 creatine kinase activity MF 7.88E-03 4

GO:0005827 polar microtubule CC 7.88E-03 4

GO:0030215 semaphorin receptor binding MF 7.88E-03 5

GO:0009434 microtubule-based flagellum CC 7.88E-03 6

GO:0005930 axoneme CC 8.93E-03 9

GO:0007286 spermatid development BP 1.19E-02 9

GO:0016775 phosphotransferase activity, nitrogenous group as acceptor MF 1.20E-02 6

GO:0044430 cytoskeletal part CC 1.20E-02 51

GO:0035639 purine ribonucleoside triphosphate binding MF 1.20E-02 80

GO:0008054 cyclin catabolic process BP 1.20E-02 4

GO:0005856 cytoskeleton CC 1.20E-02 66

GO:0032555 purine ribonucleotide binding MF 1.22E-02 81

GO:0007283 spermatogenesis BP 1.22E-02 18

GO:0019861 flagellum CC 1.32E-02 7

GO:0001831 trophectodermal cellular morphogenesis BP 1.74E-02 4

GO:0031463 Cul3-RING ubiquitin ligase complex CC 1.74E-02 4

GO:0072014 proximal tubule development BP 1.74E-02 4

GO:0019953 sexual reproduction BP 1.75E-02 25

GO:0005509 calcium ion binding MF 1.85E-02 37

GO:0030332 cyclin binding MF 1.89E-02 5

GO:0042995 cell projection CC 1.89E-02 55

Nature Genetics: doi:10.1038/ng.2890

Page 95: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

95

GO:0030240 skeletal muscle thin filament assembly BP 2.13E-02 4

GO:0008603 cAMP-dependent protein kinase regulator activity MF 2.21E-02 5

GO:0005524 ATP binding MF 2.38E-02 66

GO:0032559 adenyl ribonucleotide binding MF 2.38E-02 67

GO:0006600 creatine metabolic process BP 2.38E-02 4

GO:0000090 mitotic anaphase BP 2.38E-02 4

GO:0072019 proximal convoluted tubule development BP 2.86E-02 3

GO:0015293 symporter activity MF 2.95E-02 14

GO:0001829 trophectodermal cell differentiation BP 2.95E-02 5

GO:0035024 negative regulation of Rho protein signal transduction BP 3.47E-02 4

GO:0048870 cell motility BP 3.47E-02 44

GO:0004467 long-chain fatty acid-CoA ligase activity MF 3.47E-02 4

GO:0001822 kidney development BP 3.47E-02 16

GO:0001539 ciliary or flagellar motility BP 3.47E-02 5

GO:0044441 cilium part CC 3.47E-02 9

GO:0004053 arginase activity MF 3.47E-02 2

GO:0010963 regulation of L-arginine import BP 3.47E-02 2

GO:0035379 carbon dioxide transmembrane transporter activity MF 3.47E-02 2

GO:0072230 metanephric proximal straight tubule development BP 3.47E-02 2

GO:0072220 metanephric descending thin limb development BP 3.47E-02 2

GO:0020003 symbiont-containing vacuole CC 3.47E-02 2

GO:0085018 maintenance of symbiont-containing vacuole via substance

secreted by host

BP 3.47E-02 2

GO:0072232 metanephric proximal convoluted tubule segment 2

development

BP 3.47E-02 2

GO:0017111 nucleoside-triphosphatase activity MF 4.10E-02 40

GO:0035085 cilium axoneme CC 4.17E-02 6

GO:0032501 multicellular organismal process BP 4.17E-02 186

GO:0015291 secondary active transmembrane transporter activity MF 4.17E-02 16

GO:0015630 microtubule cytoskeleton CC 4.69E-02 35

GO:0016459 myosin complex CC 4.69E-02 8

GO:0043292 contractile fiber CC 4.99E-02 13

Nature Genetics: doi:10.1038/ng.2890

Page 96: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

96

Supplementary Table 47. Sex-biased GO.

GO_ID GO_Term GO_Class ovary_expression testis_expression Pvalue

GO:0007340 acrosome reaction BP 62.09 2.51 7.08E-09

GO:0009988 cell-cell recognition BP 41.96 2.23 1.22E-07

GO:0030332 cyclin binding MF 3.65 35.29 4.32E-05

GO:0070528 protein kinase C signaling cascade BP 20.63 2.19 5.21E-05

GO:0005771 multivesicular body CC 36.55 4.71 2.19E-04

GO:0008603 cAMP-dependent protein kinase regulator

activity

MF 2.34 16.96 3.53E-04

GO:0000313 organellar ribosome CC 104.18 17.36 1.23E-03

GO:0005761 mitochondrial ribosome CC 104.18 17.36 1.23E-03

GO:0002920 regulation of humoral immune response BP 15.81 2.86 2.03E-03

GO:0001539 ciliary or flagellar motility BP 2.38 12.15 3.30E-03

GO:0035085 cilium axoneme CC 2.41 12.08 3.61E-03

GO:0006270 DNA-dependent DNA replication initiation BP 35.76 7.32 4.24E-03

GO:0009994 oocyte differentiation BP 29.48 6.15 4.72E-03

GO:0048599 oocyte development BP 29.48 6.15 4.72E-03

GO:0005930 axoneme CC 2.51 11.47 6.19E-03

GO:0032633 interleukin-4 production BP 18.49 4.10 6.59E-03

GO:0044447 axoneme part CC 2.35 10.26 7.78E-03

GO:0070206 protein trimerization BP 2.45 10.49 8.74E-03

GO:0021591 ventricular system development BP 3.00 12.74 9.13E-03

GO:0043049 otic placode formation BP 1.43 5.99 9.99E-03

GO:0009434 microtubule-based flagellum CC 3.95 16.43 1.02E-02

GO:0044241 lipid digestion BP 1.42 5.91 1.03E-02

GO:0006364 rRNA processing BP 91.74 22.57 1.14E-02

GO:0045921 positive regulation of exocytosis BP 16.15 4.01 1.20E-02

GO:0006271 DNA strand elongation involved in DNA

replication

BP 44.79 11.15 1.22E-02

GO:0042254 ribosome biogenesis BP 96.38 24.11 1.25E-02

GO:0060986 endocrine hormone secretion BP 2.01 7.95 1.30E-02

GO:0030916 otic vesicle formation BP 1.38 5.47 1.31E-02

GO:0022616 DNA strand elongation BP 40.80 10.40 1.37E-02

GO:0016460 myosin II complex CC 3.52 13.54 1.51E-02

GO:0016775 phosphotransferase activity, nitrogenous

group as acceptor

MF 2.69 10.33 1.51E-02

Nature Genetics: doi:10.1038/ng.2890

Page 97: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

97

GO:0044452 nucleolar part CC 64.56 16.87 1.55E-02

GO:0015669 gas transport BP 3.22 12.31 1.56E-02

GO:0016072 rRNA metabolic process BP 81.15 22.09 1.90E-02

GO:0032350 regulation of hormone metabolic process BP 2.71 9.62 2.21E-02

GO:0000387 spliceosomal snRNP assembly BP 111.25 31.60 2.32E-02

GO:0002076 osteoblast development BP 2.10 7.38 2.36E-02

GO:0009125 nucleoside monophosphate catabolic

process

BP 2.83 9.86 2.42E-02

GO:0009651 response to salt stress BP 1.87 6.41 2.63E-02

GO:0002455 humoral immune response mediated by

circulating immunoglobulin

BP 8.70 2.57 2.81E-02

GO:0005201 extracellular matrix structural constituent MF 1.83 6.16 2.86E-02

GO:0040023 establishment of nucleus localization BP 4.85 16.30 2.87E-02

GO:0030553 cGMP binding MF 1.36 4.55 2.95E-02

GO:0048477 oogenesis BP 21.71 6.67 3.32E-02

GO:0007512 adult heart development BP 1.73 5.52 3.65E-02

GO:2001259 positive regulation of cation channel activity BP 18.04 5.67 3.69E-02

GO:0030532 small nuclear ribonucleoprotein complex CC 112.76 35.57 3.75E-02

GO:0032649 regulation of interferon-gamma production BP 10.28 3.26 3.82E-02

GO:0032609 interferon-gamma production BP 9.72 3.10 3.92E-02

GO:0015172 acidic amino acid transmembrane

transporter activity

MF 1.78 5.57 3.94E-02

GO:0005313 L-glutamate transmembrane transporter

activity

MF 1.78 5.57 3.94E-02

GO:0017158 regulation of calcium ion-dependent

exocytosis

BP 11.79 3.78 4.05E-02

GO:0030551 cyclic nucleotide binding MF 1.77 5.50 4.08E-02

GO:0008173 RNA methyltransferase activity MF 27.25 8.78 4.11E-02

GO:0030315 T-tubule CC 1.55 4.77 4.31E-02

GO:0050886 endocrine process BP 2.10 6.45 4.31E-02

GO:0002437 inflammatory response to antigenic

stimulus

BP 12.63 4.12 4.32E-02

GO:0044058 regulation of digestive system process BP 1.94 5.95 4.37E-02

GO:0001502 cartilage condensation BP 1.65 5.03 4.44E-02

GO:0008033 tRNA processing BP 28.55 9.37 4.44E-02

GO:0051537 2 iron, 2 sulfur cluster binding MF 25.53 8.38 4.45E-02

GO:0071599 otic vesicle development BP 1.88 5.73 4.47E-02

GO:0071600 otic vesicle morphogenesis BP 1.87 5.67 4.51E-02

GO:0032201 telomere maintenance via

semi-conservative replication

BP 37.27 12.30 4.55E-02

GO:0032011 ARF protein signal transduction BP 2.15 6.49 4.65E-02

Nature Genetics: doi:10.1038/ng.2890

Page 98: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

98

GO:0007159 leukocyte cell-cell adhesion BP 2.29 6.90 4.66E-02

GO:0060174 limb bud formation BP 1.32 3.96 4.71E-02

GO:0019861 flagellum CC 4.78 14.35 4.75E-02

GO:0006261 DNA-dependent DNA replication BP 31.62 10.55 4.78E-02

GO:0005086 ARF guanyl-nucleotide exchange factor

activity

MF 2.31 6.87 4.93E-02

GO:0034470 ncRNA processing BP 43.42 14.67 4.95E-02

GO:0000146 microfilament motor activity MF 2.33 6.93 4.98E-02

Supplementary Table 48. Metabolism Pathway (KEGG) enriched by DEGs between

female ovaries and pseudomale testes.

KO_ID Pvalue Gene Num Drscription

ko01100 1.00E-02 43 Metabolic pathways

ko01110 3.79E-02 13 Biosynthesis of secondary metabolites

ko04020 5.08E-03 11 Calcium signaling pathway

ko04520 3.45E-04 9 Adherens junction

ko04114 4.77E-03 8 Oocyte meiosis

ko04910 6.16E-03 8 Insulin signaling pathway

ko04972 5.25E-04 8 Pancreatic secretion

ko05412 2.22E-03 8

Arrhythmogenic right ventricular

cardiomyopathy (ARVC)

ko04540 4.03E-03 7 Gap junction

ko04723 1.30E-02 7 Retrograde endocannabinoid signaling

ko04724 2.61E-02 7 Glutamatergic synapse

ko04915 4.94E-03 7 Estrogen signaling pathway

ko04916 8.64E-03 7 Melanogenesis

ko05410 1.20E-02 7 Hypertrophic cardiomyopathy (HCM)

ko05414 2.15E-02 7 Dilated cardiomyopathy

ko03320 4.14E-03 6 PPAR signaling pathway

ko04713 4.57E-02 6 Circadian entrainment

ko04918 5.26E-03 6 Thyroid hormone synthesis

ko04970 4.67E-03 6 Salivary secretion

ko04920 2.03E-02 5 Adipocytokine signaling pathway

ko04971 1.36E-02 5 Gastric acid secretion

ko04974 2.03E-02 5 Protein digestion and absorption

ko00010 3.13E-02 4 Glycolysis / Gluconeogenesis

ko00561 1.77E-02 4 Glycerolipid metabolism

ko00601 1.97E-03 4

Glycosphingolipid biosynthesis - lacto and

neolacto series

ko04070 4.18E-02 4 Phosphatidylinositol signaling system

Nature Genetics: doi:10.1038/ng.2890

Page 99: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

99

ko04742 3.24E-03 4 Taste transduction

ko04913 4.18E-02 4 Ovarian steroidogenesis

ko04961 8.57E-03 4

Endocrine and other factor-regulated calcium

reabsorption

ko04962 3.13E-02 4 Vasopressin-regulated water reabsorption

ko00051 2.46E-02 3 Fructose and mannose metabolism

ko00052 3.31E-02 3 Galactose metabolism

ko04964 5.27E-03 3 Proximal tubule bicarbonate reclamation

ko04973 2.08E-02 3 Carbohydrate digestion and absorption

ko00512 4.30E-02 2 Mucin type O-Glycan biosynthesis

ko00981 3.78E-02 1 Insect hormone biosynthesis

Supplementary Table 49. Data production and alignment statistic of smRNA-Seq.

Samples raw reads used reads aligned aligned rate

(%)

used for miRNA

prediction

miRNA reads

number

ZW ovary F1 12,598,07

2

11,338,592 8,169,725 72.05 1,777,571 630,479

ZW ovary F2 17,656,29

5

16,414,960 10,686,452 65.10 3,023,837 872,401

ZW testis F1 12,377,77

3

11,487,874 7,871,839 68.52 3,125,824 1,873,452

ZW testis F2 18,804,13

9

17,476,137 12,072,738 69.08 3,894,509 1,380,666

ZZ testis P 19,931,62

5

19,182,781 12,868,705 67.08 7,826,868 3,964,030

Supplementary Table 50. Differentially expressed miRNAs between female and

reversed male.

ID name ovary_exp testis_exp P-value

m0058 - 0.00 55.77 2.11E-02

m0059 - 0.00 37.66 2.94E-02

m0064 - 364.80 0.00 1.40E-02

m0108 miR-724 0.00 445.44 4.29E-03

m0120 miR-200a 0.00 224.53 1.02E-02

m0182 - 306.12 0.00 1.67E-02

m0212 miR-724 0.00 446.16 4.28E-03

m0248 miR-30c 0.00 170.93 1.37E-02

Nature Genetics: doi:10.1038/ng.2890

Page 100: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

100

m0355 miR-7 0.00 816.27 1.44E-03

m0426 - 114.20 0.00 2.19E-02

m0435 miR-455 517.07 0.00 9.60E-03

m0753 - 0.00 47.08 2.35E-02

m0756 miR-125a 0.00 1176.24 6.47E-04

m0766 miR-30c 172.88 0.00 1.94E-02

m0837 miR-132 17.45 344.04 6.06E-02

m0838 miR-212 0.00 813.38 1.45E-03

m0875 miR-124 0.00 34.04 3.38E-02

m1020 miR-181b 0.00 175.28 1.33E-02

m1024 miR-101 0.00 5041.77 1.11E-05

m1123 - 0.00 20.28 8.86E-02

m1124 miR-153 0.00 19.56 9.55E-02

m1140 let-7 505.96 3879.29 5.35E-02

m1148 - 0.00 45.63 2.41E-02

m1151 miR-132 7.93 100.68 9.99E-02

m1260 - 0.00 24.63 5.99E-02

m1306 let-7 0.00 357389.12 7.93E-07

m1344 let-7i 0.00 4500.00 1.58E-05

m1367 - 0.00 68.81 1.98E-02

m1462 let-7i 0.00 4500.00 1.58E-05

m1503 let-7 211074.44 0.00 1.29E-07

m1533 - 39.65 1531.87 1.40E-02

m1567 - 0.00 203.53 1.13E-02

m1568 - 0.00 157.89 1.49E-02

m1595 - 870.77 0.00 5.05E-03

m1599 - 0.00 52.87 2.17E-02

m1614 - 870.77 0.00 5.05E-03

m1628 - 0.00 46.35 2.38E-02

m1633 - 0.00 63.01 2.02E-02

m1638 miR-22 0.00 7.97 2.43E-02

m1672 miR-219 20.92 0.00 7.42E-07

m1635 miR-455 517.07 0.00 9.60E-03

m1652 miR-130 0.00 1379.77 4.50E-04

Nature Genetics: doi:10.1038/ng.2890

Page 101: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

101

Supplementary Table 51. Comparison of assembled scaffolds and independently

finished 4 BACs of tongue sole genome. The scaffolds were aligned with the BACs

using BLASTN (E value<1e-20). The alignment blocks were then chained along the

BACs by SOLAR and also with manual confirmation.

BAC ID BAC Len.(bp) Coverage # of scaffold Scaffold Len(bp)

zscgax 127,603 0.97 1 1,146,347

zscgbx 145,278 0.97 1 1,989,243

zscgdx 165,379 0.96 1 1,989,243

zscgex 107,567 0.97 1 618,700

Supplementary Table 52. Comparison of assembled scaffolds and ESTs. We aligned

ESTs to scaffolds using BLAT with default parameters and chose the best hit for each

one.

Len.(bp) Total Total match >=50% coverage >=90% coverage

# % # % # %

All 14,687 14,685 99.99 14,610 99.48 14,365 97.81

>200 14,629 14,622 99.95 14,547 99.44 14,307 97.8

>500 11,130 11,112 99.84 11,068 99.44 10,866 97.63

>1000 1 1 100 1 100 1 100

m1667 - 0.00 159.34 1.47E-02

M1612 miR-27 486.93 60.12 0

m1704 miR-455 517.07 0.00 9.60E-03

m1713 miR-7 55.51 580.15 8.91E-02

m1752 - 880.28 0.00 4.97E-03

Nature Genetics: doi:10.1038/ng.2890

Page 102: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

102

Supplementary Table 53. Enrichment of GO terms in expanded gene families of

tongue sole genome. The gene enrichment was compared to all the reference genes.

P-values were calculated by Fisher-exact test, and adjusted by FDR.

GO Term Description GO space Gene

number P value

P value

(Adjusted)

GO:0007017 microtubule-based process BP 11 1.41E-12 5.85E-10

GO:0016051 carbohydrate biosynthetic process BP 7 1.65E-08 3.91E-06

GO:0007156 hurnilic cell adhesion BP 11 2.09E-08 4.34E-06

GO:0008378 galactosyltransferase activity MF 6 1.35E-06 2.04E-04

GO:0006468 protein phosphorylation BP 24 1.69E-06 2.34E-04

GO:0005272 sodium channel activity MF 4 3.36E-06 3.98E-04

GO:0008146 sulfotransferase activity MF 7 3.44E-05 0.003

GO:0043687 post-translational protein

modification BP 6 5.02E-05 0.005

GO:0008076 voltage-gated potassium channel

complex CC 7 1.03E-04 0.009

GO:0005230 extracellular ligand-gated ion

channel activity MF 6 1.98E-04 0.016

GO:0006486 protein glycosylation BP 6 2.98E-04 0.022

Note:BP,Biological Process; MF,Molecular Function; CC,Cellular Component.

Supplementary Table 54. Enrichment of GO terms in contracted gene families of

tongue sole genome. The gene enrichment was compared to all the reference genes.

P-values were calculated by Fisher-exact test, and adjusted by FDR.

GO Term Description GO space Gene

number P value

P value

(Adjusted)

GO:0007186 G-protein coupled receptor protein

signaling pathway BP 37 0.00E+00 0

GO:0004984 olfactory receptor activity MF 21 1.18E-34 9.794E-32

GO:0016021 integral to membrane CC 37 5.16E-11 2.8552E-08

GO:0000786 nucleosome CC 7 7.28E-09 3.0212E-06

GO:0008270 zinc ion binding MF 35 2.74E-08 7.58067E-06

GO:0006334 nucleosome assembly BP 7 2.73E-08 9.0636E-06

GO:0019882 antigen processing and presentation BP 3 3.54E-05 0.008

GO:0020037 heme binding MF 6 1.82E-04 0.038

GO:0004497 monooxygenase activity MF 5 2.38E-04 0.039

GO:0005525 GTP binding MF 12 2.34E-04 0.043

Nature Genetics: doi:10.1038/ng.2890

Page 103: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

103

Supplementary Table 55. Oligonucleotide primers used in the study.

Gene name Tm (℃) Primer Sequence 5’-3’

pepd 59.5 F: CAAAATCGGTTGTGGTGCTG

R: CTCCCATCCATGTGGCGTA

fbn1 61 F: TTCCTGGCTCCATCATTTCGT

R: GGCTCCTCTCACACTCGTCATC

mgam 61 F: GTCCCCATCAGCGACACGTT

R: GCAAAGACCAGGGGTGCTATC

ace2 60 F: TGGTGTGTCATCCCACTGCG

R: GGTAGGACAGGTTGCGATAAGC

itih2 61 F: GGAGATCCGACTGTGGGTGAG

R: GCAGCATCGTGGTTGGCATA

gda 61 F: TTCCTGCGGAGTCTCGCTTTA

R: CGGTGGTGGTTCCATTTCTCA

mep1b 60 F: ATAACAGCACCAACCCTAACGG

R:GACCCCTCAAATACAACACGAAA

hnf4a 60 F:AGAGCAGAAATCAGCCACTATCGT

R: ACACAGCCGTTACCTAAAAGCAG

cpb1 61 F: CGGCTACGACTACACCCACAAG

R: TTCTGCGGATGAAGTCGGC

cdhr2 60 F: TCTCGTTAGGGCTGAGGACTTG

R: GTCTATTACGAACACATCCACGGT

slc15a2 61 F: CACCCATCCTCGGAGCTCTTA

R: TCAAACTGGTCTCCTCCAAACG

cp 60 F: AGCCTCTATATGAGCTTCGGGA

R: CATTATGGTGTGGTAGGACCGTT

tmem67 60 F: ACCAGCAGTACAGTGGACGGTT

R: TGTCCTTGCTGCGTTCTCG

xdh 60 F: TGCTGCAAGAACGGAGGTAAC

R: ACTGAACGCTCCCCACGAA

cd74 61 F: GCCTCACTTCAATGACACCTTCC

R: CCTTCTGGCACTTGGTCTTCAC

rh1 61 F: GGGGTCGTCAGGAATCCGTAT

R: TCGTGGTGAATCCTCCGAAGA

lws1 61 F: CCTGCCACTTTCAATAATCCTCC

R: GCAGGCAAAGAAGGTGTAAGGTC

Nature Genetics: doi:10.1038/ng.2890

Page 104: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

104

rh2 61 F: GCACCGGAACACCCATCAA

R: CCACCAGAGACCACAGAGCAAC

dmrt1(RT-PCR) 60 F: AAGCAGCGCCCTGACTACAC

R: GCCCTTCAACGGAGACACG

follistatin 60 F:GCGGTGCCAGGTGCTCTA

R: GGGTCCACAGTCAACATTATCG

patched1 60 F: GCTTCAGAGCGTGGCGAC

R: ACCACTGGCTACACGGATGAC

sf-1 60 F:GCTGCCAGTATTGCCGCT

R: GTGATGGTTGGTTGCCCTCT

neurl3 59 F:CTGGTGTTTAGCAGCCGTCCT

R: CCAGAACTCCAGCACTGACCC

dmrt1(BS-PCR,1# exon) 48 F: GGTTAAATATTGTTATAGTAGTAGTAG

R: ACRATTACCTACACCACCA

dmrt1(BS-PCR,1# intron) 50 F: GTTATTGTGATTGGAGGGA

R: ATTATAATAAATTACTCTACAACAT

dmrt1(DM domain) 60 DM-F: AAGCAGCGCCCTGACTACAC

DM-R: GCCCTTCAACGGAGACACG

gas8 45 F: GCTCAGGACCACAACA

R: TTTCCAGGTGCTTCAT

nme5 49 F: TCTTTCCCAGGTTGATTAT

R: TTTGCTCTGAGGCTTTT

ropn1l 52 F: TGCCCAACATCCTCAAA

R: GCAGCGGTTCTCCATTA

tekt1 48 F: TCCAAGAAACGGACAAA

R: TCCAGAGCCTTCAACAC

plcz1 46 F: ATCTACCAAGCCCAAAT

R: TCCTCACCCTCATCTGT

tbpl1 55 F: CACAGCCACAATCTCATCG

R: GAACGGCACGGAACAAA

spag6 51 F: TACAGACCTGGAAACCC

R: CGTCGTGCGAGAAGAT

gal3st1 47 F: TCCTCATAGCGGAACA

R: TCAGACACGGACGAAC

dnajb13 47 F: TCCTCGGTTTGTTAGAG

R: ATGTCGTTGATGGGTAT

cldn11 54 F: ACCACTGCGTCTCCCTG

R: ACCACCGTGCGTTTGTT

gpr64 53 F: CTGCTGGCTGCGTAATG

Nature Genetics: doi:10.1038/ng.2890

Page 105: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

105

R: GCGAACAGAGCGAAGC

aqp1 54 F: GAATAGCAGCAGCCCTCA

R: TGTCATCAGCAGCATCCC

β-actin 61 F: CCTTGGTATGGAGTCCTGTGGC

R: TCCTTCTGCATCCTGTCGGC

Nature Genetics: doi:10.1038/ng.2890

Page 106: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

106

Supplementary Note

DNA library construction and sequencing

One adult female and one adult male of tongue sole were selected for whole genome

shotgun sequencing and were temporarily maintained at 22 C in the facilities of the Key

Laboratory of Sustainable Development of Marine Fisheries in Qingdao. Fish blood cells

were taken from the selected female and male using sterile injectors with pre-added

anticoagulant solutions (0.5 M EDTA, pH8.0). High quality genomic DNA suitable for

construction of the large fragment insert libraries (2k~40Kb) was extracted from the

blood cells using Puregene Tissue Core Kit A (Qiagen, Maryland, USA). Fifteen

paired-end libraries for the female and 11 paired-end libraries for the male (170bp~40Kb)

were then constructed using the Illumina standard operating procedure. Paired-end

sequencing was performed on an Illumina Hiseq2000 for each library.

Genome assembly

Raw data of 91.35 Gb and 67.86 Gb were obtained for the female and male individual,

respectively. Before genome assembly, we filtered out artificial and low-quality reads to

obtain a usable reads set containing 857.5 M and 730.0 M reads, representing 63.86 Gb

and 46.67 Gb of data, for the female and male individual, respectively. The genome

coverage was 212. In addition, we corrected sequencing errors for the 17-mers with a

frequency lower than four using a method described in a previous study1. We then

assembled the reads into contigs and scaffolds to build the male and female genomes

using SOAPdenovo2.

Identification of Z and W-linked scaffolds

With the same sequencing coverage, the depth of Z-linked scaffolds in the

non-pseudoautosomal region (non-PAR) in the female is expected to be half of that in

male Z chromosome, in female autosomes and in male autosomes, respectively

Nature Genetics: doi:10.1038/ng.2890

Page 107: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

107

(Zf=1/2Zm=1/2Af=1/2Am). Accordingly, we identified 26 and 126 Z-linked scaffolds in

the male and female assembly, respectively. For the W-linked scaffolds in the female

assembly only, we expected that these scaffolds should not be covered by male reads, and

that the sequencing depth should be about half of the average value of autosomes by

female reads. Using this method, we identified 306 W-linked scaffolds in the non-PAR,

representing 16.4 Mb with a scaffold N50 of 128 Kb. Considering the interference of W

reads and the high quality assembly in the male genome relative to the female genome for

Z assembly (scaffold N50 of 1,305 Kb versus 357 Kb), we chose the scaffolds from the

male assembly as the Z-linked scaffolds in the final version. For other scaffolds,

representing autosomes and W, and other undetected Z-linked scaffolds (if any); we used

the female version. Ultimately, the final genome had contig and scaffold N50 sizes of 26

Kb and 867 Kb, respectively.

Genetic map construction, ordering and chromosomal assignment of

scaffolds

An F1 cross panel between a wild male and a cultured female with 92 offspring was used

for simple sequence repeat (SSR) genetic mapping. Another mapping population

consisting of 216 individuals was used for high-resolution SNP genetic map using

RAD-Seq. Linkage analysis was performed using JoinMap 4.03 with a logarithm of odds

(LOD) score of 3 for grouping. Sex-specific genetic linkage maps were constructed

independently for each parent using informative markers. In a few cases, some markers

were discarded during the mapping stage where their presence caused inconsistencies in

the map. Finally, the two genetic linkage maps were constructed for the tongue sole

comprising 942 SSR markers and 12,142 SNP markers, respectively. We used BLASTN

(E-value <1E-5, identity ≥95%, and aligning rate >50%) to map SSR markers to scaffolds.

Of the 26 Z-linked scaffolds that were identified by depth comparison, 24 resided on the

Z chromosome. In addition, two other small scaffolds (243 Kb and 399 Kb) with 1:1

sequencing depth rate between male and female were located distal to the Z chromosome.

Nature Genetics: doi:10.1038/ng.2890

Page 108: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

108

Furthermore, these two scaffolds were orthologous with medaka chromosome 9. We thus

inferred that they are sequences of the PAR of the tongue sole sex chromosomes. For the

W chromosome, we ordered 306 W-linked scaffolds in a pseudo-W chromosome based on

their gene synteny with the Z chromosome. We linked scaffolds onto chromosomes with a

string of 100 ‘N’s representing the gap between two adjacent scaffolds. In total, 944

scaffolds comprising 445 Mb (93.3% of scaffolds in length) were anchored to 22

chromosomes, representing the 20 autosomes, Z and W.

Validation of Z chromosome

To further verify the Z chromosome, all the continuous sequences originating from the

exons of genes in the Z-linked scaffolds that did not match with other genomic regions,

were selected to design primers that would not amplify similar sequences in the genome.

Finally, Primer Premier 5 was used to design the 1-4 primers from compatible exons in

specific sequence of 24 scaffolds and quantitative PCR (qPCR) was used to confirm the

depth ratio between the male and the female. Briefly, the three sample DNAs of male and

female were mixed at a ratio of 1:1:1, and then qPCR was conducted on an Applied

Biosystems 7500 Real-Time PCR System following the standard protocol (SYBR®

Premix Ex Taq™ (Takara)). β-actin was used for normalization and the 2-ΔΔ

CT method

was selected as the relative quantification calculation method. The result shows that the

ratio of ZZ and ZW in 51 genes for 22 scaffolds was approximately 2, suggesting that

these scaffolds are located on the Z chromosome. In addition, the ratio of four genes from

the PAR two scaffolds was almost equal to 1.

Genome evaluation

To assess assembly quality, we analyzed gene region coverage and assembly accuracy.

Firstly, we compared scaffolds with four BACs independently sequenced using Sanger

sequencing technology to assess the large-scale accuracy of the assembly. A fast search

was performed to identify counterparts in scaffolds for each BAC using MUMer4. We

then compared each sequence pair (BAC and scaffold) in detail using BLASTN5 (E value

Nature Genetics: doi:10.1038/ng.2890

Page 109: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

109

<1e-20), and used sorting out local alignment result (SOLAR) to chain local alignments

into global results. Up to 96.9% of the sequences of the BACs were well covered by

one-to-one alignments to scaffolds, with few misassemble errors. In addition, the 14,687

ESTs of tongue sole from three samples (ovary, testis and immune tissues) were aligned

to the tongue sole assembly using BLAT6, with a cutoff of identity ≥95% and aligning

rate ≥50%7. The result shows that 99.48% (14,610/14,687) of the ESTs could be detected

in the scaffolds, with a cutoff of identity ≥95% and aligning rate ≥50%. Using a more

stringent cutoff (identity ≥95% and aligning rate ≥90%), 14,365 (97.81%) of them were

still detected. While this likely overestimates coverage because it avoids repeated

sequences, it remains an important indicator of the representation of the transcribed

sequences in the assembly. These data indicate that our assembly has good coverage and

completeness for gene regions.

We used a Kmer-based method to estimate the genome size as described in a previous

study2. From the distribution curve of depth-frequency by 17-mer statistics, we calculated

the genome size as 545 Mb for the female and 495 Mb for the male, according to the

formula: G=kmer_num/kmer_depth. Given the presence of sequencing errors, we expect

that the 17-mer depth an underestimate, and consequently the tongue sole genome size

should be slightly smaller than 545 Mb for the female and 495 Mb for the male. This

estimated genome size indicated that the assembled contigs and scaffolds covered about

83% and 88% of the whole genome, respectively.

We measured the GC content in 500 bp non-overlapping sliding windows along the

genome and filtered out windows containing over 50% Ns. For each window, we divided

the number of bases that were either C or G by the total number of bases, not counting

any ambiguous bases. The average G+C content of tongue sole is consistent at around

40.8%, which is approximately 5% lower than Takifugu (45.5%) and Tetraodon (46.4%),

4% higher than zebrafish (36.8%), and almost equal to medaka (40.5%) and human

(40.9%). The GC content of the tongue sole gene-coding regions is 52.7%, which is about

12% higher than the GC content throughout its whole genome.

Nature Genetics: doi:10.1038/ng.2890

Page 110: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

110

Repeat annotation

Construction of a de novo transposable element (TE) library

We used two software packages, PILER-DF8 and RepeatScout

9, to construct a de novo

TE library for the tongue sole genome. We ran the software with default parameters

independently, filtered out too short (<100bp) and gap ‘N’ > 5%, then combined the

results to obtain a consensus library. The library contains 1182 elements that were

classified using homologies with TEs from Repbase10

. Of these, 701 sequences were

considered as “Unknown”, meaning they cannot be classified. Of the 480 annotated

sequences, 325 are shorter than 500 bp and are not necessarily short interspersed elements

(SINEs). This means that only a few elements encode proteins, and even fewer elements

in the tongue sole genome may still be active.

Detection of long terminal repeats (LTRs) and terminal inverted repeats (TIRs)

structures

a. LTR_Finder

LTR_Finder software11

was used to detect LTRs, which is specific for LTR

retrotransposons and also dictyostelium intermediate repeat sequence (DIRS) elements.

No LTRs were found in the TE database. This result is consistent with the protein

prediction and phylogeny (see below): few LTR retrotransposons were found and most of

them are small. These results suggest that LTR retrotransposons are not active in the

genome of the tongue sole.

b. e-inverted

e-inverted software, which is specific for DNA transposons, was used to detect TIRS.

This software did not find any specific TIRs in the tongue sole genomes.

Phylogenies of TEs

a. Methods

Protein prediction:

Using known TE proteins, we used BLAST software on the de novo TE library to predict

proteins on a large scale. We examined all potential predicted proteins for reverse

transcriptases of different families (LINE, DIRS and LTRs) and transposases.

Nature Genetics: doi:10.1038/ng.2890

Page 111: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

111

Number of predicted sequences:

-LINE reverse transcriptase: 33 sequences

-LTR reverse transcriptase: 8 sequences

-Penelope reverse transcriptase: 7 sequences

-Transposase: 111 sequences.

The protein sequences were then aligned and phylogenetic trees were constructed.

Phylogeny:

Phylogenetic trees were constructed using the SEAVIEW-PhyML software12

.

b. All reverse transcriptase phylogeny

There are three different types of retrotransposons orders (LINE, LTR and Penelope);

therefore, we decided to align them and perform a complete reverse transcriptase tree. We

could easily differentiate the three different orders and the four families of LINE. The

main branches were well supported with high bootstrap values. From this tree, we also

observed that there are many more LINE elements compared with LTRs and Penelope.

Two Penelope elements, one Rex/Babar and one Gypsy, were not placed together with

their respective families.

c. DNA Transposons

We identified 90 Tc1 elements, one Tc5, eight hAT, two PiggyBac, six Buster, one

Harbinger and six Pogo-like elements in the tongue sole genomes.

The genome of the tongue sole contains around 5.85% of TEs, mainly represented by Tc1

transposons and LINE RTE and Babar retrotransposons. There are few LTR

retrotransposons in the genome of the tongue sole. The main LTR elements present are

Sushi elements (from the Gypsy family) and are probably not longer active. Both APE

and REL (only one family) endonuclease elements were found; the main represented

elements belonging to the RTE and Babar families. Finally, DNA transposons are mainly

represented by Tc1 elements, with 90 predicted elements.

Transcriptome sequencing

Nature Genetics: doi:10.1038/ng.2890

Page 112: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

112

Tongue sole larvae came from one batch of fertilized eggs and were maintained at 23 °C

until 23 days post hatching (dph) in the Shandong Huanghai Aquaculture Co., Ltd. Six

larvae were sampled individually at 18 dph (pre-metamorphosis stage) and at 22 dph

(post-metamorphosis stage), respectively. Sampled larvae were sacrificed by anesthetic

overdose and their gonads were isolated for genetic sex identification, and then placed in

liquid nitrogen. For the sex-reversed samples, the offspring of a normal parent male (ZZ

testis P)and female were incubated at 28 °C during the critical stage (30 dph- 80 dph) to

produce the first generation of sex-reversed fish (ZW testis F1)and female (ZW ovary F1)

in 2008. Then in 2010 one of the pseudomales was used for crossing with a normal

female to produce the next generation consisting of spontaneously sex-reversed fish (ZW

testis F2) and normal females. Taken together, ten for parent male, first generation of

pseudomale and female, and second generation of pseudomale and normal female,

respectively, were collected from Laizhou Mingbo, Co. Ltd. in 2011. All these fishes were

sacrificed and their gonads were isolated and preserved at -80℃ for RNA and DNA

isolation.

Total RNA was isolated and purified from all samples using a traditional phenol method13

.

RNA concentration was measured using Nanodrop technology. For the metamorphosis

samples, RNAs of three larvae were pooled at equal quantities for library construction.

Firstly, the oligo-dT-coupled beads were used to enrich poly-A+ RNA molecules.

Random hexamers and Superscript II reverse transcriptase (Invitrogen) were used for first

strand cDNA synthesis and E. coli DNA PolI (Invitrogen) was used for second strand

cDNA synthesis. A Qiaquick PCR purification kit (Qiagen, Germantown, MD) was used

to purify the double stranded cDNA. The cDNA was then sheared with a nebulizer

(Invitrogen) to 100–500 bp fragments. The fragments were ligated to Illumina PE adapter

oligo mix after end repair and addition of a 3' dA overhang. Then, 150 ± 20 bp/200 ± 20

bp (two sizes) cDNA fragments were collected by gel purification. After 15 cycles of

PCR amplification, the libraries were subjected to paired-end sequencing (90 bp or 75 bp

Nature Genetics: doi:10.1038/ng.2890

Page 113: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

113

at each end) using Illumina HiSeq2000. Finally, 12.38 Gb of transcriptome sequences

were generated from the seven tissues to aid precise gene annotation.

Gene prediction and annotation

Homology-based gene prediction

Protein sequences of Oryzias latipes, Takifugu rubripes, Tetraodon nigroviridis,

Gasterosteus aculeatus, Danio rerio and Homo sapiens were obtained from Ensembl

database (release 57).Short (<50 aa) sequences were filtered out before further processing.

We used the following pipeline to project them as parent proteins onto the tongue sole

genome: (a) Rough alignment. We aligned the parent protein sequences to the tongue sole

genome by TBLASTN at E-value <1e-5, and grouped all the high-scoring segment pairs

(HSPs) into gene-like structures using SOALR. Alignments with less than 70% aligning

sequence similarity to their parent proteins were filtered out. (b) Precise alignment. We

first isolated the target gene region in the genome by extending the alignment regions by

500 bp at both ends, including the intron regions, and then aligned the parent proteins to

these DNA fragments using GeneWise14

to predict the precise transcript structure. To

filter out low quality results, we only retained transcripts with ≥150 bp. (c) Transcript

clustering. We clustered all the predicted transcripts by genomic overlap with a cutoff of

more than 100 bp. For each gene locus, the transcript with the longest length was chosen.

(d) Filtering out pseudogenes. There are two types of frame errors, frame shift and

internal stop codons, that identify pseudogenes. We filtered out genes containing more

than one frame error for single-exon genes, and more than two frame errors for

multiple-exon genes. Finally, we predicted 18,284 genes and 587 pseudogenes in the

homology-based gene set.

RNA-seq models

We mapped a mixture of Illumina paired-end reads from seven tongue sole libraries using

TopHat15

, and then used Cufflinks16

to construct transcripts. After that, an in-house

software was used to predict potential open reading frames (ORF) (≥150bp) for 42,912

spliced transcripts. All predicted ORFs were aligned to Uniprot17

Protein Existence (PE)

Nature Genetics: doi:10.1038/ng.2890

Page 114: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

114

classification level 1 and 2 proteins, using BLASTP with a cutoff E-value<1e-5.

Transcripts without any significant hits were not included for further analysis. If a

transcript contained more than one ORF, the one with max alignment score was chosen.

In the end, 30,253 transcripts were considered as RNA-seq models.

De novo prediction

First, we masked all TE-related regions in the tongue sole genome. We then performed de

novo prediction using two software programs: Genscan18

and Augustus19

, both with gene

model parameters trained from Homo sapiens, and filtered out partial (missing start or

stop codon) and short (coding region<150bp) genes. To filter out TE-derived genes, we

aligned predicted protein sequences to a TE protein database in RepeatMasker using

BLASTP with an E-value ≥1e-5. Genes showing more than 50% alignment were filtered

out. We then clustered genes according to their genomic overlap (≥100bp) and chose the

gene with longest coding region for each cluster. Ultimately, we obtained 27,327 genes in

the de novo gene set.

Integrating gene sets

To form a comprehensive reference gene set, we integrated gene models defined by the

different methods with the following steps: a) Gene were clusterd from all the input sets,

with a cutoff of genomic overlap greater than 100 bp for each gene locus. b) We chose

one representative gene model for each cluster according to the following priority:

homology-based model > RNA-seq model>de novo model. c) To complete the gene

structure in the homology-based gene set, we identified all supporting evidence in other

gene sets, and then used GLEAN to try to complete the structure of homology-based

genes presented in the reference gene set. d) We used a more stringent cutoff for de novo

genes than for homology-based genes and RNA-seq models. If de novo genes were chose

in the reference gene set, we only retained those with more than 30% aligning when

searching in Uniprot database and that contained at least 3 exons. The final reference

gene set contained 21,516 genes. Ninety-nine percent of the predicted genes were

supported by homologs in other organisms or in the transcriptome. In particular, the

Nature Genetics: doi:10.1038/ng.2890

Page 115: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

115

highly conserved structure of homologous genes in other well-annotated teleost fish

genomes confirmed the accuracy of the annotation.

Functional annotation

For the tongue sole reference genes, we annotated the motifs and domains by

InterProScan20

against publicly available databases, including Pfam, PRINTS, PROSITE,

ProDom, SMART and PANTHER, and then retrieved Gene Ontology (GO) annotation

from the results of InterProScan. From the reference gene set, 17,890 and 14,935 genes

could be annotated by IPR and GO annotation, respectively. We also annotated 20,265

genes by searching the Swiss-Prot database using BLASTP at E-value ≥1e-5.

Constructinggene family and reconstructing phylogeny

Constructing gene families

With human and chicken as out-groups, we constructed gene families for sequenced fish

genomes including medaka, Takifugu, Tetraodon, stickleback, zebfrafish and tongue sole

using Treefam’s methodology21

. For the tongue sole, the reference gene set was used. The

protein-coding genes of other species were obtained from Ensembl (release 57). After

filtering out short genes (coding sequence<150 bp), we chose the transcripts with longest

coding sequence to represent each gene. We constructed gene families used the same

pipeline as a previous study1. In summary, we first performed all-against-all comparison

of all proteins using BLASTP with a cutoff of E-value<1e-7, aligning rate≥1/3 to both

genes, and then clustered the genes into gene families using Hcluster_sgwith

consideration of proteins of out-group species (human and chicken). Treebest was then

used to infer all the orthology and paralogy gene relations.

Reconstructing phylogeny

Using Treefam, 2,426 single-copy gene families were defined. These single-copy gene

families were used to reconstruct the phylogeny. Four-fold degenerate sites (4d) from

them were extracted and concatenated to one super gene for each species. Modeltest22

was used to select the best substitution model (GTR+gamma+I) and Mrbayes23

was used

to reconstruct the phylogenetic tree. The chain length was set to 50,000,000 (1

Nature Genetics: doi:10.1038/ng.2890

Page 116: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

116

sample/1000 generations) and the first 1,000 samples were discarded as burn in.

Branch-specific dN and dS were estimated with codeml in PAML with branch model24

.

The transition/transversion rate ratio was estimated as a free parameter, while other

parameters were set with the default settings.

Gene family expansion and contraction

We used Café25

, a maximum-likelihood method, to analyze gene family expansion and

contraction in all lineages with a birth/death model. The model had a global parameters λ

(lambda), representing the gene birth and death (μ = -λ) rate of each branch in the tree.

This method estimated the family sizes in the common ancestor, and then defined

expansion and contraction by comparing the family size between the current species and

the ancestor. To gain an insight into the evolution of the gained and lost genes, we also

performed functional enrichment analysis (p<0.05 by Fisher’s exact test) based on GO

annotation of genes involved in significant (p<0.01) gains and losses gene families in the

tongue sole lineage.

Estimation of divergence time

We applied mcmctree program in the PAML package24

to estimate species divergence

time with 4d sites extracted from all single-copy gene families. The correlated molecular

clock and JC69 model was chosen, and Root Age was set to be below 2.0. The process

was run for 20000 samples, first 1000 of which were burn in. Other parameters were set

as defaults. We also used divergence times from human-chicken (267-325 Mya),

human-zebrafish (438-455 Mya) and zebrafish-medaka (258-307 Mya)26

as the

calibration times.

ncRNA prediction

We predicted tRNA genes by tRNAscan-SE with eukaryote mode and the default

threshold. After filtering out RNA genes covered by TE-related regions (≥80%), 674

tRNA genes, with an average length of 77 bp, were predicted. Based on sequence

conservation, we identified 104 rRNA gene fragments, with an average length of 107 bp,

by aligning the rRNA template sequences from the human genome using BLASTN with

cutoff of E-value <1e-5, identity ≥85% and match length ≥50 bp. 285 miRNA and 434

Nature Genetics: doi:10.1038/ng.2890

Page 117: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

117

snRNA were also predicted by INFERNAL27

software against the Rfam database (release

9.1, 1,372 families)28

with Rfam's family-specific "gathering" cutoff.

Benthic adaptation by transcriptome analysis

To investigate the gene expression profiling associated with benthic adaptation of tongue

sole, we identified the differential expressed genes (DEGs)between pre-metamorphosis

and post-metamorphosis fish. Firstly, TopHat v1.2.0 package15

was used to map

transcriptome reads to the genome, with the following parameters: -a/--min-anchor 8,

-m/--splice-mismatches 0, -i/--min-intron-length 50, -I/--max-intron-length 500000,

--segment-mismatches 2 and --segment-length 25. High quality splice junctions were also

predicted by Tophat. High sequence depth regions joining known gene coding regions

directly or by high quality junction reads were considered as UTRs. Gene expression was

measured by reads per kilobase of gene per million mapped reads (RPKM)29

, and adjusted

by a scaling normalization method30

. Only genes with an RPKM>1 in at least one

sequenced sample were considered. Differentially expressed genes were detected using

DESeq31

and Cuffdiff32

. We ran DESeq and Cuffdiff with the parameter “method = blind”

because we lacked a biological replicate. The P-values were then adjusted by the false

discovery rate (FDR)33

. Only genes with adjusted P-value < 0.05 in any method and a

change in fold > 4 were considered as true DEGs.

Annotation of genes to GO categories was performed to the orthologous relationship

between the C. semilaevis gene set and D. rerio gene set, which had a perfect GO

annotation. Fisher's Exact Test and the Chi-squared Test34

was used to identify whether a

list of genes (foreground genes) was enriched in a specific GO category compared with

background genes, by comparing the number of background genes annotated to this

specific GO, the number of foreground genes annotated to this specific GO, the total

number of background genes and the total number of foreground genes. The P-value was

adjusted for multiple testing by consideration of the Benjamini-Hochberg FDR33

. The

KEGG automatic annotation server35

, annotated the genes to KEGG pathways, with

Nature Genetics: doi:10.1038/ng.2890

Page 118: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

118

zebrafish and human as references. Fisher's exact test and the Chi-squared test then

identified enriched pathways36

.

Improved branch-site model A37

was used to extrapolate the sites under positive selection

(dN/dS>1 in tongue sole lineage and dN/dS ≤1 in other lineages). We performed detection

of signatures of positive selection for 1,577 conserved tongue sole genes of all single-copy

families (2426) with a cutoff of coverage >70% to all genes in the same family. Using a

rooted tree with Human and chicken as outgroups, we detected at least one positive

selected site with Bayesian empirical Bayes posterior probability >0.95, and then filtered

out positive selected genes with an FDR q-value <0.05. Finally, 219 genes under positive

selection in tongue sole lineages were identified. We then compared them with the DEGs

between pre- and post metamorphosis fish, 15 genes involved in benthic adaptation were

discovered to be under positive selection.

Furthermore, 15 positively selected genes including cp, mep1b, hnf4a, ace2, tmem67,

fbn1, cdhr2, pepd, itih2, mgam, cpb1, xdh, cd74, slc15a2, gda were verified by qRT-PCR.

Briefly, total RNA of each individual at pre- and post- metamorphosis stages of normal

fish was isolated and reverse transcribed as described previously13

. Three individuals at

each stage were collected for isolation of total RNA. Primers for qRT-PCR analysis were

designed using the Primer Premier 5 program for the genes. The final PCR reactions

contained 0.4 mM of each primer, 10 µl SYBR Green (Invitrogen) and as template 80 ng

of cDNA reverse transcribed from a standardized amount of total RNA. qRT-PCR was

performed on ABI PRISM 7500 Real-Time PCR System using Hotstart Taq polymerase

(Qiagen) in a final volume of 20 μl and β-actin gene was used as internal reference. All

reactions were subjected to: 95℃for 35 s followed by 40 cycles at 95℃ for 5 s, 60℃ 34

s. Melting curve analysis was applied to all reactions to ensure homogeneity of the

reaction product. The results were analyzed using 7500 System SDS Software.

In addition, we used zebrafish visual gene proteins in a comparison with other teleost

genomes by blat, with 90% identity and 70% gene coverage. The corresponding regions

Nature Genetics: doi:10.1038/ng.2890

Page 119: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

119

(extended by 500 bp at both ends) in other teleost genomes were extracted for GeneWise

gene prediction. Combining the best prediction results with the synteny analysis, we

finally identified the corresponding protein genes in other teleost genomes and detected

the expression level of visual genes in tongue sole based on the transcriptome analysis.

Also, the significantly differentially expressed genes including the rh1, rh2 and lws1 were

verified by RT-PCR as mentioned previously.

Reconstruction of ancestral vertebrate chromosomes

We performed an all-against-all comparison between tongue sole protein sequences and

human protein sequences using BLASTP (E-value < 1e-10). Each tongue sole gene was

assigned a best-matched human gene, if any. Then, for each human gene corresponding to

more than one best-matched genes in tongue sole, we defined the first and second

orthologs as the best and second best matches, which were defined as paralogs associated

with a human gene in tongue sole. Finally, we identified 2,733 paralogs in the tongue sole

genome, 2,365 of which were anchored on chromosomes. Moreover, we paired

paralogous chromosomes according to the number of paralogs between two chromosomes.

We obtained Tetraodon, medaka and zebrafish gene sequences from Ensembl (release 57)

and identified reciprocal best-match orthologous genes between tongue sole and each

other fish using BLASTP at an E-value of 1e-10. 14,231, 14,310 and 13,084 orthologous

genes were identified for Tetraodon, medaka and zebrafish, respectively. We identified

duplicated regions by synteny analysis, using a method from a previous study38

.

Conserved synteny is defined as orthologous genes on a pair of chromosomes from

distinct species. There have been few interchromosomal arrangements in teleosts;

therefore, the ancestral chromosome in human lineage, which did not have any whole

genome duplication (WGD) event after separating from the teleost lineage, was broken

into smaller blocks, which were likely to have conserved synteny on a pair of duplicated

tongue sole chromosomes. These human blocks are called doubly conserved synteny

(DCS)38

. Following the above principle, we identified 61 DCS blocks. This result is

consistent with analyses of the Tetraodon and medaka genomes38,39

except for some small

Nature Genetics: doi:10.1038/ng.2890

Page 120: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

120

DCSs that did not affect the result and thus were filtered out in our analysis. With the

human genome as the out-group, we deduced the ancestral teleost karyotype by

considering results from paralogy and orthology relations and DCSs, as done in a

previous study39

. The method is summarized as follows:

a. Chromosome clustering. To reduce complexity, we tried to divide the tongue sole

chromosomes into a few sub-groups. According to number of shared DCSs, 21 tongue

sole chromosomes were clustered into nine groups. Some chromosomes were involved in

multiple groups because of their completed evolution. In each group, chromosomes are

likely to contain regions duplicated from one or more ancestral chromosomes.

b. Rearrangement detection. To infer inter-chromosomal rearrangements, such as

fusions and fissions, among chromosomes in the same group, we checked whether there

were substantial paralogs between pairs of tongue sole chromosomes that originated

from one ancestral chromosome. If two tongue sole chromosomes have many DCSs but

just a few paralogs in common, we inferred that these two chromosomes were derived

from a fission event of an ancestral chromosome.

c. Inferring when arrangements occurred. After the previous step, we detected

potential arrangements from the ancestor to the current tongue sole genome. To infer

when these events occurred, we needed to further consider the orthological relationship

between tongue sole and other fish genomes, including medaka, Tetraodon, and

stickleback, using zebrafish as the out-group. Using this method, the ancestral teleost

karyotype (gnathostome ancestor) was determined to have 13 chromosomes, represented

as Anc1~Anc13. The ancestral vertebrate ancestor consisting of 10 proto-chromosomes

was reconstructed and the evolutionary hierarchy from the ancestral vertebrate ancestor to

the genomes of the human, chicken, and medaka were assigned, as reported by Nakatani40

.

Based on the evolutionary relationship between tongue sole and medaka fish, we then

deduced the chromosome evolution from the ancestral vertebrate ancestor to tongue sole.

Finally, we found that the chicken and tongue sole Z chromosomes have a common origin,

being derived from proto-chromosome A in the vertebrate ancestor, and A0 in the

gnathostome ancestor.

Nature Genetics: doi:10.1038/ng.2890

Page 121: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

121

Genomic organization and evolution of the sex chromosomes

Structural features of sex chromosomes

The gene density on CseZ (42 genes/Mb) is slightly lower that the average value of

autosomes (46 genes/Mb), and higher than on seven of the autosomes. The CseW has a

lower gene density (19 genes/Mb) than any of the autosomes, about half of the autosomal

average. Conversely, the density of interspersed repeats of both CseZ and CseW is much

higher (by ~2.3- and ~6.9-fold, respectively) than the average for the autosomes. On CseZ,

the most abundant type of interspersed repeats is DNA transposons (36.1% of all

interspersed repeats); while on CseW, LINE elements (31.4% of all interspersed repeats)

are the most abundant. In addition, the PAR includes two scaffolds, scaffold589 (398,660

bp) and scaffold757 (243,113 bp), which are anchored distally of Z and have the same

coverage depth in both male and female samples. We identified 22 protein-coding genes

and one pseudo gene in the PAR, and inferred their function by BLAST searching against

SwissProt (E-value<1e-5) and retained the best hit for further analysis.

Homologous genes in the non-PAR regions of Z and W

To identify homologous gene pairs in the non-PAR of Z and W, we compared all W and Z

genes (395 and 937) from the non-PAR, including functional genes and pseudogenes,

using BLASTP with a cutoff of identity >50% and an alignment rate >50%. We then

chose the best hit for each W gene. We found that 339 W genes are homologous to 297 Z

genes, because some Z genes have more than one homologous W gene. As we described

before, there are two unplaced scaffolds, which are identified as Z-linked scaffolds by

their M:F depth ratios. We observed that 24 W genes are homologous to genes on these

two scaffolds. We also aligned Z and W genes to genes on autosomes and unplaced

scaffolds (except the two Z-linked scaffolds) using the same method, and found that 258

Z genes and 30 W genes have homologous genes. The other Z (382) and W (2) genes

without any homologous genes were defined as Z-specific and W-specific genes,

respectively. Neither of the two W specific gene sequences, nor their expression patterns,

gave any indication that they might function as ovary-determining genes. In addition,

pseudogenes were defined as having more than one frame error for single-exon genes and

Nature Genetics: doi:10.1038/ng.2890

Page 122: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

122

more than two frame errors for multiple-exon genes; frame errors included frame shift

and internal stop codon.

Estimation of divergence time between Z and W

We obtained a lineage-specific Ks of 0.47 and mean divergence time of 197 (170~220)

MY (Million Years) for the separation of the tongue sole lineage from medaka using

orthologs of the whole genome. We could then calculate a mean lineage specific rate of

~2.39E-9/site/year for the tongue sole lineage. If we assumed that the Z and W

chromosome evolutionary rates were equal to the lineage specific rate, the combined rate

of Z-W divergence would be ~4.77E-9/site/year. Using this rate, we estimated a mean

divergence time of ~31 MY between Z and W. Besides, we calculated the Ks for

autosome genes and PAR genes, respectively. We firstly mapped reads from the ZW fish

to WGS assembly result using BWA41

, and performed SNP calling using pileup in

SAMtools package42

. Then, SnpEff package43

was used to classify whether the SNP site is

synonymous or non-synonymous. We also used codeml to calculate the number of

synonymous sites in every gene. Finally, the Ks was calculated by the number of

synonymous SNP sites per the number of synonymous sites in coding region.

Dosage compensation

We used RNA-seq data from the whole body (without gonad) of male and female fish to

test for dosage compensation of the Z chromosome. The male:female (M:F) gene

expression ratio was used to measure the dosage compensation level for every gene in the

female or sex reversal male relative to the normal male, calculated as the RPKM ratio of

each gene between two samples. Only genes (763) with an RPKM>1 in both the normal

male and female was considered. The Z to autosomes expression ratio (Z:A) for every

gene in the Z chromosome was calculated by dividing the RPKM of the gene by the

median RPKM of all autosomal genes. After filtering out genes with RPKM <1, we

calculated the M:F expression ratio for all Z-linked genes. The result shows that the

tongue sole exhibits incomplete dosage compensation via an upregulation of gene

expression levels in the female. We further defined the compensated and uncompensated

Nature Genetics: doi:10.1038/ng.2890

Page 123: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

123

genes on the Z chromosome, using the same cutoff (M:F<1.3 and M:F>1.3) as described

in for zebrafinch and chicken44

. Of 763expressed Z genes (RPKM >1), 370 and 393 genes

were classified as compensated and not compensated genes, respectively. To test the

conservation of the compensated genes between tongue sole and birds (chicken and

zebrafinch), we downloaded the expression data of zebrafinch from GEO (accession no.

GSE20035)45

, and calculated the average M:F expression ratio for each pair of samples,

including four pairs of brain samples from d1 (day 1 after hatching), d25, d45 and adult

male and female individuals, respectively. We also downloaded the raw expression data of

chicken from GEO (accession nos. GSE6843, GSE6844 and GSE6856)44

, which are from

heart, brain and liver samples from male and female individuals, respectively.

Normalization of the raw expression data was performed in R using Affymetrix MAS 5.0

algorithm in Bioconductor packages46

, and the expression values were log2-transformed

to produce a more normal distribution. We then calculated the average M:F expression

ratio for each pair of samples. Using reciprocal best matching method by BLASTP at an

E-value <1e-5, we identified 10,111 and 9,979 reciprocal best orthologs to chicken and

zebrafinch, respectively. Of these orthologs, between tongue sole and zebrafinch, 158 are

located on the Z chromosome, and only 40 were expressed in both tongue sole and

zebrafinch. Correspondingly, 100 of 169 orthologs, which reside on the Z chromosome,

were expressed in both tongue sole and chicken. We found that the compensated Z genes

in tongue sole are not the same as those compensated in birds. Interestingly, the male to

female expression ratio was about 1.2-1.4 (less than 1.5) for all tested species with ZW

systems (1.32 for tongue sole; 1.36 for crow; 1.40 for chicken; 1.23 for zebrafinch; and

1.41 for silkworm)44,47,48

.

Production and genetic analyses of sex-reversals

Treatment of tongue sole with high temperature (28°C) during the critical developmental

stage directly affects the sex ratio of progeny49

. Briefly, about 3,000larvae were collected

at 25 dph and then evenly allocated into three tanks (3 m3) at 23°C.The seawater was

heated up gradually and maintained 28°C until the 100 dph. The genetic sex identification

Nature Genetics: doi:10.1038/ng.2890

Page 124: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

124

of larvae was performed by PCR analysis using the sex-linked SSR markers, CseF-SSR1,

which produced one band of 206 bp for ZZ and two bands of 206 bp and 218 bp for ZW50

.

The phenotypic sex was identified following routine histology that the ovaries contain

oocytes or the testis contains spermatocytes51

. Under treatment of high temperature, about

70% of ZW individuals developed as male, while under normal condition (22°C), the

spontaneously sex reversal rate is about 14%. The sex-reversed fish produced by high

temperature can crossed with normal females. Unexpectedly, there was an extremely

male-skewed sex ratio (>94%) in offspring of pseudo-male families raised under normal

conditions (22°C). This was caused by an extremely high sex reversal rate of genetic

females to phenotypic males (~94%, compared to the sex reversal rate of 14% under

normal conditions). A similar phenomenon was also detected in the offspring of

pseudo-male families, crossing spontaneous sex reverted males with normal females.

Theoretically, the genotypes of progeny of pseudo-males should be Z*Z (1): Z*W (1):

ZW* (1): W*W (1) (Z* and W* are derived from the pseudo-male, and Z and W are

derived from the normal female), and thus the male to female ratio should be 1:3. We first

analyzed the genotype of fertile sperm of WZ pseudo-males using sex chromosome

specific microsatellite markers. Surprisingly, no W sperm was detected. More importantly,

the sex ratios were determined 53.26% genetic female and 46.74% genetic male in 1,812

progeny of pseudomale families, which is almost a 1:1 ratio. Thus the genotype of

progeny of pseudo-male were Z*Z (1): Z*W (1). To verify the paternal inheritance of the

Z chromosome in the progeny, we selected Z specific microsatellite markers and then

determined the genotypes of the parents in the pseudo-male families. For the pseudo-male

families with different microsatellite genotypes for the maternal and paternal

Z-chromosomes, the offspring were analyzed. This revealed that all WZ fish had the

paternal microsatellite marker, while the ZZ fish were heterozygous for both parental

markers.

Sex-related genes and expression of Z specific gene

Characterization of the sex-related genes involved in gonadal development is an initial

Nature Genetics: doi:10.1038/ng.2890

Page 125: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

125

step towards understanding sex determination of the tongue sole. Thus, we

comprehensively searched for known sex-related genes, based on studies from other

vertebrates. Fifth-eight sex-related genes were retrieved from the tongue sole genome

sequence using BLAST against known sex-related gene sequences. As the sex reversal

experiments clearly indicated that sex determination in tongue sole is mediated by a Z

encoded dominant factor, we determined the location of the 58 sex-related genes in the

tongue sole genome and found that five genes including follistatin, patched1, figalpha,

sf-1 and dmrt1 are located on the sex chromosomes.

For whole genome methylation analysis, two biological replicates were used, and each

replicate was a pool with five gonads from the same group of fish (ZZ testis P, ZW ovary

F1, ZW testis F1, ZW testis F2 and ZW ovary F2)52

. Briefly, up to 25 μg genomic DNA

were isolated from five pooled gonads of the same replicate, and 5 μg DNA was mixed

with 25 ng of cl857 Sam7 Lambda DNA and used for BS-Seq library construction with a

modified NH4HSO3-based protocol53

. Libraries were sequenced on an Illumina HiSeq

2000. Short reads were aligned onto the tongue sole genome with SOAP254

. Cs in BS-Seq

reads that matched to Cs on the reference genome were counted as potential mCs. The

conversion rate for each library was ~99.5%. The methylation level of an individual

cytosine was determined by the number of reads containing a C at the site of interest

divided by the total number of reads containing the site. We then detected the methylation

profile of the Z-linked sex-related genes among those samples.

To identify the expression pattern of sex-related genes during sex determination stages,

we firstly analyzed the gonadal development of female and male tongue soles. The

gonads of tongue sole at different developmental stages, including 25 dph, 48 dph, 70 dph,

160 dph, 1 year and 2 years were dissected, rapidly fixed in Bouin's solution and

processed by routine histological procedures55

, including hematoxylin-eosin staining. All

individuals were checked for genetic sex identity. The period of the gonadal

differentiation was then identified. At 25 dph, the primordial gonad, stretching from the

ventral end of the kidney to the posterior end of the abdominal cavity, was detected in

Nature Genetics: doi:10.1038/ng.2890

Page 126: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

126

females and males, without any morphological differentiation towards testis or ovary.

After this period, the primordial gonads showed differentiation with primordial germ cells

(PGCs) cells uplifting inwards, and clusters of ovarian cavities appearing as a result of the

high rate of mitotic multiplication at 48 dph, which indicated ovarian differentiation of

the tongue sole. The male gonad at the corresponding stage did not yet form a testis cavity,

but the PGCs also showed faster mitotic multiplication and filled the primordial gonad. At

70 dph, the developing ovary had a relatively large ovarian cavity and the number of

primary oocytes had increased, with the appearance of a few oogonia. Although in the

male gonad the formation of spermatogonial clusters of cysts and seminal lobules, which

are cytological features of testicular differentiation, had become apparent, testis

differentiation in the tongue sole, as in other teleosts, is evidently delayed compared with

ovarian differentiation. In addition, immature spermatids were detected in the

spermatogenic cysts at higher magnification at 70 dph. At 160 dph, ovarian development

had further proceeded, visible by the presence of primary oocytes containing prominent

nucleoli. In testes, matured spermatozoa were observed to flow into the lumen of

seminiferous lobula because of the rupture of spermatogenic cysts. Histological analysis

revealed that the matured gonads were full of oocytes and sperms in different phases,

respectively, both in the gonad of 1- and 2-year-old fish.

Total RNA of gonads at different developmental stages of normal and sex-reversed fish

was isolated and reverse transcribed as described previously13

. Primers for qRT-PCR

analysis were designed using the Primer Premier 5 program for Z-linked genes including

follistatin, patched1, sf-1 and dmrt1. The final PCR reactions contained 0.4 mM of each

primer, 10 µl SYBR Green (Invitrogen) and 80 ng of cDNA reverse transcribed from a

standardized amount of total RNA as the template. qRT-PCR was performed on an ABI

PRISM 7500 Real-Time PCR System using Hotstart Taq polymerase (Qiagen), in a final

volume of 20 μl and the β-actin gene was used as an internal reference. The PCR

conditions comprised: 95C for 35 s, followed by 40 cycles at 95 C for 5 s and 60 C for

34 s. Melting curve analysis was applied to all reactions to ensure homogeneity of the

Nature Genetics: doi:10.1038/ng.2890

Page 127: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

127

reaction product. The results were analyzed using 7500 System SDS Software. In

comparison to dmrt156,57

, the other three Z-linked genes sf-1, patched1 and follistatin

have considerably lower expression levels at the sex determining stage in male fish. The

adult tissue expression pattern and expression at the critical sex determining period during

temperature-induced sex reversal did not indicate a male determining function of those

genes. Combining the expression data with the methylation pattern of these three genes in

normal and sex reversed individuals excluded all three male sex determining gene.

Four genes had homologs on both the Z and W chromosome, respectively, except for the

follistatin, which are only found on the Z chromosome. However, we detected that the Z

homolog of figalpha and W homologs of sf-1 and dmrt1, respectively, have to be

considered as pseudogenes, because they appeared to have lost the all or some of those

conserved domains that exert the main function of the respective gene products. figalpha

plays a key role in ovarian development, while expression of sf-1 and dmrt1are critical for

male development. In addition, there is a paralog of sf-1 on chr.14, resulting from

fish-specific WGD. Both patched1 genes on the Z and W chromosome are very similar

except for one intron that is missing from the W ortholog.

Analysis of a possible primary sex-determining role of CseDmrt1

Z specific localization of dmrt1

From the genetic map, scaffold 317, which contains the intact dmrt1 gene, was anchored

on the linkage group of Z. For qPCR analysis a specific pair of primers designed from the

DM domain of dmrt1 was used on different samples (three males (ZZ), three females

(ZW) and two super-females (WW) that were induced by gynogenesis50

, as described

above. The M: F (male versus female) ratio was about two (as expected) and expression

level in WW super-female embryos was almost zero, indicating that dmrt1 is indeed a

Z-linked gene. To further identify the physical position of the dmrt1 gene, fluorescence in

situ hybridization (FISH) was carried out essentially according to the methods described

previously58

, with slight modifications. Briefly, the metaphases of chromosome spreads

Nature Genetics: doi:10.1038/ng.2890

Page 128: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

128

were obtained from the head kidney of females and males, respectively, and then stored at

-20℃. Chromosome preparations were passed through an ethanol series and air-dried

before denaturation (at 70℃ for 1 min and 10 seconds in 70% formamide, 2×SSC). The

dmrt1-BAC clone was cultured in LB medium containing 30 µg/ml chloramphenicol at

37℃ for 16 h. The BAC DNA was then extracted and labeled by nick translation using a

Nick Translation System (DIG-Nick Translation Mix (Roche)). The BAC-FISH probe

contained 1 µg of labeled dmrt1-BAC DNA, 50 µg of sonicated salmon sperm DNA and

10 µg of Cot-1 DNA. After hybridization in a moist chamber at 37℃ for 24 hours,

chromosome slides were subjected to a series of washing steps (2×SSC for 5 min; 50%

formamide for 5 min; 1×SSC for 5 min). Signal detection and amplification were

performed using sheep-anti-digoxigenin and FITC-Donkey-anti-sheep. FISH staining was

performed with propidium iodide (PI). Image capture was carried out with NIS-element

fluorescence microscope (Nikon) and then analyzed by the LUCIA system and Adobe

photoshop software.

Gonad in situ hybridization for dmrt1 expression

We confirmed the distribution of the dmrt1 transcripts by gonad in situ hybridization from

different developmental stages. Gonad complexes were dissected and fixed in 4%

paraformaldehyde in 0.1 M phosphate buffer (PB) (pH 7.4) at 48 C overnight. After

fixation, gonads were embedded in paraffin. Cross-sections were cut at 6–8 micrometers

(µm). Probes of digoxigenin (DIG)-labeled sense and antisense strands were generated by

in vitro transcription from linearized tongue sole dmrt1 cDNA plasmid, using an RNA

labeling kit (Roche Applied Science, Germany). Gonad in situ hybridization was

performed as described previously59

. Sections were deparaffinized, hydrated, treated with

proteinase K (10 mg/ml) and then hybridized using sense or antisense DIG-labeled RNA

probes at 70 C overnight. Hybridization signals were then detected using alkaline

phosphatase conjugated anti-DIG antibody (Roche Applied Science, Germany) and a mix

of BCIP and NBT as the chromogens.

Nature Genetics: doi:10.1038/ng.2890

Page 129: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

129

Determination of methylation levels in different regions of dmrt1 by Bisulfite

(BS)-PCR

We found an obvious differentially methylated region (DMR) between the female and the

male, ranging from 210 bases upstream of the transcription start site (TSS) to almost the

3' end of the first intron of dmrt1, based on the whole genome methylation analysis using

five different samples (testis (ZZ testis P), ovary (ZW ovary F1), testis (ZW testis F1),

testis (ZW testis F2) and ovary (ZW ovary F2). BS PCR was performed on the first exon

and intron of dmrt1 to verify the authenticity of the DMR. In brief, a 40 μl PCR was

carried out in 1 PCR buffer, 5 mM MgCl2, 1 mM dNTP mix, 1 unit of Taq polymerase,

50 pmol each of the forward primer and reverse primer and 50 ng of bisulfite-treated

genomic DNA. BS-PCR primers were designed using the sense strand of the

bisulfite-converted DNA. PCR cycling conditions were 94C for 1 min; 40 cycles of

94C for 30 s, 50C for 30 s and 72C for 30 s; followed by 72 C for 5 min; and stored at

4C. The PCR products were electrophoresed on 1% agarose gels, the bands were excised

and gel extracted using a Zymoclean Gel DNA Recovery Kit. The purified PCR products

were cloned using the pMD18-T Simple Vector cloning kit following the manufacturer’s

protocol. For each sample, a minimum of 15 clones was sequenced. All clones were

sequenced on an ABI 3730xl DNA analyzer using SP6 or T7 primers. BS-PCR together

with sequencing of several clones provided allele-specific methylation profiles.

Methylation studies showed that the region upstream from the TSS to the end of intron 1

of dmrt1 is specifically demethylated in male gonads, but not in female gonads

In addition, we found that an E3 ubiquitin ligase gene, neurl3, which is also located only

on the Z chromosome and is absent from W. RT-PCR and methylation profile were also

carried out according to the method described previously.

Sex reversal analysis by transcriptome and miRNA sequencing

The female-to-male sex reversal phenomenon in tongue sole fish offers a good

opportunity to investigate gene expression profiles during this process. We firstly

compared the gene expression between the ovary of a normal female and the testis of a

Nature Genetics: doi:10.1038/ng.2890

Page 130: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

130

pseudomale. We found 836 differentially expressed genes (with at least 4-fold difference)

with 262 up-regulated in female and 574 up-regulated in pseudomale. For every GO

category containing more than 20 genes, we calculated the geometric mean expression

level in both the ovary and testis, in which an RPKM <1 was considered as 1. Gene

ontology analyses suggested that these genes are related to sexual reproduction. The shift

from genetic females to phenotypic males is accompanied by the enhanced expression of

genes involved in spermatogenesis, flagellar assembly and spermmobility, together with

depressed expression of genes with function in oogenesis and ovary development, such as

aqp1, gas8, ropn1l, nme5, tekt1, plcz1, tbpl1, spag6, gal3st1, dnajb13, cldn11, gpr64.

These genes were verifed by semi-quantitative RT-PCR. Total RNA of gonads from three

female and three pseudomale individuals were isolated and reverse transcribed as described

previously. Primers for these genes were designed by Primer premier 5.0PCR was

performed in a 25 μl volume consisting of 0.5 μl forward/reverse primers (10 μM), 12.5 μl

2x Taq MasterMix (CWBIO), 1 μl cDNA and 10.5 μl ddH2O. The PCR conditions were as

follows: 95℃ for30 s, 27 cycles of 95 ℃ for 30 s, 52 ℃ for 30 s and 72 ℃ for 30

s.β-actin was used to calibrate the cDNA template for corresponding samples. The final

amplification products were resolved on 1.0 % agarose gel with a DL2000 DNA marker.

Besides, the KEGG automatic annotation server35

was used annotate the genes to KEGG

pathways, with zebrafish and human as references. Fisher's Exact Test and Chi-squared

Test36

were then used to identify enriched pathway.

For small RNA, 18-30 nts RNAs were purified from the gonads, including the ZZ testis P,

ZW ovary F1, ZW testis F1, ZW testis F2 and ZW ovary F2. Illumina 5’ and 3’ RNA

adapters were sequentially ligated to the RNA fragments and the ligated products were

size-selected on denaturing polyacrylamide gels. The adapter-linked RNA was reverse

transcribed with small RNA RT primers and amplified using 15 cycles of PCR with small

RNA PCR primer 1 and 2 (Illumina). The libraries were sequenced with the Illumina

Genome Analyzer. The adapters were removed using our custom scripts, and low quality

reads were discarded. Reads were aligned to the genome using bwa-0.6.241

. We discarded

Nature Genetics: doi:10.1038/ng.2890

Page 131: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

131

reads aligned to repeats, exons, tRNAs, rRNAs, snRNAs and the remnant were used for

microRNA detection using our custom scripts, which use ViennaRNA-2.0.760 to compute

RNA structure. Expression level of miRNAs were calculated as RPM (reads per million)

and DESeq was used to identify differentially expressed microRNAs32

. As a result, 44

microRNAs showed differential expression between female and pseudomale gonads.

Intriguingly, several microRNAs having important regulatory roles in sex determination

are upregulated in pseudomale testes. For instance, the expression of miR-124, miR-132,

miR-212 and miR-22, were significantly induced by up to 8-fold in testes. In mouse,

miR-124 can directly target sox9, a gene that has a critical role in testis development,

regulating expression of amh in Sertoli cells to inhibit the development of the female

reproductive system61

. MiR-132 and miR-212 are the products of the same pri-miRNA in

mouse and human, highly conserved among vertebrates (from teleosts to mammals), and

share a common consensus seed sequence62

. In mouse, the two micro-RNAs are regulated

by GnRH, a hormone central to the regulation of reproductive function, and cooperatively

(up/down)-regulate the expression of LH/hCG, which trigger ovulation in female, and

stimulate leydig cell production of testosterone in male63

. In tongue sole, miR-132 and

miR212 were found to be highly expressed in pseudomale testes. The estrogen

receptor-alpha (ER-alpha), a gene essential for sexual development and reproductive

function, is targeted by miR-22 and miR-219 and miR-27 in human MCF-7 cell line64

. In

tongue sole, miR-22 andmiR-219 were Z-linked microRNAs and miR-27 was a W-linked

microRNA. Further, in the human MCF-7 cell line, miR-27 has been reported to be

co-expressed with beta-catenin, which is an essential gene for female development and

fertility65

. Taken together, the expression profiling revealed an overall trend of inhibition of

ovary development and stimulation of testis development during sex reversal from genetic

females to males.

Nature Genetics: doi:10.1038/ng.2890

Page 132: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

132

Supplementary URLs

SOAP, http://soap.genomics.org.cn/; Ensembl, http://www.ensembl.org/index.html; KEGG,

http://www.genome.jp/kegg/; Repbase, http://www.girinst.org/repbase/index.html; SOALR,

http://treesoft.svn.sourceforge.net/viewvc/treesoft/; RepeatMasker, http://repeatmasker.org/; GLEAN,

http://sourceforge.net/projects/glean-gene/; Time tree, http://www.timetree.org.

References

1. Li, R.Q. et al. The sequence and de novo assembly of the giant panda genome. Nature 463,

311-317 (2010).

2. Li, R.Q. et al. De novo assembly of human genomes with massively parallel short read

sequencing. Genome Res. 20, 265-272 (2010).

3. JW, V.O. JOINMAP®4, Software for the caculation of genetic linkage maps in experimetal

populations. (2006).

4. Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5,

R12 (2004).

5. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database

search programs. Nucleic Acids Res. 25, 3389-3402 (1997).

6. Kent, W.J. BLAT - The BLAST-like alignment tool. Genome Res. 12, 656-664 (2002).

7. Sha, Z. X. et al. Generation and analysis of 10 000 ESTs from the half-smooth tongue sole

Cynoglossus semilaevis and identification of microsatellite and SNP markers. J. Fish Biol. 76,

1190-1204 (2010).

8. Edgar, R.C. & Myers, E.W. PILER: identification and classification of genomic repeats.

Bioinformatics 21 Suppl 1, i152-158 (2005).

9. Price, A.L., Jones, N.C. & Pevzner, P.A. De novo identification of repeat families in large

genomes. Bioinformatics 21 Suppl 1, i351-358 (2005).

10. Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet.

Genome Res. 110, 462-467 (2005).

11. Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR

retrotransposons. Nucleic Acids Res. 35, W265-268 (2007).

12. Gouy, M., Guindon, S. & Gascuel, O. SeaView version 4: A multiplatform graphical user

interface for sequence alignment and phylogenetic tree building. Mol. Biol. Evol. 27, 221-224

(2010).

13. Chen, S.L., Hong, Y.H., Scherer, S.J. & Schartl, M. Lack of ultraviolet-light inducibility of the

medakafish (Oryzias latipes) tumor suppressor gene p53. Gene 264, 197-203 (2001).

14. Birney, E., Clamp, M. & Durbin, R. GeneWise and genomewise. Genome Res. 14, 988-995

(2004).

15. Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with RNA-Seq.

Nature Genetics: doi:10.1038/ng.2890

Page 133: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

133

Bioinformatics 25, 1105-1111 (2009).

16. Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated

transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511-515

(2010).

17. UniProt, C. The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res. 38,

D142-148 (2010).

18. Salamov, A.A. & Solovyev, V.V. Ab initio gene finding in Drosophila genomic DNA. Genome

Res. 10, 516-522 (2000).

19. Stanke, M. & Waack, S. Gene prediction with a hidden Markov model and a new intron

submodel. Bioinformatics 19 Suppl 2, ii215-225 (2003).

20. Zdobnov, E.M. & Apweiler, R. InterProScan--an integration platform for the

signature-recognition methods in InterPro. Bioinformatics 17, 847-848 (2001).

21. Galtier, N., Gouy, M. & Gautier, C. SEAVIEW and PHYLO_WIN: two graphic tools for

sequence alignment and molecular phylogeny. Comput. Appl. Biosci. 12, 543-548 (1996).

22. Posada, D. & Crandall, K.A. MODELTEST: testing the model of DNA substitution.

Bioinformatics 14, 817-818 (1998).

23. Huelsenbeck, J.P. & Ronquist, F. MRBAYES: Bayesian inference of phylogenetic trees.

Bioinformatics 17, 754-755 (2001).

24. Yang, Z.H. PAML 4: Phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24,

1586-1591 (2007).

25. De Bie, T., Cristianini, N., Demuth, J.P. & Hahn, M.W. CAFE: a computational tool for the

study of gene family evolution. Bioinformatics 22, 1269-1271 (2006).

26. Hedges, S.B., Dudley, J. & Kumar, S. TimeTree: a public knowledge-base of divergence times

among organisms. Bioinformatics 22, 2971-2972 (2006).

27. Nawrocki, E.P., Kolbe, D.L. & Eddy, S.R. Infernal 1.0: inference of RNA alignments.

Bioinformatics 25, 1335-1337 (2009).

28. Griffiths-Jones, S. et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic

Acids Res. 33, D121-124 (2005).

29. Mortazavi, A., Williams, B.A., Mccue, K., Schaeffer, L. & Wold, B. Mapping and quantifying

mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621-628 (2008).

30. Robinson, M.D. & Oshlack, A. A scaling normalization method for differential expression

analysis of RNA-seq data. Genome Biol. 11, R25 (2010).

31. Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome

Biol. 11, R106 (2010).

32. Roberts, A., Trapnell, C., Donaghey, J., Rinn, J.L. & Pachter, L. Improving RNA-Seq

expression estimates by correcting for fragment bias. Genome Biol. 12, R22 (2011).

33. Benjamini, Y.& Hochberg., Y. Controlling the False Discovery Rate: A Practical and Powerful

Approach to Multiple Testing. J. R. Stat. Soc. 57, 289-300 (1995).

34. Beissbarth, T. & Speed, T.P. GOstat: find statistically overrepresented Gene Ontologies within

a group of genes. Bioinformatics 20, 1464-1465 (2004).

35. Moriya, Y., Itoh, M., Okuda, S., Yoshizawa, A.C. & Kanehisa, M. KAAS: an automatic

genome annotation and pathway reconstruction server. Nucleic Acids Res. 35, W182-W185

(2007).

36. Huang, D.W., Sherman, B.T. & Lempicki, R.A. Bioinformatics enrichment tools: paths

Nature Genetics: doi:10.1038/ng.2890

Page 134: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

134

toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37, 1-13

(2009).

37. Zhang, J.Z., Nielsen, R. & Yang, Z.H. Evaluation of an improved branch-site likelihood

method for detecting positive selection at the molecular level. Mol. Biol. Evol. 22, 2472-2479

(2005).

38. Jaillon, O. et al. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the

early vertebrate proto-karyotype. Nature 431, 946-957 (2004).

39. Kasahara, M. et al. The medaka draft genome and insights into vertebrate genome evolution.

Nature 447, 714-719 (2007).

40. Nakatani, Y., Takeda, H., Kohara, Y. & Morishita, S. Reconstruction of the vertebrate

ancestral genome reveals dynamic genome reorganization in early vertebrates. Genome Res.

17, 1254-1265 (2007).

41. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform.

Bioinformatics 25, 1754-1760 (2009).

42. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25,

2078-2079 (2009).

43. Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide

polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118

; iso-2;

iso-3. Fly 6, 1-13 (2012).

44. Itoh, Y. et al. Sex bias and dosage compensation in the zebra finch versus chicken genomes:

General and specialized patterns among birds. Genome Res. 20, 512-518 (2010).

45. Tomaszycki, M.L. et al. Sexual differentiation of the zebra finch song system: potential roles

for sex chromosome genes. BMC Neurosci. 10, 24 (2009).

46. Gentleman, R.C. et al. Bioconductor: open software development for computational biology

and bioinformatics. Genome Biol. 5, R80 (2004).

47. Wolf, J.B. & Bryk, J. General lack of global dosage compensation in ZZ/ZW systems?

Broadening the perspective with RNA-seq. BMC Genomics 12, 91 (2011).

48. Zha, X.F. et al. Dosage analysis of Z chromosome genes using microarray in silkworm,

Bombyx mori. Insect Biochem. Mol. Biol. 39, 315-321 (2009).

49. Deng, S.P., Chen, S. L., Tian, Y. S., Liu, B. W. & Zhuang, Z. M. Gonadal differentiation and

effects of temperature on sex determination in half-smooth tongue sole, Cynoglossus

semilaevis. J. Fish. Sci. China, 5, 714-719 (2007).

50. Chen, S.L. et al. Induction of Mitogynogenetic Diploids and Identification of WW

Super-female Using Sex-Specific SSR Markers in Half-Smooth Tongue Sole (Cynoglossus

semilaevis). Mar. Biotechnol. 14, 120-128 (2012).

51. Chen, S.L. et al. Selection of the families with high growth rate and high female proportion in

half-smooth tongue sole (Cynoglossus semilaevis). J. Fish. China 37, 481-488 (2013).

52. Shao, C.W. et al. Epigenetic Modification and Inheritance in Sexual Reversal of Fish. Genome

Res. doi:10.1101/gr.162172.113 (2014).

53. Hayatsu, H., Tsuji, K. & Negishi, K. Does urea promote the bisulfite-mediated deamination of

cytosine in DNA? Investigation aiming at speeding-up the procedure for DNA methylation

analysis. Nucleic Acids Symp. Ser. (Oxf) 50, 69-70 (2006).

54. Li, R.Q. et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25,

1966-1967 (2009).

Nature Genetics: doi:10.1038/ng.2890

Page 135: Supplementary Information for Whole-genome sequence of a ... · 2BGI-Shenzhen, Shenzhen 518000, China.3Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA,

135

55. Liang, Z., Chen, S.L., Zhang, J., Song, W.T. & Liu S.S. Gonadal development process

observation of half-smooth tongue sole in rearing population. J. Southern Agr. 12, 2074-2078

(2012).

56. Deng, S.P.& Chen, S. L. Molecular cloning, characterization and RT-PCR expression analysis

of Dmrt1α from half-smooth tongue-sole, Cynoglossus semilaevis. J. Fish. Sci. China, 4,

577-584 (2008).

57. Sun, Y.Y. et al. Cloning and expression analysis of DMRT1 gene in Cynoglossus semilaevis. J.

Wuhan Univ. (Nat.Sci.Ed.), 221-226 (2008).

58. Szczerbal, I., Klukowska-Roetzler, J., Dolf, G., Schelling, C. & Switonski, M. FISH mapping

of 10 canine BAC clones harbouring genes and microsatellites in the arctic fox and the

Chinese raccoon dog genomes. J. Anim. Breed. Genet. 123, 337-342 (2006).

59. Kobayashi, T., Kajiura-Kobayashi, H. & Nagahama, Y. Differential expression of vasa

homologue gene in the germ cells during oogenesis and spermatogenesis in a teleost fish,

tilapia, Oreochromis niloticus. Mech. Develop. 99, 139-142 (2000).

60. Lorenz, R. et al. ViennaRNA Package 2.0. Algorithm. Mol. Biol. 6, 26 (2011).

61. Cheng, L.C., Pastrana, E., Tavazoie, M. & Doetsch, F. miR-124 regulatesadult neurogenesis in

the subventricular zone stem cell niche. Nat. Neurosci. 12, 399-408 (2009).

62. Wanet, A., Tacheny, A., Arnould, T. & Renard, P. miR-212/132 expressionand functions:

within and beyond the neuronal compartment. Nucleic Acids Res. 40, 4742-4753 (2012).

63. Fiedler, S.D., Carletti, M.Z., Hong, X.M. & Christenson, L.K. Hormonalregulation of

microRNA expression in periovulatory mouse muralgranulosa cells. Biol. Reprod. 79,

1030-1037 (2008).

64. Pandey, D.P. & Picard, D. miR-22 inhibits estrogen signaling by directlytargeting the estrogen

receptor alpha mRNA. Mol. Cell. Biol. 29, 3783-3790 (2009).

65. Li, X. et al. MicroRNA-27a indirectly regulates estrogen receptor {alpha}expression and

hormone responsiveness in MCF-7 breast cancer cells. Endocrinology 151, 2462-2473 (2010).

Nature Genetics: doi:10.1038/ng.2890