25
Identification and Characterization of Novel Conserved Domains in Metazoan Zic Proteins Takahide Tohmonda, 1 Akiko Kamiya, 1 Akira Ishiguro, 1 Takashi Iwaki, 2 Takahiko J. Fujimi, 1 Minoru Hatayama, 3 and Jun Aruga* ,1,3 1 Laboratory for Behavioral and Developmental Disorders, RIKEN Brain Science Institute, Wako-Shi, Saitama, Japan 2 Meguro Parasitological Museum, Meguro-Ku, Tokyo, Japan 3 Department of Medical Pharmacology, Nagasaki University Institute of Biomedical Sciences, Nagasaki, Japan *Corresponding author: E-mail: [email protected]. Associate editor: John R. True Abstract Zic family genes encode C2H2-type zinc finger proteins that act as critical toolkit proteins in the metazoan body plan establishment. In this study, we searched evolutionarily conserved domains (CDs) among 121 Zic protein sequences from 22 animal phyla and 40 classes, and addressed their evolutionary significance. The collected sequences included those from poriferans and orthonectids. We discovered seven new CDs, CD0–CD6, (in order from the N- to C-terminus) using the most conserved Zic protein sequences from Deuterostomia (Hemichordata and Cephalochordata), Lophotrochozoa (Cephalopoda and Brachiopoda), and Ecdysozoa (Chelicerata and Priapulida). Subsequently, we analyzed the evolution- ary history of Zic CDs including the known CDs (ZOC, ZFD, ZFNC, and ZFCC). All Zic CDs are predicted to have existed in a bilaterian ancestor. During evolution, they have degenerated in a taxa-selective manner with significant correlations among CDs. The N terminal CD (CD0) was largely lost, but was observed in Brachiopoda, Priapulida, Hemichordata, Echinodermata, and Cephalochordata, and the C terminal CD (CD6) was highly conserved in conserved-type-Zic pos- sessing taxa, but was truncated in vertebrate Zic gene paralogues (Zic1/2/3), generating a vertebrate-specific C-terminus critical for transcriptional regulation. ZOC was preferentially conserved in insects and in an anthozoan paralogue, and it was bound to the homeodomain transcription factor Msx in a phylogenetically conserved manner. Accordingly, the extent of divergence of Msx and Zic CDs from their respective bilaterian ancestors is strongly correlated. These results suggest that coordinated divergence among the toolkit CDs and among toolkit proteins is involved in the divergence of metazoan body plans. Key words: Zic, transcription factors, evolution, development, metazoans. Introduction Body plans of metazoans have diverged and converged during evolution, providing a basis for adaptation strategies to changing environments. In the current metazoan phylogeny, nonbilaterian animals include poriferans, ctenophores, cni- darians, and placozoans, and most bilaterian animals are di- vided into three major taxa, Lophotrochozoa, Ecdysozoa, and Deuterostomia (Ruppert et al. 2004; Telford et al. 2015; Brusca et al. 2016). Structural alterations of the genome largely ac- count for body plan evolution. Recent evolutionary develop- mental studies have clarified key proteins that play roles in establishing animal body plans. The genes for such proteins are often called toolkit genes, which include those encoding transcription factors and components of intra- or intercellular signaling (Meyerowitz 1999; True and Carroll 2002). As part of the toolkit genes, Zic family genes encode C2H2- type zinc finger proteins, and they are harbored in bilaterian, cnidarian, and placozoan genomes, but have not been detected in the genomes of poriferans and ctenophores. They are essential for a variety of developmental processes. In vertebrates and ascidians, they play roles both in the ectodermal and mesodermal lineages: for example, neural differentiation, neural plate border specification, and node/ notochord/somite development (references in Aruga 2004; Houtmeyers et al. 2013). In ecdysozoans, they participate in embryonic segmentation, visceral mesoderm differentiation of arthropods, and neural cell specification and epidermal differentiation of nematodes (Alper and Kenyon 2002; Bertrand and Hobert 2009). In the lophotrochozoan planaria, they are required for head regeneration, including eye and CNS regeneration (Vasquez-Doorman and Petersen 2014; Vogg et al. 2014). Furthermore, expression profiles suggest that cnidarian Zic genes are involved in the development of the ectoderm, gastrodermis (a bifunctional endomesoderm) (Layden et al. 2010), and nematocytes (Lindgens et al. 2004). Accumulating evidences regarding the involvement of Zic genes in a wide range of metazoan development processes have raised the question of how Zic genes have been involved in the change of animal body plans throughout the course of evolution. Several studies have addressed this question. Experiments using animal models have revealed their roles in the establishment of binocular vision in vertebrates (Herrera et al. 2003) and dorsoventral patterning of somites Article ß The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] Mol. Biol. Evol. 35(9):2205–2229 doi:10.1093/molbev/msy122 Advance Access publication June 14, 2018 2205 Downloaded from https://academic.oup.com/mbe/article/35/9/2205/5037825 by guest on 06 August 2022

msy122.pdf - Oxford Academic

Embed Size (px)

Citation preview

Identification and Characterization of Novel ConservedDomains in Metazoan Zic Proteins

Takahide Tohmonda,1 Akiko Kamiya,1 Akira Ishiguro,1 Takashi Iwaki,2 Takahiko J. Fujimi,1

Minoru Hatayama,3 and Jun Aruga*,1,3

1Laboratory for Behavioral and Developmental Disorders, RIKEN Brain Science Institute, Wako-Shi, Saitama, Japan2Meguro Parasitological Museum, Meguro-Ku, Tokyo, Japan3Department of Medical Pharmacology, Nagasaki University Institute of Biomedical Sciences, Nagasaki, Japan

*Corresponding author: E-mail: [email protected].

Associate editor: John R. True

Abstract

Zic family genes encode C2H2-type zinc finger proteins that act as critical toolkit proteins in the metazoan body planestablishment. In this study, we searched evolutionarily conserved domains (CDs) among 121 Zic protein sequences from22 animal phyla and 40 classes, and addressed their evolutionary significance. The collected sequences included thosefrom poriferans and orthonectids. We discovered seven new CDs, CD0–CD6, (in order from the N- to C-terminus) usingthe most conserved Zic protein sequences from Deuterostomia (Hemichordata and Cephalochordata), Lophotrochozoa(Cephalopoda and Brachiopoda), and Ecdysozoa (Chelicerata and Priapulida). Subsequently, we analyzed the evolution-ary history of Zic CDs including the known CDs (ZOC, ZFD, ZFNC, and ZFCC). All Zic CDs are predicted to have existed ina bilaterian ancestor. During evolution, they have degenerated in a taxa-selective manner with significant correlationsamong CDs. The N terminal CD (CD0) was largely lost, but was observed in Brachiopoda, Priapulida, Hemichordata,Echinodermata, and Cephalochordata, and the C terminal CD (CD6) was highly conserved in conserved-type-Zic pos-sessing taxa, but was truncated in vertebrate Zic gene paralogues (Zic1/2/3), generating a vertebrate-specific C-terminuscritical for transcriptional regulation. ZOC was preferentially conserved in insects and in an anthozoan paralogue, and itwas bound to the homeodomain transcription factor Msx in a phylogenetically conserved manner. Accordingly, theextent of divergence of Msx and Zic CDs from their respective bilaterian ancestors is strongly correlated. These resultssuggest that coordinated divergence among the toolkit CDs and among toolkit proteins is involved in the divergence ofmetazoan body plans.

Key words: Zic, transcription factors, evolution, development, metazoans.

Introduction

Body plans of metazoans have diverged and converged duringevolution, providing a basis for adaptation strategies tochanging environments. In the current metazoan phylogeny,nonbilaterian animals include poriferans, ctenophores, cni-darians, and placozoans, and most bilaterian animals are di-vided into three major taxa, Lophotrochozoa, Ecdysozoa, andDeuterostomia (Ruppert et al. 2004; Telford et al. 2015; Bruscaet al. 2016). Structural alterations of the genome largely ac-count for body plan evolution. Recent evolutionary develop-mental studies have clarified key proteins that play roles inestablishing animal body plans. The genes for such proteinsare often called toolkit genes, which include those encodingtranscription factors and components of intra- or intercellularsignaling (Meyerowitz 1999; True and Carroll 2002).

As part of the toolkit genes, Zic family genes encode C2H2-type zinc finger proteins, and they are harbored in bilaterian,cnidarian, and placozoan genomes, but have not beendetected in the genomes of poriferans and ctenophores.They are essential for a variety of developmental processes.In vertebrates and ascidians, they play roles both in the

ectodermal and mesodermal lineages: for example, neuraldifferentiation, neural plate border specification, and node/notochord/somite development (references in Aruga 2004;Houtmeyers et al. 2013). In ecdysozoans, they participate inembryonic segmentation, visceral mesoderm differentiationof arthropods, and neural cell specification and epidermaldifferentiation of nematodes (Alper and Kenyon 2002;Bertrand and Hobert 2009). In the lophotrochozoan planaria,they are required for head regeneration, including eye andCNS regeneration (Vasquez-Doorman and Petersen 2014;Vogg et al. 2014). Furthermore, expression profiles suggestthat cnidarian Zic genes are involved in the development ofthe ectoderm, gastrodermis (a bifunctional endomesoderm)(Layden et al. 2010), and nematocytes (Lindgens et al. 2004).

Accumulating evidences regarding the involvement of Zicgenes in a wide range of metazoan development processeshave raised the question of how Zic genes have been involvedin the change of animal body plans throughout the course ofevolution. Several studies have addressed this question.Experiments using animal models have revealed their rolesin the establishment of binocular vision in vertebrates(Herrera et al. 2003) and dorsoventral patterning of somites

Article

� The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.All rights reserved. For permissions, please e-mail: [email protected]

Mol. Biol. Evol. 35(9):2205–2229 doi:10.1093/molbev/msy122 Advance Access publication June 14, 2018 2205

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022

in teleost fish (Moriyama et al. 2012). Conversely, molecularphylogenetic analyses have provided starting point hypothe-ses about the evolutionary history of these genes duringevolution.

Firstly, concerning their origins, Zic genes are derived from acommon ancestor of the Gli-Glis-Zic superfamily proteins thatshare similar five-C2H2-type zinc finger domains (ZFDs)(reviewed in Aruga and Hatayama 2018). In Gli-Glis-Zic pro-teins, the two N-terminal C2H2 zinc finger motifs conform tothe tandem-CWCH2 motif that characterizes structurally uni-fied zinc fingers (Hatayama and Aruga 2010), and their ZFDsshow monophyly in the molecular phylogenetic analysis ofeukaryotic zinc finger proteins (Hatayama and Aruga 2010;Layden et al. 2010). The common ancestor of Gli-Glis-Zicexisted in a metazoan ancestor (Aruga et al. 2006; Hatayamaand Aruga 2010; Layden et al. 2010). Secondly, a prototypal Zicgene existed in a bilaterian ancestor (urbilaterian). This is basedon the presence of an absolutely conserved intron (A-intron)in bilaterian Zic proteins, and the distribution of all previouslyknown conserved domains (CDs), that is, Zic-Opa-conserved(ZOC), ZFD, and zinc finger N terminally conserved (ZFNC) ineach of the three major bilaterian taxa (supplementary fig. S1,Supplementary Material online) (Aruga et al. 2006; Laydenet al. 2010). Thirdly, the CDs in urbilaterian prototypal Zicgenes was selectively degenerated in several animal taxa in-cluding Tunicata (phylum Chordata in Deuterostomia),Platyhelminthes (Lophotrochozoa), Dicyemida(Lophotrochozoa), and Nematoda (Ecdysozoa) (Aruga et al.2006). Lastly, vertebrate paralogues in Amphibia, Reptilia,Aves, and Mammalia are generated as a consequence of tan-dem gene duplication-C terminal truncation of one of the twogenes-quadruplication of the whole genome-loss of threegenes (supplementary fig. S1, Supplementary Material online)(Aruga et al. 2006).

However, these hypotheses may not have been fully veri-fied, presumably due to the limitation of the number of an-imal species in which the Zic sequence is present. Previouscomparative genomic studies lacked comparison besides thethree limited CDs and exon–intron boundaries, and minorphyla have been excluded from the analyses. Moreover, thefunctions of the proteins or their domains have not beenaddressed in any of the comparative analyses. Zic proteins,chordate, and fly Zic proteins specifically, possess transcrip-tion regulatory activities (Mizugishi et al. 2001; Yagi et al. 2004;Sen et al. 2010) that are yet to be comparatively investigated.In addition, we recently found Msx protein-binding activitiesof Zic proteins, which we report in this article.

Msx proteins contain a homeodomain (HD) and controlontogeny in many animal species. Msx genes are tool kitgenes that are widely distributed in Metazoa (Takahashiet al. 2008). Phylogenetic analysis showed that Msx proteinslost their CDs selectively in some animal taxa (Takahashi et al.2008). Zic and Msx control cell fate specification at lateralCNS in both vertebrates and nematodes (Li et al. 2017; Arugaand Hatayama 2018). These findings raise the question of howthe evolutionary processes of Zic and Msx are correlated.

In this study, we discovered seven additional CDs thatexisted in the urbilaterian Zic protein by optimizing the

taxa for sampling. We then examined the distribution of allknown CDs in Zic sequences from an extended metazoananimal list including those from 10 new phyla and 19 classes,and analyzed the extent and correlation of conservationamong these CDs. Finally, we addressed the protein functionsin a comparative manner. These analyses revealed Msx as anovel binding partner for ZOC in Zic proteins. Msx conser-vation showed a strong correlation with that of Zic CDs,indicating molecular coevolution between Zic and Msx. Wealso present new hypotheses concerning the origin of Zicfamily proteins, generation of cnidarian paralogues, and pres-ence of the A-intron in a placozoan Zic. These findings revealnot only the evolutionary history of Zic proteins but also thecoordinated nature of the protein domain evolution, whichmay underlie the generation of metazoan body plan diversity.

Results and Discussion

Collection of Novel Zic Protein Sequences and Originof Zic GeneTo update the molecular phylogenetic analysis of Zic genes,we conducted a homology search against the current genomedatabases and collected Zic amino acid (AA) sequences fromanimal taxa that were not included in previous studies. Thesearch identified Zic genes in new phyla including Porifera,Brachiopoda, Bryozoa, Priapulida, Onychophora, Tardigrada,and Orthonectida. In addition, we newly cloned Zic cDNAfragments from Spinochordodes tellinii (a parasitic worm formantis, belonging to phylum Nematomorpha), Brachionusplicatilis (a plankton belonging to phylum Rotifer), andEchinorhynchus gadi (a parasite in teleost fish, belonging tophylum Acanthocephala). In Chordata, we added Zic AAsequences from two new Chordata classes: Callorhinchus milii(shark, belonging to class Chondrichthyes) and Petromyzonmarinus (sea lamprey belonging to classCephalospidomorphi). We also updated the annotations ofZic sequences in several species including a placozoanTrichoplax adhaerens and a centipede Strigamia maritima.In total, we collected 121 Zic AA sequences from 22 animalphyla and 40 classes (table 1). The extent of animal taxacoverage was larger than that in the latest molecular phylo-genetic study on Zic (12 phyla, 21 classes) (Layden et al. 2010).

Poriferan Zic sequences were identified in class Calcarea(three genes in Sycon ciliatum and two genes in Leucosoleniacomplicata) and in class Homoscleromorpha (one gene inOscarella carmela) (table 1; supplementary figs. S2 and S3,Supplementary Material online). This finding contrastedwith the absence of Zic sequences in any sponges belongingto class Demospongiae (whole genome sequence ofAmphimedon queenslandica; Layden et al. 2010; Srivastavaet al. 2010) and the transcriptome of Ephydatia muelleri,Haliclona amboinensis, Haliclona tubifera, Stylissa carteri,and Xestospongia testudinaria (tblastn search againstCompagen database; Hemmrich and Bosch 2008). We didnot detect any Zic sequences in ctenophores (whole genomesequence of Mnemiopsis leidyi; Ryan et al. 2013 andPleurobrachia pileus; Moroz et al. 2014, the transcriptomesof Beroe abyssicola, Coeloplana astericola, Euplokamis

Tohmonda et al. . doi:10.1093/molbev/msy122 MBE

2206

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022

Tab

le1.

List

of

Seq

uen

ces

and

the

Der

ived

An

imal

Spec

ies

inT

his

Stu

dy.

Supe

r-Ph

ylum

Ph

ylum

Su

bphy

ulum

or

Cla

ss

Spec

ies

Spec

ies

Abr

evia

tion

Zic

Para

logu

e

ID

Intr

ons

in

ZFD

Zic

Gen

e N

ame

or L

ocus

Tag

(se

quen

ce I

D)

Gli

, Glis

, and

Msx

Gen

e N

ame

([lo

cus

tag]

,

sequ

ence

ID

)

18S

rRN

A S

eque

nce

ID

Bilateria

Deuterostomia

Cho

rdat

a

Ver

tebr

ata

Hom

o sa

pien

s H

sa

1, 2

, 3, 4

,

5

AB

,

AB

,

AB

, A,

A

ZIC

1 (N

P_00

3403

), Z

IC2

(NP_

0090

60),

ZIC

3

(NP_

0034

04),

ZIC

4 (N

P_0

0116

1850

), Z

IC5

(NP_

1491

23)

MSX

1 (N

P_00

2439

), M

SX2

(NP_

0024

40)

M

1009

8

Cho

ndri

chth

yes

Cal

lorh

inch

us

mil

ii

Cm

i

1, 2

, 3, 4

,

5

AB

,

AB

,

AB

, A,

A

Zic

1 (X

P_00

7889

156)

, Zic

2 (X

P_00

7904

497)

, Zic

3

(XP_

0078

9096

0), Z

ic4

(XP_

0078

8903

4), Z

ic5

(XP_

0079

0449

6)

Msx

1(X

P_00

7895

507)

, Msx

2 (X

P_00

7904

825)

JW

8671

15

Cep

hala

spid

omor

phi

Pet

rom

yzon

mar

inus

ASF)111 33

OD

A(Bxs

M,)1 2167W

BA(

AxsM

)91167W

BA(

AciZ

.d .na

mPR

RE

E

Tun

icat

a

Cio

na in

test

inal

is

Cin

a,

b

AC

Ci-

mac

ho1

(Zic

-r.a

) (N

P_00

1027

958)

, Ci-

Zic

L

(Zic

-r.b

) (N

P_00

1071

853)

710310B

A)406390100_P

N(bxs

m

Hal

ocyn

thia

rore

tzi

Hro

r a,

b

n.d.

Mac

ho-1

(Z

ic-r

.a)

(BA

B19

958)

, Hrz

icN

(Z

ic-r

.b)

(BA

C23

063)

610310B

A

Mol

gula

tect

ifor

mis

Mte

a,

b

n.d.

Mt-

mac

ho1

(Zic

-r.a

) (B

AE

5434

9), M

t-zi

cL (

Zic

-r.b

)

(BA

E54

350)

)anirt ic.M(

024 21L

Cep

halo

chor

data

Bra

nchi

osto

ma

flor

idae

Bfl

a

AE

RR

AR

B)1 0201

AA

C (x s

M)421 49

EA

B(c i

Z ihpm

A.d.n

Bra

nchi

osto

ma

belc

heri

63321 010RS

YA

)2356 46 910_PX(

4 30 7849 01C

OL

Ae b

B

Bra

nchi

osto

ma

lanc

eola

tum

718 824Y

A)9 302 2

HL

A (ci

Z.d. n

a lB

Ech

inod

erm

ata

Ast

eroi

dea

Aca

ntha

ster

pla

nci

Apl

a

A

LO

C11

0977

927

(XP_

0220

8815

5)

Msx

(D

lx2b

-lik

e, X

P_02

2098

347)

A

B08

4554

(co

nti

nu

ed)

Identification and Characterization of Novel Conserved Domains in Metazoan Zic Proteins . doi:10.1093/molbev/msy122 MBE

2207

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022

Pat

iria

min

iata

Pm

i tr

N777060

QD

)83538R

DA(

ciZ

.d.n

Pat

iria

pec

tini

fera

Pp

e a

155480B

A) 76519F

AB(

AxsM

)23149E

AB(

ciZ-p

AA

Ech

inoi

dea

Stro

ngyl

ocen

trot

us

purp

urat

us

Spu

aRS81S

US)877999_P

N(xs

M)999576110_P

X(ci

Z-pSA

Hem

icho

rdat

a E

nter

opne

usta

Sacc

oglo

ssus

kow

alev

skii

Sko

an.

d.

zic1

(N

P_00

1158

430)

Msx

(A

BD

9728

0), G

li (X

P_00

6823

312)

, Glis

2

(XP_

0027

3878

4), G

lis3

(X

P_00

6825

750)

SASR

GE

Sch

izoc

ardi

um

cali

forn

icum

Sca

a275386F

K)16858

OR

A(ciz

.d.n

Protostomia

Lophotrochozoa

Ann

elid

a

Poly

chae

ta

Cap

itel

la te

leta

C

te

aA

Ct-

zic

(AD

N43

078)

, thi

s st

udy

for

full

OR

F

(BR

0014

76),

AM

QN

0100

5783

.155

81–1

6514

.214

17–2

2060

Msx

(C

APT

ED

RA

FT_1

7662

5, E

LT

8977

5)

JF50

9728

Cli

tella

ta

Tub

ifex

tubi

fex

3FA

)17519FA

B(Bxs

M,)07 519 FA

B(Axs

M)0314 9

EA

B(ci

Z- tT

Au t

T97

152

Hel

obde

lla

robu

sta

Hro

b

A

this

stu

dy (

BR

0014

75),

AM

QM

0100

3360

(353

22–3

3850

.327

48–3

2615

.307

53–2

9184

)

Msx

(H

EL

RO

DR

AFT

_174

933,

XP_

0090

2061

4)

AM

QM

0100

8875

.130

.128

7

Bra

chio

poda

Lin

gula

ta

Lin

gula

ana

tina

L

an

a13618

X) 97 519 F

AB(

xsM

)4 709143 10_PX(

638971601C

OL

A

Rhy

ncho

nella

ta

Ter

ebra

tali

a

tran

sver

sa

Ttr

a a

3181.1.51169 1JF)81646

UQ

A(ci

Z.d.n

Bry

ozoa

G

ymno

laem

ata

Bug

ula

neri

tina

7971.1.947994FA

)5411–66(8_38851_gitnoc_deloop

NB

.d.nen

B

Mol

lusc

a

Biv

alvi

a

Cor

bicu

la

flum

inea

Cfl

a

755021FA

)96519FA

B(xs

M)43149

EA

B(ci

Z-jC

A

Cra

ssos

trea

gig

as

Cgi

1,

2, 3

HA

,

A, A

LO

C10

5339

352

(XP_

0114

4317

4), L

OC

1053

3935

4

(XP_

0114

4317

6), L

OC

1053

3935

3 (X

P_01

1443

175)

Msx

1 (L

OC

1053

3759

0, X

P_01

1440

684)

, Msx

2

(LO

C11

1134

762,

XP_

0223

3981

2)

AB

0649

42

Miz

uhop

ecte

n

yess

oens

is

Mye

1,

2, 3

A, A

,

A

LO

C11

0458

045

(XP_

0213

6526

5), L

OC

1104

5804

3

(XP_

0213

6526

2), L

OC

1104

5804

6 (X

P_02

1365

266)

Msx

(L

OC

1104

5176

6, X

P_02

1355

611)

Spis

ula

soli

diss

ima

Sso

a07211

L)32149

EA

B(ci

Z-os SA

Cep

halo

poda

H

eter

olol

igo

Hbl

a

)5314 9E

AB(

c iZ -b

L.d .n

blee

keri

(co

nti

nu

ed)

Tohmonda et al. . doi:10.1093/molbev/msy122 MBE

2208

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022

Oct

opus

bim

acul

oide

s

Obi

a

A

LO

C10

6877

584

+ L

OC

1068

8318

9 (X

P_01

4782

012

+

XP_

0147

8959

5)

Msx

1 (O

CB

IM_2

2016

198m

g, K

OG

0102

3), M

sx2

(LO

C10

6871

140,

XP_

0147

7297

8), G

li

(XP_

0147

6779

1), G

lis2

(X

P_01

4769

206)

, Glis

3

(XP_

0147

7237

8)

Gas

trop

oda

Apl

ysia

cal

ifor

nica

79506130CS

AA

)2.744311500_PX(

563468101C

OL

.d.nac

A

Bio

mph

alar

ia

glab

rata

Bgl

A

LO

C10

6066

485

(XP_

0130

8098

3)

Msx

(th

is s

tudy

, BR

0014

78),

(AP

KA

1042

272.

1795

.140

1)

U65

223

Lot

tia

giga

ntea

L

gi

aA

L

OT

GID

RA

FT_1

9323

8 (X

P_00

9060

630)

Msx

(th

is s

tudy

, BR

0014

77),

AM

QO

0100

6498

.103

608–

1040

16.1

0430

5–10

4723

Plat

yhel

min

thes

Rha

bdit

opho

ra

Dug

esia

japo

nica

D

ja

A, B

n.

d.

Dj-

Zic

A (

BA

E94

141)

, Dj-

Zic

B (

BA

E94

142)

m

sh1

(CA

L25

148)

, msh

2 (C

AL

2514

9)

AF0

1315

3

Schm

idte

a

med

iter

rane

a

Sme

A, B

A

, A

Zic

A (

AH

W52

381)

, Zic

B (

AA

WT

0102

8541

.1)

msh

1 (C

AL

2514

6), m

sh2

(CA

L25

147)

, Msx

(BA

G11

600)

DM

U31

084

Ces

toda

Ech

inoc

occu

s

gran

ulos

us

510 72U

)09 251SD

C(005 867000 _

GrgE

Arg

E

Ech

inoc

occu

s

mul

tilo

cula

ris

4361 37B

A)9 11 04 S

DC(

005867 00 0_ Jum

EA

um

E

Hym

enol

epis

mic

rost

oma

Hm

ic

A

Hm

N_0

0003

9700

+ H

mN

_000

7400

00 (

CD

S302

21 +

CD

S273

63)

525782JA

Tre

mat

oda

Clo

norc

his

sine

nsis

077 413FJ)07855

AA

G(802901_ F

LC

Ais

C

Schi

stos

oma

man

soni

22600010G

BA

C)073156810_P

X(xs

M)22149

EA

B(ci

Z-amS

Ana

mS

Rot

ifer

a M

onog

onon

ta

Bra

chio

nus

plic

atili

s

Bpl

n.d.

th

is s

tudy

11 994U

)249823C

L(

Supe

r-Ph

ylum

Ph

ylum

Su

bphy

ulum

or

Cla

ss

Spec

ies

Spec

ies

Abr

evia

tion

Zic

Para

logu

e

ID

Intr

ons

in

ZFD

Zic

Gen

e N

ame

or L

ocus

Tag

(se

quen

ce I

D)

Gli

, Glis

, and

Msx

Gen

e N

ame

([lo

cus

tag]

,

sequ

ence

ID

)

18S

rRN

A S

eque

nce

ID

(co

nti

nu

ed)

Identification and Characterization of Novel Conserved Domains in Metazoan Zic Proteins . doi:10.1093/molbev/msy122 MBE

2209

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022

Bde

lloi

dea

Adi

neta

vag

a A

va

1, 2

trN

GA

+1 ,

GA

+1

this

stu

dy A

va-Z

ic1

(BR

0014

81),

Ava

-Zic

2

(BR

0014

82)

4062.897.357630020IW

AC

)584100R

B(xs

M-avA

Aca

ntho

ceph

ala

Aca

ntho

ceph

ala

Ech

inor

hync

hus

gadi

53388U

) 349823C

L (yduts

siht.d.n

agE

Dic

yem

ida

Rho

mbo

zoa

Dic

yem

a

acut

icep

halu

m

720662B

A)29615F

AB(

BciZ,)19615F

AB(

AciZ

A,A

B,A

caD

Ort

hone

ctid

a O

rtho

nect

ida

Into

shia

line

i Il

i 20010

AC

WL

)22486FA

O,4830_65Q3

A(xs

M)58086F

AO(

15140_65Q3

AAF

351.

1070

.227

7

Ecdysozoa

Pria

puli

da

Pria

puli

da

Pri

apul

us

caud

atus

Pca

aA

L

OC

1068

1342

9 (X

P_01

4673

047)

M

sx (

LO

C10

6815

387,

XP_

0146

7536

2)

X87

984

Art

hrop

oda

Che

licer

ata

Lim

ulus

poly

phem

us

Lpo

1a , 2

A

, A

LO

C10

6463

037

(XP_

0222

4615

4), L

OC

1064

6737

9

(XP_

0222

5126

3)

Gli

(X

P_02

2243

540)

, Glis

3A (

XP_

0222

5365

1),

Gli

s3B

(X

P_02

2256

099)

, Glis

3C (

XP_

0222

5018

2),

Gli

s2 (

XP_

0222

4379

4)

L81

949

Pan

dinu

s

impe

rato

r

Pim

a

138012Y

A)57519F

AB(

BxsM,)4 7519F

AB(

AxsM

)83149E

AB(

c iZ-iP

A

Par

aste

atod

a

tepi

dari

orum

Pte

a)237013100 _P

N(apo- t

A.d.n

Cru

stac

ea

Art

emia

fran

cisc

ana

Afr

a

160832JA

)67519FA

B(xs

M)04149

EA

B(ci

Z-fA

BA

Dap

hnia

mag

na

Dm

a 1,

2

n.d.

Dap

ma6

txE

Vm

_004

009t

1 (J

AM

3942

5),

Dap

ma6

txE

Vm

0040

09t2

(JA

I850

10)

95182010 PID

G

Hya

lell

a az

teca

H

az

a)8400 108 10 _P

X(52576 6801

CO

LA

Myr

iapo

da

Stri

gam

ia

mar

itim

a

Smar

1,

2

A, A

this

stu

dy S

mar

-Zic

1 (B

R00

1483

), S

mar

-Zic

2

(BR

0014

84)

2212.1.562371FA

)6 84100R

B(xs

M-ramS

Inse

cta

Tri

boli

um

cast

aneu

m

22 3914PK

) 5944301 00_PN (

hsm

)09964Y

QA(

apO

.d.ns ac

T

Dro

soph

ila

9553 31_R

N)92 37 4

CA

A(HS

M) 8224 25 _P

N(ap

OB

Ae

mD

(co

nti

nu

ed)

Tohmonda et al. . doi:10.1093/molbev/msy122 MBE

2210

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022

mel

anog

aste

r

Bem

isia

taba

ci48181010

WZ

CG

)828598810_PX(

007920901C

OL

BA

atB

Fop

ius

aris

anus

)99838G

AJ(ap

O.d.n

raF

Bom

byx

mor

i81462 010

KD

AA

)2.217429400_PX(

8593 7101C

OL

.d.no

mB

Ony

chop

hora

U

deon

ycho

phor

a

Eup

erip

atoi

des

kana

ngre

nsis

Eka

a tr

N)iit ra kcuel.

E(0 1994

U)831 25F

DC(

de ria p-ddo.d.n

Tar

digr

ada

Eut

ardi

grad

a

Hyp

sibi

us

duja

rdin

i

RZ

BG

)96942V

QO,87 110_8 98

VB(

xsM

)36721V

QO(

2992 1_89 8V

BA

u dH

0101

2413

Ram

azzo

ttiu

s

vari

eorn

atus

0 6Q

H) 709 29

UA

G,1 -5294 0_Y v

R(xs

M) 62330

VA

G(1-86731_

Y vR

Aav

R49

50.1

.791

Nem

atod

a

Sece

rnen

tea

Cae

norh

abdi

tis

eleg

ans

711862Y

A) 8469 05_P

N(51-bav

)774420100_PN(

2-f erB

CA

Ele

C

Nec

ator

amer

ican

us

Nam

EA

CB

N

EC

AM

E_0

0647

(X

P_01

3297

463)

N

EC

AM

E_1

1097

(X

P_01

3299

611

) A

J920

348

Tox

ocar

a ca

nis

2834 9U

)06 168N

HK(

51- bav)1 4157

NH

K(21040 _na c

TB

AEI

na cT

Asc

aris

suu

m

Asu

1,

2

n.d.

Z

IC3

(AD

Y45

473)

, ASU

_027

41 (

ER

G86

104)

va

b-15

(E

R317 1002 0I

UE

A)43258

G

Eno

plea

Tri

chur

is tr

ichi

ura

090996B

A) 701 55

WD

C(2-

X SM

) 19 05 5W

DC(

1 02 63 30 000_E

RT

TB

Airt

T

Tri

chin

ella

pseu

dosp

iral

is

Tps

B

852158Y

A)31901

ZR

K(bxs

m)388 76

YR

K(91731_

A4T

Tri

chin

ella

zim

babw

ensi

s

Tzi

B

462158Y

A)31901

ZR

K(bxs

m)8 46 60

ZR

K(8903_11

T

Nem

atom

orph

a N

emat

omor

pha

Spin

ocho

rdod

es

tell

inii

Ste

trN

3771 24FA

)4 498 23C

L(yduts

sih tA

Xen

acoe

lom

orph

a A

coel

a

Sym

sagi

ttif

era

rosc

offe

nsis

Sro

84113210R F

GA

Hof

sten

ia m

iam

ia

Hm

ia

01613110ASF

G.d .n

Supe

r-Ph

ylum

Ph

ylum

Su

bphy

ulum

or

Cla

ss

Spec

ies

Spec

ies

Abr

evia

tion

Zic

Para

logu

e

ID

Intr

ons

in

ZFD

Zic

Gen

e N

ame

or L

ocus

Tag

(se

quen

ce I

D)

Gli

, Glis

, and

Msx

Gen

e N

ame

([lo

cus

tag]

,

sequ

ence

ID

)

18S

rRN

A S

eque

nce

ID

(co

nti

nu

ed)

Identification and Characterization of Novel Conserved Domains in Metazoan Zic Proteins . doi:10.1093/molbev/msy122 MBE

2211

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022

Con

volu

tril

oba

long

ifis

sura

Clo

tr

C440255

NF)77034

ND

A(ciz-olc

A

Nonbilateria

Plac

ozoa

T

rico

plac

ia

Tri

chop

lax

adha

eren

s

Tad

Zic

(A

BG

P010

0003

4.1,

134

377–

1338

91,

1330

79–1

3268

5), t

his

stud

y fo

r ex

tend

ed C

DS

(BR

0014

79)

7871.1.82801L

Cni

dari

a

Ant

hozo

a

Exa

ipta

sia

pall

ida

Epa

1, 2

, 3, 4

,

5, 6

none

,

none

,

none

,

none

,

none

,

none

LO

C11

0249

839

(XP_

0209

1209

0),

AC

249_

AIP

GE

NE

2485

6 (K

XJ1

9923

),

LO

C11

0249

850

(XP_

0209

1210

1),

AC

249_

AIP

GE

NE

630

(KX

J150

38),

AC

249_

AIP

GE

NE

2893

0 (K

XJ2

0887

),

AC

249_

AIP

GE

NE

2487

3 (K

XJ1

9931

)

MSX

1 (K

XJ2

3890

), M

SX2

(KX

J161

86)

KP7

6128

1

Nem

atos

tell

a

vect

ensi

s

Nve

A, B

, C,

D, E

none

,

none

,

none

,

none

,

none

Nv-

Zic

A (

BA

E94

125)

, Nv-

Zic

B (

BA

E94

126)

,

Nv-

Zic

C (

BA

E94

127)

, Nv-

Zic

D (

BA

E94

128)

,

Nv-

Zic

E (

BA

E94

129)

Msx

(B

AG

1159

8), G

li (H

AD

P011

9076

5), N

vGli

s

(HA

DP0

1062

898)

, NvN

kl (

HA

DP0

1217

754)

AF2

5438

2

Acr

opor

a

digi

tife

ra

Adi

1,

2, 3

none

,

none

,

none

LO

C10

7357

868

(XP_

0157

7999

1), L

OC

1073

5787

0

(XP_

0157

7999

3), L

OC

1073

5786

1 (X

P_01

5779

985)

Msx

(L

OC

1073

4910

2, X

P_01

5770

686)

B

AC

K02

0184

06

Orb

icel

la

fave

olat

a

Ofa

1,

2, 3

none

,

none

,

none

LO

C11

0042

868

(XP_

0206

0391

0), L

OC

1100

4287

1

(XP_

0206

0391

3), L

OC

1100

4286

9 (X

P_02

0603

911)

Msx

(L

OC

1100

6128

2, X

P_02

0623

784)

Hyd

rozo

a H

ydra

vul

gari

s H

vu

1, 2

, 3, 4

D,

none

,

A˝,

none

LO

C10

0210

883

(XP_

0021

5378

2),

Hyz

ic(A

AR

1081

7), Z

ic3

(AFK

7487

6), Z

ic2

(AF

K74

875)

44995 0FE

)94117G

DC(

1XS

M

(co

nti

nu

ed)

Tohmonda et al. . doi:10.1093/molbev/msy122 MBE

2212

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022

Scol

ione

ma

suva

ense

488132B

A)27519F

AB(

xsM

)34149E

AB(

ciZ-usS

DusS

Cte

noph

ora

Ten

tacu

lata

M

nem

iops

is le

idyi

)67034N

DA(

silG,)74025010

TAF

G(ilG

elM

Pori

fera

Dem

ospo

ngia

e

Eph

ydat

ia

fluv

iatil

is

6 418 75Y

A)15102

AA

A,3 xorp(xs

Mlf

E

Eph

ydat

ia m

uell

eri

,)2334m(

3/1silG ,) 973 682

m(e kil-il

G, )7176 8m( il

Geu

mE

Am

phim

edon

quee

nsla

ndic

a

Aqu

Msx

(L

OC

1006

3160

3, N

P_00

1266

196)

, Gli-

2/3a

(Aqu

1.21

7717

), G

li2/3

b (A

qu1.

2199

64),

Glis

1/3

(Aqu

1.21

3405

)

AC

UQ

0101

5651

Hal

iclo

na

ambo

inen

sis

)05012.m

1i1g

88664c(ekil-il

Gma

H

Xes

tosp

ongi

a

test

udin

aria

Xte

)7 91 6dloffac st x(eki l-il

G

Cal

care

a

Leu

coso

leni

a

com

plic

ata

549001FA

)35286dipcl(xs

M78485dipcl,855411dipcl

.d.n2,1

ocL

.1.1

817

Syco

n ci

liat

um

Sci

1(tr

N),

2,

3

none

,

none

,

none

scpi

d881

50, s

cpid

4445

0, s

cpid

9509

7

Msx

(sc

pid6

5818

), G

li (s

cpid

3796

5), G

li-li

ke

(sca

ffol

d107

7), u

ncla

ssif

ied

(scp

id34

448)

AJ6

2718

7.1.

1792

Hom

oscl

erom

orph

a O

scar

ella

car

mel

a68 61.1.625456F

E)0135g(

xsM

627 013.genon

acO

Supe

r-Ph

ylum

Ph

ylum

Su

bphy

ulum

or

Cla

ss

Spec

ies

Spec

ies

Abr

evia

tion

Zic

Para

logu

e

ID

Intr

ons

in

ZFD

Zic

Gen

e N

ame

or L

ocus

Tag

(se

quen

ce I

D)

Gli

, Glis

, and

Msx

Gen

e N

ame

([lo

cus

tag]

,

sequ

ence

ID

)

18S

rRN

A S

eque

nce

ID

a Use

dto

defi

ne

new

CD

s.A

-I,p

osi

tio

ns

ofi

ntr

on

sin

ZFD

(su

pp

lem

enta

ryfi

g.S7

,Su

pp

lem

enta

ryM

ater

ialo

nlin

e);A0 ,

A-i

ntr

on

con

form

ing

toG

C-A

Gru

le;A00 ,

A-i

ntr

on

rem

nan

t;Aþ

1,A

-in

tro

1si

te;t

rC,C

-ter

min

us

tru

nca

ted

;trN

,N-t

erm

inu

str

un

cate

d;n

.d.,

no

td

eter

min

ed.

Identification and Characterization of Novel Conserved Domains in Metazoan Zic Proteins . doi:10.1093/molbev/msy122 MBE

2213

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022

dunlapae, Pukia falcata, and Vallicula multiformis; Neurobase,https://neurobase.rc.ufl.edu/pleurobrachia/browse) or inchoanoflagellates (whole genome sequence of Monosiga bre-vicollis, Salpingoeca rosetta, and Capsaspora owczarzaki; NCBIand Ensembl databases). In addition, no Zic sequences wereidentified in fungal species or other living organisms (Arugaet al. 2006; Hatayama and Aruga 2010).

Based on the above facts, we summarized the Zic distri-bution in the metazoan phylogenetic tree and updated thehypothesis concerning the origin of Zic genes (fig. 1). Becauseporiferans are monophylic with Calcarea þHomoscleromorpha forming the sister group (Nosenkoet al. 2013; Riesgo et al. 2014), Zic absence in demospongesmay indicate the loss of Zic in the demosponge clade. Thephylogenetic positions of poriferans and ctenophores are stillcontroversial (Telford et al. 2016). However, recent studiessupport the model, “poriferans as sister to all other animals”(Feuda et al. 2017; Simion et al. 2017). Based on this model,the absence of Zic in the ctenophores is considered to be theloss of Zic in the ctenophore clade. In this case, the directancestor of Zic is predicted to be gained by the metazoanancestor (fig. 1). Alternatively, if we adopt “ctenophore assister to all other animals” model, the common ancestor ofmetazoan excluding ctenophore after divergence ofctenophore.

Distribution of Gli and Glis in the phylogenetic tree is alsolimited to metazoan clades (Hatayama and Aruga 2010). Gliand Glis have been identified in a demosponge species(Amphimedon queenslandica, Layden et al. 2010). We foundGli and Glis1/3 orthologues in another demosponge species(Ephydatia muelleri) (supplementary figs. S2 and S3,Supplementary Material online). It is also known that A.queenslandica contains another Gli/Glis sister/ancestral se-quence (Amqgli2/3b, Layden et al. 2010). In our molecularphylogenetic analysis, Amqgli2/3b was grouped with sequen-ces from other demospongiae species (E. muelleri, Haliclonaamboinensis, and Xestospongia testudinaria) and a calcareaspecies (S. ciliatum), suggesting the presence of a Gli/Glisgene unique to poriferans (supplementary figs. S2 and S3,Supplementary Material online). These results, collectively,

indicated that the Gli-Glis-Zic superfamily common ancestralgene may have already differentiated into Gli, Glis, and Zic inthe last common ancestor of the metazoans.

Distribution of a Strongly Conserved Intron (A-Intron)in Zic GenesWe added notes on Tad-Zic gene in the placozoan Trichoplaxadhaerens (Srivastava et al. 2008) as follows. The currentDDBJ/EMBL/GenBank database contains an mRNA sequence(XM_002108437.1) that starts from the midst of ZF1, and awhole genome sequence contig (ABGP01000034.1) thatincludes two presumptive exons with Zic ZF1-3 and ZF4-5as the splicing donor and acceptor, respectively (supplemen-tary fig. S4, Supplementary Material online). However, the firstmethionine is located in the midst of ZF1 of the ZF1-3-containing exon and the splicing acceptor sequence doesnot coincide between the mRNA and the contig sequence.Therefore, we searched for an upstream exon and hypothe-sized a new Tad-Zic protein. The new Tad-Zic gene containedthe A-intron that conforms to the GC-AG rule consensussequence that is frequently involved in alternative splicingin humans (Thanaraj and Clark 2001).

As the A-intron was only found in bilaterian species in aprevious study (Aruga et al. 2006), we investigated the distri-bution of introns in the ZFD where genomic sequences wereavailable (table 1). As a result, all bilaterian Zic genes except abdelloid rotifer Adineta vaga (Ava) were found to possess A-introns (table 1). In case of Ava-Zic sequences, introns wereinserted into one base at 30 from the A-intron (i.e., A-intron isin HTG*[phase-1]EKP; Ava_1 and Ava_2 introns are inHTG*[phase-2]EKP, where * denotes the codon with intron.Hereafter, the Ava introns are referred as Aþ1 introns).Interestingly, although Ava_1 and Ava_2 each possessedtwo additional introns in the ZFD, the positions are slightlydifferent between paralogues (table 1). It is known that bdel-loid rotifers including Ava are tetraploid species without mei-osis (ameiotic), and possess unusual genomic features (Flotet al. 2013). We speculate that the positions of introns areeasily changed in this species and that the Aþ1 introns may bevariants of the A-intron.

A-introns are also found in highly simplified bilaterians. Wepreviously showed that a dicyemid, lophotrochozoan para-sitic worm without any specialized gut, nervous system, andmuscle, possessed Zic genes with A-introns (Aruga et al.2007). In a recent study, whole genome sequencing analysisof an orthonectid Intoshia linei, indicated that orthonectidsare highly simplified lophotrochozoans with a muscular andnervous system (Mikhailov et al. 2016). The orthonectid Ili-Zicwas found to have two introns in the ZFD, one of which wasan A-intron. Orthonectids are rare parasites of marine inver-tebrates having simple body plans without digestive, circula-tory, and excretory systems, and have been described as“mesozoan” animals showing an uncertain affinity with pla-cozoans and dicyemids (Brusca et al. 2016). Another intrigu-ing animal, the Acoela, which lacks the anus, nephridia, and acirculatory system, and is proposed to be a bilaterian and asister to the Nephrozoa (¼ Protostomia þ Deuterostomia)based on the 11 newly reported xenoacoelomorph

Placozoa (1)

Cnidaria (3-6)

Bilateria

Porifera

Ctenophora

Demospongiae

Calcarea (2,3)

Homoscleromorpha (1)

metazoan

ancestor

Choanoflagellatea

+ Zic gain

Zic loss ?- ?

Fungi

+- ?

- ?

Lophotrochozoa (1-3)

Ecdysozoa (1,2)

Deuterostomia (1,2)

Vertebrata (5,6)

FIG. 1. A hypothetical model to explain the Zic distribution in met-azoans. In taxa framed with broken lines, the absence of Zic gene wasconfirmed in multiple species. Numbers in parentheses indicate theparalogue numbers.

Tohmonda et al. . doi:10.1093/molbev/msy122 MBE

2214

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022

transcriptomes (Cannon et al. 2016). Using transcript shotgunassembly in the Acoela Symsagittifera roscoffensis, we identi-fied two Sro-Zic transcripts. One was an A-intron-spliced-outtype (mature form) and the other was an A-intron-retainedtype; a deposited Convolutriloba longifissura transcript(ADN43077) was also an A-intron-retained type (table 1),indicating that xenoacoelomorpha Zic genes possess the A-intron. As a consequence, the A-intron is likely to be found inthe “bilaterian ancestor” defined as the common ancestor ofXenoacoelomorpha and Nephrozoa.

We also examined the introns in cnidarian Zic sequences.There were no introns in the ZFDs from 17 anthozoansequences. In Hydrozoa, one of the four Hydra vulgaris Zicparalogues (Hvu_3) possessed an A-intron variant (A0-in-tron), Hvu_1 and Scolionema suvaense (Ssu) carried a D-in-tron (Aruga et al. 2006), and the others (Hvu_2 and Hvu_4)did not contain any introns. In the A0-intron, a new splicingdonor sequence in the A-intron was used for splicing, result-ing in the insertion of four AA at the position of the A-intron(supplementary fig. S4, Supplementary Material online). It wasconsidered that the A0-intron was formed in the hydorozoanclade after the generation of four paralogues.

Collectively, the A-intron or its variants existed in bilater-ians, placozoan, and hydrozoa. Although the placozoaTrichoplax was classically included in “mesozoan” species, re-cent molecular phylogenetic analyses indicate that placozo-ans are closely related to (Cnidaria þ Bilateria) (Dohrmannand Worheide 2013). Based on recent molecular phylogeneticanalysis (Srivastava et al. 2008; Mallatt et al. 2010; Pick et al.2010; Telford et al. 2015), we prepared four models to hy-pothesize the gain and loss of the A-intron (fig. 2). Among thefour models, model (D) is discordant with the result of thetree topology test (Srivastava et al. 2008). Considering theabsolute conservation of the A-intron in bilaterians, we spec-ulate that there has been a strong negative selection pressureagainst A-intron loss for unknown reasons. If focusing on theparsimony of A-intron gain and loss, we would favor model(C) in which the A-intron is gained two times in theBilateriaþPlacozoa common ancestor and in the hydrozoaclade after paralogue generation. Naturally, the conclusionawaits the solution of metazoan phylogeny.

Identification of Novel CDs in Bilaterian Zic ProteinsIn a previous study, we showed that the extent of conserva-tion varies among the metazoan Zic AA sequences. They arestrongly conserved in Vertebrata (Chordata),Cephalochordata (Chordata), Echinodermata, Mollusca, andArthropoda in comparison to Platyhelminthes, Cnidaria,Nematoda, and Tunicata (Chordata) (Aruga et al. 2006). Zicproteins belonging to the former and latter groups are calledas conserved-type-Zic and diverged-type-Zic, respectively. Tonewly define the evolutionarily CDs, we selected 21conserved-type-Zic sequences (table 1). The set of sequenceswere optimized to comparably represent the three majorbilaterian taxa (Deuterostomia, Lophotrochozoa, andEcdysozoa) where a taxon contained five to seven sequencesfrom three animal phyla. After multiple sequence alignment,we inferred the ancestral sequence, defined by a maximal

likelihood-based prediction in MEGA7 (Kumar et al. 2016).The analysis revealed highly conserved sequence elementsthroughout the proteins. We defined new evolutionarilyCDs according to the following criteria: (1) the sequence el-ement is conserved across the three taxa, (2) the length of thesequence element is>8, and (3) the sequence element is notdivided by intervening AA residues in most species. A zincfinger C-terminal flanking region (ZFCC, eight AA) was ex-ceptionally included because of its inclusion in a previousmolecular phylogenetic analysis (Aruga et al. 2006). As a re-sult, we obtained new CDs that are well conserved among theselected Zic proteins (fig. 3 and supplementary fig. S6,Supplementary Material online).

We next examined the distribution of CD0-6 and ZFCCtogether with the known domains (ZOC, ZFNC, and ZFD)across all eumetazoan Zic proteins (figs. 4 and 5; supplemen-tary fig. S5, Supplementary Material online). CD0 was located

Placozoa

A A-intron gain

A-intron lossA-

C

B

D

Placozoa

Cnidaria

Bilateria

Anthozoa

AA

AHydrozoa

A

Placozoa

Cnidaria

Bilateria

Anthozoa

AHydrozoa

A

Placozoa

Cnidaria

Bilateria

Anthozoa

A

Hydrozoa

A

A-

Cnidaria

Bilateria

Anthozoa

A

Hydrozoa

A

FIG. 2. Hypothetical models to explain the A-intron gain and lossduring evolution. (A) and (B) depend on the tree according toSrivastava et al. (2008). (C) Tree supported by Pick et al. (2010). (D)Trichoplax sister to Hydrozoa is less likely (P< 0.01) in tree topologytests in Srivastava et al. (2008).

Identification and Characterization of Novel Conserved Domains in Metazoan Zic Proteins . doi:10.1093/molbev/msy122 MBE

2215

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022

at the N terminus of Zic AA sequences from limited taxaincluding Priapulida, Chelicerata (Arthropoda), Brachiopoda,Cephalopoda (Mollusca), Hemichordata/Echinodermata, andCephalochordata. In vertebrata, a partially conserved se-quence could be identified in Zic4/5 paralogues. In theremaining taxa, Zic proteins existed as N-terminally truncatedproteins containing CD1 as the N terminal CD. CD0 is pre-dicted to be present in the bilaterian ancestor but has beenextensively lost during evolution. Although its function isunknown, we noticed the distribution of CD0 in so called“living fossil” species such as Limulus polyphemus (Smith andBerkson 2005) and Lingula anatina (Emig 2008). In this regard,it would be tempting to call CD0 a “living fossil domain.”

On the other hand, the C-terminal conserved domainCD6, was strongly conserved in most conserved-type-Zic pro-teins, and was weakly conserved in nematode, cnidarian, andplacozoan Zic proteins. Interestingly, an “EWYV” sequencemotif that was strongly conserved at the C-termini of thevertebrate Zic1/2/3 paralogues could be found in the N-terminal region of CD6, and its N-terminal flanking showedsimilarity to CD6 in multiple alignment. The result indicatesthat the vertebrate Zic1/2/3-type C-termini are truncatedvariants of CD6. Truncation of CD6 at the same positionhas not been observed in any Zic proteins besides the verte-brate Zic1/2/3 and can be regarded as a unique innovation inearly vertebrates. Because the functional importance of the

C-termini of Zic1/2/3 has been described, we hypothesizedthat this structural change may have a role in the establish-ment of the vertebrate nervous system. Because the C-terminiof Zic1 and Zic2 are shown to have transcriptional regulatoryactivities (Kuo et al. 1998; Mizugishi et al. 2004; Twigg et al.2015), and Zic1 and Zic2 play major roles in vertebrate CNSdevelopment, their loss-of-function results in dysgenesis ofthe central nervous system, grossly characterized by hypoplas-tic changes in the dorsal neural tube along the entire rostro-caudal axis including the forebrain, cerebellum, and spinalcord (reviewed by Aruga 2004). In the case of human ZIC1,loss of the C-terminal CD is associated with calvaria deformityand learning disability (Twigg et al. 2015).

The remaining new CDs (CD1–CD5) are located N-termi-nally to ZFD (fig. 5). ZOC is located between CD1 and CD2.They were variably degenerated across the taxa. CD3 andZOC are conserved as highly as CD6, and are included amongthe most conserved-type-Zic genes. However, ZOC is moreclearly conserved in insects (Arthropoda) in accordance withits derivation (comparison between mouse and fly homo-logues, Zic-Opa Conserved). We also observed a clear conser-vation in a subset (Nve_A and Epa_4) of sea anemone (orderActinaria class Anthozoa phylum Cnidaria) Zic sequences.

ZFNC, ZFD, and ZFCC are adjacently placed, forming thelargest compound CD. ZFNC was strongly conserved as alsoshown in previous studies (Aruga et al. 2006; Layden et al.

FIG. 3. Multiple sequence alignment of Zic CDs. URB, predicted urbilaterian sequence; dot, identical to above URB residue. Full alignments areindicated in supplementary fig. S6, Supplementary Material online.

Tohmonda et al. . doi:10.1093/molbev/msy122 MBE

2216

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022

2010) whereas the extent of conservation was much less inZFCC (fig. 4).

Absolute conservation was observed in 51 AA residues inthe ZFD (supplementary fig. S7, Supplementary Material on-line) including cysteine, histidine, and tryptophan residues ofthe C2H2/tCWCH2 motif. The tCWCH2 motif is proposed tobe a hallmark for the structurally unified zinc finger unit(Hatayama and Aruga 2010). ZF1 and ZF2 in all Zic proteinsconformed to tCWCH2. However, the C2H2 motif was notcompletely conserved. For instance, ZF5 is missing in Pmi(Echinodermata) (PmZic Yankura et al. 2010) and inHVu_3 and Hvu_4 (Cnidaria) (Zic3 and Zic2, respectivelyHemmrich et al. 2012) in addition to the C-terminal trun-cated partial sequence record (Clo). These results indicatethat Zic isoforms lacking ZF5 can occur in the course ofmetazoan evolution. This study also revealed the presenceof intervening sequences, a particularly long one in the HrobZF1 C-C region (61 AA), as well as in the Ste ZF1-ZF2 linkerregion (42 AA) and the novel Tcan ZF2 C-H region (9 AA).This finding is in agreement with the increased frequency ofintervening sequences in tCWCH2 ZFs (Hatayama and Aruga2010).

To further clarify the evolutionary traits of Zic CDs, weexamined the correlation of the evolutionary conservationextent among known CDs. The matrix of r values ofSpearman’s rank coefficient (fig. 6) indicates moderate tostrong (r¼ 0.35–0.82) positive correlations among the ZicCDs. The strongest correlations (r¼ 0.79–0.82) were ob-served in the ZFNC-ZFD, CD3–CD5, CD1–CD5, CD1–CD3,and ZFD-CD6 pairs. These results indicate that degenerationof the Zic CDs has occurred coordinately during evolution,suggesting the presence of intramolecular functional or struc-tural associations in ancient Zic proteins.

In a previous study, we observed coevolution of Zic ZF1and ZF2, which are structurally fused to form a single globularunit (Hatayama et al. 2008; Hatayama and Aruga 2010). TheZF1/2 unit frequently possessed insertions and the extent ofevolutionary conservation was lower in ZF1/2 than in ZF3/4/5(Hatayama and Aruga 2010) (supplementary fig. S7,Supplementary Material online). It is known that proteinrepeats in an open structure and in independently foldingunits are more volatile, and that volatile CDs are often shapedby concerted evolution, likely by recombination (Schuler andBornberg-Bauer 2016). However, ZF3, ZF4, and ZF5 are placedunder strong evolutionary constraints even though they arepredicted to form independent globular structures(Hatayama et al. 2008). This may be explained by the factthat ZF3, ZF4, and ZF5 are essential for DNA and cofactorbinding (Hatayama and Aruga 2018).

FIG. 4. Preservation of conserved domains in metazoan Zic proteins.Preservation of CDs in metazoan Zic proteins. Red color gradient inthe boxes indicates the percentage of conservation where maximalmatching to the urbilaterian CD sequence was defined as 100% andthe minimal blast score as 1%, as shown in the inset scale. Blank boxindicates the absence of the sequence element with minimally de-tectable homology (BLAST score > 25) to the urbilaterian CD se-quence. In the ZOC core column, gray and blank boxes indicate thepresence and absence of ZOC core sequence consensus defined as

FIG. 4. Continued(R/S/N)(D/E)(F/L)(V/L/I)(L/F)(R/K)(R/N/S). Arrowheads, parasiticanimals; hyphen (–), not applicable; rank, ranking order of overallCD conservation extent defined as a summation of the percentagesof each conserved domain among the 116 full-length sequences.Mouse Zic proteins (Mmu_1/2/3/4/5) show the same profiles asHsa_1/2/3/4/5, respectively.

Identification and Characterization of Novel Conserved Domains in Metazoan Zic Proteins . doi:10.1093/molbev/msy122 MBE

2217

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022

Diversification of Zic Genes in Each TaxaMolecular phylogenetic tree analysis using the full set ofmetazoan Zic ZFNC-ZFD-ZFCC region AA sequencesrevealed novel evolutionary processes of Zic genes ineach taxa. First, anthozoan Zic paralogues (Nve_D,Epa_1, Epa_2, Adi_1, Ofa_1), (Nve_C, Epa_3, Adi_2,and Ofa_2), (Nve_A, Epa_4), and (Nve_E, Epa_6) weregrouped with moderate to strong statistical supports(fig. 7A and supplementary fig. S8, SupplementaryMaterial online). On the other hand, hydrozoan Zicsequences were not grouped with their anthozoan

1 Z 2 3 4 5 N ZF C 6

Bilateria ancestor

1 Z 2 3 4 5 N ZF C 6

Anthozoa (Cnidaria)

Deuterostomia

1 Z 2 3 4 5 N ZF C 6

Brachiopoda

1 Z 2 3 4 5 N ZF C 6

Tunicata

1 Z 2 3 4 5 N ZF C 6

Hemichordata/Echinodermata

1 Z 2 3 4 5 N ZF C 6’

Vertebrata 1/2/3

1 Z 2 3 4 5 N ZF C 6

Vertebrata 4/5

1 Z 2 3 4 5 N ZF C 6

Chelicerata (Arthropoda)

1 Z 2 3 4 5 N ZF C 6

Insecta (Arthropoda)

1 Z 2 3 4 5 N ZF C 6

Platyhelminthes

1 Z 2 3 4 5 N ZF C 6

Crustacea (Arthropoda)

1 Z 2 3 4 5 N ZF C 6

Priapulida

1 Z 2 3 4 5 N ZF C 6

Nematoda

1 Z 2 3 4 5 N ZF C 6

Bivalvia (Mollusca)

1 Z 2 3 4 5 N ZF C 6

Cephalopoda (Mollusca)

1 Z 2 3 4 5 N ZF C 6

Gastropoda (Mollusca)

1 Z 2 3 4 5 N ZF C 6

Annelida

N ZF C 6

Onychophora

1 Z 2 3 4 5 N ZF C 6

Rotifera/Acanthocephala, Dicyemida

1 Z 2 3 4 5 N ZF C 6

Tardigrada

1 Z 2 3 4 5 N ZF C 6

Cephalochordata

Protostomia

Lophotrochozoa

Ecdysozoa

1 Z 2 3 4 5 N ZF C 6

Hydrozoa (Cnidaria)

Placozoa

Metazoa ancestor

1 Z 2 3 4 5 N ZF C 6

Porifera

1 Z 2 3 4 5 N ZF C 6

1 Z 2 3 4 5 N ZF C 6

FIG. 5. Zic CDs during evolutionary processes. Gray gradient, the extent of conservation (higher-darker); 0–6, CD0–CD6; 60 , carboxy terminallytruncated CD6 in vertebrates; Z, ZOC; N, ZFNC; ZF, ZFD; C, ZFNC.

FIG. 6. Correlation of the conservation extent among Zic CDs. Valuesindicate Spearman’s correlation coefficient obtained by a rank-ordercorrelation test. Gray-back indicates strong correlation (r > 0.7).

Tohmonda et al. . doi:10.1093/molbev/msy122 MBE

2218

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022

counterparts, but Hvu_1 and Ssu were grouped withinthe taxon. Together with taxonomic information that thefour anthozoan species belong to either Order Actiniaria(anemones) (Nve and Epa) or Order Scleractinia (corals)(Adi and Ofa) (Shinzato et al. 2011), we hypothesize pro-cesses of the cnidarian Zic paralogue generation (fig. 7B),where the Anthozoan ancestor had three Zic paralogues,and two additional Zic genes were generated by duplica-tion in the Actiniaria ancestor, and an additional duplica-tion in the Epa ancestor after diverging Nve. ZOC waspredicted to have existed in the cnidarian ancestor butwas retained in one type of paralogues in the Actiniaria.

Concerning other paralogues, we found conservation ofeach paralogue between mammals (Hsa) and cartilaginousfish (Cmi) (supplementary fig. S8, Supplementary Materialonline), indicating that vertebrate paralogues were generatedin a vertebrate ancestor before the Chondrichthyes diverged.Rotifer and Acanthocephala are reported to have a close re-lationship in molecular phylogeny (Garey et al. 1996; Sielaffet al. 2016). However, we did not see a clear affinity betweenthe Rotifer (Bpl and Ava) and Acanthocephala (Ega) Zicsequences (supplementary fig. S8, Supplementary Materialonline).

Sequence-Dependent Transcriptional Activation Is anEvolutionarily Conserved Function of Metazoan ZicProteinsWe next addressed the function of Zic proteins from a phy-logenetic perspective. Chordate (vertebrates and tunicates)and fly Zic proteins have been shown to have transcriptionalregulatory activities (Mizugishi et al. 2001; Yagi et al. 2004;Sawada et al. 2005; Sen et al. 2010). To examine the possibilityof transcriptional regulatory activity, we constructed the N-terminal FLAG epitope-tagged expression vectors for Nve_A/B/C/D/E, Hbl, Ttu, Pim, Afr, Dme, Cel, Ppe, Ci_a/b, Bfl, andmouse (Mmu_1/2/3/4/5) proteins. Mouse Zic proteins werechosen because they are well characterized and highly similarto human ZIC proteins (Aruga 2004; Houtmeyers et al. 2013).These were transfected into cultured mammalian cells alongwith high affinity Zic-binding-sites containing the Tgif1 pro-moter (Ishiguro et al. 2017). The results indicated that mostZic proteins showed binding sequence-dependent transcrip-tional regulatory activities (fig. 8A and B). Among the verte-brate paralogues, Zic2 showed the strongest activation. Whencompared with mouse Zic levels, several invertebrate Zic (Bfl,Ppe, Pim, Cel, Nve_A, Nve_C, and Nve_E) proteins showedmore than half of the transcriptional activation as that of

FIG. 7. Evolutionary processes in cnidarian Zic. (A) A part of the molecular phylogenetic tree by Maximal Likelihood analysis using Zic ZFCC-ZFD-ZFCC AA sequences. Branches with <0.5 bootstrap values were condensed. The internal values indicate the Bootstrap values in MaximalLikelihood analysis (above branches) and postprobability in percentages in Bayesian inference analysis (below branches). The complete tree ispresented in supplementary fig. S8, Supplementary Material online. (B) Predicted evolutionary processes of cnidarian Zic genes.þ1–þ3, acquiredgene numbers by gene duplication.

Identification and Characterization of Novel Conserved Domains in Metazoan Zic Proteins . doi:10.1093/molbev/msy122 MBE

2219

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022

0

50

100

150

200

250

300

em

pty

Nve-A

Nve-B

Nve-C

Nve-D

Nve-E

Ssu

Ttu

Hbl

Pim Afr

Dm

e

Cel

Ppe

Cin

-a

Cin

-b Bfl

Mm

u-1

Mm

u-2

Mm

u-3

Mm

u-4

Mm

u-5

Tgif-luc

TgifDZBS-luc

Cnidaria DeuterostomiaProtostomia

Rela

tive lucifera

se a

ctivity

(Mm

u-2

= 1

00%

)

50

37

75100150250

2520

B

A

C

** *** * * ****** ** **** **

**Tgif-luc

TgifΔZBS-luc

** ** * * *

Nve-A

Nve-B

Nve-C

Nve-D

Nve-E

Ssu

Ttu

Hbl

Pim

Afr

Dm

e

Cel

Ppe

Cin

-a

Cin

-b

Bfl

Mm

u-1

Mm

u-2

Mm

u-3

Mm

u-4

Mm

u-5

em

pty

Nve-A

Ttu

Pim

Dm

e

Cel

Ppe

Mm

u-2

em

pty

Nve-A

Ttu

Pim

Dm

e

Cel

Ppe

Mm

u-2

Input

FIG. 8. Transcriptional regulatory activity of Zic proteins from various animals. (A) N-terminally Flag-tagged Zic protein expression plasmids or theempty were cotransfected with Tgif promoter driven luciferase (Tgif-luc) or Tgif promoter lacking three high affinity Zic binding sites (ZBS)(TgifDZBS-luc) in NIH3T3 cells. Bar graphs show averages of triplicate experiments where each value was normalized by that of internal control.Error bar, SD; *P< 0.05; **P< 0.01 in t-test between Tgif-luc and TgifDZBS-luc. (B) Immunoblot with anti-Flag tag antibody. The cell extracts in (A)are subjected for the immunoblot. (C) Protein binding abilitis of Zic proteins to the transcription regulatory molecular complexes. FLAG-taggedZic expression plasmids were transfected into 293T cells. The cell lysates were immunnoprecipitated (IP) with anti-FLAG epitope antibody andimmunoblotted (IB) with Zic binding proteins (Ishiguro et al. 2007).

Tohmonda et al. . doi:10.1093/molbev/msy122 MBE

2220

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022

mouse Zic2. Most Zic proteins except Afr showed significantactivation, and those except Nve_B, Afr, and Cin_b showedsignificant sequence dependence. These results indicate thatthe transcriptional regulatory activities are maintained de-spite the apparent absence of CDs other than ZFD (e.g., Ssuand Cin_a).

The above results led us to examine the protein bindingabilities of Zic proteins with proteins that are proposed to beassociated with transcriptional regulation of Zic proteins(Ishiguro et al. 2007). For this purpose, we expressedNve_A, Ttu_B, Pim, Dme, Cel, Ppe, and mouse Zic2(Mmu_2) in HEK293T cells and immune-precipitated themusing an anti-FLAG antibody; the coprecipitated proteinswere then analyzed by immunoblotting (fig. 8C). All theZic-binding proteins (DNA-PKcs, Ku70, Ku80, poly ADP-ribose polymerase, RNA helicase A) were found to be coim-munoprecipitated with Zic proteins at least in part. It isknown that these proteins associate with mouse Zic2through either the CD2-ZFNC region (DNA-PKcs) or ZF3 inZFD (Ku70, Ku80, poly ADP-ribose polymerase, and RNAhelicase A) (Ishiguro et al. 2007). It was considered that inter-action with these proteins underlies the highly conservedtranscriptional regulatory activities among Zic proteins.

ZOC Is Essential for Zic–Msx InteractionRecent studies have shown that Zic and a homeodomain(HD) transcription factor, Msx, are involved in an evolution-arily conserved gene regulatory cascade to specify the lateralborder region of the bilaterian central nervous system (CNS)(Simoes-Costa and Bronner 2015; Li et al. 2017). We alsofound that mouse Zic1 and Msx are colocalized in the cellnuclei of the dorsal spinal cord and the progress zone beneaththe apical ectodermal ridge of developing limbs (Aruga et al.2002) (fig. 9A). These facts led us to examine whether Zicprotein could physically interact with Msx protein. We trans-fected HA-tagged mouse Zic2 and FLAG-tagged Msx2 expres-sion vectors into mammalian cells and performed acoimmunoprecipitation assay. The result showed that Zic2physically interacted with Msx2 (fig. 9B). We next mapped theMsx-binding site in Zic2 protein using a series of N-terminallyor C-terminally deleted mutants and found that the Zic2 AA100–140 region including ZOC was necessary for Msx2 bind-ing (fig. 9C). We also mapped the Zic2 binding domain inMsx2 using an Msx deletion mutant, and found that HD isessential for interaction with Zic2 (fig. 9D). The purified HD-GST fusion protein bound HA-Zic2 protein in a GST-pulldown assay (fig. 9E), suggesting a direct interaction betweenZic2 and Msx2.

In a previous study, ZOC was shown to be bound by atranscriptional repressor, I-mfa (Inhibitor of myoD family, alsocalled as Mdfi) protein (Mizugishi et al. 2004). Therefore, it islikely that ZOC serves as a regulatory hub to control Zicprotein function.

Msx-Binding Abilities Are Widely Conserved inMetazoan Zic ProteinsMsx proteins are also widely conserved in metazoa(Takahashi et al. 2008) and play critical roles in neural and

skeletal development (Alappat et al. 2003; Ramos and Robert2005). We next asked how widely the Msx-binding abilitiesare retained in metazoan Zic proteins. We cloned Msx cDNAfrom mouse and sea anemone and constructed FLAG-taggedMmu-Msx1 and Nve-Msx expression vectors. These vectorswere cotransfected with MYC-tagged Zic proteins fromMmu, Nve, Ttu, Pim, Dme, Cel, and Ppe, and the MSX pro-teins were precipitated using anti-FLAG antibodies. The resultindicated that all the tested Zic proteins were coprecipitatedwith both Mmu- and Nve-Msx proteins (fig. 10A and B).However, Dme-Zic (Opa) and Cel-Zic (ref-2) coprecipitatedless efficiently than the other Zic proteins in both cases.

Although Cel-Zic does not contain an apparent ZOC inthe CD homology search, it contained a (KDKMMKS) se-quence instead of typical ZOC (RDFL[L/F]RR) (Layden et al.2010). The similarity in the position of the charged residuesand hydrophobic residues might be sufficient for binding.Therefore, the results suggest that a detailed structure-function analysis based on these experiments is required toconsider the evolutionarily conserved protein–protein inter-action. In addition, we cannot exclude the involvement ofother CDs in the binding of Zic to Msx.

Msx Conservation Extent Was Strongly Correlatedwith That of Zic CDsHaving obtained results suggesting evolutionary conservationof Zic–Msx interaction, we examined how the evolutionaryprocesses of the two genes are correlated. In a previous study,the evolutionary process of metazoan Msx genes was de-scribed from the viewpoint of conservation and diversifica-tion from a bilaterian ancestor (Takahashi et al. 2008). Wethen compared the conservation extent of Msx and Zic CDs.As an Msx CD, HD with 8 AA of N-flanking and 17 AA of C-flanking was chosen because this domain is sufficiently largefor conservation extent analysis and as HD can mediate phys-ical interaction with Zic proteins. We calculated the HD ho-mology score between the predicted bilaterian ancestral MsxAA sequence and Msx sequences from 38 species in whichZic sequences are known (table 1). Coefficient values weredetermined for each Zic CD and Msx HD (fig. 11A). As areference, we used the evolutionary distances of 18S ribo-somal RNA (18S) sequences of the corresponding species.The result indicated that both Msx HD and 18S showedmoderate to strong correlations. However, the coefficientvalues for Msx-Zic CDs was higher than those of 18S-ZicCDs in the total CDs (P< 0.01 in a paired t-test), and thevalues for CD1 and CD3 were particularly well correlated withthose of Msx HD, compared with those of 18S. The value forZFD was high, but was comparable to that of 18S. The scatterplot of the Zic ZFNC-ZFD-ZFCC and Msx HD or 18S score isindicated in figure 11B and C to show the detailed correlationprofile. The graphs indicate that both Zic and Msx arestrongly conserved in animals belonging to Echinodermata/Hemichordate, Mollusca, Cephalochordata, and Vertebratagroups, but are poorly conserved in Nematoda,Platyhelminthes, and Urochordata. There was disparity con-cerning the cnidarian sequences where Msx sequencesshowed high conservation in anthozoa and low conservation

Identification and Characterization of Novel Conserved Domains in Metazoan Zic Proteins . doi:10.1093/molbev/msy122 MBE

2221

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022

FIG. 9. ZOC domain is necessary for binding to Msx protein. (A) (a, b) Zic2 (a) and Msx2 (b) mRNA distribution in embryonic day 9.5 mouseembryos. Blue signals indicate the presence of mRNA. AER, apical ectodermal ridge of limb bud epithelium; LB, limb buds; PZ, progress zone of limbbud mesenchymal cells; SC, prospective spinal cord. (c–h) Double immunostaining indicating Zic2 (red) and Msx2 (green) protein distribution intissue sections. Overlapping distribution is shown in yellow. (c and d) Limb bud. (d) An enlarged view of the tip of the limb bud. (e, f, g) Cross sectionthrough the dorsal trunk including the spinal cord. These three images are derived from a single section. In (g), the signals in (e) and (f) have beenmerged. (h) An enlarged view of the dorsal part of the spinal cord and the surrounding mesenchymal cells. (B) Physical interaction between Zic2and Msx2. HA-tagged Zic2 and (FLAG-tagged Msx2 or empty) expression plasmids were cotransfected into HEK293T cells. Cell lysates of thetransfected cells were immunoprecipitated (IP) with anti-FLAG antibody and immunoblotted (IB) with anti-HA antibody. (C) ZOC is necessary forMsx2 binding. HA-tagged deletion mutants of Zic2 (top) and FLAG-Msx2 expression plasmids were cotransfected and immunoprecipitated withanti-FLAG antibody and immunoblotted with anti HA-antibody. Asterisk indicates the nonspecific bands generated by immunoglobulins afterimmunoprecipitation. (D) Msx2 homeodomain is necessary for Zic2 binding. FLAG-tagged deletion mutants of Msx2 (top) and HA-Zic2 expres-sion plasmids were cotransfected and immunoprecipitated with anti-FLAG antibody and immunoblotted with anti HA-antibody. (E) GST-pull-down (PD) assay. FLAG-Zic2 transfected cell lysate was incubated with GST or GST-Msx2-HD fusion protein. Coprecipitated FLAG-Zic2 wasdetected with the anti-FLAG antibody. Cell lysates (10%) were electrophoresed in Input (10%) lane.

Tohmonda et al. . doi:10.1093/molbev/msy122 MBE

2222

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022

in hydrozoa, consistent with a previous report (Takahashiet al. 2008).

Significance of Zic-Msx Interaction during EvolutionAbove results suggest evolutionary processes of Zic and Msxfamily genes are closely related (summarized in fig. 12). BothZic and Msx proteins are distributed widely in bilaterians(Aruga et al. 2006; Takahashi et al. 2008; Layden et al. 2010).The role of Msx genes in neuroectodermal patterning hasbeen suggested by their expression in the lateral longitudinalcolumns in the fruit fly and in vertebrates (Isshiki et al. 1997;Arendt and Nubler-Jung 1999), and this feature is now ex-tended to the nematode nervous system (Li et al. 2017).Expression in the lateral longitudinal columns has also beendescribed for the vertebrate Zic family (Nagai et al. 1997;Fujimi et al. 2006) and in nematodes (Li et al. 2017). Theircommon roles in cell fate specification have been indicatedfor both, Protostomia and Deuterostomia (Simoes-Costa andBronner 2015; Li et al. 2017). This study showed Zic/Msxcoexpression in the limb buds (fig. 9A) where both geneshave a role in limb patterning (Satokata and Maas 1994;Nagai et al. 2000; Satokata et al. 2000; Quinn et al. 2012). Inaddition, the genetic interaction could be predicted in thecranial bone as both Zic and Msx are associated with cranio-synostosis (ZIC1 and MSX2, developmental abnormality of

calvaria bones) (Wilkie et al. 2000; Twigg et al. 2015) and jawdevelopment (Inoue et al. 2004; Cerny et al. 2010). Althoughthe functional link is limited to the nematodes at present, theconserved binding between Zic and Msx may predict addi-tional links in unexplored species.

The cooperation between Zic and Msx raises the possibilitythat toolkit genes that are often coopted are placed undersimilar evolutionary constraints to preserve their functionaldomains. The result was logically expected. However, it maynot have been sufficiently proven. In a previous study weshowed that conservation of paired domain and homeodo-main of Pax6 showed taxon dependent differential degener-ation similar to Zic in comparison to housekeeping genes(Aruga et al. 2007). Further molecular phylogenetic analysesconsidering the developmental context or protein functionwould reveal novel aspects of evolution.

However, we should note that Zic and Msx are not dis-tributed identically in metazoans. Zic genes are absent indemosponges and are present in animals belonging to phy-lum Placozoa and phylum Platyhelminthes class Cestoda. Incontrast, Msx genes are retained in demosponges and are notdetected in Placozoa and Cestoda. Parahox genes were shownto be lost in Cestoda species presumably due to adaptationsto parasitism (Tsai et al. 2013). The result is in contrast withthe preservation of Msx genes in parasitic nematodes (Nam,Tcan, Asu, Tzi, Ttri, and Tps) and in a freely living highlysimplified orthonectid (Ili). These results are thought to reflectthat differential evolutionary constraints understandably existfor Zic and Msx genes.

Significance of CDs in Zic Protein EvolutionThis study provided several novel ideas and facts about theZic protein evolution (fig. 12). Identification of poriferan Zicgenes suggests the presence of an ancestral Zic gene in themetazoan ancestor. The presence of novel CDs in Zic proteinsof bilaterian ancestors and their selectivity contributed totheir loss during bilaterian evolution. Transcriptional activa-tion and Msx binding are phylogenetically conserved func-tions of Zic proteins. Some Zic CDs and Msx HDs share similardegeneration profiles during evolution. However, besides theevolutionary history of Zic proteins, some results suggest con-sidering the protein domain evolution.

First, the discovery of novel CDs became feasible by selec-tion of slow evolving genes that preserve the ancestral traits.This idea was based on the awareness that the degenerationextent of CDs in some toolkit proteins such as Zic, Pax, andMsx varies strongly among the animal taxa (phyla or classes)(Aruga et al. 2006, 2007; Takahashi et al. 2008). Because suf-ficient numbers of conserved sequences were identified in theLophotrochozoa, Ecdysozoa, and Deuterostomia, thereported CDs may properly represent those in the bilaterianancestor. By adding these seven CDs (CD0-6) to the twoknown ones (ZOC and ZFNCþZFDþZFCC cluster), we canpredict that at least nine CDs existed in the bilaterianancestor.

The protein domains can be described as a compact, spa-tially distinct unit, which can be defined from both functionaland structural viewpoints (Miklos and Campbell 1992). The

B

A

Mm

u-2

em

pty

Nve

-A

Ttu

Pim

Dm

e

Cel

Ppe

Mm

u-2

Input IP: FLAG

MYC-

IB:

FLAG

MYC

FLAG-Mmu-Msx1 FLAG-Mmu-Msx1

Input IP: FLAG

MYC-

FLAG-Nve-Msx FLAG-Nve-Msx

IB:

FLAG

MYC

Mm

u-2

em

pty

Nve

-A

Ttu

Pim

Dm

e

Cel

Ppe

Mm

u-2

Mm

u-2

em

pty

Nve

-A

Ttu

Pim

Dm

e

Cel

Ppe

Mm

u-2

Mm

u-2

em

pty

Nve

-A

Ttu

Pim

Dm

e

Cel

Ppe

Mm

u-2

FIG. 10. Binding abilities to Msx are phylogenetically conservedamong metazoan Zic proteins. FLAG-tagged mouse Msx1 (A) orsea anemone Msx (B) expression plasmids were cotransfected withMYC-tagged Zic protein expression plasmids in 293T cells. The celllysates were immunnoprecipitated (IP) with anti-FLAG epitope anti-body and immunoblotted (IB) with anti MYC epitope antibody.

Identification and Characterization of Novel Conserved Domains in Metazoan Zic Proteins . doi:10.1093/molbev/msy122 MBE

2223

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022

“CDs” in this study are based on sequence similarity andare considered from evolutionary perspectives. Amongthe nine Zic CDs, the CD that can be found in the otherproteins is the C2H2 ZF in ZFD upon searching againstprotein CD databases (NCBI-CDD, Pfam, Prosite,SMART), suggesting that CDs other than ZFD are onlydistributed in the Zic family. Furthermore, there are notraits of CD duplication or shuffling in the collectedmetazoan Zic sequences. Together with the strong correla-tion among the CDs (fig. 6), Zic CDs are thought to becoordinated to exert Zic protein function. In other words,the protein structure of the urbilaterian Zic was self-contained and a radical change in protein structure maynot have been allowed in the course of bilaterian evolution.

Even though the domains defined by the structure (se-quence) and function are not identical, there might be acorrelation between the protein function and CDs degener-ation extent. Based on this idea, we performed a functional

assay of transcriptional activation using both conserved anddiverged type Zic expression vectors. However, the analysisdemonstrated that the sequence-dependent transcriptionregulatory functions of the Zic proteins are not clearly corre-lated with CDs maintenance and that interactions betweenZic and transcription regulatory proteins are mostly con-served (fig. 8). These results suggest that the basic regulatoryactivity of Zic protein function is not simply predicted by thepresence or absence of CDs.

Finally, while whole genome sequencing of key species inevolution readily improve our understanding of the phylo-genic relationship among the animals and provide us withan indispensable framework, the key gene with many-species approach as taken by this study would elucidatedifferent aspects of evolution such as function-oriented pro-tein domain evolution. A combination of these twoapproaches would be beneficial for better understandingof the evolutionary process.

B

A

C

n

EpaNve

Adi

Ofa

HvuSsu

HsaCmi

Pma

Cin

Bfl

AplPpeSpu

Sko

PcaTcasDmeHdu

Cel

Nam

Tcan

AsuTzi

Ttri

Tps

CteTtuHrob

CflCgiMye

BglLgi

DjaSme

Sma

Ili

Sci

Lco

Oca

180

230

280

330

380

430

480

400 600 800 1000 1200

Msx-H

D

Zic-NCZFCC

Cnidaria

Deuterostomia

Ecdysozoa

Lophotrochozoa

Placozoa

Porifera

-0.5

-0.45

-0.4

-0.35

-0.3

-0.25

-0.2

-0.15

-0.1

-0.05

0

400 600 800 1000

-18S

Zic-NCZFCC

-0.5

-0.45

-0.4

-0.35

-0.3

-0.25

-0.2

-0.15

-0.1

-0.05

0

150 250 350 450

-18

S

Msx-HD

FIG. 11. Correlation of evolutionarily conservation. (A) Correlation between Zic CDs and Msx HD or 18S rRNA. About 39 animal species were usedfor the analysis. Strong correlation (r> 0.7) is shaded. (B and C) Scattered plots indicating conservation extents relationship between (B) Msx HDand Zic NCZFCC (ZFNC-ZFD-ZFCC) and (C) 18S rRNA and Zic NCZFCC (left) and 18S rRNA and NCZFCC (right). Blast scores (Zic and Msx) oradditive inverse of evolutionary distance score (18S rRNA). Labels in (B) are the abbreviations in table 1.

Tohmonda et al. . doi:10.1093/molbev/msy122 MBE

2224

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022

Materials and Methods

AnimalsBrachionus plicatilis were purchased from Nikkai Center Co.(Tokyo, Japan). Echinorhynchus gadi were collected from anAlaska pollock captured off Hokkaido purchased from a localfish dealer. Spinochordodes tellinii were collected from a wildmantis (Acromantis sp.) captured at Fukaya City in SaitamaPrefecture, Japan. Species validation was done using 18S ribo-somal RNA sequences.

PCR Cloning of Zic cDNARNA was isolated using TRIzol reagent (Invitrogen, CA) as perthe manufacturer’s recommendation. cDNA was synthesizedusing a 3�-Full RACE Core Set (Takara Bio, Shiga, Japan). Zic

homologs were initially identified by nested PCR on cDNA orgenomic templates using degenerate primers correspondingto the ZF2 to ZF3 region (Aruga et al. 2006). cDNAs corre-sponding to ZF at their 30 and 50 ends were cloned using a 3�-Full RACE Core Set and 5�-Full RACE Core Set (Takara Bio),respectively. The entire open reading frame region of thecDNAs was again cloned using primers located outside thetarget regions. Amino acid sequences were acquired afternucleotide sequencing of multiple PCR fragments.

Database SearchA key word- or homology-based search was conductedagainst the NCBI database (https://www.ncbi.nlm.nih.gov/2018.3.21), ENSEMBL database (http://www.ensembl.org/in-dex.html 2018.3.21), Compagen (http://www.compagen.org/index.html 2018.3.16), and SILVA rRNA database (https://www.arb-silva.de/ 2018.3.11). For the database query, weused the sequence information from previous studies (Trueand Carroll 2002; Aruga et al. 2006, 2007; Takahashi et al. 2008;Hatayama and Aruga 2010; Layden et al. 2010). A TBlastNsearch was done for whole genome shotgun contigs (wgs) ortranscript shotgun assembly (TSA) of target organisms withthe following key sequences and the validity of the target waschecked by reciprocal BLAST. Ttu-Zic and Cfl-Msx were usedto identify the hypothetical Hrob-Zic and Gastropoda Msx(Bgl and Lgi), respectively. Hypothetical Cte-Zic and Tad-Zicsequences were obtained with a BLAST search againstADN43078 and XP_002108473, respectively. The criteria fortheir identification as members of Zic and Msx families weredescribed previously (True and Carroll 2002; Aruga et al. 2006;Takahashi et al. 2008). The newly defined sequences weredeposited at DDBJ/NCBI/EMBL databases with accessionnumbers shown in table 1.

We omitted sequences that were thought be immaturelycurated; for instance, the Schistosoma mansoni (short form)and Hymenolepis microstoma sequences were edited to ob-tain the entire ORF by adjoining the predicted Exon1 andExon2 sequences manually.

Presence or absence of introns in ZFD and their positionsin case of presence were examined by ENSEMBL database orby aligning genomic sequence and mRNA sequence.

Molecular Phylogenetic AnalysisThe AA and nucleotide sequences were aligned usingMUSCLE (Edgar 2004), MSAPROBS (Liu et al. 2010), andMAFFT (Katoh et al. 2017). Some of the aligned sequenceswere corrected by visual inspection.

To define the Zic CDs except CD0 in the bilaterian ances-tor, we first selected and aligned Zic sequences fromDeuterostomia (Bfl, Sko, Sca, Apl, Ppe, and Spu),Lophotrochozoa (Cte, Lan, Ttra, Cfl, Sso, Hbl, Obi, and Lgi),and Ecdysozoa (Pca, Lpo_1, Pim, Pte, Afr, Haz, and Eka) with acnidarian sequence (Nve_A) as an outgroup. The ancestralsequences were calculated using the Maximum Likelihoodmethod under a JTT matrix-based model (Jones et al.1992), a defined tree as follows ((Bfl,(((Apl, Ppe), Spu),(Sca, Sko))),((Cte,(Lan, Ttr),(Lga,(Hbl, Obi),(Sso,Cfl))),(Pca,(Eka,((Pim, Lpo_1, Pte),(Afr, Haz))))), Nve_A).

1 Z 2 3 4 5 N ZF C 6

Msx

Zic

HD

1 Z 2 3 4 5 N ZF C 6

Msx

Zic

HD

Msx

Zic

metazoan ancestor

bilaterian ancestor

(NK homeobox family)

(Gli-Glis-Zic ZF family)

EH1

FIG. 12. Coevolution of Zic and Msx. Bottom, Zic family and Msxfamily existed in the metazoan ancestor (fig. 1). Middle, Bilaterianancestor contained Zic and Msx CDs that are differentially conservedin existing animals (fig. 5). EH1 (Engrailed homology 1 motif) binds thetranscriptional corepressor Groucho. Phylogenetically conserved ZicCD (CD1-ZF)-Msx HD interaction (fig. 10) suggests the presence ofthis interaction in the bilaterian ancestor (gray arrows). Top, Curvedlines connecting Zic CDs indicate the correlations of the conservedextent among Zic CDs (fig. 6). Lines connecting Msx and Zic CDs in-dicate the correlation between Zic CDs and Msx HD (fig. 11). Thicklines, r > 0.8; thin lines, r > 0.7. Evolution of Msx family is based onTakahashi et al. (2008).

Identification and Characterization of Novel Conserved Domains in Metazoan Zic Proteins . doi:10.1093/molbev/msy122 MBE

2225

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022

All positions with <70% site coverage were eliminated(423 positions in the final data set). The CDs were definedas shown in supplementary figure S6, Supplementary Materialonline. Selection of CDs was done under the above criteria byexcluding clusters with less Maximal Probability. To defineCD0, we carried out the same analysis using Bfl, Apl, Ppe,Sca, Sko, Lan, Pca, Lpo_1 sequences and a tree as follows(Bfl, (((Apl, Ppe), Spu),(Sca, Sko)),(Lan,(Lpo_1, Pca))), and de-fined it as shown in supplementary figure S6, SupplementaryMaterial online.

A homology search against the local Zic sequence databasewas performed using the BLAST program (Altschul et al.1990) implemented in the NCBI Genome Workbench(https://www.ncbi.nlm.nih.gov/tools/gbench/2018.3.18) withthe following parameters: word size, 3; e-value, 10; andThreshold, 11. The conservation of CDs was measured bythe BLAST score based on a BLOSUM62 matrix (Henikoffand Henikoff 1992). We defined the presence of a CD if thetarget Zic AA sequence contained a homologous sequenceelement with a score >24. If multiple patterns of sequencealignments were given for a sequence in the homology search,the optimal alignment with the lowest E-value was retainedand the others were omitted. The E-values of the omittedsequences were marginal, ranging from 1.4 to 7.6. To calculatethe conservation extent as percentage among the CD-containing sequences, the minimal and maximal scoreswere defined as rank value 1 and 100, respectively, and theremaining sequence scores were placed proportionally withinthis range. Sequences that were not shown by the CD ho-mology search were defined as having a rank value of 0. Thecorrelation analysis for conservation extent was performedusing the percentages defined above. Because the rank valueswere discontinuous between 0 and 1, we calculated the co-efficient r in Spearman’s rank-order correlation using the rankvalues to consider both presence–absence and conservationextent information.

Phylogenetic tree analysis was performed with MEGA7(Kumar et al. 2016) and MrBayes 3.2 (Ronquist et al. 2012).The Maximal Likelihood-based tree was based on distancecalculation using the JTT matrix (Jones et al. 1992) after re-moving all positions with<70% site coverage. In the MaximalLikelihood tree, tree reliability was estimated by a bootstraptest with 500 repetitions. In the Bayesian inference analysis weused an empirical model (WAG distances, Whelan andGoldman 2001) with gamma, alpha shape parameter, andAA frequencies estimated from the data. We ran 1, 000,000 generations with one cold and three incrementallyheated Markov chains, random starting trees for each chain,and trees sampled every 100 generations. We constructed a50% major rule consensus tree from the last 2500 trees thatwere saved (burnin ¼ 2500). The tree was edited usingTreeGraph 2 (Stover and Muller 2010).

Reconstruction of the Msx sequence in the bilaterian an-cestor was performed as described above for the ancestral Zicsequence construction. The resultant sequence was identicalto that constructed in a previous study (Takahashi et al. 2008)except that the sequence was extended by including 17 AA ofthe N-terminal flanking region of HD. The conservation

extent was defined as the BLOSUM62-based BLAST scorebetween the ancestral sequence and the target sequence.The evolutionary distances of 18S RNA were calculated asthe number of base substitutions per site between the ances-tral 18S RNA sequence and the target species-derived ones.Analyses were performed using the Tamura–Nei model(Tamura and Nei 1993). Measurements were taken afterremoval of any alignment gap-containing sites, assuming dif-ferent evolutionary rates among sites (gamma distribution,a¼ 0.4). The correlation between the conservation extents of(18S or Msx) and Zic was analyzed using the coefficient r inSpearman’s rank-order correlation. The analysis was done forspecies in which Zic, Msx, and 18S RNA sequences were allavailable after removing the paralog that showed lower con-servation values than any remaining paralogues. Therefore,the representative sequences were the most strongly con-served in each group.

Plasmids and MutagenesisTo construct Zic expression vectors for Zic proteins (Nve-A/B/C/D/E, Ssu, Hbl, Pim, Afr, Dme, Cel, Ppe, Cin-a/b, Bfl, Mmu-1/2/3/4/5, Xla-1/2/3/4/5) and Msx proteins (Nve, Mmu-1),entire protein coding regions were first amplified by PCRand cloned into the pGEMT easy vector (Promega). NveBAC (Nve-A, CH314-49A19; Nve-B/C/D/E, CH314-55K22) orNve cDNA (Nve-Msx). After verification by sequencing, thecorrect ORFs were excised from the plasmid using NotI orEcoRI and inserted into modified pcDNA3.1 vectors(Invitrogen), in which a Myc or FLAG tag was introducedN-terminally followed by the initiation codon for methionine.

The HA-tagged Zic2 deletion series was described previ-ously (Mizugishi et al. 2004). The pcDNA3-Flag-Msx2 vectorwas a gift from Dr. Ken Watanabe (Masuda et al. 2001). Aseries of truncated Msx2 (NþHD; 1–612 bp, N; 1–423 bp) wasamplified by PCR and subcloned into the BamHI-SalI site ofpCMV-tag2A (N-terminal Flag expression vector; Stratagene).To construct the GST fusion plasmid, the sequence of theMsx2 homeodomain (424–612 bp) was amplified by PCR andsubcloned into pGEX4T3 vector (Promega). Mutations wereintroduced into mouse and frog Zic2 using the Takara in vitromutagenesis kit (Takara).

Luciferase Reporter AssayNIH3T3 cells in 24-well dishes were transfected with pTgif-lucor pTgifDZBS (200 ng) (Ishiguro et al. 2017), pcDNA3.1-FLAGor pcDNA3.1-FLAG-Zic (200 ng), and pEF-Renilla luciferase(4 ng) using TransIT-LT1 (Mirus Bio). Luciferase activity wasmeasured using a Dual Luciferase Assay System (Promega)and a Minilumat LB 9506 luminometer (Berthold).

Immunoprecipitation and GST-Pull Down AssayFor the Zic-Msx binding assay (figs. 9 and 10), 293 T cells werecotransfected with appropriate expression vectors usingEffectene (Qiagen). At 24 h after transfection, the cells werelysed in an immunoprecipitation buffer (25 mM Hepes, pH7.2, 0.5% NP-40, 150 mM NaCl, 50 mM NaF, 2 mM Na3VO4,1 mM PMSF, 20 lg/ml aprotinin) at 4�C.Immunoprecipitation was performed using an anti-FLAG

Tohmonda et al. . doi:10.1093/molbev/msy122 MBE

2226

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022

M2 monoclonal antibody (Sigma). The bound material wasdetected by immunoblotting with an anti-HA polyclonal an-tibody H-6908 (Sigma). For the Zic2-DNAPK/RHA complexesbinding assay, 293 T cells were transfected with the appropri-ate expression vectors using Lipofectamine 2000 (Invitrogen).Transfected cells were washed and harvested in PBS(-) con-taining 1 mM PMSF, and total cell extracts were preparedwith lysis wash buffer 150 (20 mM HEPES-KOH [pH 7.8],10% glycerol, 150 mM NaCl, 0.5 mM DTT, 0.1 mM EDTA,0.5% Nonidet P-40, and 1 mM PMSF). The extracts were in-cubated with anti-HA or anti-FLAG affinity beads at 4�C for6 h. Immunoblotting was carried out as described previously(Ishiguro et al. 2007).

GST fusion proteins were expressed in Escherichia coli andaffinity-purified with Glutathione Sepharose 4B (Pharmacia).For the GST pull-down assay, GST fusion proteins were incu-bated for 2 h at 4�C with protein extracts from 293 T cells orpurified Zic2 protein in the immunoprecipitation buffer. Afterwashing five times, the bound proteins were separated bySDS–PAGE, and then immunoblotted with an anti-FLAGantibody.

Immunostaining and In Situ HybridizationImmunostaining for mouse embryo sections was carried outas described previously (Aruga et al. 2002). Whole mountstaining of the mouse embryo was carried out as describedpreviously (Nagai et al. 1997).

Data AvailabilityThe sequences newly defined in this study were deposited atthe DDBJ/GenBank/EMBL database under the following ac-cession numbers (Bpl-Zic, LC328942; Ega-Zic, LC328943; Ste-Zic, LC328944; Hrob-Zic, BR001475; Cte-Zic, BR001476; Ttri-Zic, BR001480; Tad-Zic, BR001479; Lgi-Msx, BR001477; Bgl-Msx, BR001478; Ava-Zic1, BR001481; Ava-Zic2, BR001482;Smar-Zic1, BR001483; Smar-Zic2, BR001484; Ava-Msx,BR001485; and Smar-Msx, BR001486).

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online.

AcknowledgmentsWe thank Akiko Shimazaki and Yayoi Nozaki for technicalassistance in molecular biology experiments, YoshifumiMatsumoto for comments on the article, and anonymousreviewers for valuable advice including that about poriferanZic sequences. This work was supported by RIKEN BSI fundsand MEXT grants (grant numbers 16390086, 15K15019).

ReferencesAlappat S, Zhang ZY, Chen YP. 2003. Msx homeobox gene family and

craniofacial development. Cell Res. 13(6):429–442.Alper S, Kenyon C. 2002. The zinc finger protein REF-2 functions with the

Hox genes to inhibit cell fusion in the ventral epidermis of C. elegans.Development 129(14):3335–3348.

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic localalignment search tool. J Mol Biol. 215(3):403–410.

Arendt D, Nubler-Jung K. 1999. Comparison of early nerve cord devel-opment in insects and vertebrates. Development 126(11):2309–2325.

Aruga J. 2004. The role of Zic genes in neural development. Mol CellNeurosci. 26(2):205–221.

Aruga J, Hatayama M. 2018. Comparative genomics of the Zic familygenes. Adv Exp Med Biol. 1046:3–26.

Aruga J, Kamiya A, Takahashi H, Fujimi TJ, Shimizu Y, Ohkawa K, YazawaS, Umesono Y, Noguchi H, Shimizu T, et al. 2006. A wide-rangephylogenetic analysis of Zic proteins: implications for correlationsbetween protein structure conservation and body plan complexity.Genomics 87(6):783–792.

Aruga J, Odaka YS, Kamiya A, Furuya H. 2007. Dicyema Pax6 andZic: tool-kit genes in a highly simplified bilaterian. BMC Evol Biol.7:201.

Aruga J, Tohmonda T, Homma S, Mikoshiba K. 2002. Zic1 promotes theexpansion of dorsal neural progenitors in spinal cord by inhibitingneuronal differentiation. Dev Biol. 244(2):329–341.

Bertrand V, Hobert O. 2009. Linking asymmetric cell division to theterminal differentiation program of postmitotic neurons in C. ele-gans. Dev Cell 16(4):563–575.

Brusca RC, Moore W, Shuster SM. 2016. Invertebrates. Sunderland MA:Sinauer Associates.

Cannon JT, Vellutini BC, Smith J, 3rd, Ronquist F, Jondelius U, Hejnol A.2016. Xenacoelomorpha is the sister group to Nephrozoa. Nature530(7588):89–93.

Cerny R, Cattell M, Sauka-Spengler T, Bronner-Fraser M, Yu F, MedeirosDM. 2010. Evidence for the prepattern/cooption model of verte-brate jaw evolution. Proc Natl Acad Sci U S A. 107(40):17262–17267.

Dohrmann M, Worheide G. 2013. Novel scenarios of early animalevolution–is it time to rewrite textbooks? Integr Comp Biol.53(3):503–511.

Edgar RC. 2004. MUSCLE: a multiple sequence alignment method withreduced time and space complexity. BMC Bioinformatics 5:113.

Emig CC. 2008. On the history of the names Lingula, anatina, and on theconfusion of the forms assigned them among the BrachiopodaCarnets de G�eologie/Notebooks on Geology CG2008_A08. Carnets,France.

Feuda R, Dohrmann M, Pett W, Philippe H, Rota-Stabelli O, Lartillot N,Worheide G, Pisani D. 2017. Improved modeling of compositionalheterogeneity supports sponges as sister to all other animals. CurrBiol. 27(24):3864–3870 e3864.

Flot JF, Hespeels B, Li X, Noel B, Arkhipova I, Danchin EG, Hejnol A,Henrissat B, Koszul R, Aury JM, et al. 2013. Genomic evidence forameiotic evolution in the bdelloid rotifer Adineta vaga. Nature500(7463):453–457.

Fujimi TJ, Mikoshiba K, Aruga J. 2006. Xenopus Zic4: conservation anddiversification of expression profiles and protein function among theXenopus Zic family. Dev Dyn. 235(12):3379–3386.

Garey JR, Near TJ, Nonnemacher MR, Nadler SA. 1996. Molecular evi-dence for Acanthocephala as a subtaxon of Rotifera. J Mol Evol.43(3):287–292.

Hatayama M, Aruga J. 2010. Characterization of the tandem CWCH2sequence motif: a hallmark of inter-zinc finger interactions. BMC EvolBiol. 10:53.

Hatayama M, Aruga J. 2018. Role of Zic family proteins in transcriptionalregulation and chromatin remodeling. Adv Exp Med Biol.1046:353–380.

Hatayama M, Tomizawa T, Sakai-Kato K, Bouvagnet P, Kose S, ImamotoN, Yokoyama S, Utsunomiya-Tate N, Mikoshiba K, Kigawa T, et al.2008. Functional and structural basis of the nuclear localization sig-nal in the ZIC3 zinc finger domain. Hum Mol Genet.17(22):3459–3473.

Hemmrich G, Bosch TC. 2008. Compagen, a comparative genomicsplatform for early branching metazoan animals, reveals early originsof genes regulating stem-cell differentiation. Bioessays30(10):1010–1018.

Hemmrich G, Khalturin K, Boehm AM, Puchert M, Anton-Erxleben F,Wittlieb J, Klostermeier UC, Rosenstiel P, Oberg HH, Domazet-LosoT, et al. 2012. Molecular signatures of the three stem cell lineages in

Identification and Characterization of Novel Conserved Domains in Metazoan Zic Proteins . doi:10.1093/molbev/msy122 MBE

2227

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022

hydra and the emergence of stem cell function at the base of multi-cellularity. Mol Biol Evol. 29(11):3267–3280.

Henikoff S, Henikoff JG. 1992. Amino acid substitution matrices fromprotein blocks. Proc Natl Acad Sci U S A. 89(22):10915–10919.

Herrera E, Brown L, Aruga J, Rachel RA, Dolen G, Mikoshiba K, Brown S,Mason CA. 2003. Zic2 patterns binocular vision by specifying theuncrossed retinal projection. Cell 114(5):545–557.

Houtmeyers R, Souopgui J, Tejpar S, Arkell R. 2013. The ZIC gene familyencodes multi-functional proteins essential for patterning and mor-phogenesis. Cell Mol Life Sci. 70(20):3791–3811.

Inoue T, Hatayama M, Tohmonda T, Itohara S, Aruga J, Mikoshiba K.2004. Mouse Zic5 deficiency results in neural tube defects and hy-poplasia of cephalic neural crest derivatives. Dev Biol.270(1):146–162.

Ishiguro A, Hatayama M, Otsuka MI, Aruga J. 2017. Link between thecausative genes of holoprosencephaly, Zic2 directly regulates Tgif1expression. Sci Rep. 8(1):2140.

Ishiguro A, Ideta M, Mikoshiba K, Chen DJ, Aruga J. 2007. ZIC2-depen-dent transcriptional regulation is mediated by DNA-dependent pro-tein kinase, poly(ADP-ribose) polymerase, and RNA helicase A. J BiolChem. 282(13):9983–9995.

Isshiki T, Takeichi M, Nose A. 1997. The role of the msh homeobox geneduring Drosophila neurogenesis: implication for the dorsoventralspecification of the neuroectoderm. Development124(16):3099–3109.

Jones DT, Taylor WR, Thornton JM. 1992. The rapid generation of mu-tation data matrices from protein sequences. Comput Appl Biosci.8(3):275–282.

Katoh K, Rozewicki J, Yamada KD. 2017. MAFFT online service: multiplesequence alignment, interactive sequence choice and visualization.Brief Bioinform. doi: 10.1093/bib/bbx108.

Kumar S, Stecher G, Tamura K. 2016. MEGA7: molecular evolutionarygenetics analysis version 7.0 for bigger datasets. Mol Biol Evol.33(7):1870–1874.

Kuo JS, Patel M, Gamse J, Merzdorf C, Liu X, Apekin V, Sive H. 1998. Opl: azinc finger protein that regulates neural determination and pattern-ing in Xenopus. Development 125(15):2867–2882.

Layden MJ, Meyer NP, Pang K, Seaver EC, Martindale MQ. 2010.Expression and phylogenetic analysis of the zic gene family in theevolution and development of metazoans. Evodevo 1(1):12.

Li Y, Zhao D, Horie T, Chen G, Bao H, Chen S, Liu W, Horie R, Liang T,Dong B, et al. 2017. Conserved gene regulatory module specifieslateral neural borders across bilaterians. Proc Natl Acad Sci U S A.114(31):E6352–E6360.

Lindgens D, Holstein TW, Technau U. 2004. Hyzic, the Hydra homolog ofthe zic/odd-paired gene, is involved in the early specification of thesensory nematocytes. Development 131(1):191–201.

Liu Y, Schmidt B, Maskell DL. 2010. MSAProbs: multiple sequence align-ment based on pair hidden Markov models and partition functionposterior probabilities. Bioinformatics 26(16):1958–1964.

Mallatt J, Craig CW, Yoder MJ. 2010. Nearly complete rRNA genes as-sembled from across the metazoan animals: effects of more taxa, astructure-based alignment, and paired-sites evolutionary models onphylogeny reconstruction. Mol Phylogenet Evol. 55(1):1–17.

Masuda Y, Sasaki A, Shibuya H, Ueno N, Ikeda K, Watanabe K. 2001.Dlxin-1, a novel protein that binds Dlx5 and regulates its transcrip-tional function. J Biol Chem. 276(7):5331–5338.

Meyerowitz EM. 1999. Plants, animals and the logic of development.Trends Cell Biol. 9(12):M65–M68.

Mikhailov KV, Slyusarev GS, Nikitin MA, Logacheva MD, Penin AA,Aleoshin VV, Panchin YV. 2016. The genome of Intoshia linei affirmsorthonectids as highly simplified spiralians. Curr Biol.26(13):1768–1774.

Miklos GL, Campbell HD. 1992. The evolution of protein domains andthe organizational complexities of metazoans. Curr Opin Genet Dev.2(6):902–906.

Mizugishi K, Aruga J, Nakata K, Mikoshiba K. 2001. Molecular propertiesof Zic proteins as transcriptional regulators and their relationship toGLI proteins. J Biol Chem. 276(3):2180–2188.

Mizugishi K, Hatayama M, Tohmonda T, Ogawa M, Inoue T, MikoshibaK, Aruga J. 2004. Myogenic repressor I-mfa interferes with the func-tion of Zic family proteins. Biochem Biophys Res Commun.320(1):233–240.

Moriyama Y, Kawanishi T, Nakamura R, Tsukahara T, Sumiyama K,Suster ML, Kawakami K, Toyoda A, Fujiyama A, Yasuoka Y, et al.2012. The medaka zic1/zic4 mutant provides molecular insights intoteleost caudal fin evolution. Curr Biol. 22(7):601–607.

Moroz LL, Kocot KM, Citarella MR, Dosung S, Norekian TP, PovolotskayaIS, Grigorenko AP, Dailey C, Berezikov E, Buckley KM, et al. 2014. Thectenophore genome and the evolutionary origins of neural systems.Nature 510(7503):109–114.

Nagai T, Aruga J, Minowa O, Sugimoto T, Ohno Y, Noda T, Mikoshiba K.2000. Zic2 regulates the kinetics of neurulation. Proc Natl Acad Sci US A. 97(4):1618–1623.

Nagai T, Aruga J, Takada S, Gunther T, Sporle R, Schughart K, MikoshibaK. 1997. The expression of the mouse Zic1, Zic2, and Zic3 genesuggests an essential role for Zic genes in body pattern formation.Dev Biol. 182(2):299–313.

Nosenko T, Schreiber F, Adamska M, Adamski M, Eitel M, Hammel J,Maldonado M, Muller WE, Nickel M, Schierwater B, et al. 2013. Deepmetazoan phylogeny: when different genes tell different stories. MolPhylogenet Evol. 67(1):223–233.

Pick KS, Philippe H, Schreiber F, Erpenbeck D, Jackson DJ, Wrede P, WiensM, Alie A, Morgenstern B, Manuel M, et al. 2010. Improved phylo-genomic taxon sampling noticeably affects nonbilaterian relation-ships. Mol Biol Evol. 27(9):1983–1987.

Quinn ME, Haaning A, Ware SM. 2012. Preaxial polydactyly caused byGli3 haploinsufficiency is rescued by Zic3 loss of function in mice.Hum Mol Genet. 21(8):1888–1896.

Ramos C, Robert B. 2005. msh/Msx gene family in neural development.Trends Genet. 21(11):624–632.

Riesgo A, Farrar N, Windsor PJ, Giribet G, Leys SP. 2014. The analysis ofeight transcriptomes from all poriferan classes reveals surprising ge-netic complexity in sponges. Mol Biol Evol. 31(5):1102–1120.

Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Hohna S,Larget B, Liu L, Suchard MA, Huelsenbeck JP. 2012. MrBayes 3.2:efficient Bayesian phylogenetic inference and model choice acrossa large model space. Syst Biol. 61(3):539–542.

Ruppert EE, Fox RS, Barns RD. 2004. Invertebrate zoology. Belmont, CA.:Thomson-Brooks/Cole.

Ryan JF, Pang K, Schnitzler CE, Nguyen A-D, Moreland RT, Simmons DK,Koch BJ, Francis WR, Havlak P, Smith SA, et al. 2013. The genome ofthe ctenophore Mnemiopsis leidyi and its implications for cell typeevolution. Science 342(6164):1242592.

Satokata I, Ma L, Ohshima H, Bei M, Woo I, Nishizawa K, Maeda T,Takano Y, Uchiyama M, Heaney S, et al. 2000. Msx2 deficiency inmice causes pleiotropic defects in bone growth and ectodermalorgan formation. Nat Genet. 24(4):391–395.

Satokata I, Maas R. 1994. Msx1 deficient mice exhibit cleft palate andabnormalities of craniofacial and tooth development. Nat Genet.6(4):348–356.

Sawada K, Fukushima Y, Nishida H. 2005. Macho-1 functions as tran-scriptional activator for muscle formation in embryos of the ascidianHalocynthia roretzi. Gene Expr Patterns 5(3):429–437.

Schuler A, Bornberg-Bauer E. 2016. Evolution of protein domain repeatsin metazoa. Mol Biol Evol. 33(12):3170–3182.

Sen A, Stultz BG, Lee H, Hursh DA. 2010. Odd paired transcriptionalactivation of decapentaplegic in the Drosophila eye/antennal disc iscell autonomous but indirect. Dev Biol. 343(1–2):167–177.

Shinzato C, Shoguchi E, Kawashima T, Hamada M, Hisata K, Tanaka M,Fujie M, Fujiwara M, Koyanagi R, Ikuta T, et al. 2011. Using theAcropora digitifera genome to understand coral responses to envi-ronmental change. Nature 476(7360):320–323.

Sielaff M, Schmidt H, Struck TH, Rosenkranz D, Mark Welch DB, HankelnT, Herlyn H. 2016. Phylogeny of syndermata (syn. Rotifera): mito-chondrial gene order verifies epizoic Seisonidea as sister to endopar-asitic Acanthocephala within monophyletic Hemirotifera. MolPhylogenet Evol. 96:79–92.

Tohmonda et al. . doi:10.1093/molbev/msy122 MBE

2228

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022

Simion P, Philippe H, Baurain D, Jager M, Richter DJ, Di Franco A, RoureB, Satoh N, Queinnec E, Ereskovsky A, et al. 2017. A large and con-sistent phylogenomic dataset supports sponges as the sister groupto all other animals. Curr Biol. 27(7):958–967.

Simoes-Costa M, Bronner ME. 2015. Establishing neural crest identity: agene regulatory recipe. Development 142(2):242–257.

Smith SA, Berkson J. 2005. Laboratory culture and maintenance ofthe horseshoe crab (Limulus polyphemus). Lab Anim (NY)34(7):27–34.

Srivastava M, Begovic E, Chapman J, Putnam NH, Hellsten U, KawashimaT, Kuo A, Mitros T, Salamov A, Carpenter ML, et al. 2008. TheTrichoplax genome and the nature of placozoans. Nature454(7207):955–960.

Srivastava M, Simakov O, Chapman J, Fahey B, Gauthier ME, Mitros T,Richards GS, Conaco C, Dacre M, Hellsten U, et al. 2010. TheAmphimedon queenslandica genome and the evolution of animalcomplexity. Nature 466(7307):720–726.

Stover BC, Muller KF. 2010. TreeGraph 2: combining and visualizingevidence from different phylogenetic analyses. BMC Bioinformatics11(1):7.

Takahashi H, Kamiya A, Ishiguro A, Suzuki AC, Saitou N, Toyoda A,Aruga J. 2008. Conservation and diversification of Msx protein inmetazoan evolution. Mol Biol Evol. 25(1):69–82.

Tamura K, Nei M. 1993. Estimation of the number of nucleotide sub-stitutions in the control region of mitochondrial DNA in humansand chimpanzees. Mol Biol Evol. 10(3):512–526.

Telford MJ, Budd GE, Philippe H. 2015. Phylogenomic insights into an-imal evolution. Curr Biol. 25(19):R876–R887.

Telford MJ, Moroz LL, Halanych KM. 2016. Evolution: a sisterly dispute.Nature 529(7586):286–287.

Thanaraj TA, Clark F. 2001. Human GC-AG alternative intron isoformswith weak donor sites show enhanced consensus at acceptor exonpositions. Nucleic Acids Res. 29(12):2581–2593.

True JR, Carroll SB. 2002. Gene co-option in physiological and morpho-logical evolution. Annu Rev Cell Dev Biol. 18:53–80.

Tsai IJ, Zarowiecki M, Holroyd N, Garciarrubio A, Sanchez-Flores A,Brooks KL, Tracey A, Bobes RJ, Fragoso G, Sciutto E, et al. 2013.The genomes of four tapeworm species reveal adaptations to par-asitism. Nature 496(7443):57–63.

Twigg SR, Forecki J, Goos JA, Richardson IC, Hoogeboom AJ, van denOuweland AM, Swagemakers SM, Lequin MH, Van Antwerp D,McGowan SJ, et al. 2015. Gain-of-function mutations in ZIC1 areassociated with coronal craniosynostosis and learning disability. Am JHum Genet. 97(3):378–388.

Vasquez-Doorman C, Petersen CP. 2014. zic-1 Expression in Planarianneoblasts after injury controls anterior pole regeneration. PLoSGenet. 10(7):e1004452.

Vogg MC, Owlarn S, Perez Rico YA, Xie J, Suzuki Y, Gentile L, Wu W,Bartscherer K. 2014. Stem cell-dependent formation of a functionalanterior regeneration pole in planarians requires Zic and Forkheadtranscription factors. Dev Biol. 390(2):136–148.

Whelan S, Goldman N. 2001. A general empirical model of proteinevolution derived from multiple protein families using amaximum-likelihood approach. Mol Biol Evol. 18(5):691–699.

Wilkie AO, Tang Z, Elanko N, Walsh S, Twigg SR, Hurst JA, Wall SA,Chrzanowska KH, Maxson RE Jr 2000. Functional haploinsufficiencyof the human homeobox gene MSX2 causes defects in skull ossifi-cation. Nat Genet. 24(4):387–390.

Yagi K, Satou Y, Satoh N. 2004. A zinc finger transcription factor, ZicL, is adirect activator of Brachyury in the notochord specification of Cionaintestinalis. Development 131(6):1279–1288.

Yankura KA, Martik ML, Jennings CK, Hinman VF. 2010. Uncoupling ofcomplex regulatory patterning during evolution of larval develop-ment in echinoderms. BMC Biol. 8:143.

Identification and Characterization of Novel Conserved Domains in Metazoan Zic Proteins . doi:10.1093/molbev/msy122 MBE

2229

Dow

nloaded from https://academ

ic.oup.com/m

be/article/35/9/2205/5037825 by guest on 06 August 2022