11
Deep convolutional neural networks for multi-modality isointense infant brain image segmentation Wenlu Zhang a , Rongjian Li a , Houtao Deng b , Li Wang c , Weili Lin d , Shuiwang Ji a, , Dinggang Shen c,e, a Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA b Instacart, San Francisco, CA 94107, USA c IDEA Lab, Department of Radiology and BRIC, University of North Carolina at Chapel Hill, NC 27599, USA d MRI Lab, Department of Radiology and BRIC, University of North Carolina at Chapel Hill, NC 27599, USA e Department of Brain and Cognitive Engineering, Korea University, Seoul, Republic of Korea abstract article info Article history: Accepted 23 December 2014 Available online 3 January 2015 Keywords: Image segmentation Multi-modality data Infant brain image Convolutional neural networks Deep learning The segmentation of infant brain tissue images into white matter (WM), gray matter (GM), and cerebrospinal uid (CSF) plays an important role in studying early brain development in health and disease. In the isointense stage (approximately 68 months of age), WM and GM exhibit similar levels of intensity in both T1 and T2 MR images, making the tissue segmentation very challenging. Only a small number of existing methods have been designed for tissue segmentation in this isointense stage; however, they only used a single T1 or T2 images, or the combination of T1 and T2 images. In this paper, we propose to use deep convolutional neural networks (CNNs) for segmenting isointense stage brain tissues using multi-modality MR images. CNNs are a type of deep models in which trainable lters and local neighborhood pooling operations are applied alternatingly on the raw input images, resulting in a hierarchy of increasingly complex features. Specically, we used multi- modality information from T1, T2, and fractional anisotropy (FA) images as inputs and then generated the segmentation maps as outputs. The multiple intermediate layers applied convolution, pooling, normalization, and other operations to capture the highly nonlinear mappings between inputs and outputs. We compared the performance of our approach with that of the commonly used segmentation methods on a set of manually seg- mented isointense stage brain images. Results showed that our proposed model signicantly outperformed prior methods on infant brain tissue segmentation. In addition, our results indicated that integration of multi-modality images led to signicant performance improvement. © 2014 Elsevier Inc. All rights reserved. Introduction During the rst year of postnatal human brain development, the brain tissues grow quickly, and the cognitive and motor functions undergo a wide range of development (Zilles et al., 1988; Paus et al., 2001; Fan et al., 2011). The segmentation of infant brain tissues into white matter (WM), gray matter (GM), and cerebrospinal uid (CSF) is of great importance for studying early brain development in health and disease (Li et al., 2013a, 2013b; Nie et al., 2012; Yap et al., 2011; Gilmore et al., 2012). It is widely accepted that the segmentation of in- fant brains is more difcult than that of the adult brains. This is mainly due to the lower tissue contrast in early-stage brains (Weisenfeld and Wareld, 2009). There are three distinct WM/GM contrast patterns in chronological order, which are infantile (birth), isointense, and adult- like (10 months and onward) (Paus et al., 2001). In this work, we fo- cused on the isointense stage that corresponds to the infant age of ap- proximately 68 months. In this stage, WM and GM exhibit almost the same level of intensity in both T1 and T2 MR images. This property makes the tissue segmentation problem very challenging (Shi et al., 2010b). Currently, most of prior methods for infant brain MR image segmen- tation have focused on the infantile or adult-like stages (Cardoso et al., 2013; Gui et al., 2012; Shi et al., 2010a; Song et al., 2007; Wang et al., 2011, 2013; Weisenfeld and Wareld, 2009; Xue et al., 2007). They as- sumed that each tissue class can be modeled by a single Gaussian distri- bution or the mixture of Gaussian distributions (Xue et al., 2007; Wang et al., 2011; Shi et al., 2010a; Cardoso et al., 2013). This assumption may not be valid for the isointense stage, since the distributions of WM and GM largely overlap due to early maturation and myelination. In addi- tion, many previous methods segmented the tissues using a single T1 or T2 images or the combination of T1 and T2 images (Kim et al., 2013; Leroy et al., 2011; Nishida et al., 2006; Weisenfeld et al., 2006a, 2006b). It has been shown that the fractional anisotropy (FA) images from diffusion tensor imaging provide rich information of major ber bundles (Liu et al., 2007), especially in the middle of the rst year (around 68 months of age). The studies in Wang et al. (2014, 2012) demonstrated that the complementary information from multiple NeuroImage 108 (2015) 214224 Corresponding authors. E-mail addresses: [email protected] (S. Ji), [email protected] (D. Shen). http://dx.doi.org/10.1016/j.neuroimage.2014.12.061 1053-8119/© 2014 Elsevier Inc. All rights reserved. Contents lists available at ScienceDirect NeuroImage journal homepage: www.elsevier.com/locate/ynimg

Deep convolutional neural networks for multi-modality isointense infant brain image segmentation

Embed Size (px)

DESCRIPTION

The segmentation of infant brain tissue images into white matter (WM), gray matter (GM), and cerebrospinalfluid (CSF) plays an important role in studying early brain development in health and disease. In the isointensestage (approximately 6–8 months of age), WM and GM exhibit similar levels of intensity in both T1 and T2 MRimages, making the tissue segmentation very challenging. Only a small number of existing methods have beendesigned for tissue segmentation in this isointense stage; however, they only used a single T1 or T2 images, orthe combination of T1 and T2 images. In this paper, we propose to use deep convolutional neural networks(CNNs) for segmenting isointense stage brain tissues using multi-modality MR images. CNNs are a type ofdeep models in which trainable filters and local neighborhood pooling operations are applied alternatingly onthe raw input images, resulting in a hierarchy of increasingly complex features. Specifically, we used multi-modality information from T1, T2, and fractional anisotropy (FA) images as inputs and then generated thesegmentation maps as outputs. The multiple intermediate layers applied convolution, pooling, normalization,and other operations to capture the highly nonlinear mappings between inputs and outputs. We compared theperformance of our approach with that of the commonly used segmentation methods on a set of manually seg-mented isointense stage brain images. Results showed that our proposed model significantly outperformed priormethods on infant brain tissue segmentation.

Citation preview

  • lt

    c, W

    275759

    Available online 3 January 2015

    Keywords:Image segmentationMulti-modality dataInfant brain imageConvolutional neural networks

    Xue et al., 2007;Wang. This assumptionmaystributions of WM andmyelination. In addi-

    NeuroImage 108 (2015) 214224

    Contents lists available at ScienceDirect

    NeuroIm

    e lWareld, 2009). There are three distinct WM/GM contrast patterns inchronological order, which are infantile (birth), isointense, and adult-like (10 months and onward) (Paus et al., 2001). In this work, we fo-

    tion, many previous methods segmented the tissues using a single T1or T2 images or the combination of T1 and T2 images (Kim et al.,2013; Leroy et al., 2011; Nishida et al., 2006; Weisenfeld et al., 2006a,and disease (Li et al., 2013a, 2013b; Nie et al., 2012; Yap et al., 2011;Gilmore et al., 2012). It is widely accepted that the segmentation of in-fant brains is more difcult than that of the adult brains. This is mainlydue to the lower tissue contrast in early-stage brains (Weisenfeld and

    bution or the mixture of Gaussian distributions (et al., 2011; Shi et al., 2010a; Cardoso et al., 2013)not be valid for the isointense stage, since the diGM largely overlap due to early maturation andDuring the rst year of postnatal human brain development, thebrain tissues grow quickly, and the cognitive and motor functionsundergo a wide range of development (Zilles et al., 1988; Paus et al.,2001; Fan et al., 2011). The segmentation of infant brain tissues intowhite matter (WM), gray matter (GM), and cerebrospinal uid (CSF)is of great importance for studying early brain development in health

    makes the tissue segmentation problem very challenging (Shi et al.,2010b).

    Currently, most of priormethods for infant brainMR image segmen-tation have focused on the infantile or adult-like stages (Cardoso et al.,2013; Gui et al., 2012; Shi et al., 2010a; Song et al., 2007; Wang et al.,2011, 2013; Weisenfeld and Wareld, 2009; Xue et al., 2007). They as-sumed that each tissue class can bemodeled by a single Gaussian distri-cused on the isointense stage that correspondproximately 68 months. In this stage, WM an

    Corresponding authors.E-mail addresses: [email protected] (S. Ji), dgshen@med.

    http://dx.doi.org/10.1016/j.neuroimage.2014.12.0611053-8119/ 2014 Elsevier Inc. All rights reserved.same level of intensity in both T1 and T2 MR images. This propertyIntroductionDeep learninguid (CSF) plays an important role in studying early brain development in health and disease. In the isointensestage (approximately 68 months of age), WM and GM exhibit similar levels of intensity in both T1 and T2 MRimages, making the tissue segmentation very challenging. Only a small number of existing methods have beendesigned for tissue segmentation in this isointense stage; however, they only used a single T1 or T2 images, orthe combination of T1 and T2 images. In this paper, we propose to use deep convolutional neural networks(CNNs) for segmenting isointense stage brain tissues using multi-modality MR images. CNNs are a type ofdeep models in which trainable lters and local neighborhood pooling operations are applied alternatingly onthe raw input images, resulting in a hierarchy of increasingly complex features. Specically, we used multi-modality information from T1, T2, and fractional anisotropy (FA) images as inputs and then generated thesegmentation maps as outputs. The multiple intermediate layers applied convolution, pooling, normalization,and other operations to capture the highly nonlinear mappings between inputs and outputs. We compared theperformance of our approach with that of the commonly used segmentation methods on a set of manually seg-mented isointense stage brain images. Results showed that our proposedmodel signicantly outperformed priormethods on infant brain tissue segmentation. In addition, our results indicated that integration ofmulti-modalityimages led to signicant performance improvement.

    2014 Elsevier Inc. All rights reserved.Article history:Accepted 23 December 2014

    The segmentation of infant brain tissue images into white matter (WM), gray matter (GM), and cerebrospinala b s t r a c ta r t i c l e i n f oDeep convolutional neural networks for mubrain image segmentation

    Wenlu Zhang a, Rongjian Li a, Houtao Deng b, Li Wanga Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USAb Instacart, San Francisco, CA 94107, USAc IDEA Lab, Department of Radiology and BRIC, University of North Carolina at Chapel Hill, NCd MRI Lab, Department of Radiology and BRIC, University of North Carolina at Chapel Hill, NC 2e Department of Brain and Cognitive Engineering, Korea University, Seoul, Republic of Korea

    j ourna l homepage: www.s to the infant age of ap-d GM exhibit almost the

    unc.edu (D. Shen).i-modality isointense infant

    eili Lin d, Shuiwang Ji a,, Dinggang Shen c,e,

    99, USA9, USA

    age

    sev ie r .com/ locate /yn img2006b). It has been shown that the fractional anisotropy (FA) imagesfrom diffusion tensor imaging provide rich information of major berbundles (Liu et al., 2007), especially in the middle of the rst year(around 68 months of age). The studies in Wang et al. (2014, 2012)demonstrated that the complementary information from multiple

  • large margin when the size of patch increased. This is consistent withthe fact that CNNs weight pixels differently based on their distance to

    215W. Zhang et al. / NeuroImage 108 (2015) 214224the center pixel.

    Material and methods

    Data acquisition and image preprocessing

    The experiments were performed with the approval of InstitutionalReview Board (IRB). All the experiments on infants were approved bytheir parents with written forms. We acquired T1, T2, and diffusion-weighted MR images of 10 healthy infants using a Siemens 3T head-only MR scanner. These infants were asleep, unsedated, tted with earprotection, and their heads were secured in a vacuum-xation deviceduring the scan. T1 images having 144 sagittal slices were acquiredwith TR/TE as 1900/4.38 ms and a ip angle of 7 using a resolution of1 1 1 mm3. T2 images having 64 axial slices were acquired withTR/TE as 7380/119 ms and a ip angle of 150 using a resolution of1.25 1.25 1.95 mm3. Diffusion-weighted images (DWI) having60 axial slices were acquired with TR/TE as 7680/82 ms using a resolu-tion of 2 2 2 mm3 and 42 non-collinear diffusion gradients with adiffusion weight of 1000 s/mm2.

    T2 images and fractional anisotropy (FA) images, derived fromdistortion-corrected DWI, were rst rigidly aligned with the T1 imageand further up-sampled into an isotropic grid with a resolution of1 1 1 mm3. A rescanning was executed when the data was accom-panied with moderate or severe motion artifacts (Blumenthal et al.,2002). We then applied intensity inhomogeneity correction (Sledet al., 1998) on both T1 and aligned T2 images (but not for FA imagesince it is not needed). After that, we applied the skull stripping (Shiet al., 2012) and removal of cerebellum and brain stem on the T1image by using in-house tools. In this way, we obtained a brain maskwithout the skull, cerebellum and brain stem. With this brain mask,we nally removed the skull, cerebellum and brain stem also from theimage modalities was benecial to deal with the insufcient tissuecontrast.

    To overcome the above-mentioned difculties, we consideredthe deep convolutional neural networks (CNNs) in this work. CNNs(LeCun et al., 1998a; Krizhevsky et al., 2012) are a type of multi-layer,fully trainable models that can capture highly nonlinear mappings be-tween inputs and outputs. These models were originally motivatedfrom computer vision problems and thus are intrinsically suitable forimage-related applications. In this work, we proposed to employ CNNsfor segmenting infant tissue images in the isointense stage. One appeal-ing property of CNNs is that it can naturally integrate and combinemulti-modality brain images in determining the segmentation. OurCNNs took complementary and multi-modality information from T1,T2, and FA images as inputs and then generated the segmentationmaps as outputs. The multiple intermediate layers applied convolution,pooling, normalization, and other operations to transform the input tothe output. The networks contain millions of trainable parameters thatwere adjusted on a set of manually segmented data. Specically, thenetworks took patches centered at a pixel as inputs and produced thetissue class of the center pixel as the output. This enabled the segmenta-tion results of a pixel to be determined by all pixels in the neighborhood.In addition, due to the convolution operations applied at intermediatelayers, nearby pixels contribute more to the segmentation results thanthose that are far away.We compared the performance of our approachwith that of the commonly used segmentation methods. Resultsshowed that our proposed model signicantly outperformed priormethods on infant brain tissue segmentation. In addition, our resultsindicated that the integration of multi-modality images led to signi-cant performance improvement. Furthermore, we showed that ourCNN-based approach outperformed other methods at increasinglyaligned T2 and FA images.To generate manual segmentation, an initial segmentation was ob-tained with publicly available infant brain segmentation software,IBEAT (Dai et al., 2013). Then, manual editing was carefully performedby an experienced rater according to the T1, T2 and FA images forcorrecting possible segmentation errors. ITK-SNAP (Yushkevich et al.,2006) (www.itksnap.org) was particularly used for interactive manualediting. For each infant brain image, there are generally 100 axial slices;we randomly selected slices from themiddle regions (40th60th slices)for manual segmentation. This work only used these manually seg-mented slices. Since we were not able to obtain the FA images of 2 sub-jects, we only used the remaining 8 subjects in this work. Note thatpixels are treated as samples in segmentation tasks. For each subject,we generated more than 10,000 patches centered at each pixel fromT1, T2, and FA images. These patches were considered as training andtesting samples in our study.

    Deep CNN for multi-modality brain image segmentation

    Deep learningmodels are a class of machines that can learn a hierar-chy of features by building high-level features from low-level ones. Theconvolutional neural networks (CNNs) (LeCun et al., 1998a; Krizhevskyet al., 2012) are a type of deep models, in which trainable lters andlocal neighborhood pooling operations are applied alternatingly onthe raw input images, resulting in a hierarchy of increasingly complexfeatures. One property of CNN is its capability to capture highly nonlin-ear mappings between inputs and outputs (LeCun et al., 1998a). Whentrainedwith appropriate regularization, CNNs can achieve superior per-formance on visual object recognition and image classication tasks(LeCun et al., 1998a; Krizhevsky et al., 2012). In addition, CNN has alsobeen used in a few other applications. In Jain et al. (2007), Jain andSeung (2009), Turaga et al. (2010), and Helmstaedter et al. (2013),CNNs were applied to restore and segment the volumetric electron mi-croscopy images. Ciresan et al. (2013, 2012) applied deep CNNs to de-tect mitosis in breast histology images by using pixel classiers basedon patches.

    In this work, we proposed to use CNN for segmenting the infantbrain tissues by combining multi-modality T1, T2, and FA images.Although CNN has been used for similar tasks in prior studies, none ofthem has focused on integrating and combining multi-modality imagedata. Our CNN contained multiple input feature maps correspondingto different datamodalities, thus providing a natural formalism for com-bining multi-modality data. Since different modalities might containcomplementary information, our experimental results showed thatcombining multi-modality data with CNN led to improved segmenta-tion performance. Fig. 1 showed a CNN architecture we developed forsegmenting infant brain images into white matter (WM), gray matter(GM), and cerebrospinal uid (CSF).

    Deep CNN architectures

    In this study, we designed four CNN architectures to segment infantbrain tissues based on multi-modality MR images. In the following, weprovided details on one of the CNN architectures with input patch sizeof 13 13 to explain the techniques used in this work. The detailed ar-chitecture was shown in Fig. 1. This CNN architecture contained threeinput feature maps corresponding to T1, T2, and FA image patches of13 13. It then applied three convolutional layers and one fully con-nected layer. This network also applied local response normalizationand softmax layers.

    The rst convolutional layer contained 64 feature maps. Each of thefeature maps was connected to all of the three input feature mapsthrough lters of size 5 5. We used a stride size of one pixel. This gen-erated feature maps of size 9 9 in this layer. The second convolutionallayer took the output of the rst convolutional layer as input andcontained 256 feature maps. Each of the feature maps was connected

    to all of the feature maps in the previous layer through lters of size

  • 5 5. We again used a stride size of one pixel. The third convolutionallayer contained 768 feature maps of size 1 1. They were connectedto all feature maps in the previous layer through 5 5 lters. We alsoused a stride size of one pixel in this layer. The function (Nair andHinton, 2010) was applied after the convolution operation in all of rec-tied linear unit (ReLU) the convolutional layers. It has been shown(Krizhevsky et al., 2012) that the use of ReLU can expedite the training

    response normalization and softmax layers have been applied onthese architectures.We also usedmax-pooling layer for the architecturewith input patch size of 22 22 after the rst convolutional layer. Thepooling sizewas set to 2 2 and a stride size of 2 2wasused. The com-plete details of these architectures were given in Table 1. The numbersof trainable parameters for these architectures are 6,577,155,5,947,523, and 5,332,995, respectively.

    Fig. 1. Detailed architecture of the convolutional neural network taking patches of size 13 13 as inputs.

    Norm

    aye

    orm

    216 W. Zhang et al. / NeuroImage 108 (2015) 214224of CNN.In addition to the convolutional layers, a few other layer types have

    been used in the CNN. Specically, the local response normalizationscheme was applied after the third convolutional layer to enforcecompetitions between features at the same spatial location acrossdifferent feature maps. The fully-connected layer following the normal-ization layer had 3 outputs that correspond to the three tissue classes. A3-way softmax layerwas used to generate a distribution over the 3 classlabels after the output of the fully-connected layer. Our network mini-mized the cross entropy loss between the predicted label and groundtruth label. In addition, we used dropout (Hinton et al., 2012) to learnmore robust features and reduce overtting. This technique set the out-put of each neuron to zerowith probability 0.5. The dropoutwas appliedbefore the fully-connected layers in the CNN architecture of Fig. 1. Intotal, the number of trainable parameters for this architecture is5,332,995.

    We also considered three other CNN architectures with input patchsizes of 9 9, 17 17, and 2222. These CNN architectures consisted ofdifferent numbers of convolutional layers and feature maps. Both local

    Table 1Details of the CNN architectureswith different input patch sizes used in thiswork. Conv., layer, respectively.

    Patch size Layer 1 Layer 2 L

    9 9 Layer type Conv. Conv. N

    # of feat. maps 256 1024 1024Filter size 5 5 5 5 Conv. stride 1 1 1 1 Input size 9 9 5 5 1 1

    13 13 Layer type Conv. Conv. Conv# of feat. maps 64 256 768Filter size 5 5 5 5 5 5Conv. stride 1 1 1 1 1 1Input size 13 13 9 9 5 5

    17 17 Layer type Conv. Conv. Conv# of feat. maps 64 128 256Filter size 5 5 5 5 5 5Conv. stride 1 1 1 1 1 1Input size 17 17 13 13 9 9

    22 22 Layer type Conv. Pooling Conv# of feat. maps 64 64 256Filter size 5 5 5 5Pooling size 2 2 Pooling stride 2 2 Input size 22 22 18 18 9 9Model training and calibration

    We trained the networks using data consisting of patches extractedfrom the MR images and the corresponding manual segmentationground truth images. In this work, we did not consider the segmenta-tion of background as this is clear from the T1 images. Instead, we fo-cused on segmenting the three tissue types (GM, WM, and CSF) fromthe foreground. For each foreground pixel, we extracted three patchescentered at this pixel from T1, T2, and FA images, respectively. Thethree patches were used as input feature maps of CNNs. The corre-sponding output was a binary vector of length 3 indicating the tissueclass to which the pixel belonged. This procedure generated morethan 10,000 instances, each corresponding to three patches, from eachsubject. We used leave-one-subject-out cross validation procedure toevaluate the segmentation performance. Specically, we used sevenout of the eight subjects to train the network and used the remainingsubject to evaluate the performance. The average performance acrossfolds was reported. All the patches from each training subject are stored

    ., and Full conn. denote convolutional layers, normalization layers, and fully-connected

    r 3 Layer 4 Layer 5 Layer 6 Layer 7

    . Full conn. softmax

    3 3 1 1 1 1 1 1 1 1

    . Norm. Full conn. softmax 768 3 3 1 1 1 1 1 1 1 1 1 1

    . Conv. Norm. Full conn. softmax768 768 3 35 5 1 1 1 1 1 1 5 5 1 1 1 1 1 1

    . Conv. Norm. Full conn. softmax768 768 3 35 5 1 1

    5 5 1 1 1 1 1 1

  • in a batch le separately, leading to seven batch les in total. We usedpatches in these seven batches as the input of CNN consecutively fortraining. Note that patches in each batch le were presented to thetraining algorithm in random orders as was commonly used.

    The weights in the networks were initialized randomly with Gauss-iandistributionN(0,1 104) (LeCun et al., 1998b). During training, theweights were updated by stochastic gradient descent algorithm with amomentum of 0.9 and a weight decay of 4 104. The biases inconvolutional layers and fully-connected layer were initialized to 1.The number of epochs was tuned on a validation set consisting ofpatches from one randomly selected subject in the training set. Thelearning rate was set to 4 104 initially.

    FollowingKrizhevsky et al. (2012), we rst used the validation set toobtain a coarse approximation of the optimal epoch by minimizing thevalidation error. This epoch number was used to train a model on thetraining and validation sets consisting of seven subjects. Then the learn-ing rate was reduced by a factor of 10 twice successively, and themodelwas trained for about 10 epochs each time. By following this procedure,the network with a patch size of 13 13 was trained for about

    370 epochs. The training took less than one day on a Tesla K20c GPUwith 2496 cores. The networks with other patch sizes were trained ina similar way. One advantage of using CNN for image segmentation isthat, at test time, the entire image can be used as an input to the net-work to produce the segmentation map, and patch-level prediction isnot needed (Giusti et al., 2013; Ning et al., 2005). This leads to very ef-cient segmentation at test time. For example, our CNN models tookabout 50100 s for segmenting an image of size 256 256.

    Results and discussion

    Experimental setup

    In the experiments, we focused on evaluating our CNN architecturesfor segmenting the three types of infant brain tissues. We formulatedthe prediction of brain tissue classes as a three-class classication task.For comparison purposes, we also implemented two other commonlyused classication methods, namely the support vector machine(SVM) and the random forest (RF) (Breiman, 2001)methods. The linear

    0.86

    0.85

    0.84

    0.83

    0.82

    0.81

    0.7

    0.6

    0.5

    0.4

    0.3

    0.5

    0.45

    0.4

    0.35

    0.3

    0.25

    0.2

    0.88

    0.86

    0.84

    0.82

    0.8

    9 13 17 22

    9 13 17 22

    9 13 17 22

    9 13 17 22

    Patch size

    Dic

    e ra

    tio fo

    r CSF

    Dic

    e ra

    tio fo

    r GM

    MH

    D fo

    r CSF

    MH

    D fo

    r GM

    Patch size

    ferenpatize odere

    217W. Zhang et al. / NeuroImage 108 (2015) 2142240.88

    0.86

    0.84

    0.82

    0.8

    0.789 13 17 22

    Dic

    e ra

    tio fo

    r WM

    Patch size

    Patch size

    Fig. 2. Box plots of the segmentation performance achieved by CNNs over 8 subjects for difeach of the three tissue types, and four different architectures are trained by using differentusing leave-one-subject-out cross validation and 8 test results are collected for each patch sand 75th percentiles. Thewhiskers extend to theminimumandmaximumvalues not consi

    measured by MHD using the same conguration.0.7

    0.6

    0.5

    0.4

    0.3

    0.2

    9 13 17 22

    MH

    D fo

    r WM

    Patch size

    Patch size

    t patch sizes. Each plot in the rst column uses Dice ratio to measure the performance forch sizes of 9 9, 13 13, 17 17, and 22 22, respectively. The performance is evaluatedf each plot. The central mark represents themedian, the edges of the box denote the 25thd outliers, and outliers are plotted individually. The plots in the right column are the results

  • SVM was used in our experiments, as other kernels yielded lower per-formance empirically. The performance of SVM was generated by

    modied Hausdorff distance (MHD). Supposing that C and D aretwo sets of positive pixels identied manually and computationally, re-

    Fig. 3. Visualization of the 64 lters in the rst convolutional layer for the model with an input patch size of 13 13.

    218 W. Zhang et al. / NeuroImage 108 (2015) 214224tuning the regularization parameters using cross validation. An RF is atree-based ensemble model in which a set of randomized trees arebuilt and the nal decision is made using majority voting by all trees.This method has been used in image-related applications (Amit andGeman, 1997), including medical image segmentation (Criminisi andShotton, 2013; Criminisi et al., 2012). In this work, we used RFs contain-ing 100 trees, and each treewas grown fully and unpruned. The numberof features at each node randomly selected to compete for the best splitwas set to the square root of the total number of features. We used therandomForest R package (Liaw andWiener, 2002) in the experiments.We reshaped the raw training patches into vectors whose elementswere considered as the input features of SVM andRF.We also comparedour methods with two common image segmentation methods, namelythe coupled level set (CLS) (Wang et al., 2011) and the majority voting(MV) methods. Note that the method based on local dictionaries ofpatches proposed inWang et al. (2014) requires the images of differentsubjects to be registered, since a local dictionary was constructed byusing patches extracted from the corresponding locations on the train-ing images. We thus did not compare our methods with the one inWang et al. (2014).

    To evaluate the segmentation performance, we used the Dice ratio(DR) to quantitatively measure the segmentation accuracy. Specically,let A and B denote the binary segmentation labels generated manuallyand computationally, respectively, about one tissue class on pixels forcertain subject. The Dice ratio is dened as

    DR A; B 2jABjAj jBj j ;

    where |A| denotes the number of positive elements in the binary seg-mentation A, and |A B| is the number of shared positive elements byA and B. The Dice ratio lies in [0, 1], and a larger value indicates a highersegmentation accuracy. We also used another measure known as theTable 2Comparison of segmentation performance over different imagemodalities achieved by CNNwitdifferent tissue segmentation tasks is highlighted.

    Sub. 1 Sub. 2 Sub. 3

    CSF T1 0.7797 0.7824 0.7928T2 0.6906 0.7238 0.7459FA 0.6021 0.5838 0.6068All 0.8323 0.8314 0.8304

    GM T1 0.8123 0.8039 0.8001T2 0.6094 0.5884 0.6026FA 0.7256 0.7459 0.7282All 0.8531 0.8572 0.8848

    WM T1 0.8241 0.7476 0.8269T2 0.6942 0.7021 0.7181FA 0.8082 0.6816 0.6627All 0.8798 0.8116 0.8824spectively, about one tissue class for a certain subject, the MHD is de-ned as

    MHD C; D max d C; D ; d D; C ;

    where d(C,D)=maxc Cd(c,D), and the distance between a point c anda set of points D is dened as d(c, D)=mind D||c d||. A smaller valueindicates a higher proximity of two point sets, thus implying a highersegmentation accuracy.

    Comparison of different CNN architectures

    The nonlinear relationship between inputs and outputs of a CNN isrepresented by its multi-layer architecture using convolution, poolingand normalization. We rst studied the impact of different CNN archi-tectures on segmentation accuracy. We devised four different architec-tures, and the detailed congurations have been described in Table 1.The classication performance of these architectures was reported inFig. 2 using box plots. It can be observed from the results that the predic-tive performance is generally higher for the architectures with inputpatch sizes of 13 13 and 17 17. This result is consistent with thefact that networks with more convolutional layers and feature mapstend to have a deeper hierarchical structure and more trainable param-eters. Thus, these networks are capable of capturing the complex rela-tionship between input and output. We can also observe that thearchitecture with input patch size of 22 22 did not generate substan-tially higher predictive performance, suggesting that the pooling opera-tion might not be suitable for the data we used. In the following, wefocused on evaluating the performance of CNN with input patch sizeof 13 13. To examine the patterns captured by the CNNmodels, we vi-sualized the 64 lters in the rst convolutional layer for the model withan input patch size of 13 13 in Fig. 3. Similar to the observation inh input patch size of 13 13 on each subject in terms of Dice ratio. The best performance of

    Sub. 4 Sub. 7 Sub. 8 Sub. 9 Sub. 10

    0.8072 0.7931 0.8076 0.7610 0.81760.7614 0.7792 0.7737 0.7328 0.79340.5378 0.6076 0.6001 0.6374 0.54080.8373 0.8482 0.8492 0.8211 0.83390.7529 0.7693 0.7499 0.7273 0.81460.4973 0.5897 0.6142 0.6027 0.62660.6065 0.7224 0.7126 0.6991 0.78270.8184 0.8119 0.8652 0.8628 0.86070.7751 0.8006 0.8223 0.7527 0.79960.6318 0.6917 0.7001 0.6892 0.69790.7238 0.7824 0.7774 0.8131 0.81630.8489 0.8689 0.8677 0.8742 0.8760

  • (Zeiler and Fergus (2014), these lters capture primitive image featuressuch as edges and corners.

    Effectiveness of integrating multi-modality data

    To demonstrate the effectiveness of integratingmulti-modality data,we considered the performance achieved by each single image modali-ty. Specically, the T1, T2, and FA images of each subjectwere separatelyused as the input of the architecture with a patch size of 13 13in Table 1. The segmentation performance achieved using different

    modalities was presented in Tables 2 and 3. It can be observed thatthe combination of different image modalities invariably yielded higherperformance than any of the single imagemodality.We can also see thatthe T1 images produced the highest performance among the threemodalities. This suggests that the T1 images are most informative indiscriminating the three tissue types. Another interesting observationis that the FA images are very informative in distinguishing GMand WM, but they achieved low performance on CSF. This might bebecause the anisotropic diffusion is hardly detectable using FA forliquids such as cerebrospinal uid (CSF) in brain. In contrast, T2 images

    Table 3Comparison of segmentation performance over different image modalities achieved by CNN with input patch size of 13 13 on each subject in terms of modied Hausdorff distance(MHD). The best performance of different tissue segmentation tasks is highlighted.

    Sub. 1 Sub. 2 Sub. 3 Sub. 4 Sub. 7 Sub. 8 Sub. 9 Sub. 10

    CSF T1 0.7245 0.6724 0.6428 0.6072 0.5537 0.5027 0.6021 0.4478T2 0.9048 0.8228 0.7932 0.6978 0.6004 0.5938 0.6989 0.5457FA 1.2446 1.3895 1.3348 1.4277 1.3271 1.4297 0.9312 1.3389All 0.6320 0.3293 0.3659 0.4395 0.4268 0.4482 0.4970 0.3442

    GM T1 0.5069 0.4237 0.4892 0.6528 0.6187 0.6691 0.6843 0.3971T2 1.2372 1.3728 1.2871 1.8421 1.5980 1.2963 1.3325 1.2241FA 0.6839 0.6781 0.6538 0.9479 0.6843 0.6945 0.7461 0.4322All 0.2067 0.2490 0.2010 0.2964 0.4398 0.2367 0.1839 0.1719

    WM T1 0.4796 0.6526 0.4232 0.6455 0.4726 0.4271 0.5023 0.4047T2 0.9171 0.7381 0.7974 1.0043 0.9423 0.7169 0.8274 0.8934FA 0.4162 0.8924 1.0258 0.7523 0.6228 0.5428 0.4238 0.5016All 0.2258 0.4362 0.2401 0.3275 0.2504 0.3050 0.3029 0.2271

    Table 4Segmentation performance in terms of Dice ratio achieved by the convolutional neural network (CNN), random forest (RF), support vector machine (SVM), coupled level sets (CLS), andmajority voting (MV). The highest performance in each case was highlighted, and the statistical signicance of the results were given in Table 6.

    Sub. 1 Sub. 2 Sub. 3 Sub. 4 Sub. 7 Sub. 8 Sub. 9 Sub. 10

    CSF CNN 0.8323 0.8314 0.8304 0.8373 0.8482 0.8492 0.8211 0.8339RF 0.8192 0.8135 0.8323 0.8090 0.8306 0.8457 0.7904 0.7955SVM 0.7409 0.7677 0.7733 0.7429 0.7006 0.7837 0.7243 0.7333CLS 0.8064 0.8152 0.732 0.8614 0.8397 0.8238 0.8087 0.828MV 0.7072 0.6926 0.6826 0.6348 0.6313 0.6136 0.6904 0.692

    GM CNN 0.8531 0.8572 0.8848 0.8184 0.8119 0.8652 0.8628 0.8607

    219W. Zhang et al. / NeuroImage 108 (2015) 214224RF 0.8288 0.8482 0.8772SVM 0.7933 0.7991 0.8294CLS 0.8298 0.8389 0.8498MV 0.849 0.8442 0.8525

    WM CNN 0.8798 0.8116 0.8824RF 0.8612 0.7816 0.8687SVM 0.8172 0.7404 0.7623CLS 0.8383 0.8054 0.7998MV 0.8631 0.8002 0.8504Table 5Segmentation performance in terms of modied Hausdorff distance (MHD) achieved by the ccoupled level sets (CLS), and majority voting (MV). The best performance in each case was hig

    Sub. 1 Sub. 2 Sub. 3

    CSF CNN 0.6320 0.3293 0.3659RF 1.0419 0.5914 0.6802SVM 1.1426 0.8867 0.7571CLS 0.6420 0.3487 0.8151MV 1.5287 1.3788 1.3566

    GM CNN 0.2067 0.2490 0.2010RF 0.2771 0.2739 0.1524SVM 0.5247 0.2916 0.3566CLS 0.3615 0.2950 0.2683MV 0.2834 0.2743 0.2483

    WM CNN 0.2258 0.4362 0.2401RF 0.3022 0.7981 0.2648SVM 0.3218 0.8290 0.5276CLS 0.6320 0.4923 0.7207MV 0.3063 0.5314 0.28240.8078 0.7976 0.8498 0.8461 0.83530.7527 0.7416 0.7996 0.8017 0.80380.8343 0.813 0.8719 0.8612 0.84210.8027 0.7831 0.797 0.8372 0.82990.8489 0.8689 0.8677 0.8742 0.87600.8373 0.8479 0.8575 0.8393 0.83530.8030 0.7997 0.7919 0.7059 0.75850.8238 0.8437 0.8213 0.8297 0.81070.8171 0.8389 0.8373 0.8412 0.8445onvolutional neural network (CNN), random forest (RF), support vector machine (SVM),hlighted, and the statistical signicance of the results were given in Table 6.

    Sub. 4 Sub. 7 Sub. 8 Sub. 9 Sub. 10

    0.4395 0.4268 0.4482 0.4970 0.34420.9042 0.3610 0.4935 0.9151 0.69490.9014 1.0020 0.4743 1.1789 0.88660.4875 0.4987 0.4939 0.4717 0.49861.5178 1.4157 2.1068 1.1156 1.28890.2964 0.4398 0.2367 0.1839 0.17190.3033 0.3429 0.2315 0.2517 0.27080.4015 0.6308 0.3809 0.4466 0.43620.3577 0.3872 0.2536 0.2530 0.36550.3395 0.4316 0.3569 0.2687 0.33240.3275 0.2504 0.3050 0.3029 0.22710.3201 0.5020 0.3321 0.3268 0.39090.5751 0.4784 0.4445 0.9407 0.60290.5425 0.6947 0.4485 0.5627 0.72160.2907 0.2922 0.3323 0.3271 0.3751

  • are more powerful for capturing CSF instead of GM and WM. These re-sults demonstrated that certain modality is more informative indistinguishing certain tissue types, and combination of all modalitiesleads to improved segmentation performance.

    Comparison with other methods

    In order to provide a comprehensive and quantitative evaluation ofthe proposed method, we reported the segmentation performance onall 8 subjects using leave-one-subject-out cross validation. The perfor-mance of CNN, RF, SVM, CLS, and MV was reported in Tables 4 and 5using the Dice ratio and MHD, respectively. We can observe fromthese two tables that CNN outperformed other methods for segmenting

    Table 6Statistical test results in comparing CNN with RF, SVM, CLS, and MV, respectively. Wecalculated the p-values byperforming one-sidedWilcoxon signed rank tests using the per-formance reported in Tables 4 and 5. We performed the left-sided test for the Dice ratio,and the right-sided test for the MHD.

    CSF GM WM

    Dice ratio CNN vs. RF 3.30E03 1.55E04 4.02E04CNN vs. SVM 2.55E05 2.51E09 1.87E04CNN vs. CLS 6.59E02 8.88E02 8.37E04CNN vs. MV 6.22E06 2.50E03 1.71E05

    MHD CNN vs. RF 2.30E03 2.72E01 2.16E02CNN vs. SVM 1.39E04 3.67E04 7.99E04CNN vs. CLS 5.75E02 1.85E02 5.57E04CNN vs. MV 1.09E05 4.30E03 1.52E02

    Fig. 4. Comparison of the segmentation results with themanually generated segmentation on Sond row shows the manual segmentations (CSF, GM, and WM). The third and fourth rows sho

    220 W. Zhang et al. / NeuroImage 108 (2015) 214224ubject 1. The rst row shows the original multi-modality data (T1, T2 and FA), and the sec-

    w the segmentation results by CNN and RF, respectively.

  • 221W. Zhang et al. / NeuroImage 108 (2015) 214224all three types of brain tissues in most cases. Specically, CNN couldachieve Dice ratios as 83.55% 0.94% (CSF), 85.18% 2.45% (GM),and 86.37% 2.34% (WM) on average over 8 subjects, yielding an over-all value of 85.03% 2.27%. In contrast, RF, SVM, CLS, and MV achievedoverall Dice ratios of 83.15% 2.52%, 76.95% 3.55%, 82.62% 2.76%,and 77.64% 8.28%, respectively. Meanwhile, CNN also outperformedother methods in terms of MHD. Specically, CNN could achieveMHDs as 0.4354 0.0979 (CSF), 0.2482 0.0871 (GM), and0.2894 0.0710 (WM), yielding an overall value of 0.3243 0.1161.In contrast, RF, SVM, CLS, and MV achieved overall MHDs of 0.4593 0.2506, 0.6424 0.2665, 0.4839 0.1597, and 0.7076 0.5721,respectively.

    To assess the statistical signicance of the performance differences,we performed one-sided Wilcoxon signed rank tests on both Dice

    Fig. 5. Comparison of the segmentation results with themanually generated segmentation on Sond row shows the manual segmentations (CSF, GM, and WM). The third and fourth rows shoratio and MHD produced by the 8 subjects, and the p-values were re-ported in Table 6. When considering the Dice ratio, we chose the left-sided test with the alternative hypothesis that the averaged perfor-mance of CNN is higher than that of either RF, SVM, CLS or MV. Theright-sided test was considered for MHD.We can see that the proposedCNN method signicantly outperformed SVM, RF, CLS and MV in mostcases. These results demonstrated that CNN is effective in segmentingthe infant brain tissues as compared to other methods.

    In addition to quantitatively demonstrating the advantage of theproposed CNNmethod, we visually examined the segmentation resultsof different tissues for two subjects in Figs. 4 and 5. The original T1, T2,and FA images were shown in the rst row and the following threerows presented the segmentation results of human experts, CNN, andRF, respectively. It can be seen that, the segmentation patterns of CNN

    ubject 2. The rst row shows the original multi-modality data (T1, T2 and FA), and the sec-w the segmentation results by CNN and RF, respectively.

  • Fig. 6. Label differencemaps of the results generated by CNNand RF on Subject 1. Therst row shows the original images andmanual segmentation (T1, T2, FA, andmanual segmentation).The second and third rows show the results by CNN and RF (CSF, GM,WM, segmentation result). In each label differencemap, dark blue color indicates false positives and the dark greencolor indicates false negatives.

    Fig. 7. Label differencemaps of the results generated by CNNand RF on Subject 2. Therst row shows the original images andmanual segmentation (T1, T2, FA, andmanual segmentation).The second and third rows show the results by CNN and RF (CSF, GM,WM, segmentation result). In each label differencemap, dark blue color indicates false positives and the dark greencolor indicates false negatives.

    222 W. Zhang et al. / NeuroImage 108 (2015) 214224

  • ectsnd r

    223W. Zhang et al. / NeuroImage 108 (2015) 214224are quite similar to the ground truth data generated by human experts.In contrast, RF generated more defects and fuzzy boundaries for differ-ent tissues. These results further showed that the proposed CNNmethod was more effective than other methods.

    In order to further compare results by different methods, the labeldifference maps that compare the ground-truth segmentation withthe predicted segmentation were also presented. In Figs. 6 and 7, theoriginal T1, T2, FA images and the ground-truth segmentations for twosubjects were shown in the rst rows. The false positives and falsenegatives of CNN and RF were given in the second and third rows,respectively. We also showed the segmentation results in these twogures. We can see that the CNN outperformed RF in both the numberof false pixels and the performance of tissue boundary detection. For ex-ample, RF generated more false positives around the surface of brain,and also more false negatives around hippocampus for white matterson Subject 2. We can also observe that most of the mis-classied pixelsare located in the areas having large tissue contrast, such as corticesconsisting of gyri and sulci. This might be explained by the fact thatour segmentation methods are patch-based, and patches centered atboundary pixels contain pixels of multiple tissue types.

    To compare the performance between CNNs and RF when the patchsize varies, we reported the performance differences between CNNs andRF averaged over 8 subjects for different input patch sizes in Fig. 8. Wecan observe that the performance gains of CNNs over RF are generallyamplied for an increased input patch size. This difference is evenmore signicant for the results of CSF and WM, which have more re-stricted distributions than GM.

    This is because of the fact that RF treated each pixel independently,and therefore, did not leverage the spatial relationships between pixels.In comparison, CNNs weighted pixels differently based on their spatialdistance to the center pixel, enabling the retaining of spatial informa-tion. The impact of this essential difference between CNNs and RF is ex-

    0.03

    0.025

    0.02

    0.015

    0.01

    0.005 9 13 17 22Patch size

    Dic

    e ra

    tio D

    iffer

    ence

    CSF GM WM

    Fig. 8. Comparison of performance differences between CNNs and RF averaged over 8 subjferences were obtained by subtracting the performance of RF from that of CNNs. The left apected to bemore signicantwith a larger patch size, sincemore spatialinformation is ignored by RF. This difference probably also explainswhyCNNs could segment the boundary pixels with a higher accuracy, whichwas shown in Figs. 4 and 5.

    Conclusion and future work

    In this study, we aimed at segmenting infant brain tissue images inthe isointense stage. This was achieved by employing CNNs withmulti-ple intermediate layers to integrate and combine multi-modality brainimages. The CNNs used the complementary and multi-modality infor-mation from T1, T2, and FA images as input featuremaps and generatedthe segmentation labels as output feature maps. We compared the per-formance of our approach with that of the commonly used segmenta-tion methods. Results showed that our proposed model signicantlyoutperformed prior methods on infant brain tissue segmentation. Over-all, our experiments demonstrated that CNNs could produce morequantitative and accurate computationalmodeling and results on infanttissue image segmentation.

    In this work, the tissue segmentation problem was formulated as apatch classication task, where the relationship among patches was ig-nored. Some prior work has incorporated geometric constraints intosegmentation models (Wang et al., 2014). We will improve our CNNmodels to include similar constraints in the future. In the current exper-iments, we employed CNNs with a few hidden layers. Recent studiesshowed that CNNs with many hidden layers yielded very promisingperformance on visual recognition tasks when appropriate regulariza-tion was applied (Krizhevsky et al., 2012). We will explore CNNs withmany hidden layers in the future as more data become available. Inthe current study, we used all the patches extracted from each subjectfor training the convolutional neural network. The number of patchesfrom each tissue type is not balanced. The imbalanced data might affectthe prediction performance. For example, we might use sampling andensemble learning for combating this imbalance problem, althoughthis will further increase the training time. The current work used 2DCNN for image segmentation, because only selected slices have beenmanually segmented in the current data set. In principle, CNN couldbe used to segment 3D images when labeled data are available. In thiscase, it is more natural to apply 3D CNN (Ji et al., 2013) as such modelshave been developed for processing 3D video data. The computationalcosts for training and testing 3D CNNs might be higher than those fortraining 2D CNNs, as 3D convolutions are involved in these networks.We will explore these high-order models in the future.

    Acknowledgments

    This work was supported by the National Science Foundation grantsDBI-1147134 and DBI-1350258, and the National Institutes of Health

    0.1

    0

    -0.1

    -0.2

    -0.3

    -0.4 9 13 17 22Patch size

    MH

    D D

    iffer

    ence

    CSF GM WM

    for patch sizes of 9 9, 13 13, 17 17, and 22 22, respectively. The performance dif-ight gures show the results of Dice ratio and MHD, respectively.grants EB006733, EB008374, EB009634, AG041721, MH100217, andAG042599.

    References

    Amit, Y., Geman, D., 1997. Shape quantization and recognition with randomized trees.Neural Comput. 9 (7), 15451588.

    Blumenthal, J.D., Zijdenbos, A., Molloy, E., Giedd, J.N., 2002. Motion artifact in magneticresonance imaging: implications for automated analysis. Neuroimage 16 (1), 8992.

    Breiman, L., 2001. Random forests. Mach. Learn. 45 (1), 532.Cardoso, M.J., Melbourne, A., Kendall, G.S., Modat, M., Robertson, N.J., Marlow, N., Ourselin,

    S., 2013. Adapt: an adaptive preterm segmentation algorithm for neonatal brain mri.Neuroimage 65, 97108.

    Ciresan, D., Giusti, A., Gambardella, L.M., Schmidhuber, J., 2012. Deep neural networkssegment neuronal membranes in electron microscopy images. In: Pereira, F.,Burges, C., Bottou, L., Weinberger, K. (Eds.), Advances in Neural Information Process-ing Systems 25. Curran Associates, Inc., pp. 28432851.

    Ciresan, D.C., Giusti, A., Gambardella, L.M., Schmidhuber, J., 2013. Mitosis detection inbreast cancer histology images with deep neural networks. Proceedings of the Inter-national Conference on Medical Image Computing and Computer Assisted Interven-tion. vol. 2, pp. 411418.

  • Criminisi, A., Shotton, J., 2013. Decision Forests for Computer Vision and Medical ImageAnalysis. Springer.

    Criminisi, A., Shotton, J., Konukoglu, E., 2012. Decision forests: a unied framework forclassication, regression, density estimation, manifold learning and semi-supervisedlearning. Found. Trends Comput. Graph. Vis. 7 (23), 81227.

    Dai, Y., Shi, F., Wang, L., Wu, G., Shen, D., 2013. ibeat: a toolbox for infant brain magneticresonance image processing. Neuroinformatics 11 (2), 211225.

    Fan, Y., Shi, F., Smith, J.K., Lin, W., Gilmore, J.H., Shen, D., 2011. Brain anatomical networksin early human brain development. Neuroimage 54 (3), 18621871.

    Gilmore, J.H., Shi, F., Woolson, S.L., Knickmeyer, R.C., Short, S.J., Lin, W., Zhu, H., Hamer,R.M., Styner, M., Shen, D., 2012. Longitudinal development of cortical and subcorticalgray matter from birth to 2 years. Cerebral Cortex 22 (11), 24782485.

    Giusti, A., Ciresan, D.C., Masci, J., Gambardella, L.M., Schmidhuber, J., 2013. Fast imagescanning with deep max-pooling convolutional neural networks. 2013 IEEE Interna-tional Conference on Image Processing, pp. 40344038.

    Gui, L., Lisowski, R., Faundez, T., Hppi, P.S., Lazeyras, F., Kocher, M., 2012. Morphology-driven automatic segmentation of mr images of the neonatal brain. Med. ImageAnal. 16 (8), 15651579.

    Helmstaedter, M., Briggman, K.L., Turaga, S.C., Jain, V., Seung, H.S., Denk, W., 2013.Connectomic reconstruction of the inner plexiform layer in the mouse retina. Nature500 (7461), 168174.

    Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R., 2012. Improv-ing Neural Networks by Preventing Co-adaptation of Feature Detectors (arXiv,

    Nie, J., Li, G., Wang, L., Gilmore, J.H., Lin,W., Shen, D., 2012. A computational growthmodelfor measuring dynamic cortical development in the rst year of life. Cereb. Cortex 2227722284 (Oxford Univ Press).

    Ning, F., Delhomme, D., LeCun, Y., Piano, F., Bottou, L., Barbano, P.E., 2005. Toward auto-matic phenotyping of developing embryos from videos. IEEE Trans. Image Process.14 (9), 13601371.

    Nishida, M., Makris, N., Kennedy, D.N., Vangel, M., Fischl, B., Krishnamoorthy, K.S.,Caviness, V.S., Grant, P.E., 2006. Detailed semiautomated MRI based morphometryof the neonatal brain: preliminary results. Neuroimage 32 (3), 10411049.

    Paus, T., Collins, D., Evans, A., Leonard, G., Pike, B., Zijdenbos, A., 2001. Maturation of whitematter in the human brain: a review of magnetic resonance studies. Brain Res. Bull.54 (3), 255266.

    Shi, F., Fan, Y., Tang, S., Gilmore, J.H., Lin, W., Shen, D., 2010a. Neonatal brain imagesegmentation in longitudinal MRI studies. Neuroimage 49 (1), 391400.

    Shi, F., Yap, P.-T., Gilmore, J.H., Lin, W., Shen, D., 2010b. Spatialtemporal constraint forsegmentation of serial infant brain mr images. Medical Imaging and AugmentedReality. Springer, pp. 4250.

    Shi, F., Wang, L., Dai, Y., Gilmore, J.H., Lin, W., Shen, D., 2012. Label: pediatric brain extrac-tion using learning-based meta-algorithm. Neuroimage 62 (3), 19751986.

    Sled, J.G., Zijdenbos, A.P., Evans, A.C., 1998. A nonparametric method for automatic correc-tion of intensity nonuniformity in MRI data. Med. Imaging IEEE Trans. 17 (1), 8797.

    Song, Z., Awate, S.P., Licht, D.J., Gee, J.C., 2007. Clinical neonatal brain MRI segmentationusing adaptive nonparametric data models and intensity-based markov priors.Medical Image Computing and Computer-Assisted InterventionMICCAI 2007.Springer, pp. 883890.

    224 W. Zhang et al. / NeuroImage 108 (2015) 214224Jain, V., Seung, S., 2009. Natural image denoising with convolutional networks. In: Koller,D., Schuurmans, D., Bengio, Y., Bottou, L. (Eds.), Advances in Neural InformationProcessing Systems. 21, pp. 769776.

    Jain, V., Murray, J.F., Roth, F., Turaga, S., Zhigulin, V., Briggman, K.L., Helmstaedter, M.N.,Denk, W., Seung, H.S., 2007. Supervised learning of image restoration withconvolutional networks. Computer Vision, 2007. ICCV 2007. IEEE 11th InternationalConference on. IEEE, pp. 18.

    Ji, S., Xu, W., Yang, M., Yu, K., 2013. 3D convolutional neural networks for human actionrecognition. IEEE Trans. Pattern Anal. Mach. Intell. 35 (1), 221231.

    Kim, S.H., Fonov, V.S., Dietrich, C., Vachet, C., Hazlett, H.C., Smith, R.G., Graves, M.M., Piven,J., Gilmore, J.H., Dager, S.R., et al., 2013. Adaptive prior probability and spatial tempo-ral intensity change estimation for segmentation of the one-year-old human brain.J. Neurosci. Methods 212 (1), 4355.

    Krizhevsky, A., Sutskever, I., Hinton, G., 2012. Imagenet classication with deepconvolutional neural networks. In: Bartlett, P., Pereira, F., Burges, C., Bottou, L.,Weinberger, K. (Eds.), Advances in Neural Information Processing Systems. 25,pp. 11061114.

    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., 1998a. Gradient-based learning applied to doc-ument recognition. Proc. IEEE 86 (11), 22782324 (November).

    LeCun, Y., Bottou, L., Orr, G.B., Mller, K.-R., 1998b. Efcient backprop. Neural Networks:Tricks of the Trade. Springer, Berlin Heidelberg, pp. 950.

    Leroy, F., Mangin, J.-F., Rousseau, F., Glasel, H., Hertz-Pannier, L., Dubois, J., Dehaene-Lambertz, G., 2011. Atlas-free surface reconstruction of the cortical greywhite inter-face in infants. PLoS One 6 (11).

    Li, G., Nie, J., Wang, L., Shi, F., Lin, W., Gilmore, J.H., Shen, D., 2013a. Mapping region-specic longitudinal cortical surface expansion from birth to 2 years of age. Cereb.Cortex 23 (11), 27242733.

    Li, G., Wang, L., Shi, F., Lin, W., Shen, D., 2013b. Multi-atlas based simultaneous labeling oflongitudinal dynamic cortical surfaces in infants. Medical Image Computing andComputer-Assisted InterventionMICCAI 2013. Springer, pp. 5865.

    Liaw, A., Wiener, M., 2002. Classication and regression by randomforest. R News 2 (3),1822.

    Liu, T., Li, H., Wong, K., Tarokh, A., Guo, L., Wong, S.T., 2007. Brain tissue segmentationbased on dti data. Neuroimage 38 (1), 114123.

    Nair, V., Hinton, G.E., 2010. Rectied linear units improve restricted boltzmannmachines.Proceedings of the 27th International Conference on, Machine Learning (ICML-10),pp. 807814.Turaga, S.C., Murray, J.F., Jain, V., Roth, F., Helmstaedter, M., Briggman, K., Denk, W., Seung,H.S., 2010. Convolutional networks can learn to generate afnity graphs for imagesegmentation. Neural Comput. 22 (2), 511538.

    Wang, L., Shi, F., Lin, W., Gilmore, J.H., Shen, D., 2011. Automatic segmentation of neonatalimages using convex optimization and coupled level sets. Neuroimage 58 (3),805817.

    Wang, L., Shi, F., Yap, P.-T., Gilmore, J.H., Lin, W., Shen, D., 2012. 4D multi-modality tissuesegmentation of serial infant images. PLoS One 7 (9).

    Wang, L., Shi, F., Yap, P.-T., Lin, W., Gilmore, J.H., Shen, D., 2013. Longitudinally guidedlevel sets for consistent tissue segmentation of neonates. Hum. Brain Mapp. 34 (4),956972.

    Wang, L., Shi, F., Gao, Y., Li, G., Gilmore, J.H., Lin, W., Shen, D., 2014. Integration of sparsemulti-modality representation and anatomical constraint for isointense infant brainMR image segmentation. Neuroimage 89, 152164.

    Weisenfeld, N.I., Wareld, S.K., 2009. Automatic segmentation of newborn brain mri.Neuroimage 47 (2), 564572.

    Weisenfeld, N.I., Mewes, A., Wareld, S.K., 2006a. Segmentation of newborn brain mri.Biomedical Imaging: Nano to Macro, 2006. 3rd IEEE International Symposium on.IEEE, pp. 766769.

    Weisenfeld, N.I., Mewes, A.U., Wareld, S.K., 2006b. Highly accurate segmentation ofbrain tissue and subcortical gray matter from newborn mri. Medical Image Comput-ing and Computer-Assisted InterventionMICCAI 2006. Springer, pp. 199206.

    Xue, H., Srinivasan, L., Jiang, S., Rutherford, M., Edwards, A.D., Rueckert, D., Hajnal, J.V.,2007. Automatic segmentation and reconstruction of the cortex from neonatal mri.Neuroimage 38 (3), 461477.

    Yap, P.T., Fan, Y., Chen, Y., Gilmore, J.H., Lin, W., Shen, D., 2011. Development trends ofwhite matter connectivity in the rst years of life. PLoS One 6 (9), e24678.

    Yushkevich, P.A., Piven, J., Hazlett, H.C., Smith, R.G., Ho, S., Gee, J.C., Gerig, G., 2006. User-guided 3d active contour segmentation of anatomical structures: signicantly im-proved efciency and reliability. Neuroimage 31 (3), 11161128.

    Zeiler, M.D., Fergus, R., 2014. Visualizing and understanding convolutional networks.Proceedings of the European Conference on Computer Vision. Springer, pp. 818833.

    Zilles, K., Armstrong, E., Schleicher, A., Kretschmann, H.-J., 1988. The human pattern ofgyrication in the cerebral cortex. Anat. Embryol. 179 (2), 173179.preprint arXiv:1207.0580).