13
Research Article TGV Upsampling: A Making-Up Operation for Semantic Segmentation Xu Yin , Yan Li, and Byeong-Seok Shin Department of Computer Engineering, Inha University, Incheon. 082, Republic of Korea Correspondence should be addressed to Byeong-Seok Shin; [email protected] Received 8 April 2019; Accepted 8 July 2019; Published 1 August 2019 Academic Editor: Cornelio Y´ añez-M´ arquez Copyright © 2019 Xu Yin et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. With the widespread use of deep learning methods, semantic segmentation has achieved great improvements in recent years. However, many researchers have pointed out that with multiple uses of convolution and pooling operations, great information loss would occur in the extraction processes. To solve this problem, various operations or network architectures have been suggested to make up for the loss of information. We observed a trend in many studies to design a network as a symmetric type, with both parts representing the “encoding” and “decoding” stages. By “upsampling” operations in the “decoding” stage, feature maps are constructed in a certain way that would more or less make up for the losses in previous layers. In this paper, we focus on upsampling operations, make a detailed analysis, and compare current methods used in several famous neural networks. We also combine the knowledge on image restoration and design a new upsampled layer (or operation) named the TGV upsampling algorithm. We successfully replaced upsampling layers in the previous research with our new method. We found that our model can better preserve detailed textures and edges of feature maps and can, on average, achieve 1.4–2.3% improved accuracy compared to the original models. 1.Introduction Compared to traditional classification tasks, semantic seg- mentation is much more difficult. From the level of the neural network, a classifier should be used for each pixel of an image. is can help machines have a better un- derstanding of complex images, not only simple object recognition. Deep learning methods, especially convolu- tional neural networks, have generated spectacular results for visual recognition problems. is has proven that convolutional operations can successfully extract global information and features for maintaining spatial in- variances. When looking back on existing studies, we ob- served that most of them follow a symmetric architecture, which decreases the resolution (called the encoding stage) and then gradually increases it layer by layer (called the decoding stage). Whatever detailed designs are, for feature maps, if a network contains a high-to-low process, it will bring inevitable losses because this process aims to generate a more liable low-resolution presentation from a high-res- olution one. Meanwhile, with a layer increased, there would be a great effect on the texture (or boundary information) of the final dense images. One way to handle this challenge is to reduce the loss of operations during performance. Chen et al. [1] used an atrous convolution. By inserting “blanks” between pixels, the researchers enlarged the receptive field and generated a higher resolution feature map. Although this method is effective and successful, it is still hard to apply, considering the limitations of hardware and memory. Another solution focuses on the methods of making up for losses. In recent research, a design of hierarchical net- works [2, 3] has been a trend. Researchers found that losses can be made up for in later stages. High-level feature maps can be better recovered when mixing information of low- level intermediate results, which further improves dense predictions. We were inspired by the making-up idea and trend to group the layers into encoding and decoding stages (see Figure 1). In the encoding stage, layers downsample the spatial resolution layer by layer; afterward, in the decoding stage, feature maps (intermediate results produced by layers) Hindawi Computational Intelligence and Neuroscience Volume 2019, Article ID 8527819, 12 pages https://doi.org/10.1155/2019/8527819

TGVUpsampling:AMaking-UpOperationfor …downloads.hindawi.com/journals/cin/2019/8527819.pdf1.Introduction Compared to traditional classification tasks, semantic seg- ... We made a

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: TGVUpsampling:AMaking-UpOperationfor …downloads.hindawi.com/journals/cin/2019/8527819.pdf1.Introduction Compared to traditional classification tasks, semantic seg- ... We made a

Research ArticleTGV Upsampling A Making-Up Operation forSemantic Segmentation

Xu Yin Yan Li and Byeong-Seok Shin

Department of Computer Engineering Inha University Incheon 082 Republic of Korea

Correspondence should be addressed to Byeong-Seok Shin bsshininhaackr

Received 8 April 2019 Accepted 8 July 2019 Published 1 August 2019

Academic Editor Cornelio Yantildeez-Marquez

Copyright copy 2019 Xu Yin et al is is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

With the widespread use of deep learning methods semantic segmentation has achieved great improvements in recent yearsHowever many researchers have pointed out that with multiple uses of convolution and pooling operations great informationloss would occur in the extraction processes To solve this problem various operations or network architectures have beensuggested to make up for the loss of information We observed a trend in many studies to design a network as a symmetric typewith both parts representing the ldquoencodingrdquo and ldquodecodingrdquo stages By ldquoupsamplingrdquo operations in the ldquodecodingrdquo stage featuremaps are constructed in a certain way that would more or less make up for the losses in previous layers In this paper we focus onupsampling operations make a detailed analysis and compare current methods used in several famous neural networks We alsocombine the knowledge on image restoration and design a new upsampled layer (or operation) named the TGV upsamplingalgorithm We successfully replaced upsampling layers in the previous research with our new method We found that our modelcan better preserve detailed textures and edges of feature maps and can on average achieve 14ndash23 improved accuracycompared to the original models

1 Introduction

Compared to traditional classification tasks semantic seg-mentation is much more difficult From the level of theneural network a classifier should be used for each pixel ofan image is can help machines have a better un-derstanding of complex images not only simple objectrecognition Deep learning methods especially convolu-tional neural networks have generated spectacular resultsfor visual recognition problems is has proven thatconvolutional operations can successfully extract globalinformation and features for maintaining spatial in-variances When looking back on existing studies we ob-served that most of them follow a symmetric architecturewhich decreases the resolution (called the encoding stage)and then gradually increases it layer by layer (called thedecoding stage) Whatever detailed designs are for featuremaps if a network contains a high-to-low process it willbring inevitable losses because this process aims to generatea more liable low-resolution presentation from a high-res-olution one Meanwhile with a layer increased there would

be a great effect on the texture (or boundary information) ofthe final dense images

One way to handle this challenge is to reduce the loss ofoperations during performance Chen et al [1] used anatrous convolution By inserting ldquoblanksrdquo between pixels theresearchers enlarged the receptive field and generated ahigher resolution feature map Although this method iseffective and successful it is still hard to apply consideringthe limitations of hardware and memory

Another solution focuses on the methods of making upfor losses In recent research a design of hierarchical net-works [2 3] has been a trend Researchers found that lossescan be made up for in later stages High-level feature mapscan be better recovered when mixing information of low-level intermediate results which further improves densepredictions

We were inspired by the making-up idea and trend togroup the layers into encoding and decoding stages (seeFigure 1) In the encoding stage layers downsample thespatial resolution layer by layer afterward in the decodingstage feature maps (intermediate results produced by layers)

HindawiComputational Intelligence and NeuroscienceVolume 2019 Article ID 8527819 12 pageshttpsdoiorg10115520198527819

are ldquoupsampledrdquo and resolutions are increased corre-spondingly until the entire image is reconstructed to theoriginal size Researchers believe that in the down-upsampling process in which images are segmented andreconstructed networks can detect the most importantfeatures without destroying the shapes or textures of objectse aim of reconstruction is to make up for the lossesproduced in the encoding stage and the choice of theupsampling method is the key point

In this paper we follow symmetric designs and focus onthe research of upsampling methods in the decoding stageOur contributions are as follows

(1) We made a detailed introduction of upsamplingmethods that are now commonly used necessaryconcepts and mathematical definitions

(2) We proposed a new upsampling method based onthe total generalized variation (TGV) model [4ndash6]and applied it to different networks

2 Related Works

Semantic segmentation an important branch of deeplearning [7] has also become an active topic in neuralnetwork research In this section we introduce severaladvanced networks in the field of semantic segmentationand their upsampling methods used in the decoding stage

21 Symmetric-Like Networks Symmetric networks [8ndash11]have been proven to be highly effective in particular fieldsAs for the reason for choosing a symmetrical type re-searchers thought that regardless of the convolutional layerthere would be a loss of information when extracting fea-tures With further training even a small loss may result indefects in boundary information or texture edges To makeup for the loss they grouped all layers in a model into twostages according to the size of the feature maps Long et al[9] first used a fully convolutional network (FCN) eyadvised that fully connected layers can be replaced byconvolutional layers and further improved connections canbe made between stages via ldquoskip connectionsrdquo Although

the FCN has incomplete symmetry a skip connection whichbuilds a ldquobridgerdquo between layers in the encoding stage anddecoding stage provides another source for a feature mapBased on this idea Ronneberger et al [8] designed aU-shaped network (U-Net) ey considered that the ex-pansive path of features has a relationship with the con-tracting path and thus enlarged the number of featurechannels However when we want to design a new networkparameter size is a significant problem that much impor-tance should be attached to considering that symmetricnetworks need many more parameters than other kinds ofnetworks which would be a great challenge to CPU andGPU usage At almost the same time Badrinarayanan et al[10] applied a new upsampling method named ldquounpoolingrdquowhich records the indices when max-pooling With theseindices original location information can be easily found inthe decoding stage In this case parameters needed intraining processes are greatly reduced

22 Current UpsamplingMethods With the visualization ofconvolutional networks [12] we know that convolutionalcomputations can effectively extract and generalize fea-tures outputs of which can be seen as a set of features egoal of upsampling operations is to increase the resolutionfrom low-resolution map data and to upsample originaldata to a high-resolution map aiming to make up for theinformation lost in previous layers because of convolutionoperations

e difficulty of the entire process lies in how to generatesampling data from low-resolution maps and correspondingcolour channels

221 Bilinear Interpolation In early deep learning research[7 13] this kind of method could often be seen Comparedto methods introduced later on the greatest advantages ofthe bilinear method are high speed and simple operationsFrom the level of layers there is no need to learn and adjustweights when using bilinear interpolation as one parameter(referring to the desired size of the final image) is enough forthe entire operation

Skip connectionstimes1 times05 times1times05

times025 times0125 times0125 times025

Original image Output

Encoding stageDecoding

stage

Figure 1 General symmetric network architecture

2 Computational Intelligence and Neuroscience

For example with four known point values x11

(x1 y1) x12 (x1 y2) x21 (x2 y1) and x22 (x2 y2)the value of function τ can be found at the point (x y)

X-direction

Q xy1( ) τ x y1( 1113857 asympx2 minusx

x2 minusx1τ x11( 1113857 +

xminusx1

x2 minus x1τ x21( 1113857

Q xy2( ) τ x y2( 1113857 asympx2 minusx

x2 minusx1τ x12( 1113857 +

xminusx1

x2 minus x1τ x22( 1113857

(1)

Y-direction (yielded)

Q(xy) τ(x y) asympy2 minusy

y2 minusy1Q xy1( ) +

yminusy1

y2 minusy1Q xy2( )

τ x11( 1113857

x2 minus x1( 1113857 y2 minusy1( 1113857x2 minusx( 1113857 y2 minusy( 1113857

+τ x21( 1113857

x2 minus x1( 1113857 y2 minusy1( 1113857xminusx1( 1113857 y2 minusy( 1113857

+τ x21( 1113857

x2 minus x1( 1113857 y2 minusy1( 1113857x2 minusx( 1113857 yminusy1( 1113857

+τ x22( 1113857

x2 minus x1( 1113857 y2 minusy1( 1113857xminusx1( 1113857 yminusy1( 1113857

(2)

With the above processes it is possible to find an ap-proximation value of any point (x y) within the functioninterval (x isin (x1 x2) y isin (y1 y2))

222 Transposed Convolution Strictly speaking transposedconvolution is a kind of convolution operation instead of asampling method but many researchers are currently makingit their first choice to make up for losses in the decoding stage

From the perspective of convolution arithmetic eachconvolution operation can be represented as y Clowast x whereC and x stand for weight matrix and input maps respectively

In the period of backpropagation assuming that thederivative of layers zLosszy (Loss means the loss of thewhole network when training) is known correspondingderivatives of weights can be written as

zLosszx

1113944i

zLosszxi

1113944i

zLosszyi

lowastzyi

zxj

1113944i

zLosszyi

lowastCij zLoss

zylowastClowastj

CTlowastj lowast

zLosszy

(3)

Equation (3) shows the relationship of zLosszx andzLosszy It shows that transposed convolution is actuallymultiplying CT or (CT)T C when doing a forward pass anda backward pass respectively

When transposed convolution is used as the upsam-pling method situations are totally different from bilinearinterpolation e study of Dumoulin and Visin [14]

introduced four different cases where a transposed oper-ation was applied

For instance (Figure 2) the transposed convolution overa 2times 2 input for a 3times 3 output (stride 1) equals a reverseoperation of the convolutional layer (y Clowastx with input4times 4 kernel size 2)e original 2times 2map is padded with a2times 2 border of zeros first and then CT is multiplied

223 Unpooling Various combinations of layers havecurrently improved the efficiency of networks but theybrought a severe problem at the same time parameters Evena single convolutional layer requires a high number ofweight parameters which poses a significant challenge forCPUs or GPUs As noted in Section 2 transposed convo-lution equals multiplying a transposed matrix of corre-sponding convolutional layers meaning that we shouldspare enough memory to save these matrices when training

First utilized in deconvolutional networks [15]unpooling is much simpler and easier to use

Figure 3 shows a basic process of an unpooling opera-tion All numbers above correspond to values in a featuremap Assume that for some map input the unpoolingmethod records indices of the largest values before per-forming max-pooling (from input⟶ a) In the decodingstage having received outputs b from previous layers withpooling indices (represented by the black grids) it is possibleto upsample the pixel values inside to their original places(from b⟶ output) In fact unpooling keeps the directionalinformation (ldquopooling indicesrdquo) of the largest pixel values ineach feature map when processing max-pooling operationsis action solves the problems of ldquodirectionsrdquo and ldquopad-dingrdquo at the same time

Compared to transposed convolution the advantage ofunpooling lies in the number of parameters which shouldonly record the pooling indices

3 TGV Upsampling Algorithm

Inspired by the total variation model [16 17] we introduceanother upsampling method named ldquoTGV upsamplingrdquoWesee upsampling methods as a way to combine feature maprestoration with knowledge of image restoration (with theaim of making up for loss of information) e backgroundand details of our method are described as follows

31 Problem Transformation Although it has been severalyears since the first convolutional neural network (CNN)was used [7] the internal architecture can still be simplyconcluded Most networks now can be thought of as acombination of convolutional layers and max-poolinglayers and there are a lot of research studies [14 18ndash21]available for understanding what CNN is Many previousstudies pointed out that ldquomax-pooling can be replaced by aconvolutional layer with increased stride and without loss inaccuracy at the same timerdquo [22] Certainly we can think ofnetwork components with max-pooling and convolutionallayers as a set of convolutional computations regardless ofwhat the internal architecture actually is

Computational Intelligence and Neuroscience 3

As we discussed in previous sections in the field ofsemantic segmentations researchers have tended to di-vide the whole process into encoding and decoding stageswhere the upsampling step always happens in thedecoding stage Many researchers thought that a ldquopathrdquoexists in convolution operations From the perspective offeature maps stages of encoding and decoding refer to theexpanding and contracting paths respectively these twopaths are more or less symmetrical Under such cir-cumstances we treat upsampling as a restoration task offeature maps produced from corresponding layers inencoding stages

For example Figure 1 provides a simple view of a generalsymmetric-like network e whole network is grouped intoencoding and decoding stages and layers in the two stagesare represented by Ei andCj respectively (where i and j standfor the direction of the layer) When i j it meansEinput Coutput and we consider the outputs of Cj as theresult of feature map restoration to Einput

As shown in Figure 4 after a set of convolutionalcomputations a 4times 4 feature map in the encoding stagebecomes a 2times 2 pixel matrix (this matrix is also the output ofa certain layer) With a projection operation this map can berecovered to the original size of the map in the encoding

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

6 8

14 16

Max-pooling

0 0 0 0

0 5 0 7

0 0 0 0

0 7 0 11

5 7

9 11

Unpooling

Pooling indicesInput Output

a b

Figure 3 Unpooling operation

Original map

Upsampled map

Figure 2 Transposed convolution

4 Computational Intelligence and Neuroscience

stage (this temporary result is denoted by Cinput) epadding part (referring to the heavy blue grids) appears asblurred (noisy) areas the rest of the work is a restoration job

When comparing the left matrix with the right one weformulate the basic loss function as follows

loss u Cinput1113872 1113873 12

u Cinput1113872 1113873minusCinput

2

2 (4)

where u stands for our upsampled result and middot2 denotes 2-norm

32 TGV-Based Upsampling TGV [23] is mainly used tosolve problems like image denoising and restoration But inthis paper we adapted it into a trainable upsampling methodof feature maps We increase the resolution of maps to atarget size and then perform TGV on each map

e first step in our method is projection considering thesize difference between low- and high-resolution maps Wefirst applied bilinear interpolation to get a preliminary resultwhich has the same size as maps in the encoding stage eseprocessedmaps will be seen as noisy areas (actually referring tothe entire image) which would be treated in TGV restoration

Since Bredies et al [23] put forward the TGV model ithas been widely used in the field of image restoration eTGVmodel greatly reconstructs an image from blurred dataor noisy indirect measurements We formulated the wholeupsampling problem as a convex optimization model[12 24ndash26] e mathematical formulations for our modelare outlined as

minu

λloss(u x) + TGVKα (u)1113966 1113967 (5)

where loss(u x) represents image fidelity the parameter λ isused to weight previous operations to compute a global op-timization and TGVK

α (u) stands for the regularization termWith the visualization of convolutional networks [8] we

know that convolutional computations can effectively ex-tract and generalize features outputs of which can be seen asa set of features To better understand the semantics of animage smoothness of textures and border edges are par-ticularly important to feature maps

For a k-order image traditional restoration methodstend to fix the bounded variation seminorm whereas theTGV model introduces a k-order function and incorporatesinformation from different channels which can effectivelymaintain texture discontinuities

Given a k-order image μ we represent the TGVmodel asfollows

TGVka(u) sup

⎧⎨

⎩1113946ΩΩdiv

kv dx ∣ v isin C

KC Ω Symk IRd

1113872 11138731113872 1113873

middot divlv

le al l 0 kminus 1⎫⎬

(6)

Let Symk(IRd) represent the space of symmetrictensors of order k By balancing derivatives from the firstorder to the kth order it greatly alleviates the problem ofcontaining different grey levels when imaging leading toboundary information loss as well as edge and cornerloss

But in this work aiming at featuremaps it turns out that a2-order TGV is sufficient We formulate the 2-order model as

TGV2a(u) min

wa11113946Ω

|nablauminusω| + a21113946Ω

|ε(ω)|1113882 1113883 (7)

For a given 2-order u the minimum of the TGVmodel istaken over all complex vector fields w in the bounded do-main ε(w) 12(nablaω + nablaωt) stands for the symmetrizedderivative Parameters a1 and a2 are applied to balance thefirst and second derivatives

e final TGV-based upsampling model is defined as acombination of the loss function (5) and the TGV term (6) asequation (7)

minuw

λ2

u Cinput1113872 1113873minusCinput

2

2+ a11113946

Ω|nablauminusω| + a01113946

Ω|ε(ω)|1113896 1113897

(8)

Meanwhile unlike fixed weights used in [23] we adaptTGV into a trainable method taking weights a_1 and a_2into backpropagation to search for a suitable balancepoint

33 Optimization Methods Considering that the upsam-pling model we proposed (formula (8)) is convex and notsmooth we used the primal-dual scheme [24 25 27] to solvethis problem We reformulated our upsampling model as aconvex-concave saddle-point problem by introducing twodual variables m and n e transformed model of formula(8) is provided by

minu

maxv

λ2

1113944ijisinΩ

u Cij1113872 1113873minusEij1113872 11138732

+ a1lang(nablauminusω) mrang

+ a2langnablaω nrang

(9)

1 2 3 4

5 6 7 8

9 10 11 12

13 14 16

17 18

19 20

Pretreated maps

Convolutional operations Projection

17 0 0 18

0 0 0 0

0 0 0 0

19 0 0 20

Einput

Cinput15

Figure 4 Pretreatment to feature maps

Computational Intelligence and Neuroscience 5

e feasible sets of ρ and φ are defined as follows

M m Ωu⟶R2 ∣ mij

le 1 i j isin Ωu1113882 1113883

N n Ωu⟶R2 ∣ nij

le 1 i j isin Ωu1113882 1113883

(10)

e final result is computed pixel by pixel via iterativeoptimization With u0 Einput and v0 m0 and n0 0 stepsizes σm σn and τu τω are chosen For iteration ige 0 updatedvariables are as follows

mi+1 Projm mi + σma1 nabla1113957ui minus 1113957ωi( 11138571113864 1113865

ni+1 Projn ni + σna2nabla1113957ωi1113864 1113865

ui+1 1 +λ2τu1113888 1113889

minus1

ui + τu a1mi+1 +

λ2

Cinput1113888 11138891113890 1113891

ωi+1 ωi + τω a1ni+1 + a2m

i+1( 1113857

u i+1 ui+1 + θ ui+1 minus u i( 1113857

ω i+1 ωi+1 + θ ωi+1 minusω i( 1113857

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

(11)

where Projm(m) and Projn(n) denote the Euclidean pro-jectors onto the sets M and N respectively ey can becalculated by pointwise operations

Projm(m) m

max 1 |m|a1( 1113857

Projn(n) n

max 1 |n|a1( 1113857

(12)

In equation (11) θ stands for the relaxation parameter Itcan be updated iteration by iteration using preconditioning [5]Hence the TGV-based model can achieve globally optimalupsampling solutions when dealing with 2-order images

4 Evaluation

In this section we make a quantitative evaluation of theproposed TGV-based upsampling model We compared ourmodel with current upsampling methods from differentperspectives We used a PASCAL VOC2012 dataset (classsegmentation) to investigate the detailed advanced perfor-mances In the following experiments we manually set theTGV-based upsampling modelrsquos parameters a1 and a2 (inour experiments we initially set a1 005 and a2 01 bothof them can be trained with iterations)

We retrained original networks (FCN U-Net andSegNet) and their modified versions in the TGV upsamplingalgorithm on a PASCAL training set (with 1464 images)

For a reconstruction comparison on feature maps(Figures 5ndash7) we evaluated the above four methods on aPASCAL validation set (with 1449 images) We randomlyselect feature maps from a certain layer in FCN U-Net andSegNet and compared the upsampled results of using dif-ferent methods

For overall comparisons of segmentation results(Figure 8) we used a PASCAL test set (with 1456 images)For quantitative results (with 1456 images) we adoptedoverall accuracy (OA) mean accuracy (MAcc) frequency-weighted intersection over union (FWIoU) and mean in-tersection over union (MIoU) as metrics

41 Experiments

411 Reconstruction of Intermediate Feature Maps In thisstage we replaced the original upsampling layer in all threenetworks (FCN U-Net and SegNet) with our TGVupsampling method (see Figure 9 with FCN as an example)To better compare results we illustrated the performance ofTGV upsampling when dealing with the same inputs (fea-ture maps produced by previous layers) and so on replacingbilinear operations in U-Net and unpooling in SegNet

Considering that certain upsampling methods maymultiply when used in models we only selected one com-parison regardless of the size of the inputs and how manyupsampling operations appeared in the networks Featuremaps in Figures 5ndash7 are the most representative of activationoutputs in each layer in the training process

In Figures 5ndash7 (illustrating the comparisons of thefeature maps) all input maps are from some layer in theencoding stage We found that TGV upsampling performedbetter at saving the textures and edges of the feature map InFigure 5 with the same input maps from the FCN archi-tecture we observed that compared with transposed con-volution reconstruction TGV upsampling reproducedalmost all classes (represented in different colors) thatappeared in input maps While being a trainable operationthe transposed operation processes inputs following aldquopadding and extractingrdquo pipeline ere absolutely exists acertain chance that sampled maps contain some classes thatin fact are contained in input maps or a class may be filteredIt can be clearly seen that in the second and third inputsubmaps (Figure 5) the transposed operation did notcompletely reproduce all objects appearing in the input mapEven if these cases are ignored it is easy to observe that TGVupsampling is better in reconstructing texture than trans-posed operations

As illustrated in Figure 6 bilinear interpolation recoveredtoo much unnecessary information about known classes fromboth x and y directions of the object compared to theprinciple of TGV upsampling reconstruction based on thehigh-order divergence of the known pixel values e sourceof the TGV modelrsquos information was more reasonable andmore complex e reconstruction results of TGV upsam-pling were also better than those of bilinear interpolation

Lastly Figure 7 compares unpooling and TGV upsam-pling Since unpooling only records the locations of maxi-mum activations when performing pooling operations in theencoding stage it is clear that the unpooling reconstruction isnot complete as there are many blank spaces in places wheretexture should appear In general the efficiency of theunpooling operation is the worst of the three in terms ofvisualization results

6 Computational Intelligence and Neuroscience

412 Training in TGV Upsampling In this experiment wecompared overall segmentation results with those of theoriginal networks

Figure 8 illustrates examples of PASCAL VOC outputsfrom FCN (FCN-8s) U-Net SegNet and the modified TGVupsampling-based networks (denoted by FCN-TGV U-Net-

(a) (b) (c) (d)

Figure 6 Comparison between a bilinear operation and TGV upsampling for the same feature maps (in U-Net) (a) Input maps (b) TGVupsample (c) Bilinear (d) Ground truth

(a) (b) (c) (d)

Figure 5 Comparison between a transposed operation and TGV upsampling for the same feature maps (in FCN) (a) Input maps (b) TGVupsample (c) Transposed (d) Ground truth

(a) (b) (c) (d)

Figure 7 Comparison between an unpooling operation and TGV upsampling for the same feature maps (in SegNet) (a) Input maps(b) TGV upsample (c) Unpooling (d) Ground truth

Computational Intelligence and Neuroscience 7

TGV and SegNet-TGV respectively) Compared to thecoarse-feature maps of the original networks class (or ob-ject) recognitions (referring to the interval area between

objects) in our results are more precious (see SegNet andSegNet-TGV) and the situations of exceeding areas inground truth are greatly eased (see FCN and FCN-TGV) At

Input

FT

GT

S

ST

U

UT

F

Figure 8 Overall segmentation example results on PASCAL VOC (GT stands for ground truth F S and U represent results from FCNSegNet and U-Net while FT ST and UT mean three networks combined with TGV upsampling)

8 Computational Intelligence and Neuroscience

the same time they well preserved the smoothness of theboundary Lastly it can be concluded that the TGV upsamplingmethod helps models make a better predication on semantics

e quantitative results of the VOC test set of ourproposed algorithm and the competitors are presented inTable 1We observed that in terms of overall accuracy meanaccuracy frequency accuracy or MIoU all metrics wereimproved by about 14ndash23 corresponding to the visual-ization results in Figure 10

Figures 10ndash12 show the effect after applying the newupsampling method (the X and Y axes represent iterationsand losses respectively) We found that the TGVupsampling algorithm had a significant effect on reducinglosses compared to the original methods With a fixed batchsize our new method converged faster than the originalmodels

413 Loss Function in the Training Process e lossfunction of FCN FCN-TGV U-Net U-Net-TGV SegNetand SegNet-TGV on the PASCAL VOC VALIDATAE set isgiven in Figures 10ndash12

42 Analysis After concluding all four upsampling op-erations introduced in the above sections we can groupthem into two types based on their different informationsources

421 Back-Up (Including Unpooling and TransposedConvolution Operations) Many researchers think that whenreconstructing feature maps intermediate results producedin previous operations should be referred to (especiallylayers in the encoding stage) as if images are processed along

a ldquopathrdquo Having known conditions of front distances (re-ferring to corresponding layers in the encoding stage) makesit possible to better ldquopredicaterdquo the following road conditions(referring to feature map reconstructions) us whenapplying these methods operation information of previouslayers should be saved For example the unpooling oper-ation is actually seen as an index storage With saved pa-rameters feature maps can be well reconstructed

However in both transposed and unpooling operationsit is unavoidable that the processed results would retain acertain part of the intermediate results produced by previouslayers In some cases this is useful for keeping importantinformation that may be overlooked in the training processbut it can also be a disturbance by bringing along un-necessary information

422 From-Itself (Including Bilinear Interpolation andOur TGV Upsampling Operations) In contrast to theldquoback-uprdquo category the information sources bilinearinterpolation and TGV upsampling used to make up forlosses are coming from the methodsrsquo processes ratherthan results of previous operations As described inRelated Works bilinear operation interpolates from xand y directions based on known sample values e TGV

Table 1 Results of the PASCAL VOC set

Method OA MAcc FWIoU MIoUFCN-8s 9113 7868 8459 6459FCN-TGV 9233 8012 8749 6602SegNet 8686 7441 8032 5859SegNet-TGV 8746 7608 8345 6048U-Net 8713 7068 7959 6159U-Net-TGV 8914 7212 8149 6389

Input

Conv

Conv

hellip

Conv 1

Conv

Conv

hellip

Conv 2

Conv

Conv

hellip

Conv 3

Conv

Conv

hellip

Conv 4

Conv

Conv

hellip

Conv 5

Conv

Conv

hellip

Conv

sco

re_

Conv 6

Conv

tra

nsp TGV

Slice Slice

+

Conv

tra

nsp

Slice

Output

TGV

Conv

sco

re_

Slice Slice

+

Conv

tra

nsp

TGV

times(12)

times(12)

times(14)

times(14)

times(14)

times(18)

times(18)

times(18)times(116) times(116)times(132)

Figure 9 Replacing the transposed convolutional layer in the FCN with the TGV upsampling method

Computational Intelligence and Neuroscience 9

method has a more complex information source It startsfrom k-order divergence of known pixel values (for k-order images) It can better preserve boundary

information of textures and the step effect can be ef-fectively avoided raising the smoothness of the entireimage at the same time

2000e + 5

1600e + 5

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(a)

1600e + 5

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(b)

Figure 10 Loss function of SegNet (a) and SegNet-TGV (b) on the PASCAL VOC VALIDATAE set

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(a)

0000 3000k 9000k6000k

1000e + 5

8000e + 4

6000e + 4

4000e + 4

2000e + 4

000

(b)

Figure 11 Loss function of FCN (a) and FCN-TGV (b) on the PASCAL VOC VALIDATAE set

1200e + 5

1000e + 5

8000e + 4

6000e + 4

0000 3000k 9000k6000k

(a)

1200e + 5

8000e + 4

4000e + 4

0000 3000k 9000k6000k

000

(b)

Figure 12 Loss function of U-Net (a) and U-Net-TGV (b) on the PASCAL VOC VALIDATAE set

10 Computational Intelligence and Neuroscience

From the results of these experiments the numericalresults of the loss function and the smoothness of thesampled feature maps have proven that the TGV-upsam-pling method is a great improvement over current methods

5 Conclusions

In this work we introduced a TGV-upsampling methodbased on image restoration research We transformed theprocess of feature map reconstruction into a loss-optimiza-tion problem Based on the divergence of each order we usedTGV regularization which reconstructs piecewise functionsFor numerical optimization we used a primal-dual algorithmwhich can effectively perform parallel computing and result inhigh frame rates We applied our new method to threenetworks and evaluated the results via PASCAL VOC data-setsese tests proved that the TGV upsampling method cangreatly make up for lost smoothness and boundary in-formation ofmaps Compared to the original methods used innetworks (FCN U-Net and SegNet) this new method madeaverage improvements of 14ndash23 in terms of MIoU accu-racy when used in the decoding stage When observing thetraining process in the experiments we clearly found that theTGV upsampling method greatly made up for informationlosses that resulted from other operations (mainly referring tovarious convolutional layers)

e proposed algorithm is not limited to single-layermap upsampling In the future it will be extended to un-derstanding entire scenes for effectively incorporatingtemporal coherence in a consistent way On the contrarywith the proposed ldquorestoration-likerdquo method we will furtherconcentrate on how to better understand latent semanticsunder unsupervised settings

Data Availability

e codes used to support the findings of this study have notbeen made available because the proposed method would laterbe applied in medical projects that involve personal privacy

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is research was supported by the Research Program toSolve Social Issues of the National Research Foundation ofKorea (NRF) funded by theMinistry of Science and ICT (noNRF2017M3C8A8091768)

References

[1] L-C Chen G Papandreou I Kokkinos K Murphy andA L Yuille ldquoDeeplab semantic image segmentation withdeep convolutional nets atrous convolution and fully con-nected crfsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 40 no 4 pp 834ndash848 2018

[2] X Li Z Liu P Luo C C Loy and X Tang ldquoNot all pixels areequal difficulty-aware semantic segmentation via deep layercascaderdquo in Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition Salt Lake City UT USA July2017

[3] H Zhao J Shi X Qi X Wang and J Jia ldquoPyramid sceneparsing networkrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition Salt Lake City UTUSA July 2017

[4] F Knoll K Bredies T Pock and R Stollberger ldquoSecondorder total generalized variation (TGV) for MRIrdquo MagneticResonance in Medicine vol 65 no 2 pp 480ndash491 2011

[5] A Chambolle and T Pock ldquoA first-order primal-dual algo-rithm for convex problems with applications to imagingrdquoJournal of Mathematical Imaging and Vision vol 40 no 1pp 120ndash145 2011

[6] M D Zeiler G W Taylor and R Fergus ldquoAdaptivedeconvolutional networks for mid and high level featurelearningrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV) vol 1 no 2 2011

[7] A Krizhevsky I Sutskever and G E Hinton ldquoImagenetclassification with deep convolutional neural networksrdquo inProceedings of the Advances in Neural Information ProcessingSystems Lake Tahoe NV USA December 2012

[8] O Ronneberger P Fischer and T Brox ldquoU-net convolu-tional networks for biomedical image segmentationrdquo inProceedings of the International Conference on Medical ImageComputing and Computer-Assisted Intervention pp 234ndash241Springer Munich Germany October 2015

[9] J Long E Shelhamer and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo in Proceedings of theIEEE conference on computer vision and pattern recognitionBoston MA USA June 2015

[10] V Badrinarayanan A Kendall and R Cipolla ldquoSegnet adeep convolutional encoder-decoder architecture for im-age segmentationrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 39 no 12 pp 2481ndash24952017

[11] H Noh S Hong and B Han ldquoLearning deconvolutionnetwork for semantic segmentationrdquo in Proceedings of theIEEE International Conference on Computer Vision SantiagoChile December 2015

[12] D R Martin C C Fowlkes and J Malik ldquoLearning to detectnatural image boundaries using local brightness color andtexture cuesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 26 no 5 pp 530ndash549 2004

[13] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[14] V Dumoulin and F Visin ldquoA guide to convolution arithmeticfor deep learningrdquo 2016 httpsarxivorgabs160307285

[15] M D Zeiler et al Deconvolutional Networks pp 2528ndash25352010

[16] C R Vogel and M E Oman ldquoFast total variation-basedimage reconstructionrdquo in Proceedings of the 1995 ASMEDesign Engineering Conferences vol 3 No part C BostonMA USA Boston MA USA September 1995

[17] L I Rudin S Osher and E Fatemi ldquoNonlinear total variationbased noise removal algorithmsrdquo Physica D NonlinearPhenomena vol 60 no 1ndash4 pp 259ndash268 1992

[18] M D Zeiler and F Rob ldquoVisualizing and understanding con-volutional networksrdquo in Proceedings of the European Conferenceon Computer Vision Springer Zurich Switzerland September2014

[19] V Papyan Y Romano and M Elad ldquoConvolutional neuralnetworks analyzed via convolutional sparse codingrdquo 2016httpsarxivorgabs160708194

Computational Intelligence and Neuroscience 11

[20] L-B Qiao B-F Zhang J-S Su and X-C Lu ldquoA systematicreview of structured sparse learningrdquo Frontiers of In-formation Technology amp Electronic Engineering vol 18 no 4pp 445ndash463 2017

[21] P Wang P Chen Y Yuan et al ldquoUnderstanding convolutionfor semantic segmentationrdquo in Proceedings of the 2018 IEEEWinter Conference on Applications of Computer Vision(WACV) IEEE Lake Tahoe NV USA March 2018

[22] J T Springenberg et al ldquoStriving for simplicity the allconvolutional netrdquo 2014 httpsarxivorgabs14126806

[23] K Bredies K Kunisch and T Pock ldquoTotal generalizedvariationrdquo SIAM Journal on Imaging Sciences vol 3 no 3pp 492ndash526 2010

[24] T Pock and A Chambolle ldquoDiagonal preconditioning forfirst order primal-dual algorithms in convex optimizationrdquo inProceedings of the 2011 International Conference on ComputerVision IEEE Barcelona Spain November 2011

[25] E Esser X Zhang and T F Chan ldquoA general framework for aclass of first order primal-dual algorithms for convex opti-mization in imaging sciencerdquo SIAM Journal on ImagingSciences vol 3 no 4 pp 1015ndash1046 2010

[26] D Ferstl C Reinbacher R Ranftl M Ruether andH Bischof ldquoImage guided depth upsampling using aniso-tropic total generalized variationrdquo in Proceedings of the IEEEInternational Conference on Computer Vision SydneyAustralia December 2013

[27] W He H Zhang L Zhang and H Shen ldquoTotal-variation-regularized low-rank matrix factorization for hyperspectralimage restorationrdquo IEEE Transactions on Geoscience andRemote Sensing vol 54 no 1 pp 178ndash188 2016

12 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 2: TGVUpsampling:AMaking-UpOperationfor …downloads.hindawi.com/journals/cin/2019/8527819.pdf1.Introduction Compared to traditional classification tasks, semantic seg- ... We made a

are ldquoupsampledrdquo and resolutions are increased corre-spondingly until the entire image is reconstructed to theoriginal size Researchers believe that in the down-upsampling process in which images are segmented andreconstructed networks can detect the most importantfeatures without destroying the shapes or textures of objectse aim of reconstruction is to make up for the lossesproduced in the encoding stage and the choice of theupsampling method is the key point

In this paper we follow symmetric designs and focus onthe research of upsampling methods in the decoding stageOur contributions are as follows

(1) We made a detailed introduction of upsamplingmethods that are now commonly used necessaryconcepts and mathematical definitions

(2) We proposed a new upsampling method based onthe total generalized variation (TGV) model [4ndash6]and applied it to different networks

2 Related Works

Semantic segmentation an important branch of deeplearning [7] has also become an active topic in neuralnetwork research In this section we introduce severaladvanced networks in the field of semantic segmentationand their upsampling methods used in the decoding stage

21 Symmetric-Like Networks Symmetric networks [8ndash11]have been proven to be highly effective in particular fieldsAs for the reason for choosing a symmetrical type re-searchers thought that regardless of the convolutional layerthere would be a loss of information when extracting fea-tures With further training even a small loss may result indefects in boundary information or texture edges To makeup for the loss they grouped all layers in a model into twostages according to the size of the feature maps Long et al[9] first used a fully convolutional network (FCN) eyadvised that fully connected layers can be replaced byconvolutional layers and further improved connections canbe made between stages via ldquoskip connectionsrdquo Although

the FCN has incomplete symmetry a skip connection whichbuilds a ldquobridgerdquo between layers in the encoding stage anddecoding stage provides another source for a feature mapBased on this idea Ronneberger et al [8] designed aU-shaped network (U-Net) ey considered that the ex-pansive path of features has a relationship with the con-tracting path and thus enlarged the number of featurechannels However when we want to design a new networkparameter size is a significant problem that much impor-tance should be attached to considering that symmetricnetworks need many more parameters than other kinds ofnetworks which would be a great challenge to CPU andGPU usage At almost the same time Badrinarayanan et al[10] applied a new upsampling method named ldquounpoolingrdquowhich records the indices when max-pooling With theseindices original location information can be easily found inthe decoding stage In this case parameters needed intraining processes are greatly reduced

22 Current UpsamplingMethods With the visualization ofconvolutional networks [12] we know that convolutionalcomputations can effectively extract and generalize fea-tures outputs of which can be seen as a set of features egoal of upsampling operations is to increase the resolutionfrom low-resolution map data and to upsample originaldata to a high-resolution map aiming to make up for theinformation lost in previous layers because of convolutionoperations

e difficulty of the entire process lies in how to generatesampling data from low-resolution maps and correspondingcolour channels

221 Bilinear Interpolation In early deep learning research[7 13] this kind of method could often be seen Comparedto methods introduced later on the greatest advantages ofthe bilinear method are high speed and simple operationsFrom the level of layers there is no need to learn and adjustweights when using bilinear interpolation as one parameter(referring to the desired size of the final image) is enough forthe entire operation

Skip connectionstimes1 times05 times1times05

times025 times0125 times0125 times025

Original image Output

Encoding stageDecoding

stage

Figure 1 General symmetric network architecture

2 Computational Intelligence and Neuroscience

For example with four known point values x11

(x1 y1) x12 (x1 y2) x21 (x2 y1) and x22 (x2 y2)the value of function τ can be found at the point (x y)

X-direction

Q xy1( ) τ x y1( 1113857 asympx2 minusx

x2 minusx1τ x11( 1113857 +

xminusx1

x2 minus x1τ x21( 1113857

Q xy2( ) τ x y2( 1113857 asympx2 minusx

x2 minusx1τ x12( 1113857 +

xminusx1

x2 minus x1τ x22( 1113857

(1)

Y-direction (yielded)

Q(xy) τ(x y) asympy2 minusy

y2 minusy1Q xy1( ) +

yminusy1

y2 minusy1Q xy2( )

τ x11( 1113857

x2 minus x1( 1113857 y2 minusy1( 1113857x2 minusx( 1113857 y2 minusy( 1113857

+τ x21( 1113857

x2 minus x1( 1113857 y2 minusy1( 1113857xminusx1( 1113857 y2 minusy( 1113857

+τ x21( 1113857

x2 minus x1( 1113857 y2 minusy1( 1113857x2 minusx( 1113857 yminusy1( 1113857

+τ x22( 1113857

x2 minus x1( 1113857 y2 minusy1( 1113857xminusx1( 1113857 yminusy1( 1113857

(2)

With the above processes it is possible to find an ap-proximation value of any point (x y) within the functioninterval (x isin (x1 x2) y isin (y1 y2))

222 Transposed Convolution Strictly speaking transposedconvolution is a kind of convolution operation instead of asampling method but many researchers are currently makingit their first choice to make up for losses in the decoding stage

From the perspective of convolution arithmetic eachconvolution operation can be represented as y Clowast x whereC and x stand for weight matrix and input maps respectively

In the period of backpropagation assuming that thederivative of layers zLosszy (Loss means the loss of thewhole network when training) is known correspondingderivatives of weights can be written as

zLosszx

1113944i

zLosszxi

1113944i

zLosszyi

lowastzyi

zxj

1113944i

zLosszyi

lowastCij zLoss

zylowastClowastj

CTlowastj lowast

zLosszy

(3)

Equation (3) shows the relationship of zLosszx andzLosszy It shows that transposed convolution is actuallymultiplying CT or (CT)T C when doing a forward pass anda backward pass respectively

When transposed convolution is used as the upsam-pling method situations are totally different from bilinearinterpolation e study of Dumoulin and Visin [14]

introduced four different cases where a transposed oper-ation was applied

For instance (Figure 2) the transposed convolution overa 2times 2 input for a 3times 3 output (stride 1) equals a reverseoperation of the convolutional layer (y Clowastx with input4times 4 kernel size 2)e original 2times 2map is padded with a2times 2 border of zeros first and then CT is multiplied

223 Unpooling Various combinations of layers havecurrently improved the efficiency of networks but theybrought a severe problem at the same time parameters Evena single convolutional layer requires a high number ofweight parameters which poses a significant challenge forCPUs or GPUs As noted in Section 2 transposed convo-lution equals multiplying a transposed matrix of corre-sponding convolutional layers meaning that we shouldspare enough memory to save these matrices when training

First utilized in deconvolutional networks [15]unpooling is much simpler and easier to use

Figure 3 shows a basic process of an unpooling opera-tion All numbers above correspond to values in a featuremap Assume that for some map input the unpoolingmethod records indices of the largest values before per-forming max-pooling (from input⟶ a) In the decodingstage having received outputs b from previous layers withpooling indices (represented by the black grids) it is possibleto upsample the pixel values inside to their original places(from b⟶ output) In fact unpooling keeps the directionalinformation (ldquopooling indicesrdquo) of the largest pixel values ineach feature map when processing max-pooling operationsis action solves the problems of ldquodirectionsrdquo and ldquopad-dingrdquo at the same time

Compared to transposed convolution the advantage ofunpooling lies in the number of parameters which shouldonly record the pooling indices

3 TGV Upsampling Algorithm

Inspired by the total variation model [16 17] we introduceanother upsampling method named ldquoTGV upsamplingrdquoWesee upsampling methods as a way to combine feature maprestoration with knowledge of image restoration (with theaim of making up for loss of information) e backgroundand details of our method are described as follows

31 Problem Transformation Although it has been severalyears since the first convolutional neural network (CNN)was used [7] the internal architecture can still be simplyconcluded Most networks now can be thought of as acombination of convolutional layers and max-poolinglayers and there are a lot of research studies [14 18ndash21]available for understanding what CNN is Many previousstudies pointed out that ldquomax-pooling can be replaced by aconvolutional layer with increased stride and without loss inaccuracy at the same timerdquo [22] Certainly we can think ofnetwork components with max-pooling and convolutionallayers as a set of convolutional computations regardless ofwhat the internal architecture actually is

Computational Intelligence and Neuroscience 3

As we discussed in previous sections in the field ofsemantic segmentations researchers have tended to di-vide the whole process into encoding and decoding stageswhere the upsampling step always happens in thedecoding stage Many researchers thought that a ldquopathrdquoexists in convolution operations From the perspective offeature maps stages of encoding and decoding refer to theexpanding and contracting paths respectively these twopaths are more or less symmetrical Under such cir-cumstances we treat upsampling as a restoration task offeature maps produced from corresponding layers inencoding stages

For example Figure 1 provides a simple view of a generalsymmetric-like network e whole network is grouped intoencoding and decoding stages and layers in the two stagesare represented by Ei andCj respectively (where i and j standfor the direction of the layer) When i j it meansEinput Coutput and we consider the outputs of Cj as theresult of feature map restoration to Einput

As shown in Figure 4 after a set of convolutionalcomputations a 4times 4 feature map in the encoding stagebecomes a 2times 2 pixel matrix (this matrix is also the output ofa certain layer) With a projection operation this map can berecovered to the original size of the map in the encoding

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

6 8

14 16

Max-pooling

0 0 0 0

0 5 0 7

0 0 0 0

0 7 0 11

5 7

9 11

Unpooling

Pooling indicesInput Output

a b

Figure 3 Unpooling operation

Original map

Upsampled map

Figure 2 Transposed convolution

4 Computational Intelligence and Neuroscience

stage (this temporary result is denoted by Cinput) epadding part (referring to the heavy blue grids) appears asblurred (noisy) areas the rest of the work is a restoration job

When comparing the left matrix with the right one weformulate the basic loss function as follows

loss u Cinput1113872 1113873 12

u Cinput1113872 1113873minusCinput

2

2 (4)

where u stands for our upsampled result and middot2 denotes 2-norm

32 TGV-Based Upsampling TGV [23] is mainly used tosolve problems like image denoising and restoration But inthis paper we adapted it into a trainable upsampling methodof feature maps We increase the resolution of maps to atarget size and then perform TGV on each map

e first step in our method is projection considering thesize difference between low- and high-resolution maps Wefirst applied bilinear interpolation to get a preliminary resultwhich has the same size as maps in the encoding stage eseprocessedmaps will be seen as noisy areas (actually referring tothe entire image) which would be treated in TGV restoration

Since Bredies et al [23] put forward the TGV model ithas been widely used in the field of image restoration eTGVmodel greatly reconstructs an image from blurred dataor noisy indirect measurements We formulated the wholeupsampling problem as a convex optimization model[12 24ndash26] e mathematical formulations for our modelare outlined as

minu

λloss(u x) + TGVKα (u)1113966 1113967 (5)

where loss(u x) represents image fidelity the parameter λ isused to weight previous operations to compute a global op-timization and TGVK

α (u) stands for the regularization termWith the visualization of convolutional networks [8] we

know that convolutional computations can effectively ex-tract and generalize features outputs of which can be seen asa set of features To better understand the semantics of animage smoothness of textures and border edges are par-ticularly important to feature maps

For a k-order image traditional restoration methodstend to fix the bounded variation seminorm whereas theTGV model introduces a k-order function and incorporatesinformation from different channels which can effectivelymaintain texture discontinuities

Given a k-order image μ we represent the TGVmodel asfollows

TGVka(u) sup

⎧⎨

⎩1113946ΩΩdiv

kv dx ∣ v isin C

KC Ω Symk IRd

1113872 11138731113872 1113873

middot divlv

le al l 0 kminus 1⎫⎬

(6)

Let Symk(IRd) represent the space of symmetrictensors of order k By balancing derivatives from the firstorder to the kth order it greatly alleviates the problem ofcontaining different grey levels when imaging leading toboundary information loss as well as edge and cornerloss

But in this work aiming at featuremaps it turns out that a2-order TGV is sufficient We formulate the 2-order model as

TGV2a(u) min

wa11113946Ω

|nablauminusω| + a21113946Ω

|ε(ω)|1113882 1113883 (7)

For a given 2-order u the minimum of the TGVmodel istaken over all complex vector fields w in the bounded do-main ε(w) 12(nablaω + nablaωt) stands for the symmetrizedderivative Parameters a1 and a2 are applied to balance thefirst and second derivatives

e final TGV-based upsampling model is defined as acombination of the loss function (5) and the TGV term (6) asequation (7)

minuw

λ2

u Cinput1113872 1113873minusCinput

2

2+ a11113946

Ω|nablauminusω| + a01113946

Ω|ε(ω)|1113896 1113897

(8)

Meanwhile unlike fixed weights used in [23] we adaptTGV into a trainable method taking weights a_1 and a_2into backpropagation to search for a suitable balancepoint

33 Optimization Methods Considering that the upsam-pling model we proposed (formula (8)) is convex and notsmooth we used the primal-dual scheme [24 25 27] to solvethis problem We reformulated our upsampling model as aconvex-concave saddle-point problem by introducing twodual variables m and n e transformed model of formula(8) is provided by

minu

maxv

λ2

1113944ijisinΩ

u Cij1113872 1113873minusEij1113872 11138732

+ a1lang(nablauminusω) mrang

+ a2langnablaω nrang

(9)

1 2 3 4

5 6 7 8

9 10 11 12

13 14 16

17 18

19 20

Pretreated maps

Convolutional operations Projection

17 0 0 18

0 0 0 0

0 0 0 0

19 0 0 20

Einput

Cinput15

Figure 4 Pretreatment to feature maps

Computational Intelligence and Neuroscience 5

e feasible sets of ρ and φ are defined as follows

M m Ωu⟶R2 ∣ mij

le 1 i j isin Ωu1113882 1113883

N n Ωu⟶R2 ∣ nij

le 1 i j isin Ωu1113882 1113883

(10)

e final result is computed pixel by pixel via iterativeoptimization With u0 Einput and v0 m0 and n0 0 stepsizes σm σn and τu τω are chosen For iteration ige 0 updatedvariables are as follows

mi+1 Projm mi + σma1 nabla1113957ui minus 1113957ωi( 11138571113864 1113865

ni+1 Projn ni + σna2nabla1113957ωi1113864 1113865

ui+1 1 +λ2τu1113888 1113889

minus1

ui + τu a1mi+1 +

λ2

Cinput1113888 11138891113890 1113891

ωi+1 ωi + τω a1ni+1 + a2m

i+1( 1113857

u i+1 ui+1 + θ ui+1 minus u i( 1113857

ω i+1 ωi+1 + θ ωi+1 minusω i( 1113857

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

(11)

where Projm(m) and Projn(n) denote the Euclidean pro-jectors onto the sets M and N respectively ey can becalculated by pointwise operations

Projm(m) m

max 1 |m|a1( 1113857

Projn(n) n

max 1 |n|a1( 1113857

(12)

In equation (11) θ stands for the relaxation parameter Itcan be updated iteration by iteration using preconditioning [5]Hence the TGV-based model can achieve globally optimalupsampling solutions when dealing with 2-order images

4 Evaluation

In this section we make a quantitative evaluation of theproposed TGV-based upsampling model We compared ourmodel with current upsampling methods from differentperspectives We used a PASCAL VOC2012 dataset (classsegmentation) to investigate the detailed advanced perfor-mances In the following experiments we manually set theTGV-based upsampling modelrsquos parameters a1 and a2 (inour experiments we initially set a1 005 and a2 01 bothof them can be trained with iterations)

We retrained original networks (FCN U-Net andSegNet) and their modified versions in the TGV upsamplingalgorithm on a PASCAL training set (with 1464 images)

For a reconstruction comparison on feature maps(Figures 5ndash7) we evaluated the above four methods on aPASCAL validation set (with 1449 images) We randomlyselect feature maps from a certain layer in FCN U-Net andSegNet and compared the upsampled results of using dif-ferent methods

For overall comparisons of segmentation results(Figure 8) we used a PASCAL test set (with 1456 images)For quantitative results (with 1456 images) we adoptedoverall accuracy (OA) mean accuracy (MAcc) frequency-weighted intersection over union (FWIoU) and mean in-tersection over union (MIoU) as metrics

41 Experiments

411 Reconstruction of Intermediate Feature Maps In thisstage we replaced the original upsampling layer in all threenetworks (FCN U-Net and SegNet) with our TGVupsampling method (see Figure 9 with FCN as an example)To better compare results we illustrated the performance ofTGV upsampling when dealing with the same inputs (fea-ture maps produced by previous layers) and so on replacingbilinear operations in U-Net and unpooling in SegNet

Considering that certain upsampling methods maymultiply when used in models we only selected one com-parison regardless of the size of the inputs and how manyupsampling operations appeared in the networks Featuremaps in Figures 5ndash7 are the most representative of activationoutputs in each layer in the training process

In Figures 5ndash7 (illustrating the comparisons of thefeature maps) all input maps are from some layer in theencoding stage We found that TGV upsampling performedbetter at saving the textures and edges of the feature map InFigure 5 with the same input maps from the FCN archi-tecture we observed that compared with transposed con-volution reconstruction TGV upsampling reproducedalmost all classes (represented in different colors) thatappeared in input maps While being a trainable operationthe transposed operation processes inputs following aldquopadding and extractingrdquo pipeline ere absolutely exists acertain chance that sampled maps contain some classes thatin fact are contained in input maps or a class may be filteredIt can be clearly seen that in the second and third inputsubmaps (Figure 5) the transposed operation did notcompletely reproduce all objects appearing in the input mapEven if these cases are ignored it is easy to observe that TGVupsampling is better in reconstructing texture than trans-posed operations

As illustrated in Figure 6 bilinear interpolation recoveredtoo much unnecessary information about known classes fromboth x and y directions of the object compared to theprinciple of TGV upsampling reconstruction based on thehigh-order divergence of the known pixel values e sourceof the TGV modelrsquos information was more reasonable andmore complex e reconstruction results of TGV upsam-pling were also better than those of bilinear interpolation

Lastly Figure 7 compares unpooling and TGV upsam-pling Since unpooling only records the locations of maxi-mum activations when performing pooling operations in theencoding stage it is clear that the unpooling reconstruction isnot complete as there are many blank spaces in places wheretexture should appear In general the efficiency of theunpooling operation is the worst of the three in terms ofvisualization results

6 Computational Intelligence and Neuroscience

412 Training in TGV Upsampling In this experiment wecompared overall segmentation results with those of theoriginal networks

Figure 8 illustrates examples of PASCAL VOC outputsfrom FCN (FCN-8s) U-Net SegNet and the modified TGVupsampling-based networks (denoted by FCN-TGV U-Net-

(a) (b) (c) (d)

Figure 6 Comparison between a bilinear operation and TGV upsampling for the same feature maps (in U-Net) (a) Input maps (b) TGVupsample (c) Bilinear (d) Ground truth

(a) (b) (c) (d)

Figure 5 Comparison between a transposed operation and TGV upsampling for the same feature maps (in FCN) (a) Input maps (b) TGVupsample (c) Transposed (d) Ground truth

(a) (b) (c) (d)

Figure 7 Comparison between an unpooling operation and TGV upsampling for the same feature maps (in SegNet) (a) Input maps(b) TGV upsample (c) Unpooling (d) Ground truth

Computational Intelligence and Neuroscience 7

TGV and SegNet-TGV respectively) Compared to thecoarse-feature maps of the original networks class (or ob-ject) recognitions (referring to the interval area between

objects) in our results are more precious (see SegNet andSegNet-TGV) and the situations of exceeding areas inground truth are greatly eased (see FCN and FCN-TGV) At

Input

FT

GT

S

ST

U

UT

F

Figure 8 Overall segmentation example results on PASCAL VOC (GT stands for ground truth F S and U represent results from FCNSegNet and U-Net while FT ST and UT mean three networks combined with TGV upsampling)

8 Computational Intelligence and Neuroscience

the same time they well preserved the smoothness of theboundary Lastly it can be concluded that the TGV upsamplingmethod helps models make a better predication on semantics

e quantitative results of the VOC test set of ourproposed algorithm and the competitors are presented inTable 1We observed that in terms of overall accuracy meanaccuracy frequency accuracy or MIoU all metrics wereimproved by about 14ndash23 corresponding to the visual-ization results in Figure 10

Figures 10ndash12 show the effect after applying the newupsampling method (the X and Y axes represent iterationsand losses respectively) We found that the TGVupsampling algorithm had a significant effect on reducinglosses compared to the original methods With a fixed batchsize our new method converged faster than the originalmodels

413 Loss Function in the Training Process e lossfunction of FCN FCN-TGV U-Net U-Net-TGV SegNetand SegNet-TGV on the PASCAL VOC VALIDATAE set isgiven in Figures 10ndash12

42 Analysis After concluding all four upsampling op-erations introduced in the above sections we can groupthem into two types based on their different informationsources

421 Back-Up (Including Unpooling and TransposedConvolution Operations) Many researchers think that whenreconstructing feature maps intermediate results producedin previous operations should be referred to (especiallylayers in the encoding stage) as if images are processed along

a ldquopathrdquo Having known conditions of front distances (re-ferring to corresponding layers in the encoding stage) makesit possible to better ldquopredicaterdquo the following road conditions(referring to feature map reconstructions) us whenapplying these methods operation information of previouslayers should be saved For example the unpooling oper-ation is actually seen as an index storage With saved pa-rameters feature maps can be well reconstructed

However in both transposed and unpooling operationsit is unavoidable that the processed results would retain acertain part of the intermediate results produced by previouslayers In some cases this is useful for keeping importantinformation that may be overlooked in the training processbut it can also be a disturbance by bringing along un-necessary information

422 From-Itself (Including Bilinear Interpolation andOur TGV Upsampling Operations) In contrast to theldquoback-uprdquo category the information sources bilinearinterpolation and TGV upsampling used to make up forlosses are coming from the methodsrsquo processes ratherthan results of previous operations As described inRelated Works bilinear operation interpolates from xand y directions based on known sample values e TGV

Table 1 Results of the PASCAL VOC set

Method OA MAcc FWIoU MIoUFCN-8s 9113 7868 8459 6459FCN-TGV 9233 8012 8749 6602SegNet 8686 7441 8032 5859SegNet-TGV 8746 7608 8345 6048U-Net 8713 7068 7959 6159U-Net-TGV 8914 7212 8149 6389

Input

Conv

Conv

hellip

Conv 1

Conv

Conv

hellip

Conv 2

Conv

Conv

hellip

Conv 3

Conv

Conv

hellip

Conv 4

Conv

Conv

hellip

Conv 5

Conv

Conv

hellip

Conv

sco

re_

Conv 6

Conv

tra

nsp TGV

Slice Slice

+

Conv

tra

nsp

Slice

Output

TGV

Conv

sco

re_

Slice Slice

+

Conv

tra

nsp

TGV

times(12)

times(12)

times(14)

times(14)

times(14)

times(18)

times(18)

times(18)times(116) times(116)times(132)

Figure 9 Replacing the transposed convolutional layer in the FCN with the TGV upsampling method

Computational Intelligence and Neuroscience 9

method has a more complex information source It startsfrom k-order divergence of known pixel values (for k-order images) It can better preserve boundary

information of textures and the step effect can be ef-fectively avoided raising the smoothness of the entireimage at the same time

2000e + 5

1600e + 5

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(a)

1600e + 5

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(b)

Figure 10 Loss function of SegNet (a) and SegNet-TGV (b) on the PASCAL VOC VALIDATAE set

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(a)

0000 3000k 9000k6000k

1000e + 5

8000e + 4

6000e + 4

4000e + 4

2000e + 4

000

(b)

Figure 11 Loss function of FCN (a) and FCN-TGV (b) on the PASCAL VOC VALIDATAE set

1200e + 5

1000e + 5

8000e + 4

6000e + 4

0000 3000k 9000k6000k

(a)

1200e + 5

8000e + 4

4000e + 4

0000 3000k 9000k6000k

000

(b)

Figure 12 Loss function of U-Net (a) and U-Net-TGV (b) on the PASCAL VOC VALIDATAE set

10 Computational Intelligence and Neuroscience

From the results of these experiments the numericalresults of the loss function and the smoothness of thesampled feature maps have proven that the TGV-upsam-pling method is a great improvement over current methods

5 Conclusions

In this work we introduced a TGV-upsampling methodbased on image restoration research We transformed theprocess of feature map reconstruction into a loss-optimiza-tion problem Based on the divergence of each order we usedTGV regularization which reconstructs piecewise functionsFor numerical optimization we used a primal-dual algorithmwhich can effectively perform parallel computing and result inhigh frame rates We applied our new method to threenetworks and evaluated the results via PASCAL VOC data-setsese tests proved that the TGV upsampling method cangreatly make up for lost smoothness and boundary in-formation ofmaps Compared to the original methods used innetworks (FCN U-Net and SegNet) this new method madeaverage improvements of 14ndash23 in terms of MIoU accu-racy when used in the decoding stage When observing thetraining process in the experiments we clearly found that theTGV upsampling method greatly made up for informationlosses that resulted from other operations (mainly referring tovarious convolutional layers)

e proposed algorithm is not limited to single-layermap upsampling In the future it will be extended to un-derstanding entire scenes for effectively incorporatingtemporal coherence in a consistent way On the contrarywith the proposed ldquorestoration-likerdquo method we will furtherconcentrate on how to better understand latent semanticsunder unsupervised settings

Data Availability

e codes used to support the findings of this study have notbeen made available because the proposed method would laterbe applied in medical projects that involve personal privacy

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is research was supported by the Research Program toSolve Social Issues of the National Research Foundation ofKorea (NRF) funded by theMinistry of Science and ICT (noNRF2017M3C8A8091768)

References

[1] L-C Chen G Papandreou I Kokkinos K Murphy andA L Yuille ldquoDeeplab semantic image segmentation withdeep convolutional nets atrous convolution and fully con-nected crfsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 40 no 4 pp 834ndash848 2018

[2] X Li Z Liu P Luo C C Loy and X Tang ldquoNot all pixels areequal difficulty-aware semantic segmentation via deep layercascaderdquo in Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition Salt Lake City UT USA July2017

[3] H Zhao J Shi X Qi X Wang and J Jia ldquoPyramid sceneparsing networkrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition Salt Lake City UTUSA July 2017

[4] F Knoll K Bredies T Pock and R Stollberger ldquoSecondorder total generalized variation (TGV) for MRIrdquo MagneticResonance in Medicine vol 65 no 2 pp 480ndash491 2011

[5] A Chambolle and T Pock ldquoA first-order primal-dual algo-rithm for convex problems with applications to imagingrdquoJournal of Mathematical Imaging and Vision vol 40 no 1pp 120ndash145 2011

[6] M D Zeiler G W Taylor and R Fergus ldquoAdaptivedeconvolutional networks for mid and high level featurelearningrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV) vol 1 no 2 2011

[7] A Krizhevsky I Sutskever and G E Hinton ldquoImagenetclassification with deep convolutional neural networksrdquo inProceedings of the Advances in Neural Information ProcessingSystems Lake Tahoe NV USA December 2012

[8] O Ronneberger P Fischer and T Brox ldquoU-net convolu-tional networks for biomedical image segmentationrdquo inProceedings of the International Conference on Medical ImageComputing and Computer-Assisted Intervention pp 234ndash241Springer Munich Germany October 2015

[9] J Long E Shelhamer and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo in Proceedings of theIEEE conference on computer vision and pattern recognitionBoston MA USA June 2015

[10] V Badrinarayanan A Kendall and R Cipolla ldquoSegnet adeep convolutional encoder-decoder architecture for im-age segmentationrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 39 no 12 pp 2481ndash24952017

[11] H Noh S Hong and B Han ldquoLearning deconvolutionnetwork for semantic segmentationrdquo in Proceedings of theIEEE International Conference on Computer Vision SantiagoChile December 2015

[12] D R Martin C C Fowlkes and J Malik ldquoLearning to detectnatural image boundaries using local brightness color andtexture cuesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 26 no 5 pp 530ndash549 2004

[13] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[14] V Dumoulin and F Visin ldquoA guide to convolution arithmeticfor deep learningrdquo 2016 httpsarxivorgabs160307285

[15] M D Zeiler et al Deconvolutional Networks pp 2528ndash25352010

[16] C R Vogel and M E Oman ldquoFast total variation-basedimage reconstructionrdquo in Proceedings of the 1995 ASMEDesign Engineering Conferences vol 3 No part C BostonMA USA Boston MA USA September 1995

[17] L I Rudin S Osher and E Fatemi ldquoNonlinear total variationbased noise removal algorithmsrdquo Physica D NonlinearPhenomena vol 60 no 1ndash4 pp 259ndash268 1992

[18] M D Zeiler and F Rob ldquoVisualizing and understanding con-volutional networksrdquo in Proceedings of the European Conferenceon Computer Vision Springer Zurich Switzerland September2014

[19] V Papyan Y Romano and M Elad ldquoConvolutional neuralnetworks analyzed via convolutional sparse codingrdquo 2016httpsarxivorgabs160708194

Computational Intelligence and Neuroscience 11

[20] L-B Qiao B-F Zhang J-S Su and X-C Lu ldquoA systematicreview of structured sparse learningrdquo Frontiers of In-formation Technology amp Electronic Engineering vol 18 no 4pp 445ndash463 2017

[21] P Wang P Chen Y Yuan et al ldquoUnderstanding convolutionfor semantic segmentationrdquo in Proceedings of the 2018 IEEEWinter Conference on Applications of Computer Vision(WACV) IEEE Lake Tahoe NV USA March 2018

[22] J T Springenberg et al ldquoStriving for simplicity the allconvolutional netrdquo 2014 httpsarxivorgabs14126806

[23] K Bredies K Kunisch and T Pock ldquoTotal generalizedvariationrdquo SIAM Journal on Imaging Sciences vol 3 no 3pp 492ndash526 2010

[24] T Pock and A Chambolle ldquoDiagonal preconditioning forfirst order primal-dual algorithms in convex optimizationrdquo inProceedings of the 2011 International Conference on ComputerVision IEEE Barcelona Spain November 2011

[25] E Esser X Zhang and T F Chan ldquoA general framework for aclass of first order primal-dual algorithms for convex opti-mization in imaging sciencerdquo SIAM Journal on ImagingSciences vol 3 no 4 pp 1015ndash1046 2010

[26] D Ferstl C Reinbacher R Ranftl M Ruether andH Bischof ldquoImage guided depth upsampling using aniso-tropic total generalized variationrdquo in Proceedings of the IEEEInternational Conference on Computer Vision SydneyAustralia December 2013

[27] W He H Zhang L Zhang and H Shen ldquoTotal-variation-regularized low-rank matrix factorization for hyperspectralimage restorationrdquo IEEE Transactions on Geoscience andRemote Sensing vol 54 no 1 pp 178ndash188 2016

12 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 3: TGVUpsampling:AMaking-UpOperationfor …downloads.hindawi.com/journals/cin/2019/8527819.pdf1.Introduction Compared to traditional classification tasks, semantic seg- ... We made a

For example with four known point values x11

(x1 y1) x12 (x1 y2) x21 (x2 y1) and x22 (x2 y2)the value of function τ can be found at the point (x y)

X-direction

Q xy1( ) τ x y1( 1113857 asympx2 minusx

x2 minusx1τ x11( 1113857 +

xminusx1

x2 minus x1τ x21( 1113857

Q xy2( ) τ x y2( 1113857 asympx2 minusx

x2 minusx1τ x12( 1113857 +

xminusx1

x2 minus x1τ x22( 1113857

(1)

Y-direction (yielded)

Q(xy) τ(x y) asympy2 minusy

y2 minusy1Q xy1( ) +

yminusy1

y2 minusy1Q xy2( )

τ x11( 1113857

x2 minus x1( 1113857 y2 minusy1( 1113857x2 minusx( 1113857 y2 minusy( 1113857

+τ x21( 1113857

x2 minus x1( 1113857 y2 minusy1( 1113857xminusx1( 1113857 y2 minusy( 1113857

+τ x21( 1113857

x2 minus x1( 1113857 y2 minusy1( 1113857x2 minusx( 1113857 yminusy1( 1113857

+τ x22( 1113857

x2 minus x1( 1113857 y2 minusy1( 1113857xminusx1( 1113857 yminusy1( 1113857

(2)

With the above processes it is possible to find an ap-proximation value of any point (x y) within the functioninterval (x isin (x1 x2) y isin (y1 y2))

222 Transposed Convolution Strictly speaking transposedconvolution is a kind of convolution operation instead of asampling method but many researchers are currently makingit their first choice to make up for losses in the decoding stage

From the perspective of convolution arithmetic eachconvolution operation can be represented as y Clowast x whereC and x stand for weight matrix and input maps respectively

In the period of backpropagation assuming that thederivative of layers zLosszy (Loss means the loss of thewhole network when training) is known correspondingderivatives of weights can be written as

zLosszx

1113944i

zLosszxi

1113944i

zLosszyi

lowastzyi

zxj

1113944i

zLosszyi

lowastCij zLoss

zylowastClowastj

CTlowastj lowast

zLosszy

(3)

Equation (3) shows the relationship of zLosszx andzLosszy It shows that transposed convolution is actuallymultiplying CT or (CT)T C when doing a forward pass anda backward pass respectively

When transposed convolution is used as the upsam-pling method situations are totally different from bilinearinterpolation e study of Dumoulin and Visin [14]

introduced four different cases where a transposed oper-ation was applied

For instance (Figure 2) the transposed convolution overa 2times 2 input for a 3times 3 output (stride 1) equals a reverseoperation of the convolutional layer (y Clowastx with input4times 4 kernel size 2)e original 2times 2map is padded with a2times 2 border of zeros first and then CT is multiplied

223 Unpooling Various combinations of layers havecurrently improved the efficiency of networks but theybrought a severe problem at the same time parameters Evena single convolutional layer requires a high number ofweight parameters which poses a significant challenge forCPUs or GPUs As noted in Section 2 transposed convo-lution equals multiplying a transposed matrix of corre-sponding convolutional layers meaning that we shouldspare enough memory to save these matrices when training

First utilized in deconvolutional networks [15]unpooling is much simpler and easier to use

Figure 3 shows a basic process of an unpooling opera-tion All numbers above correspond to values in a featuremap Assume that for some map input the unpoolingmethod records indices of the largest values before per-forming max-pooling (from input⟶ a) In the decodingstage having received outputs b from previous layers withpooling indices (represented by the black grids) it is possibleto upsample the pixel values inside to their original places(from b⟶ output) In fact unpooling keeps the directionalinformation (ldquopooling indicesrdquo) of the largest pixel values ineach feature map when processing max-pooling operationsis action solves the problems of ldquodirectionsrdquo and ldquopad-dingrdquo at the same time

Compared to transposed convolution the advantage ofunpooling lies in the number of parameters which shouldonly record the pooling indices

3 TGV Upsampling Algorithm

Inspired by the total variation model [16 17] we introduceanother upsampling method named ldquoTGV upsamplingrdquoWesee upsampling methods as a way to combine feature maprestoration with knowledge of image restoration (with theaim of making up for loss of information) e backgroundand details of our method are described as follows

31 Problem Transformation Although it has been severalyears since the first convolutional neural network (CNN)was used [7] the internal architecture can still be simplyconcluded Most networks now can be thought of as acombination of convolutional layers and max-poolinglayers and there are a lot of research studies [14 18ndash21]available for understanding what CNN is Many previousstudies pointed out that ldquomax-pooling can be replaced by aconvolutional layer with increased stride and without loss inaccuracy at the same timerdquo [22] Certainly we can think ofnetwork components with max-pooling and convolutionallayers as a set of convolutional computations regardless ofwhat the internal architecture actually is

Computational Intelligence and Neuroscience 3

As we discussed in previous sections in the field ofsemantic segmentations researchers have tended to di-vide the whole process into encoding and decoding stageswhere the upsampling step always happens in thedecoding stage Many researchers thought that a ldquopathrdquoexists in convolution operations From the perspective offeature maps stages of encoding and decoding refer to theexpanding and contracting paths respectively these twopaths are more or less symmetrical Under such cir-cumstances we treat upsampling as a restoration task offeature maps produced from corresponding layers inencoding stages

For example Figure 1 provides a simple view of a generalsymmetric-like network e whole network is grouped intoencoding and decoding stages and layers in the two stagesare represented by Ei andCj respectively (where i and j standfor the direction of the layer) When i j it meansEinput Coutput and we consider the outputs of Cj as theresult of feature map restoration to Einput

As shown in Figure 4 after a set of convolutionalcomputations a 4times 4 feature map in the encoding stagebecomes a 2times 2 pixel matrix (this matrix is also the output ofa certain layer) With a projection operation this map can berecovered to the original size of the map in the encoding

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

6 8

14 16

Max-pooling

0 0 0 0

0 5 0 7

0 0 0 0

0 7 0 11

5 7

9 11

Unpooling

Pooling indicesInput Output

a b

Figure 3 Unpooling operation

Original map

Upsampled map

Figure 2 Transposed convolution

4 Computational Intelligence and Neuroscience

stage (this temporary result is denoted by Cinput) epadding part (referring to the heavy blue grids) appears asblurred (noisy) areas the rest of the work is a restoration job

When comparing the left matrix with the right one weformulate the basic loss function as follows

loss u Cinput1113872 1113873 12

u Cinput1113872 1113873minusCinput

2

2 (4)

where u stands for our upsampled result and middot2 denotes 2-norm

32 TGV-Based Upsampling TGV [23] is mainly used tosolve problems like image denoising and restoration But inthis paper we adapted it into a trainable upsampling methodof feature maps We increase the resolution of maps to atarget size and then perform TGV on each map

e first step in our method is projection considering thesize difference between low- and high-resolution maps Wefirst applied bilinear interpolation to get a preliminary resultwhich has the same size as maps in the encoding stage eseprocessedmaps will be seen as noisy areas (actually referring tothe entire image) which would be treated in TGV restoration

Since Bredies et al [23] put forward the TGV model ithas been widely used in the field of image restoration eTGVmodel greatly reconstructs an image from blurred dataor noisy indirect measurements We formulated the wholeupsampling problem as a convex optimization model[12 24ndash26] e mathematical formulations for our modelare outlined as

minu

λloss(u x) + TGVKα (u)1113966 1113967 (5)

where loss(u x) represents image fidelity the parameter λ isused to weight previous operations to compute a global op-timization and TGVK

α (u) stands for the regularization termWith the visualization of convolutional networks [8] we

know that convolutional computations can effectively ex-tract and generalize features outputs of which can be seen asa set of features To better understand the semantics of animage smoothness of textures and border edges are par-ticularly important to feature maps

For a k-order image traditional restoration methodstend to fix the bounded variation seminorm whereas theTGV model introduces a k-order function and incorporatesinformation from different channels which can effectivelymaintain texture discontinuities

Given a k-order image μ we represent the TGVmodel asfollows

TGVka(u) sup

⎧⎨

⎩1113946ΩΩdiv

kv dx ∣ v isin C

KC Ω Symk IRd

1113872 11138731113872 1113873

middot divlv

le al l 0 kminus 1⎫⎬

(6)

Let Symk(IRd) represent the space of symmetrictensors of order k By balancing derivatives from the firstorder to the kth order it greatly alleviates the problem ofcontaining different grey levels when imaging leading toboundary information loss as well as edge and cornerloss

But in this work aiming at featuremaps it turns out that a2-order TGV is sufficient We formulate the 2-order model as

TGV2a(u) min

wa11113946Ω

|nablauminusω| + a21113946Ω

|ε(ω)|1113882 1113883 (7)

For a given 2-order u the minimum of the TGVmodel istaken over all complex vector fields w in the bounded do-main ε(w) 12(nablaω + nablaωt) stands for the symmetrizedderivative Parameters a1 and a2 are applied to balance thefirst and second derivatives

e final TGV-based upsampling model is defined as acombination of the loss function (5) and the TGV term (6) asequation (7)

minuw

λ2

u Cinput1113872 1113873minusCinput

2

2+ a11113946

Ω|nablauminusω| + a01113946

Ω|ε(ω)|1113896 1113897

(8)

Meanwhile unlike fixed weights used in [23] we adaptTGV into a trainable method taking weights a_1 and a_2into backpropagation to search for a suitable balancepoint

33 Optimization Methods Considering that the upsam-pling model we proposed (formula (8)) is convex and notsmooth we used the primal-dual scheme [24 25 27] to solvethis problem We reformulated our upsampling model as aconvex-concave saddle-point problem by introducing twodual variables m and n e transformed model of formula(8) is provided by

minu

maxv

λ2

1113944ijisinΩ

u Cij1113872 1113873minusEij1113872 11138732

+ a1lang(nablauminusω) mrang

+ a2langnablaω nrang

(9)

1 2 3 4

5 6 7 8

9 10 11 12

13 14 16

17 18

19 20

Pretreated maps

Convolutional operations Projection

17 0 0 18

0 0 0 0

0 0 0 0

19 0 0 20

Einput

Cinput15

Figure 4 Pretreatment to feature maps

Computational Intelligence and Neuroscience 5

e feasible sets of ρ and φ are defined as follows

M m Ωu⟶R2 ∣ mij

le 1 i j isin Ωu1113882 1113883

N n Ωu⟶R2 ∣ nij

le 1 i j isin Ωu1113882 1113883

(10)

e final result is computed pixel by pixel via iterativeoptimization With u0 Einput and v0 m0 and n0 0 stepsizes σm σn and τu τω are chosen For iteration ige 0 updatedvariables are as follows

mi+1 Projm mi + σma1 nabla1113957ui minus 1113957ωi( 11138571113864 1113865

ni+1 Projn ni + σna2nabla1113957ωi1113864 1113865

ui+1 1 +λ2τu1113888 1113889

minus1

ui + τu a1mi+1 +

λ2

Cinput1113888 11138891113890 1113891

ωi+1 ωi + τω a1ni+1 + a2m

i+1( 1113857

u i+1 ui+1 + θ ui+1 minus u i( 1113857

ω i+1 ωi+1 + θ ωi+1 minusω i( 1113857

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

(11)

where Projm(m) and Projn(n) denote the Euclidean pro-jectors onto the sets M and N respectively ey can becalculated by pointwise operations

Projm(m) m

max 1 |m|a1( 1113857

Projn(n) n

max 1 |n|a1( 1113857

(12)

In equation (11) θ stands for the relaxation parameter Itcan be updated iteration by iteration using preconditioning [5]Hence the TGV-based model can achieve globally optimalupsampling solutions when dealing with 2-order images

4 Evaluation

In this section we make a quantitative evaluation of theproposed TGV-based upsampling model We compared ourmodel with current upsampling methods from differentperspectives We used a PASCAL VOC2012 dataset (classsegmentation) to investigate the detailed advanced perfor-mances In the following experiments we manually set theTGV-based upsampling modelrsquos parameters a1 and a2 (inour experiments we initially set a1 005 and a2 01 bothof them can be trained with iterations)

We retrained original networks (FCN U-Net andSegNet) and their modified versions in the TGV upsamplingalgorithm on a PASCAL training set (with 1464 images)

For a reconstruction comparison on feature maps(Figures 5ndash7) we evaluated the above four methods on aPASCAL validation set (with 1449 images) We randomlyselect feature maps from a certain layer in FCN U-Net andSegNet and compared the upsampled results of using dif-ferent methods

For overall comparisons of segmentation results(Figure 8) we used a PASCAL test set (with 1456 images)For quantitative results (with 1456 images) we adoptedoverall accuracy (OA) mean accuracy (MAcc) frequency-weighted intersection over union (FWIoU) and mean in-tersection over union (MIoU) as metrics

41 Experiments

411 Reconstruction of Intermediate Feature Maps In thisstage we replaced the original upsampling layer in all threenetworks (FCN U-Net and SegNet) with our TGVupsampling method (see Figure 9 with FCN as an example)To better compare results we illustrated the performance ofTGV upsampling when dealing with the same inputs (fea-ture maps produced by previous layers) and so on replacingbilinear operations in U-Net and unpooling in SegNet

Considering that certain upsampling methods maymultiply when used in models we only selected one com-parison regardless of the size of the inputs and how manyupsampling operations appeared in the networks Featuremaps in Figures 5ndash7 are the most representative of activationoutputs in each layer in the training process

In Figures 5ndash7 (illustrating the comparisons of thefeature maps) all input maps are from some layer in theencoding stage We found that TGV upsampling performedbetter at saving the textures and edges of the feature map InFigure 5 with the same input maps from the FCN archi-tecture we observed that compared with transposed con-volution reconstruction TGV upsampling reproducedalmost all classes (represented in different colors) thatappeared in input maps While being a trainable operationthe transposed operation processes inputs following aldquopadding and extractingrdquo pipeline ere absolutely exists acertain chance that sampled maps contain some classes thatin fact are contained in input maps or a class may be filteredIt can be clearly seen that in the second and third inputsubmaps (Figure 5) the transposed operation did notcompletely reproduce all objects appearing in the input mapEven if these cases are ignored it is easy to observe that TGVupsampling is better in reconstructing texture than trans-posed operations

As illustrated in Figure 6 bilinear interpolation recoveredtoo much unnecessary information about known classes fromboth x and y directions of the object compared to theprinciple of TGV upsampling reconstruction based on thehigh-order divergence of the known pixel values e sourceof the TGV modelrsquos information was more reasonable andmore complex e reconstruction results of TGV upsam-pling were also better than those of bilinear interpolation

Lastly Figure 7 compares unpooling and TGV upsam-pling Since unpooling only records the locations of maxi-mum activations when performing pooling operations in theencoding stage it is clear that the unpooling reconstruction isnot complete as there are many blank spaces in places wheretexture should appear In general the efficiency of theunpooling operation is the worst of the three in terms ofvisualization results

6 Computational Intelligence and Neuroscience

412 Training in TGV Upsampling In this experiment wecompared overall segmentation results with those of theoriginal networks

Figure 8 illustrates examples of PASCAL VOC outputsfrom FCN (FCN-8s) U-Net SegNet and the modified TGVupsampling-based networks (denoted by FCN-TGV U-Net-

(a) (b) (c) (d)

Figure 6 Comparison between a bilinear operation and TGV upsampling for the same feature maps (in U-Net) (a) Input maps (b) TGVupsample (c) Bilinear (d) Ground truth

(a) (b) (c) (d)

Figure 5 Comparison between a transposed operation and TGV upsampling for the same feature maps (in FCN) (a) Input maps (b) TGVupsample (c) Transposed (d) Ground truth

(a) (b) (c) (d)

Figure 7 Comparison between an unpooling operation and TGV upsampling for the same feature maps (in SegNet) (a) Input maps(b) TGV upsample (c) Unpooling (d) Ground truth

Computational Intelligence and Neuroscience 7

TGV and SegNet-TGV respectively) Compared to thecoarse-feature maps of the original networks class (or ob-ject) recognitions (referring to the interval area between

objects) in our results are more precious (see SegNet andSegNet-TGV) and the situations of exceeding areas inground truth are greatly eased (see FCN and FCN-TGV) At

Input

FT

GT

S

ST

U

UT

F

Figure 8 Overall segmentation example results on PASCAL VOC (GT stands for ground truth F S and U represent results from FCNSegNet and U-Net while FT ST and UT mean three networks combined with TGV upsampling)

8 Computational Intelligence and Neuroscience

the same time they well preserved the smoothness of theboundary Lastly it can be concluded that the TGV upsamplingmethod helps models make a better predication on semantics

e quantitative results of the VOC test set of ourproposed algorithm and the competitors are presented inTable 1We observed that in terms of overall accuracy meanaccuracy frequency accuracy or MIoU all metrics wereimproved by about 14ndash23 corresponding to the visual-ization results in Figure 10

Figures 10ndash12 show the effect after applying the newupsampling method (the X and Y axes represent iterationsand losses respectively) We found that the TGVupsampling algorithm had a significant effect on reducinglosses compared to the original methods With a fixed batchsize our new method converged faster than the originalmodels

413 Loss Function in the Training Process e lossfunction of FCN FCN-TGV U-Net U-Net-TGV SegNetand SegNet-TGV on the PASCAL VOC VALIDATAE set isgiven in Figures 10ndash12

42 Analysis After concluding all four upsampling op-erations introduced in the above sections we can groupthem into two types based on their different informationsources

421 Back-Up (Including Unpooling and TransposedConvolution Operations) Many researchers think that whenreconstructing feature maps intermediate results producedin previous operations should be referred to (especiallylayers in the encoding stage) as if images are processed along

a ldquopathrdquo Having known conditions of front distances (re-ferring to corresponding layers in the encoding stage) makesit possible to better ldquopredicaterdquo the following road conditions(referring to feature map reconstructions) us whenapplying these methods operation information of previouslayers should be saved For example the unpooling oper-ation is actually seen as an index storage With saved pa-rameters feature maps can be well reconstructed

However in both transposed and unpooling operationsit is unavoidable that the processed results would retain acertain part of the intermediate results produced by previouslayers In some cases this is useful for keeping importantinformation that may be overlooked in the training processbut it can also be a disturbance by bringing along un-necessary information

422 From-Itself (Including Bilinear Interpolation andOur TGV Upsampling Operations) In contrast to theldquoback-uprdquo category the information sources bilinearinterpolation and TGV upsampling used to make up forlosses are coming from the methodsrsquo processes ratherthan results of previous operations As described inRelated Works bilinear operation interpolates from xand y directions based on known sample values e TGV

Table 1 Results of the PASCAL VOC set

Method OA MAcc FWIoU MIoUFCN-8s 9113 7868 8459 6459FCN-TGV 9233 8012 8749 6602SegNet 8686 7441 8032 5859SegNet-TGV 8746 7608 8345 6048U-Net 8713 7068 7959 6159U-Net-TGV 8914 7212 8149 6389

Input

Conv

Conv

hellip

Conv 1

Conv

Conv

hellip

Conv 2

Conv

Conv

hellip

Conv 3

Conv

Conv

hellip

Conv 4

Conv

Conv

hellip

Conv 5

Conv

Conv

hellip

Conv

sco

re_

Conv 6

Conv

tra

nsp TGV

Slice Slice

+

Conv

tra

nsp

Slice

Output

TGV

Conv

sco

re_

Slice Slice

+

Conv

tra

nsp

TGV

times(12)

times(12)

times(14)

times(14)

times(14)

times(18)

times(18)

times(18)times(116) times(116)times(132)

Figure 9 Replacing the transposed convolutional layer in the FCN with the TGV upsampling method

Computational Intelligence and Neuroscience 9

method has a more complex information source It startsfrom k-order divergence of known pixel values (for k-order images) It can better preserve boundary

information of textures and the step effect can be ef-fectively avoided raising the smoothness of the entireimage at the same time

2000e + 5

1600e + 5

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(a)

1600e + 5

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(b)

Figure 10 Loss function of SegNet (a) and SegNet-TGV (b) on the PASCAL VOC VALIDATAE set

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(a)

0000 3000k 9000k6000k

1000e + 5

8000e + 4

6000e + 4

4000e + 4

2000e + 4

000

(b)

Figure 11 Loss function of FCN (a) and FCN-TGV (b) on the PASCAL VOC VALIDATAE set

1200e + 5

1000e + 5

8000e + 4

6000e + 4

0000 3000k 9000k6000k

(a)

1200e + 5

8000e + 4

4000e + 4

0000 3000k 9000k6000k

000

(b)

Figure 12 Loss function of U-Net (a) and U-Net-TGV (b) on the PASCAL VOC VALIDATAE set

10 Computational Intelligence and Neuroscience

From the results of these experiments the numericalresults of the loss function and the smoothness of thesampled feature maps have proven that the TGV-upsam-pling method is a great improvement over current methods

5 Conclusions

In this work we introduced a TGV-upsampling methodbased on image restoration research We transformed theprocess of feature map reconstruction into a loss-optimiza-tion problem Based on the divergence of each order we usedTGV regularization which reconstructs piecewise functionsFor numerical optimization we used a primal-dual algorithmwhich can effectively perform parallel computing and result inhigh frame rates We applied our new method to threenetworks and evaluated the results via PASCAL VOC data-setsese tests proved that the TGV upsampling method cangreatly make up for lost smoothness and boundary in-formation ofmaps Compared to the original methods used innetworks (FCN U-Net and SegNet) this new method madeaverage improvements of 14ndash23 in terms of MIoU accu-racy when used in the decoding stage When observing thetraining process in the experiments we clearly found that theTGV upsampling method greatly made up for informationlosses that resulted from other operations (mainly referring tovarious convolutional layers)

e proposed algorithm is not limited to single-layermap upsampling In the future it will be extended to un-derstanding entire scenes for effectively incorporatingtemporal coherence in a consistent way On the contrarywith the proposed ldquorestoration-likerdquo method we will furtherconcentrate on how to better understand latent semanticsunder unsupervised settings

Data Availability

e codes used to support the findings of this study have notbeen made available because the proposed method would laterbe applied in medical projects that involve personal privacy

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is research was supported by the Research Program toSolve Social Issues of the National Research Foundation ofKorea (NRF) funded by theMinistry of Science and ICT (noNRF2017M3C8A8091768)

References

[1] L-C Chen G Papandreou I Kokkinos K Murphy andA L Yuille ldquoDeeplab semantic image segmentation withdeep convolutional nets atrous convolution and fully con-nected crfsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 40 no 4 pp 834ndash848 2018

[2] X Li Z Liu P Luo C C Loy and X Tang ldquoNot all pixels areequal difficulty-aware semantic segmentation via deep layercascaderdquo in Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition Salt Lake City UT USA July2017

[3] H Zhao J Shi X Qi X Wang and J Jia ldquoPyramid sceneparsing networkrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition Salt Lake City UTUSA July 2017

[4] F Knoll K Bredies T Pock and R Stollberger ldquoSecondorder total generalized variation (TGV) for MRIrdquo MagneticResonance in Medicine vol 65 no 2 pp 480ndash491 2011

[5] A Chambolle and T Pock ldquoA first-order primal-dual algo-rithm for convex problems with applications to imagingrdquoJournal of Mathematical Imaging and Vision vol 40 no 1pp 120ndash145 2011

[6] M D Zeiler G W Taylor and R Fergus ldquoAdaptivedeconvolutional networks for mid and high level featurelearningrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV) vol 1 no 2 2011

[7] A Krizhevsky I Sutskever and G E Hinton ldquoImagenetclassification with deep convolutional neural networksrdquo inProceedings of the Advances in Neural Information ProcessingSystems Lake Tahoe NV USA December 2012

[8] O Ronneberger P Fischer and T Brox ldquoU-net convolu-tional networks for biomedical image segmentationrdquo inProceedings of the International Conference on Medical ImageComputing and Computer-Assisted Intervention pp 234ndash241Springer Munich Germany October 2015

[9] J Long E Shelhamer and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo in Proceedings of theIEEE conference on computer vision and pattern recognitionBoston MA USA June 2015

[10] V Badrinarayanan A Kendall and R Cipolla ldquoSegnet adeep convolutional encoder-decoder architecture for im-age segmentationrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 39 no 12 pp 2481ndash24952017

[11] H Noh S Hong and B Han ldquoLearning deconvolutionnetwork for semantic segmentationrdquo in Proceedings of theIEEE International Conference on Computer Vision SantiagoChile December 2015

[12] D R Martin C C Fowlkes and J Malik ldquoLearning to detectnatural image boundaries using local brightness color andtexture cuesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 26 no 5 pp 530ndash549 2004

[13] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[14] V Dumoulin and F Visin ldquoA guide to convolution arithmeticfor deep learningrdquo 2016 httpsarxivorgabs160307285

[15] M D Zeiler et al Deconvolutional Networks pp 2528ndash25352010

[16] C R Vogel and M E Oman ldquoFast total variation-basedimage reconstructionrdquo in Proceedings of the 1995 ASMEDesign Engineering Conferences vol 3 No part C BostonMA USA Boston MA USA September 1995

[17] L I Rudin S Osher and E Fatemi ldquoNonlinear total variationbased noise removal algorithmsrdquo Physica D NonlinearPhenomena vol 60 no 1ndash4 pp 259ndash268 1992

[18] M D Zeiler and F Rob ldquoVisualizing and understanding con-volutional networksrdquo in Proceedings of the European Conferenceon Computer Vision Springer Zurich Switzerland September2014

[19] V Papyan Y Romano and M Elad ldquoConvolutional neuralnetworks analyzed via convolutional sparse codingrdquo 2016httpsarxivorgabs160708194

Computational Intelligence and Neuroscience 11

[20] L-B Qiao B-F Zhang J-S Su and X-C Lu ldquoA systematicreview of structured sparse learningrdquo Frontiers of In-formation Technology amp Electronic Engineering vol 18 no 4pp 445ndash463 2017

[21] P Wang P Chen Y Yuan et al ldquoUnderstanding convolutionfor semantic segmentationrdquo in Proceedings of the 2018 IEEEWinter Conference on Applications of Computer Vision(WACV) IEEE Lake Tahoe NV USA March 2018

[22] J T Springenberg et al ldquoStriving for simplicity the allconvolutional netrdquo 2014 httpsarxivorgabs14126806

[23] K Bredies K Kunisch and T Pock ldquoTotal generalizedvariationrdquo SIAM Journal on Imaging Sciences vol 3 no 3pp 492ndash526 2010

[24] T Pock and A Chambolle ldquoDiagonal preconditioning forfirst order primal-dual algorithms in convex optimizationrdquo inProceedings of the 2011 International Conference on ComputerVision IEEE Barcelona Spain November 2011

[25] E Esser X Zhang and T F Chan ldquoA general framework for aclass of first order primal-dual algorithms for convex opti-mization in imaging sciencerdquo SIAM Journal on ImagingSciences vol 3 no 4 pp 1015ndash1046 2010

[26] D Ferstl C Reinbacher R Ranftl M Ruether andH Bischof ldquoImage guided depth upsampling using aniso-tropic total generalized variationrdquo in Proceedings of the IEEEInternational Conference on Computer Vision SydneyAustralia December 2013

[27] W He H Zhang L Zhang and H Shen ldquoTotal-variation-regularized low-rank matrix factorization for hyperspectralimage restorationrdquo IEEE Transactions on Geoscience andRemote Sensing vol 54 no 1 pp 178ndash188 2016

12 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 4: TGVUpsampling:AMaking-UpOperationfor …downloads.hindawi.com/journals/cin/2019/8527819.pdf1.Introduction Compared to traditional classification tasks, semantic seg- ... We made a

As we discussed in previous sections in the field ofsemantic segmentations researchers have tended to di-vide the whole process into encoding and decoding stageswhere the upsampling step always happens in thedecoding stage Many researchers thought that a ldquopathrdquoexists in convolution operations From the perspective offeature maps stages of encoding and decoding refer to theexpanding and contracting paths respectively these twopaths are more or less symmetrical Under such cir-cumstances we treat upsampling as a restoration task offeature maps produced from corresponding layers inencoding stages

For example Figure 1 provides a simple view of a generalsymmetric-like network e whole network is grouped intoencoding and decoding stages and layers in the two stagesare represented by Ei andCj respectively (where i and j standfor the direction of the layer) When i j it meansEinput Coutput and we consider the outputs of Cj as theresult of feature map restoration to Einput

As shown in Figure 4 after a set of convolutionalcomputations a 4times 4 feature map in the encoding stagebecomes a 2times 2 pixel matrix (this matrix is also the output ofa certain layer) With a projection operation this map can berecovered to the original size of the map in the encoding

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

6 8

14 16

Max-pooling

0 0 0 0

0 5 0 7

0 0 0 0

0 7 0 11

5 7

9 11

Unpooling

Pooling indicesInput Output

a b

Figure 3 Unpooling operation

Original map

Upsampled map

Figure 2 Transposed convolution

4 Computational Intelligence and Neuroscience

stage (this temporary result is denoted by Cinput) epadding part (referring to the heavy blue grids) appears asblurred (noisy) areas the rest of the work is a restoration job

When comparing the left matrix with the right one weformulate the basic loss function as follows

loss u Cinput1113872 1113873 12

u Cinput1113872 1113873minusCinput

2

2 (4)

where u stands for our upsampled result and middot2 denotes 2-norm

32 TGV-Based Upsampling TGV [23] is mainly used tosolve problems like image denoising and restoration But inthis paper we adapted it into a trainable upsampling methodof feature maps We increase the resolution of maps to atarget size and then perform TGV on each map

e first step in our method is projection considering thesize difference between low- and high-resolution maps Wefirst applied bilinear interpolation to get a preliminary resultwhich has the same size as maps in the encoding stage eseprocessedmaps will be seen as noisy areas (actually referring tothe entire image) which would be treated in TGV restoration

Since Bredies et al [23] put forward the TGV model ithas been widely used in the field of image restoration eTGVmodel greatly reconstructs an image from blurred dataor noisy indirect measurements We formulated the wholeupsampling problem as a convex optimization model[12 24ndash26] e mathematical formulations for our modelare outlined as

minu

λloss(u x) + TGVKα (u)1113966 1113967 (5)

where loss(u x) represents image fidelity the parameter λ isused to weight previous operations to compute a global op-timization and TGVK

α (u) stands for the regularization termWith the visualization of convolutional networks [8] we

know that convolutional computations can effectively ex-tract and generalize features outputs of which can be seen asa set of features To better understand the semantics of animage smoothness of textures and border edges are par-ticularly important to feature maps

For a k-order image traditional restoration methodstend to fix the bounded variation seminorm whereas theTGV model introduces a k-order function and incorporatesinformation from different channels which can effectivelymaintain texture discontinuities

Given a k-order image μ we represent the TGVmodel asfollows

TGVka(u) sup

⎧⎨

⎩1113946ΩΩdiv

kv dx ∣ v isin C

KC Ω Symk IRd

1113872 11138731113872 1113873

middot divlv

le al l 0 kminus 1⎫⎬

(6)

Let Symk(IRd) represent the space of symmetrictensors of order k By balancing derivatives from the firstorder to the kth order it greatly alleviates the problem ofcontaining different grey levels when imaging leading toboundary information loss as well as edge and cornerloss

But in this work aiming at featuremaps it turns out that a2-order TGV is sufficient We formulate the 2-order model as

TGV2a(u) min

wa11113946Ω

|nablauminusω| + a21113946Ω

|ε(ω)|1113882 1113883 (7)

For a given 2-order u the minimum of the TGVmodel istaken over all complex vector fields w in the bounded do-main ε(w) 12(nablaω + nablaωt) stands for the symmetrizedderivative Parameters a1 and a2 are applied to balance thefirst and second derivatives

e final TGV-based upsampling model is defined as acombination of the loss function (5) and the TGV term (6) asequation (7)

minuw

λ2

u Cinput1113872 1113873minusCinput

2

2+ a11113946

Ω|nablauminusω| + a01113946

Ω|ε(ω)|1113896 1113897

(8)

Meanwhile unlike fixed weights used in [23] we adaptTGV into a trainable method taking weights a_1 and a_2into backpropagation to search for a suitable balancepoint

33 Optimization Methods Considering that the upsam-pling model we proposed (formula (8)) is convex and notsmooth we used the primal-dual scheme [24 25 27] to solvethis problem We reformulated our upsampling model as aconvex-concave saddle-point problem by introducing twodual variables m and n e transformed model of formula(8) is provided by

minu

maxv

λ2

1113944ijisinΩ

u Cij1113872 1113873minusEij1113872 11138732

+ a1lang(nablauminusω) mrang

+ a2langnablaω nrang

(9)

1 2 3 4

5 6 7 8

9 10 11 12

13 14 16

17 18

19 20

Pretreated maps

Convolutional operations Projection

17 0 0 18

0 0 0 0

0 0 0 0

19 0 0 20

Einput

Cinput15

Figure 4 Pretreatment to feature maps

Computational Intelligence and Neuroscience 5

e feasible sets of ρ and φ are defined as follows

M m Ωu⟶R2 ∣ mij

le 1 i j isin Ωu1113882 1113883

N n Ωu⟶R2 ∣ nij

le 1 i j isin Ωu1113882 1113883

(10)

e final result is computed pixel by pixel via iterativeoptimization With u0 Einput and v0 m0 and n0 0 stepsizes σm σn and τu τω are chosen For iteration ige 0 updatedvariables are as follows

mi+1 Projm mi + σma1 nabla1113957ui minus 1113957ωi( 11138571113864 1113865

ni+1 Projn ni + σna2nabla1113957ωi1113864 1113865

ui+1 1 +λ2τu1113888 1113889

minus1

ui + τu a1mi+1 +

λ2

Cinput1113888 11138891113890 1113891

ωi+1 ωi + τω a1ni+1 + a2m

i+1( 1113857

u i+1 ui+1 + θ ui+1 minus u i( 1113857

ω i+1 ωi+1 + θ ωi+1 minusω i( 1113857

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

(11)

where Projm(m) and Projn(n) denote the Euclidean pro-jectors onto the sets M and N respectively ey can becalculated by pointwise operations

Projm(m) m

max 1 |m|a1( 1113857

Projn(n) n

max 1 |n|a1( 1113857

(12)

In equation (11) θ stands for the relaxation parameter Itcan be updated iteration by iteration using preconditioning [5]Hence the TGV-based model can achieve globally optimalupsampling solutions when dealing with 2-order images

4 Evaluation

In this section we make a quantitative evaluation of theproposed TGV-based upsampling model We compared ourmodel with current upsampling methods from differentperspectives We used a PASCAL VOC2012 dataset (classsegmentation) to investigate the detailed advanced perfor-mances In the following experiments we manually set theTGV-based upsampling modelrsquos parameters a1 and a2 (inour experiments we initially set a1 005 and a2 01 bothof them can be trained with iterations)

We retrained original networks (FCN U-Net andSegNet) and their modified versions in the TGV upsamplingalgorithm on a PASCAL training set (with 1464 images)

For a reconstruction comparison on feature maps(Figures 5ndash7) we evaluated the above four methods on aPASCAL validation set (with 1449 images) We randomlyselect feature maps from a certain layer in FCN U-Net andSegNet and compared the upsampled results of using dif-ferent methods

For overall comparisons of segmentation results(Figure 8) we used a PASCAL test set (with 1456 images)For quantitative results (with 1456 images) we adoptedoverall accuracy (OA) mean accuracy (MAcc) frequency-weighted intersection over union (FWIoU) and mean in-tersection over union (MIoU) as metrics

41 Experiments

411 Reconstruction of Intermediate Feature Maps In thisstage we replaced the original upsampling layer in all threenetworks (FCN U-Net and SegNet) with our TGVupsampling method (see Figure 9 with FCN as an example)To better compare results we illustrated the performance ofTGV upsampling when dealing with the same inputs (fea-ture maps produced by previous layers) and so on replacingbilinear operations in U-Net and unpooling in SegNet

Considering that certain upsampling methods maymultiply when used in models we only selected one com-parison regardless of the size of the inputs and how manyupsampling operations appeared in the networks Featuremaps in Figures 5ndash7 are the most representative of activationoutputs in each layer in the training process

In Figures 5ndash7 (illustrating the comparisons of thefeature maps) all input maps are from some layer in theencoding stage We found that TGV upsampling performedbetter at saving the textures and edges of the feature map InFigure 5 with the same input maps from the FCN archi-tecture we observed that compared with transposed con-volution reconstruction TGV upsampling reproducedalmost all classes (represented in different colors) thatappeared in input maps While being a trainable operationthe transposed operation processes inputs following aldquopadding and extractingrdquo pipeline ere absolutely exists acertain chance that sampled maps contain some classes thatin fact are contained in input maps or a class may be filteredIt can be clearly seen that in the second and third inputsubmaps (Figure 5) the transposed operation did notcompletely reproduce all objects appearing in the input mapEven if these cases are ignored it is easy to observe that TGVupsampling is better in reconstructing texture than trans-posed operations

As illustrated in Figure 6 bilinear interpolation recoveredtoo much unnecessary information about known classes fromboth x and y directions of the object compared to theprinciple of TGV upsampling reconstruction based on thehigh-order divergence of the known pixel values e sourceof the TGV modelrsquos information was more reasonable andmore complex e reconstruction results of TGV upsam-pling were also better than those of bilinear interpolation

Lastly Figure 7 compares unpooling and TGV upsam-pling Since unpooling only records the locations of maxi-mum activations when performing pooling operations in theencoding stage it is clear that the unpooling reconstruction isnot complete as there are many blank spaces in places wheretexture should appear In general the efficiency of theunpooling operation is the worst of the three in terms ofvisualization results

6 Computational Intelligence and Neuroscience

412 Training in TGV Upsampling In this experiment wecompared overall segmentation results with those of theoriginal networks

Figure 8 illustrates examples of PASCAL VOC outputsfrom FCN (FCN-8s) U-Net SegNet and the modified TGVupsampling-based networks (denoted by FCN-TGV U-Net-

(a) (b) (c) (d)

Figure 6 Comparison between a bilinear operation and TGV upsampling for the same feature maps (in U-Net) (a) Input maps (b) TGVupsample (c) Bilinear (d) Ground truth

(a) (b) (c) (d)

Figure 5 Comparison between a transposed operation and TGV upsampling for the same feature maps (in FCN) (a) Input maps (b) TGVupsample (c) Transposed (d) Ground truth

(a) (b) (c) (d)

Figure 7 Comparison between an unpooling operation and TGV upsampling for the same feature maps (in SegNet) (a) Input maps(b) TGV upsample (c) Unpooling (d) Ground truth

Computational Intelligence and Neuroscience 7

TGV and SegNet-TGV respectively) Compared to thecoarse-feature maps of the original networks class (or ob-ject) recognitions (referring to the interval area between

objects) in our results are more precious (see SegNet andSegNet-TGV) and the situations of exceeding areas inground truth are greatly eased (see FCN and FCN-TGV) At

Input

FT

GT

S

ST

U

UT

F

Figure 8 Overall segmentation example results on PASCAL VOC (GT stands for ground truth F S and U represent results from FCNSegNet and U-Net while FT ST and UT mean three networks combined with TGV upsampling)

8 Computational Intelligence and Neuroscience

the same time they well preserved the smoothness of theboundary Lastly it can be concluded that the TGV upsamplingmethod helps models make a better predication on semantics

e quantitative results of the VOC test set of ourproposed algorithm and the competitors are presented inTable 1We observed that in terms of overall accuracy meanaccuracy frequency accuracy or MIoU all metrics wereimproved by about 14ndash23 corresponding to the visual-ization results in Figure 10

Figures 10ndash12 show the effect after applying the newupsampling method (the X and Y axes represent iterationsand losses respectively) We found that the TGVupsampling algorithm had a significant effect on reducinglosses compared to the original methods With a fixed batchsize our new method converged faster than the originalmodels

413 Loss Function in the Training Process e lossfunction of FCN FCN-TGV U-Net U-Net-TGV SegNetand SegNet-TGV on the PASCAL VOC VALIDATAE set isgiven in Figures 10ndash12

42 Analysis After concluding all four upsampling op-erations introduced in the above sections we can groupthem into two types based on their different informationsources

421 Back-Up (Including Unpooling and TransposedConvolution Operations) Many researchers think that whenreconstructing feature maps intermediate results producedin previous operations should be referred to (especiallylayers in the encoding stage) as if images are processed along

a ldquopathrdquo Having known conditions of front distances (re-ferring to corresponding layers in the encoding stage) makesit possible to better ldquopredicaterdquo the following road conditions(referring to feature map reconstructions) us whenapplying these methods operation information of previouslayers should be saved For example the unpooling oper-ation is actually seen as an index storage With saved pa-rameters feature maps can be well reconstructed

However in both transposed and unpooling operationsit is unavoidable that the processed results would retain acertain part of the intermediate results produced by previouslayers In some cases this is useful for keeping importantinformation that may be overlooked in the training processbut it can also be a disturbance by bringing along un-necessary information

422 From-Itself (Including Bilinear Interpolation andOur TGV Upsampling Operations) In contrast to theldquoback-uprdquo category the information sources bilinearinterpolation and TGV upsampling used to make up forlosses are coming from the methodsrsquo processes ratherthan results of previous operations As described inRelated Works bilinear operation interpolates from xand y directions based on known sample values e TGV

Table 1 Results of the PASCAL VOC set

Method OA MAcc FWIoU MIoUFCN-8s 9113 7868 8459 6459FCN-TGV 9233 8012 8749 6602SegNet 8686 7441 8032 5859SegNet-TGV 8746 7608 8345 6048U-Net 8713 7068 7959 6159U-Net-TGV 8914 7212 8149 6389

Input

Conv

Conv

hellip

Conv 1

Conv

Conv

hellip

Conv 2

Conv

Conv

hellip

Conv 3

Conv

Conv

hellip

Conv 4

Conv

Conv

hellip

Conv 5

Conv

Conv

hellip

Conv

sco

re_

Conv 6

Conv

tra

nsp TGV

Slice Slice

+

Conv

tra

nsp

Slice

Output

TGV

Conv

sco

re_

Slice Slice

+

Conv

tra

nsp

TGV

times(12)

times(12)

times(14)

times(14)

times(14)

times(18)

times(18)

times(18)times(116) times(116)times(132)

Figure 9 Replacing the transposed convolutional layer in the FCN with the TGV upsampling method

Computational Intelligence and Neuroscience 9

method has a more complex information source It startsfrom k-order divergence of known pixel values (for k-order images) It can better preserve boundary

information of textures and the step effect can be ef-fectively avoided raising the smoothness of the entireimage at the same time

2000e + 5

1600e + 5

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(a)

1600e + 5

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(b)

Figure 10 Loss function of SegNet (a) and SegNet-TGV (b) on the PASCAL VOC VALIDATAE set

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(a)

0000 3000k 9000k6000k

1000e + 5

8000e + 4

6000e + 4

4000e + 4

2000e + 4

000

(b)

Figure 11 Loss function of FCN (a) and FCN-TGV (b) on the PASCAL VOC VALIDATAE set

1200e + 5

1000e + 5

8000e + 4

6000e + 4

0000 3000k 9000k6000k

(a)

1200e + 5

8000e + 4

4000e + 4

0000 3000k 9000k6000k

000

(b)

Figure 12 Loss function of U-Net (a) and U-Net-TGV (b) on the PASCAL VOC VALIDATAE set

10 Computational Intelligence and Neuroscience

From the results of these experiments the numericalresults of the loss function and the smoothness of thesampled feature maps have proven that the TGV-upsam-pling method is a great improvement over current methods

5 Conclusions

In this work we introduced a TGV-upsampling methodbased on image restoration research We transformed theprocess of feature map reconstruction into a loss-optimiza-tion problem Based on the divergence of each order we usedTGV regularization which reconstructs piecewise functionsFor numerical optimization we used a primal-dual algorithmwhich can effectively perform parallel computing and result inhigh frame rates We applied our new method to threenetworks and evaluated the results via PASCAL VOC data-setsese tests proved that the TGV upsampling method cangreatly make up for lost smoothness and boundary in-formation ofmaps Compared to the original methods used innetworks (FCN U-Net and SegNet) this new method madeaverage improvements of 14ndash23 in terms of MIoU accu-racy when used in the decoding stage When observing thetraining process in the experiments we clearly found that theTGV upsampling method greatly made up for informationlosses that resulted from other operations (mainly referring tovarious convolutional layers)

e proposed algorithm is not limited to single-layermap upsampling In the future it will be extended to un-derstanding entire scenes for effectively incorporatingtemporal coherence in a consistent way On the contrarywith the proposed ldquorestoration-likerdquo method we will furtherconcentrate on how to better understand latent semanticsunder unsupervised settings

Data Availability

e codes used to support the findings of this study have notbeen made available because the proposed method would laterbe applied in medical projects that involve personal privacy

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is research was supported by the Research Program toSolve Social Issues of the National Research Foundation ofKorea (NRF) funded by theMinistry of Science and ICT (noNRF2017M3C8A8091768)

References

[1] L-C Chen G Papandreou I Kokkinos K Murphy andA L Yuille ldquoDeeplab semantic image segmentation withdeep convolutional nets atrous convolution and fully con-nected crfsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 40 no 4 pp 834ndash848 2018

[2] X Li Z Liu P Luo C C Loy and X Tang ldquoNot all pixels areequal difficulty-aware semantic segmentation via deep layercascaderdquo in Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition Salt Lake City UT USA July2017

[3] H Zhao J Shi X Qi X Wang and J Jia ldquoPyramid sceneparsing networkrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition Salt Lake City UTUSA July 2017

[4] F Knoll K Bredies T Pock and R Stollberger ldquoSecondorder total generalized variation (TGV) for MRIrdquo MagneticResonance in Medicine vol 65 no 2 pp 480ndash491 2011

[5] A Chambolle and T Pock ldquoA first-order primal-dual algo-rithm for convex problems with applications to imagingrdquoJournal of Mathematical Imaging and Vision vol 40 no 1pp 120ndash145 2011

[6] M D Zeiler G W Taylor and R Fergus ldquoAdaptivedeconvolutional networks for mid and high level featurelearningrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV) vol 1 no 2 2011

[7] A Krizhevsky I Sutskever and G E Hinton ldquoImagenetclassification with deep convolutional neural networksrdquo inProceedings of the Advances in Neural Information ProcessingSystems Lake Tahoe NV USA December 2012

[8] O Ronneberger P Fischer and T Brox ldquoU-net convolu-tional networks for biomedical image segmentationrdquo inProceedings of the International Conference on Medical ImageComputing and Computer-Assisted Intervention pp 234ndash241Springer Munich Germany October 2015

[9] J Long E Shelhamer and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo in Proceedings of theIEEE conference on computer vision and pattern recognitionBoston MA USA June 2015

[10] V Badrinarayanan A Kendall and R Cipolla ldquoSegnet adeep convolutional encoder-decoder architecture for im-age segmentationrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 39 no 12 pp 2481ndash24952017

[11] H Noh S Hong and B Han ldquoLearning deconvolutionnetwork for semantic segmentationrdquo in Proceedings of theIEEE International Conference on Computer Vision SantiagoChile December 2015

[12] D R Martin C C Fowlkes and J Malik ldquoLearning to detectnatural image boundaries using local brightness color andtexture cuesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 26 no 5 pp 530ndash549 2004

[13] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[14] V Dumoulin and F Visin ldquoA guide to convolution arithmeticfor deep learningrdquo 2016 httpsarxivorgabs160307285

[15] M D Zeiler et al Deconvolutional Networks pp 2528ndash25352010

[16] C R Vogel and M E Oman ldquoFast total variation-basedimage reconstructionrdquo in Proceedings of the 1995 ASMEDesign Engineering Conferences vol 3 No part C BostonMA USA Boston MA USA September 1995

[17] L I Rudin S Osher and E Fatemi ldquoNonlinear total variationbased noise removal algorithmsrdquo Physica D NonlinearPhenomena vol 60 no 1ndash4 pp 259ndash268 1992

[18] M D Zeiler and F Rob ldquoVisualizing and understanding con-volutional networksrdquo in Proceedings of the European Conferenceon Computer Vision Springer Zurich Switzerland September2014

[19] V Papyan Y Romano and M Elad ldquoConvolutional neuralnetworks analyzed via convolutional sparse codingrdquo 2016httpsarxivorgabs160708194

Computational Intelligence and Neuroscience 11

[20] L-B Qiao B-F Zhang J-S Su and X-C Lu ldquoA systematicreview of structured sparse learningrdquo Frontiers of In-formation Technology amp Electronic Engineering vol 18 no 4pp 445ndash463 2017

[21] P Wang P Chen Y Yuan et al ldquoUnderstanding convolutionfor semantic segmentationrdquo in Proceedings of the 2018 IEEEWinter Conference on Applications of Computer Vision(WACV) IEEE Lake Tahoe NV USA March 2018

[22] J T Springenberg et al ldquoStriving for simplicity the allconvolutional netrdquo 2014 httpsarxivorgabs14126806

[23] K Bredies K Kunisch and T Pock ldquoTotal generalizedvariationrdquo SIAM Journal on Imaging Sciences vol 3 no 3pp 492ndash526 2010

[24] T Pock and A Chambolle ldquoDiagonal preconditioning forfirst order primal-dual algorithms in convex optimizationrdquo inProceedings of the 2011 International Conference on ComputerVision IEEE Barcelona Spain November 2011

[25] E Esser X Zhang and T F Chan ldquoA general framework for aclass of first order primal-dual algorithms for convex opti-mization in imaging sciencerdquo SIAM Journal on ImagingSciences vol 3 no 4 pp 1015ndash1046 2010

[26] D Ferstl C Reinbacher R Ranftl M Ruether andH Bischof ldquoImage guided depth upsampling using aniso-tropic total generalized variationrdquo in Proceedings of the IEEEInternational Conference on Computer Vision SydneyAustralia December 2013

[27] W He H Zhang L Zhang and H Shen ldquoTotal-variation-regularized low-rank matrix factorization for hyperspectralimage restorationrdquo IEEE Transactions on Geoscience andRemote Sensing vol 54 no 1 pp 178ndash188 2016

12 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 5: TGVUpsampling:AMaking-UpOperationfor …downloads.hindawi.com/journals/cin/2019/8527819.pdf1.Introduction Compared to traditional classification tasks, semantic seg- ... We made a

stage (this temporary result is denoted by Cinput) epadding part (referring to the heavy blue grids) appears asblurred (noisy) areas the rest of the work is a restoration job

When comparing the left matrix with the right one weformulate the basic loss function as follows

loss u Cinput1113872 1113873 12

u Cinput1113872 1113873minusCinput

2

2 (4)

where u stands for our upsampled result and middot2 denotes 2-norm

32 TGV-Based Upsampling TGV [23] is mainly used tosolve problems like image denoising and restoration But inthis paper we adapted it into a trainable upsampling methodof feature maps We increase the resolution of maps to atarget size and then perform TGV on each map

e first step in our method is projection considering thesize difference between low- and high-resolution maps Wefirst applied bilinear interpolation to get a preliminary resultwhich has the same size as maps in the encoding stage eseprocessedmaps will be seen as noisy areas (actually referring tothe entire image) which would be treated in TGV restoration

Since Bredies et al [23] put forward the TGV model ithas been widely used in the field of image restoration eTGVmodel greatly reconstructs an image from blurred dataor noisy indirect measurements We formulated the wholeupsampling problem as a convex optimization model[12 24ndash26] e mathematical formulations for our modelare outlined as

minu

λloss(u x) + TGVKα (u)1113966 1113967 (5)

where loss(u x) represents image fidelity the parameter λ isused to weight previous operations to compute a global op-timization and TGVK

α (u) stands for the regularization termWith the visualization of convolutional networks [8] we

know that convolutional computations can effectively ex-tract and generalize features outputs of which can be seen asa set of features To better understand the semantics of animage smoothness of textures and border edges are par-ticularly important to feature maps

For a k-order image traditional restoration methodstend to fix the bounded variation seminorm whereas theTGV model introduces a k-order function and incorporatesinformation from different channels which can effectivelymaintain texture discontinuities

Given a k-order image μ we represent the TGVmodel asfollows

TGVka(u) sup

⎧⎨

⎩1113946ΩΩdiv

kv dx ∣ v isin C

KC Ω Symk IRd

1113872 11138731113872 1113873

middot divlv

le al l 0 kminus 1⎫⎬

(6)

Let Symk(IRd) represent the space of symmetrictensors of order k By balancing derivatives from the firstorder to the kth order it greatly alleviates the problem ofcontaining different grey levels when imaging leading toboundary information loss as well as edge and cornerloss

But in this work aiming at featuremaps it turns out that a2-order TGV is sufficient We formulate the 2-order model as

TGV2a(u) min

wa11113946Ω

|nablauminusω| + a21113946Ω

|ε(ω)|1113882 1113883 (7)

For a given 2-order u the minimum of the TGVmodel istaken over all complex vector fields w in the bounded do-main ε(w) 12(nablaω + nablaωt) stands for the symmetrizedderivative Parameters a1 and a2 are applied to balance thefirst and second derivatives

e final TGV-based upsampling model is defined as acombination of the loss function (5) and the TGV term (6) asequation (7)

minuw

λ2

u Cinput1113872 1113873minusCinput

2

2+ a11113946

Ω|nablauminusω| + a01113946

Ω|ε(ω)|1113896 1113897

(8)

Meanwhile unlike fixed weights used in [23] we adaptTGV into a trainable method taking weights a_1 and a_2into backpropagation to search for a suitable balancepoint

33 Optimization Methods Considering that the upsam-pling model we proposed (formula (8)) is convex and notsmooth we used the primal-dual scheme [24 25 27] to solvethis problem We reformulated our upsampling model as aconvex-concave saddle-point problem by introducing twodual variables m and n e transformed model of formula(8) is provided by

minu

maxv

λ2

1113944ijisinΩ

u Cij1113872 1113873minusEij1113872 11138732

+ a1lang(nablauminusω) mrang

+ a2langnablaω nrang

(9)

1 2 3 4

5 6 7 8

9 10 11 12

13 14 16

17 18

19 20

Pretreated maps

Convolutional operations Projection

17 0 0 18

0 0 0 0

0 0 0 0

19 0 0 20

Einput

Cinput15

Figure 4 Pretreatment to feature maps

Computational Intelligence and Neuroscience 5

e feasible sets of ρ and φ are defined as follows

M m Ωu⟶R2 ∣ mij

le 1 i j isin Ωu1113882 1113883

N n Ωu⟶R2 ∣ nij

le 1 i j isin Ωu1113882 1113883

(10)

e final result is computed pixel by pixel via iterativeoptimization With u0 Einput and v0 m0 and n0 0 stepsizes σm σn and τu τω are chosen For iteration ige 0 updatedvariables are as follows

mi+1 Projm mi + σma1 nabla1113957ui minus 1113957ωi( 11138571113864 1113865

ni+1 Projn ni + σna2nabla1113957ωi1113864 1113865

ui+1 1 +λ2τu1113888 1113889

minus1

ui + τu a1mi+1 +

λ2

Cinput1113888 11138891113890 1113891

ωi+1 ωi + τω a1ni+1 + a2m

i+1( 1113857

u i+1 ui+1 + θ ui+1 minus u i( 1113857

ω i+1 ωi+1 + θ ωi+1 minusω i( 1113857

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

(11)

where Projm(m) and Projn(n) denote the Euclidean pro-jectors onto the sets M and N respectively ey can becalculated by pointwise operations

Projm(m) m

max 1 |m|a1( 1113857

Projn(n) n

max 1 |n|a1( 1113857

(12)

In equation (11) θ stands for the relaxation parameter Itcan be updated iteration by iteration using preconditioning [5]Hence the TGV-based model can achieve globally optimalupsampling solutions when dealing with 2-order images

4 Evaluation

In this section we make a quantitative evaluation of theproposed TGV-based upsampling model We compared ourmodel with current upsampling methods from differentperspectives We used a PASCAL VOC2012 dataset (classsegmentation) to investigate the detailed advanced perfor-mances In the following experiments we manually set theTGV-based upsampling modelrsquos parameters a1 and a2 (inour experiments we initially set a1 005 and a2 01 bothof them can be trained with iterations)

We retrained original networks (FCN U-Net andSegNet) and their modified versions in the TGV upsamplingalgorithm on a PASCAL training set (with 1464 images)

For a reconstruction comparison on feature maps(Figures 5ndash7) we evaluated the above four methods on aPASCAL validation set (with 1449 images) We randomlyselect feature maps from a certain layer in FCN U-Net andSegNet and compared the upsampled results of using dif-ferent methods

For overall comparisons of segmentation results(Figure 8) we used a PASCAL test set (with 1456 images)For quantitative results (with 1456 images) we adoptedoverall accuracy (OA) mean accuracy (MAcc) frequency-weighted intersection over union (FWIoU) and mean in-tersection over union (MIoU) as metrics

41 Experiments

411 Reconstruction of Intermediate Feature Maps In thisstage we replaced the original upsampling layer in all threenetworks (FCN U-Net and SegNet) with our TGVupsampling method (see Figure 9 with FCN as an example)To better compare results we illustrated the performance ofTGV upsampling when dealing with the same inputs (fea-ture maps produced by previous layers) and so on replacingbilinear operations in U-Net and unpooling in SegNet

Considering that certain upsampling methods maymultiply when used in models we only selected one com-parison regardless of the size of the inputs and how manyupsampling operations appeared in the networks Featuremaps in Figures 5ndash7 are the most representative of activationoutputs in each layer in the training process

In Figures 5ndash7 (illustrating the comparisons of thefeature maps) all input maps are from some layer in theencoding stage We found that TGV upsampling performedbetter at saving the textures and edges of the feature map InFigure 5 with the same input maps from the FCN archi-tecture we observed that compared with transposed con-volution reconstruction TGV upsampling reproducedalmost all classes (represented in different colors) thatappeared in input maps While being a trainable operationthe transposed operation processes inputs following aldquopadding and extractingrdquo pipeline ere absolutely exists acertain chance that sampled maps contain some classes thatin fact are contained in input maps or a class may be filteredIt can be clearly seen that in the second and third inputsubmaps (Figure 5) the transposed operation did notcompletely reproduce all objects appearing in the input mapEven if these cases are ignored it is easy to observe that TGVupsampling is better in reconstructing texture than trans-posed operations

As illustrated in Figure 6 bilinear interpolation recoveredtoo much unnecessary information about known classes fromboth x and y directions of the object compared to theprinciple of TGV upsampling reconstruction based on thehigh-order divergence of the known pixel values e sourceof the TGV modelrsquos information was more reasonable andmore complex e reconstruction results of TGV upsam-pling were also better than those of bilinear interpolation

Lastly Figure 7 compares unpooling and TGV upsam-pling Since unpooling only records the locations of maxi-mum activations when performing pooling operations in theencoding stage it is clear that the unpooling reconstruction isnot complete as there are many blank spaces in places wheretexture should appear In general the efficiency of theunpooling operation is the worst of the three in terms ofvisualization results

6 Computational Intelligence and Neuroscience

412 Training in TGV Upsampling In this experiment wecompared overall segmentation results with those of theoriginal networks

Figure 8 illustrates examples of PASCAL VOC outputsfrom FCN (FCN-8s) U-Net SegNet and the modified TGVupsampling-based networks (denoted by FCN-TGV U-Net-

(a) (b) (c) (d)

Figure 6 Comparison between a bilinear operation and TGV upsampling for the same feature maps (in U-Net) (a) Input maps (b) TGVupsample (c) Bilinear (d) Ground truth

(a) (b) (c) (d)

Figure 5 Comparison between a transposed operation and TGV upsampling for the same feature maps (in FCN) (a) Input maps (b) TGVupsample (c) Transposed (d) Ground truth

(a) (b) (c) (d)

Figure 7 Comparison between an unpooling operation and TGV upsampling for the same feature maps (in SegNet) (a) Input maps(b) TGV upsample (c) Unpooling (d) Ground truth

Computational Intelligence and Neuroscience 7

TGV and SegNet-TGV respectively) Compared to thecoarse-feature maps of the original networks class (or ob-ject) recognitions (referring to the interval area between

objects) in our results are more precious (see SegNet andSegNet-TGV) and the situations of exceeding areas inground truth are greatly eased (see FCN and FCN-TGV) At

Input

FT

GT

S

ST

U

UT

F

Figure 8 Overall segmentation example results on PASCAL VOC (GT stands for ground truth F S and U represent results from FCNSegNet and U-Net while FT ST and UT mean three networks combined with TGV upsampling)

8 Computational Intelligence and Neuroscience

the same time they well preserved the smoothness of theboundary Lastly it can be concluded that the TGV upsamplingmethod helps models make a better predication on semantics

e quantitative results of the VOC test set of ourproposed algorithm and the competitors are presented inTable 1We observed that in terms of overall accuracy meanaccuracy frequency accuracy or MIoU all metrics wereimproved by about 14ndash23 corresponding to the visual-ization results in Figure 10

Figures 10ndash12 show the effect after applying the newupsampling method (the X and Y axes represent iterationsand losses respectively) We found that the TGVupsampling algorithm had a significant effect on reducinglosses compared to the original methods With a fixed batchsize our new method converged faster than the originalmodels

413 Loss Function in the Training Process e lossfunction of FCN FCN-TGV U-Net U-Net-TGV SegNetand SegNet-TGV on the PASCAL VOC VALIDATAE set isgiven in Figures 10ndash12

42 Analysis After concluding all four upsampling op-erations introduced in the above sections we can groupthem into two types based on their different informationsources

421 Back-Up (Including Unpooling and TransposedConvolution Operations) Many researchers think that whenreconstructing feature maps intermediate results producedin previous operations should be referred to (especiallylayers in the encoding stage) as if images are processed along

a ldquopathrdquo Having known conditions of front distances (re-ferring to corresponding layers in the encoding stage) makesit possible to better ldquopredicaterdquo the following road conditions(referring to feature map reconstructions) us whenapplying these methods operation information of previouslayers should be saved For example the unpooling oper-ation is actually seen as an index storage With saved pa-rameters feature maps can be well reconstructed

However in both transposed and unpooling operationsit is unavoidable that the processed results would retain acertain part of the intermediate results produced by previouslayers In some cases this is useful for keeping importantinformation that may be overlooked in the training processbut it can also be a disturbance by bringing along un-necessary information

422 From-Itself (Including Bilinear Interpolation andOur TGV Upsampling Operations) In contrast to theldquoback-uprdquo category the information sources bilinearinterpolation and TGV upsampling used to make up forlosses are coming from the methodsrsquo processes ratherthan results of previous operations As described inRelated Works bilinear operation interpolates from xand y directions based on known sample values e TGV

Table 1 Results of the PASCAL VOC set

Method OA MAcc FWIoU MIoUFCN-8s 9113 7868 8459 6459FCN-TGV 9233 8012 8749 6602SegNet 8686 7441 8032 5859SegNet-TGV 8746 7608 8345 6048U-Net 8713 7068 7959 6159U-Net-TGV 8914 7212 8149 6389

Input

Conv

Conv

hellip

Conv 1

Conv

Conv

hellip

Conv 2

Conv

Conv

hellip

Conv 3

Conv

Conv

hellip

Conv 4

Conv

Conv

hellip

Conv 5

Conv

Conv

hellip

Conv

sco

re_

Conv 6

Conv

tra

nsp TGV

Slice Slice

+

Conv

tra

nsp

Slice

Output

TGV

Conv

sco

re_

Slice Slice

+

Conv

tra

nsp

TGV

times(12)

times(12)

times(14)

times(14)

times(14)

times(18)

times(18)

times(18)times(116) times(116)times(132)

Figure 9 Replacing the transposed convolutional layer in the FCN with the TGV upsampling method

Computational Intelligence and Neuroscience 9

method has a more complex information source It startsfrom k-order divergence of known pixel values (for k-order images) It can better preserve boundary

information of textures and the step effect can be ef-fectively avoided raising the smoothness of the entireimage at the same time

2000e + 5

1600e + 5

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(a)

1600e + 5

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(b)

Figure 10 Loss function of SegNet (a) and SegNet-TGV (b) on the PASCAL VOC VALIDATAE set

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(a)

0000 3000k 9000k6000k

1000e + 5

8000e + 4

6000e + 4

4000e + 4

2000e + 4

000

(b)

Figure 11 Loss function of FCN (a) and FCN-TGV (b) on the PASCAL VOC VALIDATAE set

1200e + 5

1000e + 5

8000e + 4

6000e + 4

0000 3000k 9000k6000k

(a)

1200e + 5

8000e + 4

4000e + 4

0000 3000k 9000k6000k

000

(b)

Figure 12 Loss function of U-Net (a) and U-Net-TGV (b) on the PASCAL VOC VALIDATAE set

10 Computational Intelligence and Neuroscience

From the results of these experiments the numericalresults of the loss function and the smoothness of thesampled feature maps have proven that the TGV-upsam-pling method is a great improvement over current methods

5 Conclusions

In this work we introduced a TGV-upsampling methodbased on image restoration research We transformed theprocess of feature map reconstruction into a loss-optimiza-tion problem Based on the divergence of each order we usedTGV regularization which reconstructs piecewise functionsFor numerical optimization we used a primal-dual algorithmwhich can effectively perform parallel computing and result inhigh frame rates We applied our new method to threenetworks and evaluated the results via PASCAL VOC data-setsese tests proved that the TGV upsampling method cangreatly make up for lost smoothness and boundary in-formation ofmaps Compared to the original methods used innetworks (FCN U-Net and SegNet) this new method madeaverage improvements of 14ndash23 in terms of MIoU accu-racy when used in the decoding stage When observing thetraining process in the experiments we clearly found that theTGV upsampling method greatly made up for informationlosses that resulted from other operations (mainly referring tovarious convolutional layers)

e proposed algorithm is not limited to single-layermap upsampling In the future it will be extended to un-derstanding entire scenes for effectively incorporatingtemporal coherence in a consistent way On the contrarywith the proposed ldquorestoration-likerdquo method we will furtherconcentrate on how to better understand latent semanticsunder unsupervised settings

Data Availability

e codes used to support the findings of this study have notbeen made available because the proposed method would laterbe applied in medical projects that involve personal privacy

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is research was supported by the Research Program toSolve Social Issues of the National Research Foundation ofKorea (NRF) funded by theMinistry of Science and ICT (noNRF2017M3C8A8091768)

References

[1] L-C Chen G Papandreou I Kokkinos K Murphy andA L Yuille ldquoDeeplab semantic image segmentation withdeep convolutional nets atrous convolution and fully con-nected crfsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 40 no 4 pp 834ndash848 2018

[2] X Li Z Liu P Luo C C Loy and X Tang ldquoNot all pixels areequal difficulty-aware semantic segmentation via deep layercascaderdquo in Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition Salt Lake City UT USA July2017

[3] H Zhao J Shi X Qi X Wang and J Jia ldquoPyramid sceneparsing networkrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition Salt Lake City UTUSA July 2017

[4] F Knoll K Bredies T Pock and R Stollberger ldquoSecondorder total generalized variation (TGV) for MRIrdquo MagneticResonance in Medicine vol 65 no 2 pp 480ndash491 2011

[5] A Chambolle and T Pock ldquoA first-order primal-dual algo-rithm for convex problems with applications to imagingrdquoJournal of Mathematical Imaging and Vision vol 40 no 1pp 120ndash145 2011

[6] M D Zeiler G W Taylor and R Fergus ldquoAdaptivedeconvolutional networks for mid and high level featurelearningrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV) vol 1 no 2 2011

[7] A Krizhevsky I Sutskever and G E Hinton ldquoImagenetclassification with deep convolutional neural networksrdquo inProceedings of the Advances in Neural Information ProcessingSystems Lake Tahoe NV USA December 2012

[8] O Ronneberger P Fischer and T Brox ldquoU-net convolu-tional networks for biomedical image segmentationrdquo inProceedings of the International Conference on Medical ImageComputing and Computer-Assisted Intervention pp 234ndash241Springer Munich Germany October 2015

[9] J Long E Shelhamer and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo in Proceedings of theIEEE conference on computer vision and pattern recognitionBoston MA USA June 2015

[10] V Badrinarayanan A Kendall and R Cipolla ldquoSegnet adeep convolutional encoder-decoder architecture for im-age segmentationrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 39 no 12 pp 2481ndash24952017

[11] H Noh S Hong and B Han ldquoLearning deconvolutionnetwork for semantic segmentationrdquo in Proceedings of theIEEE International Conference on Computer Vision SantiagoChile December 2015

[12] D R Martin C C Fowlkes and J Malik ldquoLearning to detectnatural image boundaries using local brightness color andtexture cuesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 26 no 5 pp 530ndash549 2004

[13] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[14] V Dumoulin and F Visin ldquoA guide to convolution arithmeticfor deep learningrdquo 2016 httpsarxivorgabs160307285

[15] M D Zeiler et al Deconvolutional Networks pp 2528ndash25352010

[16] C R Vogel and M E Oman ldquoFast total variation-basedimage reconstructionrdquo in Proceedings of the 1995 ASMEDesign Engineering Conferences vol 3 No part C BostonMA USA Boston MA USA September 1995

[17] L I Rudin S Osher and E Fatemi ldquoNonlinear total variationbased noise removal algorithmsrdquo Physica D NonlinearPhenomena vol 60 no 1ndash4 pp 259ndash268 1992

[18] M D Zeiler and F Rob ldquoVisualizing and understanding con-volutional networksrdquo in Proceedings of the European Conferenceon Computer Vision Springer Zurich Switzerland September2014

[19] V Papyan Y Romano and M Elad ldquoConvolutional neuralnetworks analyzed via convolutional sparse codingrdquo 2016httpsarxivorgabs160708194

Computational Intelligence and Neuroscience 11

[20] L-B Qiao B-F Zhang J-S Su and X-C Lu ldquoA systematicreview of structured sparse learningrdquo Frontiers of In-formation Technology amp Electronic Engineering vol 18 no 4pp 445ndash463 2017

[21] P Wang P Chen Y Yuan et al ldquoUnderstanding convolutionfor semantic segmentationrdquo in Proceedings of the 2018 IEEEWinter Conference on Applications of Computer Vision(WACV) IEEE Lake Tahoe NV USA March 2018

[22] J T Springenberg et al ldquoStriving for simplicity the allconvolutional netrdquo 2014 httpsarxivorgabs14126806

[23] K Bredies K Kunisch and T Pock ldquoTotal generalizedvariationrdquo SIAM Journal on Imaging Sciences vol 3 no 3pp 492ndash526 2010

[24] T Pock and A Chambolle ldquoDiagonal preconditioning forfirst order primal-dual algorithms in convex optimizationrdquo inProceedings of the 2011 International Conference on ComputerVision IEEE Barcelona Spain November 2011

[25] E Esser X Zhang and T F Chan ldquoA general framework for aclass of first order primal-dual algorithms for convex opti-mization in imaging sciencerdquo SIAM Journal on ImagingSciences vol 3 no 4 pp 1015ndash1046 2010

[26] D Ferstl C Reinbacher R Ranftl M Ruether andH Bischof ldquoImage guided depth upsampling using aniso-tropic total generalized variationrdquo in Proceedings of the IEEEInternational Conference on Computer Vision SydneyAustralia December 2013

[27] W He H Zhang L Zhang and H Shen ldquoTotal-variation-regularized low-rank matrix factorization for hyperspectralimage restorationrdquo IEEE Transactions on Geoscience andRemote Sensing vol 54 no 1 pp 178ndash188 2016

12 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 6: TGVUpsampling:AMaking-UpOperationfor …downloads.hindawi.com/journals/cin/2019/8527819.pdf1.Introduction Compared to traditional classification tasks, semantic seg- ... We made a

e feasible sets of ρ and φ are defined as follows

M m Ωu⟶R2 ∣ mij

le 1 i j isin Ωu1113882 1113883

N n Ωu⟶R2 ∣ nij

le 1 i j isin Ωu1113882 1113883

(10)

e final result is computed pixel by pixel via iterativeoptimization With u0 Einput and v0 m0 and n0 0 stepsizes σm σn and τu τω are chosen For iteration ige 0 updatedvariables are as follows

mi+1 Projm mi + σma1 nabla1113957ui minus 1113957ωi( 11138571113864 1113865

ni+1 Projn ni + σna2nabla1113957ωi1113864 1113865

ui+1 1 +λ2τu1113888 1113889

minus1

ui + τu a1mi+1 +

λ2

Cinput1113888 11138891113890 1113891

ωi+1 ωi + τω a1ni+1 + a2m

i+1( 1113857

u i+1 ui+1 + θ ui+1 minus u i( 1113857

ω i+1 ωi+1 + θ ωi+1 minusω i( 1113857

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

(11)

where Projm(m) and Projn(n) denote the Euclidean pro-jectors onto the sets M and N respectively ey can becalculated by pointwise operations

Projm(m) m

max 1 |m|a1( 1113857

Projn(n) n

max 1 |n|a1( 1113857

(12)

In equation (11) θ stands for the relaxation parameter Itcan be updated iteration by iteration using preconditioning [5]Hence the TGV-based model can achieve globally optimalupsampling solutions when dealing with 2-order images

4 Evaluation

In this section we make a quantitative evaluation of theproposed TGV-based upsampling model We compared ourmodel with current upsampling methods from differentperspectives We used a PASCAL VOC2012 dataset (classsegmentation) to investigate the detailed advanced perfor-mances In the following experiments we manually set theTGV-based upsampling modelrsquos parameters a1 and a2 (inour experiments we initially set a1 005 and a2 01 bothof them can be trained with iterations)

We retrained original networks (FCN U-Net andSegNet) and their modified versions in the TGV upsamplingalgorithm on a PASCAL training set (with 1464 images)

For a reconstruction comparison on feature maps(Figures 5ndash7) we evaluated the above four methods on aPASCAL validation set (with 1449 images) We randomlyselect feature maps from a certain layer in FCN U-Net andSegNet and compared the upsampled results of using dif-ferent methods

For overall comparisons of segmentation results(Figure 8) we used a PASCAL test set (with 1456 images)For quantitative results (with 1456 images) we adoptedoverall accuracy (OA) mean accuracy (MAcc) frequency-weighted intersection over union (FWIoU) and mean in-tersection over union (MIoU) as metrics

41 Experiments

411 Reconstruction of Intermediate Feature Maps In thisstage we replaced the original upsampling layer in all threenetworks (FCN U-Net and SegNet) with our TGVupsampling method (see Figure 9 with FCN as an example)To better compare results we illustrated the performance ofTGV upsampling when dealing with the same inputs (fea-ture maps produced by previous layers) and so on replacingbilinear operations in U-Net and unpooling in SegNet

Considering that certain upsampling methods maymultiply when used in models we only selected one com-parison regardless of the size of the inputs and how manyupsampling operations appeared in the networks Featuremaps in Figures 5ndash7 are the most representative of activationoutputs in each layer in the training process

In Figures 5ndash7 (illustrating the comparisons of thefeature maps) all input maps are from some layer in theencoding stage We found that TGV upsampling performedbetter at saving the textures and edges of the feature map InFigure 5 with the same input maps from the FCN archi-tecture we observed that compared with transposed con-volution reconstruction TGV upsampling reproducedalmost all classes (represented in different colors) thatappeared in input maps While being a trainable operationthe transposed operation processes inputs following aldquopadding and extractingrdquo pipeline ere absolutely exists acertain chance that sampled maps contain some classes thatin fact are contained in input maps or a class may be filteredIt can be clearly seen that in the second and third inputsubmaps (Figure 5) the transposed operation did notcompletely reproduce all objects appearing in the input mapEven if these cases are ignored it is easy to observe that TGVupsampling is better in reconstructing texture than trans-posed operations

As illustrated in Figure 6 bilinear interpolation recoveredtoo much unnecessary information about known classes fromboth x and y directions of the object compared to theprinciple of TGV upsampling reconstruction based on thehigh-order divergence of the known pixel values e sourceof the TGV modelrsquos information was more reasonable andmore complex e reconstruction results of TGV upsam-pling were also better than those of bilinear interpolation

Lastly Figure 7 compares unpooling and TGV upsam-pling Since unpooling only records the locations of maxi-mum activations when performing pooling operations in theencoding stage it is clear that the unpooling reconstruction isnot complete as there are many blank spaces in places wheretexture should appear In general the efficiency of theunpooling operation is the worst of the three in terms ofvisualization results

6 Computational Intelligence and Neuroscience

412 Training in TGV Upsampling In this experiment wecompared overall segmentation results with those of theoriginal networks

Figure 8 illustrates examples of PASCAL VOC outputsfrom FCN (FCN-8s) U-Net SegNet and the modified TGVupsampling-based networks (denoted by FCN-TGV U-Net-

(a) (b) (c) (d)

Figure 6 Comparison between a bilinear operation and TGV upsampling for the same feature maps (in U-Net) (a) Input maps (b) TGVupsample (c) Bilinear (d) Ground truth

(a) (b) (c) (d)

Figure 5 Comparison between a transposed operation and TGV upsampling for the same feature maps (in FCN) (a) Input maps (b) TGVupsample (c) Transposed (d) Ground truth

(a) (b) (c) (d)

Figure 7 Comparison between an unpooling operation and TGV upsampling for the same feature maps (in SegNet) (a) Input maps(b) TGV upsample (c) Unpooling (d) Ground truth

Computational Intelligence and Neuroscience 7

TGV and SegNet-TGV respectively) Compared to thecoarse-feature maps of the original networks class (or ob-ject) recognitions (referring to the interval area between

objects) in our results are more precious (see SegNet andSegNet-TGV) and the situations of exceeding areas inground truth are greatly eased (see FCN and FCN-TGV) At

Input

FT

GT

S

ST

U

UT

F

Figure 8 Overall segmentation example results on PASCAL VOC (GT stands for ground truth F S and U represent results from FCNSegNet and U-Net while FT ST and UT mean three networks combined with TGV upsampling)

8 Computational Intelligence and Neuroscience

the same time they well preserved the smoothness of theboundary Lastly it can be concluded that the TGV upsamplingmethod helps models make a better predication on semantics

e quantitative results of the VOC test set of ourproposed algorithm and the competitors are presented inTable 1We observed that in terms of overall accuracy meanaccuracy frequency accuracy or MIoU all metrics wereimproved by about 14ndash23 corresponding to the visual-ization results in Figure 10

Figures 10ndash12 show the effect after applying the newupsampling method (the X and Y axes represent iterationsand losses respectively) We found that the TGVupsampling algorithm had a significant effect on reducinglosses compared to the original methods With a fixed batchsize our new method converged faster than the originalmodels

413 Loss Function in the Training Process e lossfunction of FCN FCN-TGV U-Net U-Net-TGV SegNetand SegNet-TGV on the PASCAL VOC VALIDATAE set isgiven in Figures 10ndash12

42 Analysis After concluding all four upsampling op-erations introduced in the above sections we can groupthem into two types based on their different informationsources

421 Back-Up (Including Unpooling and TransposedConvolution Operations) Many researchers think that whenreconstructing feature maps intermediate results producedin previous operations should be referred to (especiallylayers in the encoding stage) as if images are processed along

a ldquopathrdquo Having known conditions of front distances (re-ferring to corresponding layers in the encoding stage) makesit possible to better ldquopredicaterdquo the following road conditions(referring to feature map reconstructions) us whenapplying these methods operation information of previouslayers should be saved For example the unpooling oper-ation is actually seen as an index storage With saved pa-rameters feature maps can be well reconstructed

However in both transposed and unpooling operationsit is unavoidable that the processed results would retain acertain part of the intermediate results produced by previouslayers In some cases this is useful for keeping importantinformation that may be overlooked in the training processbut it can also be a disturbance by bringing along un-necessary information

422 From-Itself (Including Bilinear Interpolation andOur TGV Upsampling Operations) In contrast to theldquoback-uprdquo category the information sources bilinearinterpolation and TGV upsampling used to make up forlosses are coming from the methodsrsquo processes ratherthan results of previous operations As described inRelated Works bilinear operation interpolates from xand y directions based on known sample values e TGV

Table 1 Results of the PASCAL VOC set

Method OA MAcc FWIoU MIoUFCN-8s 9113 7868 8459 6459FCN-TGV 9233 8012 8749 6602SegNet 8686 7441 8032 5859SegNet-TGV 8746 7608 8345 6048U-Net 8713 7068 7959 6159U-Net-TGV 8914 7212 8149 6389

Input

Conv

Conv

hellip

Conv 1

Conv

Conv

hellip

Conv 2

Conv

Conv

hellip

Conv 3

Conv

Conv

hellip

Conv 4

Conv

Conv

hellip

Conv 5

Conv

Conv

hellip

Conv

sco

re_

Conv 6

Conv

tra

nsp TGV

Slice Slice

+

Conv

tra

nsp

Slice

Output

TGV

Conv

sco

re_

Slice Slice

+

Conv

tra

nsp

TGV

times(12)

times(12)

times(14)

times(14)

times(14)

times(18)

times(18)

times(18)times(116) times(116)times(132)

Figure 9 Replacing the transposed convolutional layer in the FCN with the TGV upsampling method

Computational Intelligence and Neuroscience 9

method has a more complex information source It startsfrom k-order divergence of known pixel values (for k-order images) It can better preserve boundary

information of textures and the step effect can be ef-fectively avoided raising the smoothness of the entireimage at the same time

2000e + 5

1600e + 5

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(a)

1600e + 5

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(b)

Figure 10 Loss function of SegNet (a) and SegNet-TGV (b) on the PASCAL VOC VALIDATAE set

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(a)

0000 3000k 9000k6000k

1000e + 5

8000e + 4

6000e + 4

4000e + 4

2000e + 4

000

(b)

Figure 11 Loss function of FCN (a) and FCN-TGV (b) on the PASCAL VOC VALIDATAE set

1200e + 5

1000e + 5

8000e + 4

6000e + 4

0000 3000k 9000k6000k

(a)

1200e + 5

8000e + 4

4000e + 4

0000 3000k 9000k6000k

000

(b)

Figure 12 Loss function of U-Net (a) and U-Net-TGV (b) on the PASCAL VOC VALIDATAE set

10 Computational Intelligence and Neuroscience

From the results of these experiments the numericalresults of the loss function and the smoothness of thesampled feature maps have proven that the TGV-upsam-pling method is a great improvement over current methods

5 Conclusions

In this work we introduced a TGV-upsampling methodbased on image restoration research We transformed theprocess of feature map reconstruction into a loss-optimiza-tion problem Based on the divergence of each order we usedTGV regularization which reconstructs piecewise functionsFor numerical optimization we used a primal-dual algorithmwhich can effectively perform parallel computing and result inhigh frame rates We applied our new method to threenetworks and evaluated the results via PASCAL VOC data-setsese tests proved that the TGV upsampling method cangreatly make up for lost smoothness and boundary in-formation ofmaps Compared to the original methods used innetworks (FCN U-Net and SegNet) this new method madeaverage improvements of 14ndash23 in terms of MIoU accu-racy when used in the decoding stage When observing thetraining process in the experiments we clearly found that theTGV upsampling method greatly made up for informationlosses that resulted from other operations (mainly referring tovarious convolutional layers)

e proposed algorithm is not limited to single-layermap upsampling In the future it will be extended to un-derstanding entire scenes for effectively incorporatingtemporal coherence in a consistent way On the contrarywith the proposed ldquorestoration-likerdquo method we will furtherconcentrate on how to better understand latent semanticsunder unsupervised settings

Data Availability

e codes used to support the findings of this study have notbeen made available because the proposed method would laterbe applied in medical projects that involve personal privacy

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is research was supported by the Research Program toSolve Social Issues of the National Research Foundation ofKorea (NRF) funded by theMinistry of Science and ICT (noNRF2017M3C8A8091768)

References

[1] L-C Chen G Papandreou I Kokkinos K Murphy andA L Yuille ldquoDeeplab semantic image segmentation withdeep convolutional nets atrous convolution and fully con-nected crfsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 40 no 4 pp 834ndash848 2018

[2] X Li Z Liu P Luo C C Loy and X Tang ldquoNot all pixels areequal difficulty-aware semantic segmentation via deep layercascaderdquo in Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition Salt Lake City UT USA July2017

[3] H Zhao J Shi X Qi X Wang and J Jia ldquoPyramid sceneparsing networkrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition Salt Lake City UTUSA July 2017

[4] F Knoll K Bredies T Pock and R Stollberger ldquoSecondorder total generalized variation (TGV) for MRIrdquo MagneticResonance in Medicine vol 65 no 2 pp 480ndash491 2011

[5] A Chambolle and T Pock ldquoA first-order primal-dual algo-rithm for convex problems with applications to imagingrdquoJournal of Mathematical Imaging and Vision vol 40 no 1pp 120ndash145 2011

[6] M D Zeiler G W Taylor and R Fergus ldquoAdaptivedeconvolutional networks for mid and high level featurelearningrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV) vol 1 no 2 2011

[7] A Krizhevsky I Sutskever and G E Hinton ldquoImagenetclassification with deep convolutional neural networksrdquo inProceedings of the Advances in Neural Information ProcessingSystems Lake Tahoe NV USA December 2012

[8] O Ronneberger P Fischer and T Brox ldquoU-net convolu-tional networks for biomedical image segmentationrdquo inProceedings of the International Conference on Medical ImageComputing and Computer-Assisted Intervention pp 234ndash241Springer Munich Germany October 2015

[9] J Long E Shelhamer and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo in Proceedings of theIEEE conference on computer vision and pattern recognitionBoston MA USA June 2015

[10] V Badrinarayanan A Kendall and R Cipolla ldquoSegnet adeep convolutional encoder-decoder architecture for im-age segmentationrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 39 no 12 pp 2481ndash24952017

[11] H Noh S Hong and B Han ldquoLearning deconvolutionnetwork for semantic segmentationrdquo in Proceedings of theIEEE International Conference on Computer Vision SantiagoChile December 2015

[12] D R Martin C C Fowlkes and J Malik ldquoLearning to detectnatural image boundaries using local brightness color andtexture cuesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 26 no 5 pp 530ndash549 2004

[13] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[14] V Dumoulin and F Visin ldquoA guide to convolution arithmeticfor deep learningrdquo 2016 httpsarxivorgabs160307285

[15] M D Zeiler et al Deconvolutional Networks pp 2528ndash25352010

[16] C R Vogel and M E Oman ldquoFast total variation-basedimage reconstructionrdquo in Proceedings of the 1995 ASMEDesign Engineering Conferences vol 3 No part C BostonMA USA Boston MA USA September 1995

[17] L I Rudin S Osher and E Fatemi ldquoNonlinear total variationbased noise removal algorithmsrdquo Physica D NonlinearPhenomena vol 60 no 1ndash4 pp 259ndash268 1992

[18] M D Zeiler and F Rob ldquoVisualizing and understanding con-volutional networksrdquo in Proceedings of the European Conferenceon Computer Vision Springer Zurich Switzerland September2014

[19] V Papyan Y Romano and M Elad ldquoConvolutional neuralnetworks analyzed via convolutional sparse codingrdquo 2016httpsarxivorgabs160708194

Computational Intelligence and Neuroscience 11

[20] L-B Qiao B-F Zhang J-S Su and X-C Lu ldquoA systematicreview of structured sparse learningrdquo Frontiers of In-formation Technology amp Electronic Engineering vol 18 no 4pp 445ndash463 2017

[21] P Wang P Chen Y Yuan et al ldquoUnderstanding convolutionfor semantic segmentationrdquo in Proceedings of the 2018 IEEEWinter Conference on Applications of Computer Vision(WACV) IEEE Lake Tahoe NV USA March 2018

[22] J T Springenberg et al ldquoStriving for simplicity the allconvolutional netrdquo 2014 httpsarxivorgabs14126806

[23] K Bredies K Kunisch and T Pock ldquoTotal generalizedvariationrdquo SIAM Journal on Imaging Sciences vol 3 no 3pp 492ndash526 2010

[24] T Pock and A Chambolle ldquoDiagonal preconditioning forfirst order primal-dual algorithms in convex optimizationrdquo inProceedings of the 2011 International Conference on ComputerVision IEEE Barcelona Spain November 2011

[25] E Esser X Zhang and T F Chan ldquoA general framework for aclass of first order primal-dual algorithms for convex opti-mization in imaging sciencerdquo SIAM Journal on ImagingSciences vol 3 no 4 pp 1015ndash1046 2010

[26] D Ferstl C Reinbacher R Ranftl M Ruether andH Bischof ldquoImage guided depth upsampling using aniso-tropic total generalized variationrdquo in Proceedings of the IEEEInternational Conference on Computer Vision SydneyAustralia December 2013

[27] W He H Zhang L Zhang and H Shen ldquoTotal-variation-regularized low-rank matrix factorization for hyperspectralimage restorationrdquo IEEE Transactions on Geoscience andRemote Sensing vol 54 no 1 pp 178ndash188 2016

12 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 7: TGVUpsampling:AMaking-UpOperationfor …downloads.hindawi.com/journals/cin/2019/8527819.pdf1.Introduction Compared to traditional classification tasks, semantic seg- ... We made a

412 Training in TGV Upsampling In this experiment wecompared overall segmentation results with those of theoriginal networks

Figure 8 illustrates examples of PASCAL VOC outputsfrom FCN (FCN-8s) U-Net SegNet and the modified TGVupsampling-based networks (denoted by FCN-TGV U-Net-

(a) (b) (c) (d)

Figure 6 Comparison between a bilinear operation and TGV upsampling for the same feature maps (in U-Net) (a) Input maps (b) TGVupsample (c) Bilinear (d) Ground truth

(a) (b) (c) (d)

Figure 5 Comparison between a transposed operation and TGV upsampling for the same feature maps (in FCN) (a) Input maps (b) TGVupsample (c) Transposed (d) Ground truth

(a) (b) (c) (d)

Figure 7 Comparison between an unpooling operation and TGV upsampling for the same feature maps (in SegNet) (a) Input maps(b) TGV upsample (c) Unpooling (d) Ground truth

Computational Intelligence and Neuroscience 7

TGV and SegNet-TGV respectively) Compared to thecoarse-feature maps of the original networks class (or ob-ject) recognitions (referring to the interval area between

objects) in our results are more precious (see SegNet andSegNet-TGV) and the situations of exceeding areas inground truth are greatly eased (see FCN and FCN-TGV) At

Input

FT

GT

S

ST

U

UT

F

Figure 8 Overall segmentation example results on PASCAL VOC (GT stands for ground truth F S and U represent results from FCNSegNet and U-Net while FT ST and UT mean three networks combined with TGV upsampling)

8 Computational Intelligence and Neuroscience

the same time they well preserved the smoothness of theboundary Lastly it can be concluded that the TGV upsamplingmethod helps models make a better predication on semantics

e quantitative results of the VOC test set of ourproposed algorithm and the competitors are presented inTable 1We observed that in terms of overall accuracy meanaccuracy frequency accuracy or MIoU all metrics wereimproved by about 14ndash23 corresponding to the visual-ization results in Figure 10

Figures 10ndash12 show the effect after applying the newupsampling method (the X and Y axes represent iterationsand losses respectively) We found that the TGVupsampling algorithm had a significant effect on reducinglosses compared to the original methods With a fixed batchsize our new method converged faster than the originalmodels

413 Loss Function in the Training Process e lossfunction of FCN FCN-TGV U-Net U-Net-TGV SegNetand SegNet-TGV on the PASCAL VOC VALIDATAE set isgiven in Figures 10ndash12

42 Analysis After concluding all four upsampling op-erations introduced in the above sections we can groupthem into two types based on their different informationsources

421 Back-Up (Including Unpooling and TransposedConvolution Operations) Many researchers think that whenreconstructing feature maps intermediate results producedin previous operations should be referred to (especiallylayers in the encoding stage) as if images are processed along

a ldquopathrdquo Having known conditions of front distances (re-ferring to corresponding layers in the encoding stage) makesit possible to better ldquopredicaterdquo the following road conditions(referring to feature map reconstructions) us whenapplying these methods operation information of previouslayers should be saved For example the unpooling oper-ation is actually seen as an index storage With saved pa-rameters feature maps can be well reconstructed

However in both transposed and unpooling operationsit is unavoidable that the processed results would retain acertain part of the intermediate results produced by previouslayers In some cases this is useful for keeping importantinformation that may be overlooked in the training processbut it can also be a disturbance by bringing along un-necessary information

422 From-Itself (Including Bilinear Interpolation andOur TGV Upsampling Operations) In contrast to theldquoback-uprdquo category the information sources bilinearinterpolation and TGV upsampling used to make up forlosses are coming from the methodsrsquo processes ratherthan results of previous operations As described inRelated Works bilinear operation interpolates from xand y directions based on known sample values e TGV

Table 1 Results of the PASCAL VOC set

Method OA MAcc FWIoU MIoUFCN-8s 9113 7868 8459 6459FCN-TGV 9233 8012 8749 6602SegNet 8686 7441 8032 5859SegNet-TGV 8746 7608 8345 6048U-Net 8713 7068 7959 6159U-Net-TGV 8914 7212 8149 6389

Input

Conv

Conv

hellip

Conv 1

Conv

Conv

hellip

Conv 2

Conv

Conv

hellip

Conv 3

Conv

Conv

hellip

Conv 4

Conv

Conv

hellip

Conv 5

Conv

Conv

hellip

Conv

sco

re_

Conv 6

Conv

tra

nsp TGV

Slice Slice

+

Conv

tra

nsp

Slice

Output

TGV

Conv

sco

re_

Slice Slice

+

Conv

tra

nsp

TGV

times(12)

times(12)

times(14)

times(14)

times(14)

times(18)

times(18)

times(18)times(116) times(116)times(132)

Figure 9 Replacing the transposed convolutional layer in the FCN with the TGV upsampling method

Computational Intelligence and Neuroscience 9

method has a more complex information source It startsfrom k-order divergence of known pixel values (for k-order images) It can better preserve boundary

information of textures and the step effect can be ef-fectively avoided raising the smoothness of the entireimage at the same time

2000e + 5

1600e + 5

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(a)

1600e + 5

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(b)

Figure 10 Loss function of SegNet (a) and SegNet-TGV (b) on the PASCAL VOC VALIDATAE set

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(a)

0000 3000k 9000k6000k

1000e + 5

8000e + 4

6000e + 4

4000e + 4

2000e + 4

000

(b)

Figure 11 Loss function of FCN (a) and FCN-TGV (b) on the PASCAL VOC VALIDATAE set

1200e + 5

1000e + 5

8000e + 4

6000e + 4

0000 3000k 9000k6000k

(a)

1200e + 5

8000e + 4

4000e + 4

0000 3000k 9000k6000k

000

(b)

Figure 12 Loss function of U-Net (a) and U-Net-TGV (b) on the PASCAL VOC VALIDATAE set

10 Computational Intelligence and Neuroscience

From the results of these experiments the numericalresults of the loss function and the smoothness of thesampled feature maps have proven that the TGV-upsam-pling method is a great improvement over current methods

5 Conclusions

In this work we introduced a TGV-upsampling methodbased on image restoration research We transformed theprocess of feature map reconstruction into a loss-optimiza-tion problem Based on the divergence of each order we usedTGV regularization which reconstructs piecewise functionsFor numerical optimization we used a primal-dual algorithmwhich can effectively perform parallel computing and result inhigh frame rates We applied our new method to threenetworks and evaluated the results via PASCAL VOC data-setsese tests proved that the TGV upsampling method cangreatly make up for lost smoothness and boundary in-formation ofmaps Compared to the original methods used innetworks (FCN U-Net and SegNet) this new method madeaverage improvements of 14ndash23 in terms of MIoU accu-racy when used in the decoding stage When observing thetraining process in the experiments we clearly found that theTGV upsampling method greatly made up for informationlosses that resulted from other operations (mainly referring tovarious convolutional layers)

e proposed algorithm is not limited to single-layermap upsampling In the future it will be extended to un-derstanding entire scenes for effectively incorporatingtemporal coherence in a consistent way On the contrarywith the proposed ldquorestoration-likerdquo method we will furtherconcentrate on how to better understand latent semanticsunder unsupervised settings

Data Availability

e codes used to support the findings of this study have notbeen made available because the proposed method would laterbe applied in medical projects that involve personal privacy

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is research was supported by the Research Program toSolve Social Issues of the National Research Foundation ofKorea (NRF) funded by theMinistry of Science and ICT (noNRF2017M3C8A8091768)

References

[1] L-C Chen G Papandreou I Kokkinos K Murphy andA L Yuille ldquoDeeplab semantic image segmentation withdeep convolutional nets atrous convolution and fully con-nected crfsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 40 no 4 pp 834ndash848 2018

[2] X Li Z Liu P Luo C C Loy and X Tang ldquoNot all pixels areequal difficulty-aware semantic segmentation via deep layercascaderdquo in Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition Salt Lake City UT USA July2017

[3] H Zhao J Shi X Qi X Wang and J Jia ldquoPyramid sceneparsing networkrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition Salt Lake City UTUSA July 2017

[4] F Knoll K Bredies T Pock and R Stollberger ldquoSecondorder total generalized variation (TGV) for MRIrdquo MagneticResonance in Medicine vol 65 no 2 pp 480ndash491 2011

[5] A Chambolle and T Pock ldquoA first-order primal-dual algo-rithm for convex problems with applications to imagingrdquoJournal of Mathematical Imaging and Vision vol 40 no 1pp 120ndash145 2011

[6] M D Zeiler G W Taylor and R Fergus ldquoAdaptivedeconvolutional networks for mid and high level featurelearningrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV) vol 1 no 2 2011

[7] A Krizhevsky I Sutskever and G E Hinton ldquoImagenetclassification with deep convolutional neural networksrdquo inProceedings of the Advances in Neural Information ProcessingSystems Lake Tahoe NV USA December 2012

[8] O Ronneberger P Fischer and T Brox ldquoU-net convolu-tional networks for biomedical image segmentationrdquo inProceedings of the International Conference on Medical ImageComputing and Computer-Assisted Intervention pp 234ndash241Springer Munich Germany October 2015

[9] J Long E Shelhamer and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo in Proceedings of theIEEE conference on computer vision and pattern recognitionBoston MA USA June 2015

[10] V Badrinarayanan A Kendall and R Cipolla ldquoSegnet adeep convolutional encoder-decoder architecture for im-age segmentationrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 39 no 12 pp 2481ndash24952017

[11] H Noh S Hong and B Han ldquoLearning deconvolutionnetwork for semantic segmentationrdquo in Proceedings of theIEEE International Conference on Computer Vision SantiagoChile December 2015

[12] D R Martin C C Fowlkes and J Malik ldquoLearning to detectnatural image boundaries using local brightness color andtexture cuesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 26 no 5 pp 530ndash549 2004

[13] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[14] V Dumoulin and F Visin ldquoA guide to convolution arithmeticfor deep learningrdquo 2016 httpsarxivorgabs160307285

[15] M D Zeiler et al Deconvolutional Networks pp 2528ndash25352010

[16] C R Vogel and M E Oman ldquoFast total variation-basedimage reconstructionrdquo in Proceedings of the 1995 ASMEDesign Engineering Conferences vol 3 No part C BostonMA USA Boston MA USA September 1995

[17] L I Rudin S Osher and E Fatemi ldquoNonlinear total variationbased noise removal algorithmsrdquo Physica D NonlinearPhenomena vol 60 no 1ndash4 pp 259ndash268 1992

[18] M D Zeiler and F Rob ldquoVisualizing and understanding con-volutional networksrdquo in Proceedings of the European Conferenceon Computer Vision Springer Zurich Switzerland September2014

[19] V Papyan Y Romano and M Elad ldquoConvolutional neuralnetworks analyzed via convolutional sparse codingrdquo 2016httpsarxivorgabs160708194

Computational Intelligence and Neuroscience 11

[20] L-B Qiao B-F Zhang J-S Su and X-C Lu ldquoA systematicreview of structured sparse learningrdquo Frontiers of In-formation Technology amp Electronic Engineering vol 18 no 4pp 445ndash463 2017

[21] P Wang P Chen Y Yuan et al ldquoUnderstanding convolutionfor semantic segmentationrdquo in Proceedings of the 2018 IEEEWinter Conference on Applications of Computer Vision(WACV) IEEE Lake Tahoe NV USA March 2018

[22] J T Springenberg et al ldquoStriving for simplicity the allconvolutional netrdquo 2014 httpsarxivorgabs14126806

[23] K Bredies K Kunisch and T Pock ldquoTotal generalizedvariationrdquo SIAM Journal on Imaging Sciences vol 3 no 3pp 492ndash526 2010

[24] T Pock and A Chambolle ldquoDiagonal preconditioning forfirst order primal-dual algorithms in convex optimizationrdquo inProceedings of the 2011 International Conference on ComputerVision IEEE Barcelona Spain November 2011

[25] E Esser X Zhang and T F Chan ldquoA general framework for aclass of first order primal-dual algorithms for convex opti-mization in imaging sciencerdquo SIAM Journal on ImagingSciences vol 3 no 4 pp 1015ndash1046 2010

[26] D Ferstl C Reinbacher R Ranftl M Ruether andH Bischof ldquoImage guided depth upsampling using aniso-tropic total generalized variationrdquo in Proceedings of the IEEEInternational Conference on Computer Vision SydneyAustralia December 2013

[27] W He H Zhang L Zhang and H Shen ldquoTotal-variation-regularized low-rank matrix factorization for hyperspectralimage restorationrdquo IEEE Transactions on Geoscience andRemote Sensing vol 54 no 1 pp 178ndash188 2016

12 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 8: TGVUpsampling:AMaking-UpOperationfor …downloads.hindawi.com/journals/cin/2019/8527819.pdf1.Introduction Compared to traditional classification tasks, semantic seg- ... We made a

TGV and SegNet-TGV respectively) Compared to thecoarse-feature maps of the original networks class (or ob-ject) recognitions (referring to the interval area between

objects) in our results are more precious (see SegNet andSegNet-TGV) and the situations of exceeding areas inground truth are greatly eased (see FCN and FCN-TGV) At

Input

FT

GT

S

ST

U

UT

F

Figure 8 Overall segmentation example results on PASCAL VOC (GT stands for ground truth F S and U represent results from FCNSegNet and U-Net while FT ST and UT mean three networks combined with TGV upsampling)

8 Computational Intelligence and Neuroscience

the same time they well preserved the smoothness of theboundary Lastly it can be concluded that the TGV upsamplingmethod helps models make a better predication on semantics

e quantitative results of the VOC test set of ourproposed algorithm and the competitors are presented inTable 1We observed that in terms of overall accuracy meanaccuracy frequency accuracy or MIoU all metrics wereimproved by about 14ndash23 corresponding to the visual-ization results in Figure 10

Figures 10ndash12 show the effect after applying the newupsampling method (the X and Y axes represent iterationsand losses respectively) We found that the TGVupsampling algorithm had a significant effect on reducinglosses compared to the original methods With a fixed batchsize our new method converged faster than the originalmodels

413 Loss Function in the Training Process e lossfunction of FCN FCN-TGV U-Net U-Net-TGV SegNetand SegNet-TGV on the PASCAL VOC VALIDATAE set isgiven in Figures 10ndash12

42 Analysis After concluding all four upsampling op-erations introduced in the above sections we can groupthem into two types based on their different informationsources

421 Back-Up (Including Unpooling and TransposedConvolution Operations) Many researchers think that whenreconstructing feature maps intermediate results producedin previous operations should be referred to (especiallylayers in the encoding stage) as if images are processed along

a ldquopathrdquo Having known conditions of front distances (re-ferring to corresponding layers in the encoding stage) makesit possible to better ldquopredicaterdquo the following road conditions(referring to feature map reconstructions) us whenapplying these methods operation information of previouslayers should be saved For example the unpooling oper-ation is actually seen as an index storage With saved pa-rameters feature maps can be well reconstructed

However in both transposed and unpooling operationsit is unavoidable that the processed results would retain acertain part of the intermediate results produced by previouslayers In some cases this is useful for keeping importantinformation that may be overlooked in the training processbut it can also be a disturbance by bringing along un-necessary information

422 From-Itself (Including Bilinear Interpolation andOur TGV Upsampling Operations) In contrast to theldquoback-uprdquo category the information sources bilinearinterpolation and TGV upsampling used to make up forlosses are coming from the methodsrsquo processes ratherthan results of previous operations As described inRelated Works bilinear operation interpolates from xand y directions based on known sample values e TGV

Table 1 Results of the PASCAL VOC set

Method OA MAcc FWIoU MIoUFCN-8s 9113 7868 8459 6459FCN-TGV 9233 8012 8749 6602SegNet 8686 7441 8032 5859SegNet-TGV 8746 7608 8345 6048U-Net 8713 7068 7959 6159U-Net-TGV 8914 7212 8149 6389

Input

Conv

Conv

hellip

Conv 1

Conv

Conv

hellip

Conv 2

Conv

Conv

hellip

Conv 3

Conv

Conv

hellip

Conv 4

Conv

Conv

hellip

Conv 5

Conv

Conv

hellip

Conv

sco

re_

Conv 6

Conv

tra

nsp TGV

Slice Slice

+

Conv

tra

nsp

Slice

Output

TGV

Conv

sco

re_

Slice Slice

+

Conv

tra

nsp

TGV

times(12)

times(12)

times(14)

times(14)

times(14)

times(18)

times(18)

times(18)times(116) times(116)times(132)

Figure 9 Replacing the transposed convolutional layer in the FCN with the TGV upsampling method

Computational Intelligence and Neuroscience 9

method has a more complex information source It startsfrom k-order divergence of known pixel values (for k-order images) It can better preserve boundary

information of textures and the step effect can be ef-fectively avoided raising the smoothness of the entireimage at the same time

2000e + 5

1600e + 5

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(a)

1600e + 5

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(b)

Figure 10 Loss function of SegNet (a) and SegNet-TGV (b) on the PASCAL VOC VALIDATAE set

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(a)

0000 3000k 9000k6000k

1000e + 5

8000e + 4

6000e + 4

4000e + 4

2000e + 4

000

(b)

Figure 11 Loss function of FCN (a) and FCN-TGV (b) on the PASCAL VOC VALIDATAE set

1200e + 5

1000e + 5

8000e + 4

6000e + 4

0000 3000k 9000k6000k

(a)

1200e + 5

8000e + 4

4000e + 4

0000 3000k 9000k6000k

000

(b)

Figure 12 Loss function of U-Net (a) and U-Net-TGV (b) on the PASCAL VOC VALIDATAE set

10 Computational Intelligence and Neuroscience

From the results of these experiments the numericalresults of the loss function and the smoothness of thesampled feature maps have proven that the TGV-upsam-pling method is a great improvement over current methods

5 Conclusions

In this work we introduced a TGV-upsampling methodbased on image restoration research We transformed theprocess of feature map reconstruction into a loss-optimiza-tion problem Based on the divergence of each order we usedTGV regularization which reconstructs piecewise functionsFor numerical optimization we used a primal-dual algorithmwhich can effectively perform parallel computing and result inhigh frame rates We applied our new method to threenetworks and evaluated the results via PASCAL VOC data-setsese tests proved that the TGV upsampling method cangreatly make up for lost smoothness and boundary in-formation ofmaps Compared to the original methods used innetworks (FCN U-Net and SegNet) this new method madeaverage improvements of 14ndash23 in terms of MIoU accu-racy when used in the decoding stage When observing thetraining process in the experiments we clearly found that theTGV upsampling method greatly made up for informationlosses that resulted from other operations (mainly referring tovarious convolutional layers)

e proposed algorithm is not limited to single-layermap upsampling In the future it will be extended to un-derstanding entire scenes for effectively incorporatingtemporal coherence in a consistent way On the contrarywith the proposed ldquorestoration-likerdquo method we will furtherconcentrate on how to better understand latent semanticsunder unsupervised settings

Data Availability

e codes used to support the findings of this study have notbeen made available because the proposed method would laterbe applied in medical projects that involve personal privacy

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is research was supported by the Research Program toSolve Social Issues of the National Research Foundation ofKorea (NRF) funded by theMinistry of Science and ICT (noNRF2017M3C8A8091768)

References

[1] L-C Chen G Papandreou I Kokkinos K Murphy andA L Yuille ldquoDeeplab semantic image segmentation withdeep convolutional nets atrous convolution and fully con-nected crfsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 40 no 4 pp 834ndash848 2018

[2] X Li Z Liu P Luo C C Loy and X Tang ldquoNot all pixels areequal difficulty-aware semantic segmentation via deep layercascaderdquo in Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition Salt Lake City UT USA July2017

[3] H Zhao J Shi X Qi X Wang and J Jia ldquoPyramid sceneparsing networkrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition Salt Lake City UTUSA July 2017

[4] F Knoll K Bredies T Pock and R Stollberger ldquoSecondorder total generalized variation (TGV) for MRIrdquo MagneticResonance in Medicine vol 65 no 2 pp 480ndash491 2011

[5] A Chambolle and T Pock ldquoA first-order primal-dual algo-rithm for convex problems with applications to imagingrdquoJournal of Mathematical Imaging and Vision vol 40 no 1pp 120ndash145 2011

[6] M D Zeiler G W Taylor and R Fergus ldquoAdaptivedeconvolutional networks for mid and high level featurelearningrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV) vol 1 no 2 2011

[7] A Krizhevsky I Sutskever and G E Hinton ldquoImagenetclassification with deep convolutional neural networksrdquo inProceedings of the Advances in Neural Information ProcessingSystems Lake Tahoe NV USA December 2012

[8] O Ronneberger P Fischer and T Brox ldquoU-net convolu-tional networks for biomedical image segmentationrdquo inProceedings of the International Conference on Medical ImageComputing and Computer-Assisted Intervention pp 234ndash241Springer Munich Germany October 2015

[9] J Long E Shelhamer and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo in Proceedings of theIEEE conference on computer vision and pattern recognitionBoston MA USA June 2015

[10] V Badrinarayanan A Kendall and R Cipolla ldquoSegnet adeep convolutional encoder-decoder architecture for im-age segmentationrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 39 no 12 pp 2481ndash24952017

[11] H Noh S Hong and B Han ldquoLearning deconvolutionnetwork for semantic segmentationrdquo in Proceedings of theIEEE International Conference on Computer Vision SantiagoChile December 2015

[12] D R Martin C C Fowlkes and J Malik ldquoLearning to detectnatural image boundaries using local brightness color andtexture cuesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 26 no 5 pp 530ndash549 2004

[13] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[14] V Dumoulin and F Visin ldquoA guide to convolution arithmeticfor deep learningrdquo 2016 httpsarxivorgabs160307285

[15] M D Zeiler et al Deconvolutional Networks pp 2528ndash25352010

[16] C R Vogel and M E Oman ldquoFast total variation-basedimage reconstructionrdquo in Proceedings of the 1995 ASMEDesign Engineering Conferences vol 3 No part C BostonMA USA Boston MA USA September 1995

[17] L I Rudin S Osher and E Fatemi ldquoNonlinear total variationbased noise removal algorithmsrdquo Physica D NonlinearPhenomena vol 60 no 1ndash4 pp 259ndash268 1992

[18] M D Zeiler and F Rob ldquoVisualizing and understanding con-volutional networksrdquo in Proceedings of the European Conferenceon Computer Vision Springer Zurich Switzerland September2014

[19] V Papyan Y Romano and M Elad ldquoConvolutional neuralnetworks analyzed via convolutional sparse codingrdquo 2016httpsarxivorgabs160708194

Computational Intelligence and Neuroscience 11

[20] L-B Qiao B-F Zhang J-S Su and X-C Lu ldquoA systematicreview of structured sparse learningrdquo Frontiers of In-formation Technology amp Electronic Engineering vol 18 no 4pp 445ndash463 2017

[21] P Wang P Chen Y Yuan et al ldquoUnderstanding convolutionfor semantic segmentationrdquo in Proceedings of the 2018 IEEEWinter Conference on Applications of Computer Vision(WACV) IEEE Lake Tahoe NV USA March 2018

[22] J T Springenberg et al ldquoStriving for simplicity the allconvolutional netrdquo 2014 httpsarxivorgabs14126806

[23] K Bredies K Kunisch and T Pock ldquoTotal generalizedvariationrdquo SIAM Journal on Imaging Sciences vol 3 no 3pp 492ndash526 2010

[24] T Pock and A Chambolle ldquoDiagonal preconditioning forfirst order primal-dual algorithms in convex optimizationrdquo inProceedings of the 2011 International Conference on ComputerVision IEEE Barcelona Spain November 2011

[25] E Esser X Zhang and T F Chan ldquoA general framework for aclass of first order primal-dual algorithms for convex opti-mization in imaging sciencerdquo SIAM Journal on ImagingSciences vol 3 no 4 pp 1015ndash1046 2010

[26] D Ferstl C Reinbacher R Ranftl M Ruether andH Bischof ldquoImage guided depth upsampling using aniso-tropic total generalized variationrdquo in Proceedings of the IEEEInternational Conference on Computer Vision SydneyAustralia December 2013

[27] W He H Zhang L Zhang and H Shen ldquoTotal-variation-regularized low-rank matrix factorization for hyperspectralimage restorationrdquo IEEE Transactions on Geoscience andRemote Sensing vol 54 no 1 pp 178ndash188 2016

12 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 9: TGVUpsampling:AMaking-UpOperationfor …downloads.hindawi.com/journals/cin/2019/8527819.pdf1.Introduction Compared to traditional classification tasks, semantic seg- ... We made a

the same time they well preserved the smoothness of theboundary Lastly it can be concluded that the TGV upsamplingmethod helps models make a better predication on semantics

e quantitative results of the VOC test set of ourproposed algorithm and the competitors are presented inTable 1We observed that in terms of overall accuracy meanaccuracy frequency accuracy or MIoU all metrics wereimproved by about 14ndash23 corresponding to the visual-ization results in Figure 10

Figures 10ndash12 show the effect after applying the newupsampling method (the X and Y axes represent iterationsand losses respectively) We found that the TGVupsampling algorithm had a significant effect on reducinglosses compared to the original methods With a fixed batchsize our new method converged faster than the originalmodels

413 Loss Function in the Training Process e lossfunction of FCN FCN-TGV U-Net U-Net-TGV SegNetand SegNet-TGV on the PASCAL VOC VALIDATAE set isgiven in Figures 10ndash12

42 Analysis After concluding all four upsampling op-erations introduced in the above sections we can groupthem into two types based on their different informationsources

421 Back-Up (Including Unpooling and TransposedConvolution Operations) Many researchers think that whenreconstructing feature maps intermediate results producedin previous operations should be referred to (especiallylayers in the encoding stage) as if images are processed along

a ldquopathrdquo Having known conditions of front distances (re-ferring to corresponding layers in the encoding stage) makesit possible to better ldquopredicaterdquo the following road conditions(referring to feature map reconstructions) us whenapplying these methods operation information of previouslayers should be saved For example the unpooling oper-ation is actually seen as an index storage With saved pa-rameters feature maps can be well reconstructed

However in both transposed and unpooling operationsit is unavoidable that the processed results would retain acertain part of the intermediate results produced by previouslayers In some cases this is useful for keeping importantinformation that may be overlooked in the training processbut it can also be a disturbance by bringing along un-necessary information

422 From-Itself (Including Bilinear Interpolation andOur TGV Upsampling Operations) In contrast to theldquoback-uprdquo category the information sources bilinearinterpolation and TGV upsampling used to make up forlosses are coming from the methodsrsquo processes ratherthan results of previous operations As described inRelated Works bilinear operation interpolates from xand y directions based on known sample values e TGV

Table 1 Results of the PASCAL VOC set

Method OA MAcc FWIoU MIoUFCN-8s 9113 7868 8459 6459FCN-TGV 9233 8012 8749 6602SegNet 8686 7441 8032 5859SegNet-TGV 8746 7608 8345 6048U-Net 8713 7068 7959 6159U-Net-TGV 8914 7212 8149 6389

Input

Conv

Conv

hellip

Conv 1

Conv

Conv

hellip

Conv 2

Conv

Conv

hellip

Conv 3

Conv

Conv

hellip

Conv 4

Conv

Conv

hellip

Conv 5

Conv

Conv

hellip

Conv

sco

re_

Conv 6

Conv

tra

nsp TGV

Slice Slice

+

Conv

tra

nsp

Slice

Output

TGV

Conv

sco

re_

Slice Slice

+

Conv

tra

nsp

TGV

times(12)

times(12)

times(14)

times(14)

times(14)

times(18)

times(18)

times(18)times(116) times(116)times(132)

Figure 9 Replacing the transposed convolutional layer in the FCN with the TGV upsampling method

Computational Intelligence and Neuroscience 9

method has a more complex information source It startsfrom k-order divergence of known pixel values (for k-order images) It can better preserve boundary

information of textures and the step effect can be ef-fectively avoided raising the smoothness of the entireimage at the same time

2000e + 5

1600e + 5

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(a)

1600e + 5

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(b)

Figure 10 Loss function of SegNet (a) and SegNet-TGV (b) on the PASCAL VOC VALIDATAE set

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(a)

0000 3000k 9000k6000k

1000e + 5

8000e + 4

6000e + 4

4000e + 4

2000e + 4

000

(b)

Figure 11 Loss function of FCN (a) and FCN-TGV (b) on the PASCAL VOC VALIDATAE set

1200e + 5

1000e + 5

8000e + 4

6000e + 4

0000 3000k 9000k6000k

(a)

1200e + 5

8000e + 4

4000e + 4

0000 3000k 9000k6000k

000

(b)

Figure 12 Loss function of U-Net (a) and U-Net-TGV (b) on the PASCAL VOC VALIDATAE set

10 Computational Intelligence and Neuroscience

From the results of these experiments the numericalresults of the loss function and the smoothness of thesampled feature maps have proven that the TGV-upsam-pling method is a great improvement over current methods

5 Conclusions

In this work we introduced a TGV-upsampling methodbased on image restoration research We transformed theprocess of feature map reconstruction into a loss-optimiza-tion problem Based on the divergence of each order we usedTGV regularization which reconstructs piecewise functionsFor numerical optimization we used a primal-dual algorithmwhich can effectively perform parallel computing and result inhigh frame rates We applied our new method to threenetworks and evaluated the results via PASCAL VOC data-setsese tests proved that the TGV upsampling method cangreatly make up for lost smoothness and boundary in-formation ofmaps Compared to the original methods used innetworks (FCN U-Net and SegNet) this new method madeaverage improvements of 14ndash23 in terms of MIoU accu-racy when used in the decoding stage When observing thetraining process in the experiments we clearly found that theTGV upsampling method greatly made up for informationlosses that resulted from other operations (mainly referring tovarious convolutional layers)

e proposed algorithm is not limited to single-layermap upsampling In the future it will be extended to un-derstanding entire scenes for effectively incorporatingtemporal coherence in a consistent way On the contrarywith the proposed ldquorestoration-likerdquo method we will furtherconcentrate on how to better understand latent semanticsunder unsupervised settings

Data Availability

e codes used to support the findings of this study have notbeen made available because the proposed method would laterbe applied in medical projects that involve personal privacy

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is research was supported by the Research Program toSolve Social Issues of the National Research Foundation ofKorea (NRF) funded by theMinistry of Science and ICT (noNRF2017M3C8A8091768)

References

[1] L-C Chen G Papandreou I Kokkinos K Murphy andA L Yuille ldquoDeeplab semantic image segmentation withdeep convolutional nets atrous convolution and fully con-nected crfsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 40 no 4 pp 834ndash848 2018

[2] X Li Z Liu P Luo C C Loy and X Tang ldquoNot all pixels areequal difficulty-aware semantic segmentation via deep layercascaderdquo in Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition Salt Lake City UT USA July2017

[3] H Zhao J Shi X Qi X Wang and J Jia ldquoPyramid sceneparsing networkrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition Salt Lake City UTUSA July 2017

[4] F Knoll K Bredies T Pock and R Stollberger ldquoSecondorder total generalized variation (TGV) for MRIrdquo MagneticResonance in Medicine vol 65 no 2 pp 480ndash491 2011

[5] A Chambolle and T Pock ldquoA first-order primal-dual algo-rithm for convex problems with applications to imagingrdquoJournal of Mathematical Imaging and Vision vol 40 no 1pp 120ndash145 2011

[6] M D Zeiler G W Taylor and R Fergus ldquoAdaptivedeconvolutional networks for mid and high level featurelearningrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV) vol 1 no 2 2011

[7] A Krizhevsky I Sutskever and G E Hinton ldquoImagenetclassification with deep convolutional neural networksrdquo inProceedings of the Advances in Neural Information ProcessingSystems Lake Tahoe NV USA December 2012

[8] O Ronneberger P Fischer and T Brox ldquoU-net convolu-tional networks for biomedical image segmentationrdquo inProceedings of the International Conference on Medical ImageComputing and Computer-Assisted Intervention pp 234ndash241Springer Munich Germany October 2015

[9] J Long E Shelhamer and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo in Proceedings of theIEEE conference on computer vision and pattern recognitionBoston MA USA June 2015

[10] V Badrinarayanan A Kendall and R Cipolla ldquoSegnet adeep convolutional encoder-decoder architecture for im-age segmentationrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 39 no 12 pp 2481ndash24952017

[11] H Noh S Hong and B Han ldquoLearning deconvolutionnetwork for semantic segmentationrdquo in Proceedings of theIEEE International Conference on Computer Vision SantiagoChile December 2015

[12] D R Martin C C Fowlkes and J Malik ldquoLearning to detectnatural image boundaries using local brightness color andtexture cuesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 26 no 5 pp 530ndash549 2004

[13] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[14] V Dumoulin and F Visin ldquoA guide to convolution arithmeticfor deep learningrdquo 2016 httpsarxivorgabs160307285

[15] M D Zeiler et al Deconvolutional Networks pp 2528ndash25352010

[16] C R Vogel and M E Oman ldquoFast total variation-basedimage reconstructionrdquo in Proceedings of the 1995 ASMEDesign Engineering Conferences vol 3 No part C BostonMA USA Boston MA USA September 1995

[17] L I Rudin S Osher and E Fatemi ldquoNonlinear total variationbased noise removal algorithmsrdquo Physica D NonlinearPhenomena vol 60 no 1ndash4 pp 259ndash268 1992

[18] M D Zeiler and F Rob ldquoVisualizing and understanding con-volutional networksrdquo in Proceedings of the European Conferenceon Computer Vision Springer Zurich Switzerland September2014

[19] V Papyan Y Romano and M Elad ldquoConvolutional neuralnetworks analyzed via convolutional sparse codingrdquo 2016httpsarxivorgabs160708194

Computational Intelligence and Neuroscience 11

[20] L-B Qiao B-F Zhang J-S Su and X-C Lu ldquoA systematicreview of structured sparse learningrdquo Frontiers of In-formation Technology amp Electronic Engineering vol 18 no 4pp 445ndash463 2017

[21] P Wang P Chen Y Yuan et al ldquoUnderstanding convolutionfor semantic segmentationrdquo in Proceedings of the 2018 IEEEWinter Conference on Applications of Computer Vision(WACV) IEEE Lake Tahoe NV USA March 2018

[22] J T Springenberg et al ldquoStriving for simplicity the allconvolutional netrdquo 2014 httpsarxivorgabs14126806

[23] K Bredies K Kunisch and T Pock ldquoTotal generalizedvariationrdquo SIAM Journal on Imaging Sciences vol 3 no 3pp 492ndash526 2010

[24] T Pock and A Chambolle ldquoDiagonal preconditioning forfirst order primal-dual algorithms in convex optimizationrdquo inProceedings of the 2011 International Conference on ComputerVision IEEE Barcelona Spain November 2011

[25] E Esser X Zhang and T F Chan ldquoA general framework for aclass of first order primal-dual algorithms for convex opti-mization in imaging sciencerdquo SIAM Journal on ImagingSciences vol 3 no 4 pp 1015ndash1046 2010

[26] D Ferstl C Reinbacher R Ranftl M Ruether andH Bischof ldquoImage guided depth upsampling using aniso-tropic total generalized variationrdquo in Proceedings of the IEEEInternational Conference on Computer Vision SydneyAustralia December 2013

[27] W He H Zhang L Zhang and H Shen ldquoTotal-variation-regularized low-rank matrix factorization for hyperspectralimage restorationrdquo IEEE Transactions on Geoscience andRemote Sensing vol 54 no 1 pp 178ndash188 2016

12 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 10: TGVUpsampling:AMaking-UpOperationfor …downloads.hindawi.com/journals/cin/2019/8527819.pdf1.Introduction Compared to traditional classification tasks, semantic seg- ... We made a

method has a more complex information source It startsfrom k-order divergence of known pixel values (for k-order images) It can better preserve boundary

information of textures and the step effect can be ef-fectively avoided raising the smoothness of the entireimage at the same time

2000e + 5

1600e + 5

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(a)

1600e + 5

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(b)

Figure 10 Loss function of SegNet (a) and SegNet-TGV (b) on the PASCAL VOC VALIDATAE set

1200e + 5

8000e + 4

4000e + 4

000

0000 3000k 9000k6000k

(a)

0000 3000k 9000k6000k

1000e + 5

8000e + 4

6000e + 4

4000e + 4

2000e + 4

000

(b)

Figure 11 Loss function of FCN (a) and FCN-TGV (b) on the PASCAL VOC VALIDATAE set

1200e + 5

1000e + 5

8000e + 4

6000e + 4

0000 3000k 9000k6000k

(a)

1200e + 5

8000e + 4

4000e + 4

0000 3000k 9000k6000k

000

(b)

Figure 12 Loss function of U-Net (a) and U-Net-TGV (b) on the PASCAL VOC VALIDATAE set

10 Computational Intelligence and Neuroscience

From the results of these experiments the numericalresults of the loss function and the smoothness of thesampled feature maps have proven that the TGV-upsam-pling method is a great improvement over current methods

5 Conclusions

In this work we introduced a TGV-upsampling methodbased on image restoration research We transformed theprocess of feature map reconstruction into a loss-optimiza-tion problem Based on the divergence of each order we usedTGV regularization which reconstructs piecewise functionsFor numerical optimization we used a primal-dual algorithmwhich can effectively perform parallel computing and result inhigh frame rates We applied our new method to threenetworks and evaluated the results via PASCAL VOC data-setsese tests proved that the TGV upsampling method cangreatly make up for lost smoothness and boundary in-formation ofmaps Compared to the original methods used innetworks (FCN U-Net and SegNet) this new method madeaverage improvements of 14ndash23 in terms of MIoU accu-racy when used in the decoding stage When observing thetraining process in the experiments we clearly found that theTGV upsampling method greatly made up for informationlosses that resulted from other operations (mainly referring tovarious convolutional layers)

e proposed algorithm is not limited to single-layermap upsampling In the future it will be extended to un-derstanding entire scenes for effectively incorporatingtemporal coherence in a consistent way On the contrarywith the proposed ldquorestoration-likerdquo method we will furtherconcentrate on how to better understand latent semanticsunder unsupervised settings

Data Availability

e codes used to support the findings of this study have notbeen made available because the proposed method would laterbe applied in medical projects that involve personal privacy

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is research was supported by the Research Program toSolve Social Issues of the National Research Foundation ofKorea (NRF) funded by theMinistry of Science and ICT (noNRF2017M3C8A8091768)

References

[1] L-C Chen G Papandreou I Kokkinos K Murphy andA L Yuille ldquoDeeplab semantic image segmentation withdeep convolutional nets atrous convolution and fully con-nected crfsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 40 no 4 pp 834ndash848 2018

[2] X Li Z Liu P Luo C C Loy and X Tang ldquoNot all pixels areequal difficulty-aware semantic segmentation via deep layercascaderdquo in Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition Salt Lake City UT USA July2017

[3] H Zhao J Shi X Qi X Wang and J Jia ldquoPyramid sceneparsing networkrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition Salt Lake City UTUSA July 2017

[4] F Knoll K Bredies T Pock and R Stollberger ldquoSecondorder total generalized variation (TGV) for MRIrdquo MagneticResonance in Medicine vol 65 no 2 pp 480ndash491 2011

[5] A Chambolle and T Pock ldquoA first-order primal-dual algo-rithm for convex problems with applications to imagingrdquoJournal of Mathematical Imaging and Vision vol 40 no 1pp 120ndash145 2011

[6] M D Zeiler G W Taylor and R Fergus ldquoAdaptivedeconvolutional networks for mid and high level featurelearningrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV) vol 1 no 2 2011

[7] A Krizhevsky I Sutskever and G E Hinton ldquoImagenetclassification with deep convolutional neural networksrdquo inProceedings of the Advances in Neural Information ProcessingSystems Lake Tahoe NV USA December 2012

[8] O Ronneberger P Fischer and T Brox ldquoU-net convolu-tional networks for biomedical image segmentationrdquo inProceedings of the International Conference on Medical ImageComputing and Computer-Assisted Intervention pp 234ndash241Springer Munich Germany October 2015

[9] J Long E Shelhamer and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo in Proceedings of theIEEE conference on computer vision and pattern recognitionBoston MA USA June 2015

[10] V Badrinarayanan A Kendall and R Cipolla ldquoSegnet adeep convolutional encoder-decoder architecture for im-age segmentationrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 39 no 12 pp 2481ndash24952017

[11] H Noh S Hong and B Han ldquoLearning deconvolutionnetwork for semantic segmentationrdquo in Proceedings of theIEEE International Conference on Computer Vision SantiagoChile December 2015

[12] D R Martin C C Fowlkes and J Malik ldquoLearning to detectnatural image boundaries using local brightness color andtexture cuesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 26 no 5 pp 530ndash549 2004

[13] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[14] V Dumoulin and F Visin ldquoA guide to convolution arithmeticfor deep learningrdquo 2016 httpsarxivorgabs160307285

[15] M D Zeiler et al Deconvolutional Networks pp 2528ndash25352010

[16] C R Vogel and M E Oman ldquoFast total variation-basedimage reconstructionrdquo in Proceedings of the 1995 ASMEDesign Engineering Conferences vol 3 No part C BostonMA USA Boston MA USA September 1995

[17] L I Rudin S Osher and E Fatemi ldquoNonlinear total variationbased noise removal algorithmsrdquo Physica D NonlinearPhenomena vol 60 no 1ndash4 pp 259ndash268 1992

[18] M D Zeiler and F Rob ldquoVisualizing and understanding con-volutional networksrdquo in Proceedings of the European Conferenceon Computer Vision Springer Zurich Switzerland September2014

[19] V Papyan Y Romano and M Elad ldquoConvolutional neuralnetworks analyzed via convolutional sparse codingrdquo 2016httpsarxivorgabs160708194

Computational Intelligence and Neuroscience 11

[20] L-B Qiao B-F Zhang J-S Su and X-C Lu ldquoA systematicreview of structured sparse learningrdquo Frontiers of In-formation Technology amp Electronic Engineering vol 18 no 4pp 445ndash463 2017

[21] P Wang P Chen Y Yuan et al ldquoUnderstanding convolutionfor semantic segmentationrdquo in Proceedings of the 2018 IEEEWinter Conference on Applications of Computer Vision(WACV) IEEE Lake Tahoe NV USA March 2018

[22] J T Springenberg et al ldquoStriving for simplicity the allconvolutional netrdquo 2014 httpsarxivorgabs14126806

[23] K Bredies K Kunisch and T Pock ldquoTotal generalizedvariationrdquo SIAM Journal on Imaging Sciences vol 3 no 3pp 492ndash526 2010

[24] T Pock and A Chambolle ldquoDiagonal preconditioning forfirst order primal-dual algorithms in convex optimizationrdquo inProceedings of the 2011 International Conference on ComputerVision IEEE Barcelona Spain November 2011

[25] E Esser X Zhang and T F Chan ldquoA general framework for aclass of first order primal-dual algorithms for convex opti-mization in imaging sciencerdquo SIAM Journal on ImagingSciences vol 3 no 4 pp 1015ndash1046 2010

[26] D Ferstl C Reinbacher R Ranftl M Ruether andH Bischof ldquoImage guided depth upsampling using aniso-tropic total generalized variationrdquo in Proceedings of the IEEEInternational Conference on Computer Vision SydneyAustralia December 2013

[27] W He H Zhang L Zhang and H Shen ldquoTotal-variation-regularized low-rank matrix factorization for hyperspectralimage restorationrdquo IEEE Transactions on Geoscience andRemote Sensing vol 54 no 1 pp 178ndash188 2016

12 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 11: TGVUpsampling:AMaking-UpOperationfor …downloads.hindawi.com/journals/cin/2019/8527819.pdf1.Introduction Compared to traditional classification tasks, semantic seg- ... We made a

From the results of these experiments the numericalresults of the loss function and the smoothness of thesampled feature maps have proven that the TGV-upsam-pling method is a great improvement over current methods

5 Conclusions

In this work we introduced a TGV-upsampling methodbased on image restoration research We transformed theprocess of feature map reconstruction into a loss-optimiza-tion problem Based on the divergence of each order we usedTGV regularization which reconstructs piecewise functionsFor numerical optimization we used a primal-dual algorithmwhich can effectively perform parallel computing and result inhigh frame rates We applied our new method to threenetworks and evaluated the results via PASCAL VOC data-setsese tests proved that the TGV upsampling method cangreatly make up for lost smoothness and boundary in-formation ofmaps Compared to the original methods used innetworks (FCN U-Net and SegNet) this new method madeaverage improvements of 14ndash23 in terms of MIoU accu-racy when used in the decoding stage When observing thetraining process in the experiments we clearly found that theTGV upsampling method greatly made up for informationlosses that resulted from other operations (mainly referring tovarious convolutional layers)

e proposed algorithm is not limited to single-layermap upsampling In the future it will be extended to un-derstanding entire scenes for effectively incorporatingtemporal coherence in a consistent way On the contrarywith the proposed ldquorestoration-likerdquo method we will furtherconcentrate on how to better understand latent semanticsunder unsupervised settings

Data Availability

e codes used to support the findings of this study have notbeen made available because the proposed method would laterbe applied in medical projects that involve personal privacy

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is research was supported by the Research Program toSolve Social Issues of the National Research Foundation ofKorea (NRF) funded by theMinistry of Science and ICT (noNRF2017M3C8A8091768)

References

[1] L-C Chen G Papandreou I Kokkinos K Murphy andA L Yuille ldquoDeeplab semantic image segmentation withdeep convolutional nets atrous convolution and fully con-nected crfsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 40 no 4 pp 834ndash848 2018

[2] X Li Z Liu P Luo C C Loy and X Tang ldquoNot all pixels areequal difficulty-aware semantic segmentation via deep layercascaderdquo in Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition Salt Lake City UT USA July2017

[3] H Zhao J Shi X Qi X Wang and J Jia ldquoPyramid sceneparsing networkrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition Salt Lake City UTUSA July 2017

[4] F Knoll K Bredies T Pock and R Stollberger ldquoSecondorder total generalized variation (TGV) for MRIrdquo MagneticResonance in Medicine vol 65 no 2 pp 480ndash491 2011

[5] A Chambolle and T Pock ldquoA first-order primal-dual algo-rithm for convex problems with applications to imagingrdquoJournal of Mathematical Imaging and Vision vol 40 no 1pp 120ndash145 2011

[6] M D Zeiler G W Taylor and R Fergus ldquoAdaptivedeconvolutional networks for mid and high level featurelearningrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV) vol 1 no 2 2011

[7] A Krizhevsky I Sutskever and G E Hinton ldquoImagenetclassification with deep convolutional neural networksrdquo inProceedings of the Advances in Neural Information ProcessingSystems Lake Tahoe NV USA December 2012

[8] O Ronneberger P Fischer and T Brox ldquoU-net convolu-tional networks for biomedical image segmentationrdquo inProceedings of the International Conference on Medical ImageComputing and Computer-Assisted Intervention pp 234ndash241Springer Munich Germany October 2015

[9] J Long E Shelhamer and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo in Proceedings of theIEEE conference on computer vision and pattern recognitionBoston MA USA June 2015

[10] V Badrinarayanan A Kendall and R Cipolla ldquoSegnet adeep convolutional encoder-decoder architecture for im-age segmentationrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 39 no 12 pp 2481ndash24952017

[11] H Noh S Hong and B Han ldquoLearning deconvolutionnetwork for semantic segmentationrdquo in Proceedings of theIEEE International Conference on Computer Vision SantiagoChile December 2015

[12] D R Martin C C Fowlkes and J Malik ldquoLearning to detectnatural image boundaries using local brightness color andtexture cuesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 26 no 5 pp 530ndash549 2004

[13] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[14] V Dumoulin and F Visin ldquoA guide to convolution arithmeticfor deep learningrdquo 2016 httpsarxivorgabs160307285

[15] M D Zeiler et al Deconvolutional Networks pp 2528ndash25352010

[16] C R Vogel and M E Oman ldquoFast total variation-basedimage reconstructionrdquo in Proceedings of the 1995 ASMEDesign Engineering Conferences vol 3 No part C BostonMA USA Boston MA USA September 1995

[17] L I Rudin S Osher and E Fatemi ldquoNonlinear total variationbased noise removal algorithmsrdquo Physica D NonlinearPhenomena vol 60 no 1ndash4 pp 259ndash268 1992

[18] M D Zeiler and F Rob ldquoVisualizing and understanding con-volutional networksrdquo in Proceedings of the European Conferenceon Computer Vision Springer Zurich Switzerland September2014

[19] V Papyan Y Romano and M Elad ldquoConvolutional neuralnetworks analyzed via convolutional sparse codingrdquo 2016httpsarxivorgabs160708194

Computational Intelligence and Neuroscience 11

[20] L-B Qiao B-F Zhang J-S Su and X-C Lu ldquoA systematicreview of structured sparse learningrdquo Frontiers of In-formation Technology amp Electronic Engineering vol 18 no 4pp 445ndash463 2017

[21] P Wang P Chen Y Yuan et al ldquoUnderstanding convolutionfor semantic segmentationrdquo in Proceedings of the 2018 IEEEWinter Conference on Applications of Computer Vision(WACV) IEEE Lake Tahoe NV USA March 2018

[22] J T Springenberg et al ldquoStriving for simplicity the allconvolutional netrdquo 2014 httpsarxivorgabs14126806

[23] K Bredies K Kunisch and T Pock ldquoTotal generalizedvariationrdquo SIAM Journal on Imaging Sciences vol 3 no 3pp 492ndash526 2010

[24] T Pock and A Chambolle ldquoDiagonal preconditioning forfirst order primal-dual algorithms in convex optimizationrdquo inProceedings of the 2011 International Conference on ComputerVision IEEE Barcelona Spain November 2011

[25] E Esser X Zhang and T F Chan ldquoA general framework for aclass of first order primal-dual algorithms for convex opti-mization in imaging sciencerdquo SIAM Journal on ImagingSciences vol 3 no 4 pp 1015ndash1046 2010

[26] D Ferstl C Reinbacher R Ranftl M Ruether andH Bischof ldquoImage guided depth upsampling using aniso-tropic total generalized variationrdquo in Proceedings of the IEEEInternational Conference on Computer Vision SydneyAustralia December 2013

[27] W He H Zhang L Zhang and H Shen ldquoTotal-variation-regularized low-rank matrix factorization for hyperspectralimage restorationrdquo IEEE Transactions on Geoscience andRemote Sensing vol 54 no 1 pp 178ndash188 2016

12 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 12: TGVUpsampling:AMaking-UpOperationfor …downloads.hindawi.com/journals/cin/2019/8527819.pdf1.Introduction Compared to traditional classification tasks, semantic seg- ... We made a

[20] L-B Qiao B-F Zhang J-S Su and X-C Lu ldquoA systematicreview of structured sparse learningrdquo Frontiers of In-formation Technology amp Electronic Engineering vol 18 no 4pp 445ndash463 2017

[21] P Wang P Chen Y Yuan et al ldquoUnderstanding convolutionfor semantic segmentationrdquo in Proceedings of the 2018 IEEEWinter Conference on Applications of Computer Vision(WACV) IEEE Lake Tahoe NV USA March 2018

[22] J T Springenberg et al ldquoStriving for simplicity the allconvolutional netrdquo 2014 httpsarxivorgabs14126806

[23] K Bredies K Kunisch and T Pock ldquoTotal generalizedvariationrdquo SIAM Journal on Imaging Sciences vol 3 no 3pp 492ndash526 2010

[24] T Pock and A Chambolle ldquoDiagonal preconditioning forfirst order primal-dual algorithms in convex optimizationrdquo inProceedings of the 2011 International Conference on ComputerVision IEEE Barcelona Spain November 2011

[25] E Esser X Zhang and T F Chan ldquoA general framework for aclass of first order primal-dual algorithms for convex opti-mization in imaging sciencerdquo SIAM Journal on ImagingSciences vol 3 no 4 pp 1015ndash1046 2010

[26] D Ferstl C Reinbacher R Ranftl M Ruether andH Bischof ldquoImage guided depth upsampling using aniso-tropic total generalized variationrdquo in Proceedings of the IEEEInternational Conference on Computer Vision SydneyAustralia December 2013

[27] W He H Zhang L Zhang and H Shen ldquoTotal-variation-regularized low-rank matrix factorization for hyperspectralimage restorationrdquo IEEE Transactions on Geoscience andRemote Sensing vol 54 no 1 pp 178ndash188 2016

12 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 13: TGVUpsampling:AMaking-UpOperationfor …downloads.hindawi.com/journals/cin/2019/8527819.pdf1.Introduction Compared to traditional classification tasks, semantic seg- ... We made a

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom