Deep Learning-Based Inversion Methods for Solving Inverse

1

Deep Learning-Based Inversion Methods forSolving Inverse Scattering Problems with Phaseless

DataKuiwen Xu, Liang Wu, Xiuzhu Ye and Xudong Chen

Abstract—Without phase information of the measured fielddata, the phaseless data inverse scattering problems (PD-ISPs)counter more serious nonlinearity and ill-posedness comparedto full data ISPs (FD-ISPs). In this paper, we propose alearning-based inversion approach in the frame of the U-netconvolutional neural network (CNN) to quantitatively imageunknown scatterers located in homogeneous background fromamplitude-only measured total field (also denoted phaseless data).Three training schemes with different inputs to the U-net CNNare proposed and compared, i.e., the direct inversions (DIS)scheme with phaseless total field data, retrieval dominant inducedcurrents by Levenberg-Marquardt (LM) method (PD-DICs) andphaseless data with contrast source inversion scheme (PD-CSI).We also demonstrate the setup of training data and comparethe performance of the three schemes using both numerical andexperimental tests. It is found that the proposed PD-CSI andPD-DICs perform better in terms of accuracy, generalizationability and robustness compared to DIS. PD-CSI has the strongestcapability to tackle with PD-ISPs, which outperforms the PD-DICs and DIS.

Index Terms—inverse scattering problems, convolutional neu-ral network, phaseless data.

I. INTRODUCTION

ELECTROMAGNETIC inverse scattering problems (ISP-s) are widely used in microwave remote sensing [1],

nondestructive tests [2], biomedical imaging [3], geophysics[4] and subsurface imaging [5]. And they usually involvequantitative imaging of the unknown scatterers. In general,the electromagnetic ISPs are to reconstruct the physical andgeometric characteristics of the unknown objects, includingtheir positions, size, quantity and constitutive parameters bymeasuring the scattered field.

In the last decades, ISPs are becoming more and moreproficient due to the continuous efforts of the researchers.However, ill-posedness and nonlinearity are two challengingproblems for solving electromagnetic ISPs. Many different in-version methods have been developed to retrieve the unknownobjects more efficiently and reliably. In order to improve the

Manuscript received... The work described was supported by the NSFCunder grants No. 61971174 and the Zhejiang Provincial Natural ScienceFoundation of China under Grant No. LY19F010012.

K. Xu and L. Wu are with Engineering Research Center of Smart Microsen-sors and Microsystems, Ministry of Education, Hangzhou Dianzi University,Hangzhou 310018, China; (e-mail: [email protected])

X. Y is with Beijing Institute of Technology, Beijing 10081, China;(e-mail:[email protected])

X. Chen is with Department of Electrical and Computer Engineering,National University of Singapore, 4 Engineering Drive 3, Singapore 117583,Singapore(e-mail:[email protected]).

efficiency of the inversion, multiple scattering effect in thedomain of interest (DoI) is ignored and then linear methodsbased on Born approximation (BA) and Rytov approximationhave been proposed [6], [7]. Therefore, this kind of method isonly capable of imaging weak scatterers. To tackle with thestrong scatterers, multiple scattering effect should be includedin the modeling and nonlinear methods are developed, suchas the distorted Born iterative method (DBIM) [8], contrastsource-type inversion method (CSI) [9], [10], multiplicativeregularized CSI (MR-CSI) [11], Gauss Newton-type methods[12], [13], subspace-based optimization method (SOM) [14]–[16], and some global optimization methods [17], [18]. Also,regularization scheme is needed to stabilize the optimizationof ISPs due to the illposedness. The idea is to incooperateas much as prior information into the cost function. Amongthem, compressive-sensing (CS) based methods are also usedas effective regularization tools to improve inversion efficiency[19], [20].

The reconstruction methods above all used full wave data(e.g., both the amplitude and phase data), which are denoted asfull data (FD) ISPs [21]. However, it is difficult to accuratelymeasure the phase of scattered fields data at high frequencies.On the other hand, to get the phase data, the hardware cost andmeasurement sources are greatly increased, which significantlyprevent the application of ISPs in the engineering. Besides,considering that the phase measurement is susceptible to noise,it would result in the failure of the inversion. Thus, theinversion methods using only phaseless data are preferred forengineering applications.

The ISPs with phaseless data (PD), denoted as PD-ISPs,have been discussed and developed [22]–[31]. For example,diffraction tomography based on Born approximation withintensity-only data for the reconstruction of a weakly scat-tering object has been firstly developed by several literature[27], [28], [32]. When tackling with stronger scatterers for PD-ISPs, the nonlinearity and ill-posedness are more prominentcompared to the one with FD-ISPs, owing to less availableknowledge and the corresponding higher order cost function.Then the PD-ISPs is converted into a kind of optimizationproblems by minimizing the residual error between the mea-sured intensity data and the calculated one. The correspondingPD-ISPs can be solved by some iterative approaches, such asNewtons method [33], [34], PD-SOM [35], PD-CSI [22] andPD-DRIM [30]. In both of PD-SOM and PD-CSI, like the FD-ISPs, two residual errors consisting of the term describing theerror in the data equation and the one for the object equation

2

are used to construct the cost function, which improve theability of inversion with PD-ISPs. Although they can solvePD-ISPs to some extent, but they are still difficult to meetthe requirements of real-time inversion and could not get thesatisfied imaging when inversion of unknown scatterers withhigh contrasts or electrically large dimensions.

It is well known that deep neural networks (DNNs) performwell in the field of computer vision, regression, image pro-cessing and object classification problems. DNNs with strongnonlinear fitting ability has been gradually applied to variousresearch fields. For example, DNNs have been successfullyused to solve inverse problems [36]–[41]. Actually, learning-by-examples methods (LBEs) have been used in the past fordealing with ISPs [42]. Recently, DNNs employed convolu-tional neural networks (CNNs) has been developed to solvethe FD-ISPs [37]–[39]. In [37], deep-learning schemes forISPs are employed by training the U-Net, which were able togenerate quick, good quantitative results. And the relationshipbetween the DNNs architecture and the iterative method ofnonlinear EM inverse scattering was exploited [37]. After that,an induced current learning method (ICLM) by incorporatingmerits in traditional iterative algorithms into the architectureof CNN has been proposed to bridge the gap between physicalknowledge and learning approaches [38]. In order to solve theISPs with high contrast problems, a Contrast Source Network(CS-Net) combined traditional SOM and CNNs is developedwith three stages [39]. Also, in [40], a gradient learningscheme has been used to invert transient electromagnetic data.Therefore, it is proven that DNNs have been successfullyapplied into the solution of nonlinear FD-ISPs and the inver-sion method based on deep learning outperforms remarkablyconventional nonlinear inverse scattering methods in terms ofboth the image quality and computational time.

The FD-ISPs and PD-ISPs are two different problems,which are widely concerned in the engineering, physical, andmathematical communities [15], [22], [43]. Until now, DNNshave not been used to solve the PD-ISPs yet in literature.In this paper, we firstly proposed to utilize learning basedinversion to solve the PD-ISPs, which has less availableknowledge and much higher nonlinearity compared to FD-ISPs. Compared with previous work [37]–[39] for FD-ISPs,this work presents three different inversion schemes for theU-net CNNs, which aim to reconstruct the unknown scatterersin the DoI when only the intensity data of the total fieldare collected and the incident field are available. The firstscheme is direct inversion scheme (DIS), which directly usesthe intensity data of the total field to retrieve profiles of relativepermittivities. In the second scheme, the dominant inducedcurrents (DICs) (only composed of a small parts of signalsubspace) retrieved by Levenberg-Marquardt (LM) with phase-less data are as the input of U-net CNNs, which constructs anend-to-end mapping from an input rough image to the refinedpermittivity distribution and is named as PD-DICs. And inthe third scheme, the input rough image of CNNs comesfrom the output of contrast source inversion with phaselessdata (which is denoted as PD-CSI). Different from [22], theinduced currents is spanned with Fourier based expansion andonly a part of low frequency components is used such that the

computational cost and source can be reduced drastically.ThePD-DICs and PD-CSI are totally different methods from theones of FD-ISPs for exacting the feature maps or scatteringmode physically in [37]. By exacting the feature maps fromthe phaseless data, much easier nonlinear problems with lessnumber of unknowns have been left to be solved by the CNNs.We test and compare the three schemes with both numericaland experimental tests. It is found from the tests that PD-CSI shows best capability to tackle with highly nonlinear ISPsowing to more feature maps are extracted by the PD-CSI.

The structure of this paper is as follows. In Section II,we introduce the formulation of the problem. In Section III,we describe the three schemes. In Sections IV and V, testswith numerical and experimental data are given in details,respectively. In Section V, we summarize some conclusions.It’s note that we use X and X to denote the matrix and vectorof the discretized operator or parameter X, respectively.

II. FORMULATION OF THE PROBLEM

Herein, a model of the 2-D problem is considered to illus-trate the principle of the proposed methods with TM-polarizedincident waves (which can be straightforwardly generalizedto 3D ISPs). The 2-D configuration for ISPs is depictedin Fig. 1. In free-space background with permittivity ε0 andpermeability µ0, the probed unknown objects are located inthe domain of interest (DoI) D (D ⊂ R2) and the unknownscatterers, i.e., locations, shapes, and the relative permittivity(εr(r) = εr(r)/ε0), need to be determined. The DoI isilluminated successively by a total number of Ni incidences atrp (p = 1, 2, ..., Ni), which are located from the measurementdomain S outside the DoI. For each incidence, there are a totalnumber of Nr receivers located at rq (q = 1, 2, ..., Nr) alongthe measurement domain S to collect the scattered fields.

domain

Transmi er Receiver

Fig. 1: Typical geometry of inverse scattering problems.

In order to numerically solve ISPs, rectangle subunits areused to discretize the domain D into M = M1 ×M2 subunits(The numbers of subunits along the x- and y-axes are M1 andM2), and the centers of subunits are located at r1, r2, ..., rM .Method of moment (MoM) with pulse basis function is usedto formulate the scattering problem. The aim of the ISPs is toget the constitutive parameter distribution of the M subunitsfrom the measured phaseless total field data.

3

According to the Lippman-Schwinger integral equation, thediscretized matrix forms of total electric fields, i.e., E

tot

p , inthe DoI by the pth incidence can be written as follows,

Etot

p = Einc

p +GD·Ip (1)

where the total field Etotp = [Etot1 (r1), ..., EtotNi(rM )]T and the

incident field Eincp = [Einc1 (r1), ..., EincNi(rM )]T . ¯GD is the

matrix operator form of the Green’s function mapping thecontrast source of the DoI (i.e., Ip ) to the scattered fieldsin the DoI. The contrast source can be defined by,

Ip = χ · (Eincp +GD·Ip) (2)

where χ is diagonal matrix of contrast function and χ(rm) =(ε(rm) − ε0)/ε0 is defined as the contrast function at rm,m = 1, 2, ...,M in the DoI.

Then the scattered field on the receivers can be got by

Esca

p = GS · Ip, (3)

where Esca

p and GS denote the scattered fields by the pthincidence and 2-D free space Green’s function matrix mappingthe contrast source of the DoI to the scattered fields on themeasurement domain.

The equations (2) and (3) are two basic equations for ISPs,which are denoted as the object equation and data equation,respectively. The ISPs are to retrieve the contrast function χin the DoI given the measurement data and the correspondingillumination fields. According to the measurement data, theISPs can be categorized into two typed problems, i.e., the onewith full set of data (FD) and the one with only the amplitudedata or phaseless data (PD). The FD-ISPs are to be solvedwith both amplitude and phase information of the scatteredfield data. However, for the PD ISPs, we use the intensity-only measurements of total field for inversion.

Herein, we use Fp(rq) to represent the square of the intensityof the total field on the S. And Fp(rq) illuminated by the pthtransmitting antenna can be written in the following form,

F p =| Eradp + Esca

p |2 (4)

where the Erad

p and Esca

p represent the radiated field orincident electrical field on the receiving antennas and thescattered electrical field by the pth transmitting antenna onthe S, respectively. The radiated field E

rad

p can be measuredin full data only once before the experiments and saved for theinversion in the tests. (4) is denoted as the intensity equation.The equations (2), (3) and (4) together constitute the governingequations for the PD ISPs.

The aim of PD-ISPs is to reconstruct the contrast functionfrom the square of the intensity of the total fields F p basedon (2), (3) and (4). And we define the Ψ(·) as the forwardmodeling function, i.e., E

sca

p = Ψ(χ). Then the cost functionfor PD-ISPs can be summarized as,

H(χ) =

Ni∑p=1

∥∥∥F p− | Eradp + Ψ(χ) |2∥∥∥2 + αR(χ) (5)

where R(χ) is a regurgitation term, that incorporates priorinformation into the model to stabilize the nonlinear opti-mization and α is a weighting parameter to balance the datafitting term and the regularization term. From (5), the PD-ISPs counter serious nonlinear and ill-posed problems, whichcomes from the under-determined system of equations andthe measurement noise. Consequently, such a problem haveinfinite solutions and it is easy to trap into local minima. Thereare several iterative optimization methods proposed to solvethe nonlinear optimization for PD-ISPs, such as [25], [35].These iterative inversion methods take a long time to get thesolution and much depend on the initialization.

Recently, deep learning-based methods are introduced tosolve the FD-ISPs [36], which includes two phases, off-linetraining and on-line testing. As demonstrated in [36], learning-based methods have clear advantages over conventional itera-tive optimization in terms of image quality and computationaltime. Although the PD-ISPs are more susceptible to measure-ment noise and have higher nonlinearity compared to FD-ISPs,the solution to PD-ISPs plays a significant role in the physical,mathematical and even engineering communities owing to therobust measurement with only amplitude of fields data. In thispaper, we proposed a deep learning method for inversion ofPD-ISPs with different schemes for the input of neural networkfor the first time.

III. INVERSION OF DEEP LEARNING-BASED METHODS

The classic CNNs of U-net architecture are used to solvethe PD-ISPs, which is depicted as in Fig. 2. The U-netCNNs are widely used in imaging processing, recognition, andclassification. As mentioned in [37], the U-net CNNs are ableto solve full-wave nonlinear FD-ISPs. In the following, theU-net architecture will be introduced briefly.

The typical configuration of U-net is a U-shaped symmetricstructure. The left side of U-net is a contracting path, it consistsof the repeated application of two 3 × 3 convolutions, eachfollowed by batch normalization (BN) and rectified linearunit (ReLU), the BN can effectively accelerate deep networktraining [44], and then is a 2 × 2 Max pooling operation. Ateach down-sampling step we double the numbers of featurechannels. The right side of U-net is a expansive path. Andevery step in the expansive path consists of an up-samplingof the feature map followed by a 3 × 3 up-convolution thathalves the number of feature channels, which is used to restorethe image to its original size. The feature map obtained byeach convolutional layer of the U-net will concatenate to thecorresponding upsampling layer, so that each layer of thefeature map can be effectively used in subsequent calculations.The U-net architecture was originally used for segmentation[45], which can predict the value of each pixel.

A. Direct Inversion Scheme

In the measurement, the acquisition of intensity data is mucheasier than the measurement of data with both amplitude andphase. When the targets in the DoI are different, then thereceived reaction information can also be various such thatthe corresponding measured intensity field data is different.

4

Conv 3x3+BN+ReLUMax pooling 2x2

Up-conv 3x3

Copy and crop

Conv 1x1

64

Nin: number of input channels

SD:M×M

SD: Spatial dimension

Number of channels

128 646464 12864 256128 256 256 512 512 256512 256 128 128256 128Nin 1

M×M

M—8—x8

M

M—2—x2

M

M—2—x2

M

M—4—x4

M M—4—x4

M

M×M

Input Output

Fig. 2: U-net architecture for the PD-ISPs.

Therefore, the simplest solution for PD-ISPs with U-net is todirectly utilize the phaseless data as the input of the U-netCNNs and output is the reconstructed image of contrast func-tion in the DoI. Different from [37], the input of the CNN isthe intensity data of the total field without phase information.The number of input channels and output channels of U-netare set to 1, achieving one image-to-image mapping. Herein,the Direct inversion scheme (DIS) is used to compare with theother schemes.

B. PD-DICs Scheme

Although the direct inversion scheme of CNN is easy toimplement, it is a black-box method just given the input datato get the output variables, which makes the U-net CNN stilldifficult to get the global optimal solution with a large numberof unknowns but limited measurements. In order to make theU-net training more easily and the end-to-end mapping for PD-ISPs more accurately, more prior physical information shouldbe got in advance and taken as the input or prior informationto the U-net CNNs such that lighter load is left to the CNNs.Consequently, inspired by the work in [37], instead of directlyusing the intensity data, the additional step is used to exactthe dominant induced current and the corresponding roughcontrast function is taken as the input of the U-net CNNs.

For PD-DICs scheme, in the first step, the dominant inducedcurrent containing most of features of the unknown objectsis calculated. PD-SOM has been proposed to solve the ISPswith phaseless data. Similar to [35], the dominant inducedcurrents (a small part of signal subspace) can be obtained.The procedure for the PD-DICs is briefly introduced in thefollowing.

Following the operation of SOM for FD-ISPs, we dosingular value decomposition (SVD) for the operator GS asmentioned in [15]

GS = U · S · VH

(6)

where U is the Ns ×Ns matrix, S is the Ns ×N matrix, V isthe N×N matrix and H indicates conjugate transpose. The Ipis partitioned into two orthogonally complementary portions(viz., the deterministic parts of induced currents (DPICs) I

d

p

and the ambiguous parts of induced currents (APICs) Ia

p) asfollows,

Ip = Id

p + Ia

p (7)

and Id

p = Vs· αsp, where V

sand αsp are the basis of the

signal subspace which comprises the first L right singularcolumns of the V matrix and the vector of coefficients for thedeterministic, respectively. L is the number of singular valuesin the S matrix that are above a predefined noise-dependentthreshold [15], [35]. In FD-SOM, the governing scattered dataequation (2) is linear equation such that the DPICs can be gotdeterminedly from the SVD operator. But for PD-ISPs, thedominant contribution to the intensity of the total field comesfrom the E

rad

p and the deterministic portion caused by Id

p.So, in order to get the DPICs, the objective function could beformulated as quadric equations,

αsp = arg minαs

p

‖Fm,p − Y p‖ (8)

where Fm,p denotes the square of the measured intensityvector due to the pth incidence and Y p =| Eradp +GS ·V

sαsp |2.

Consequently, we need to optimize (8) to get the the coefficientvector αsp and then obtain I

d

p.Following [46], the Levenberg-Marquardt (LM) algorithm,

which is a mixture of the Gauss-Newton algorithm and themethod of gradient decent, is used to efficiently solve nonlin-ear least squares problems and get the optimal αsp.

After that, we can obtain the DPICs Id

p, and Etot

p canbe calculated according to (1). Then the nth element of thecontrast function in the DoI can be obtained with

5

χn =

[ Ni∑p=1

(Etot

p,n

)∗‖Idp,n‖

·Id

p,n

‖Idp,n‖

]/

[ Ni∑p=1

(Etot

p,n

)∗‖Idp,n‖

·Etot

p,n

‖Idp,n‖

](9)

where * indicates conjugate operator.In PD-DICs, the entire APICs (e.g., I

a

p) is ignored. But,owing to that only small number of unknown variables (e.g.,dominant signal subspace which has sizes of L × Ni) needto be solved in the optimization, it has the advantage offast convergence and can still retain most of the informationwith medium scatterers. Finally, the preliminary image, i.e.,χ, obtained by (9) is used as the input of U-net CNN and theoutputs are the fitted contrast function of the DoI.

C. PD-CSI Scheme

The PD-DICs scheme can take some physical prior in-formation out before stepping into the CNN and give thecorresponding prior information to CNN. However, owing toignore the APICs, the ability to retrieve strong scatterers issignificantly limited. Therefore, in order to improve the resolv-ability to solve high-nonlinear PD-ISPs, more high frequencycomponents should be included in the input images of the U-net. So, the induced current to get the preliminary contrastfunction as the input of CNNs should be composed of both ofDPICs and APICs (or parts of APICs).

Herein, PD-CSI based in Fourier-based expansion for in-duced currents is proposed. The procedure of PD-CSI issimilar to that used in CSI. Different from [22], the Fourier-based expansion for the induced current is used and one cancontrol the number of Fourier coefficients to determine thecomponents of induced currents. The details of the PD-CSI islisted as follows.

With (4), we can obtain the residual error of intensity fieldequation

4intensityp =∥∥∥Fm,p − |Eradp +GS ·Ip|2

∥∥∥2 (10)

and the residual error of object equation can be given

4currentp =∥∥∥Ip − χ · (Eincp +GD·Ip

)∥∥∥2 (11)

for PD-CSI. In this scheme, Ip is represented by Fouriercoefficients, i.e., Ip = F ·αp, to control the portion of inducedcurrent and accelerate the calculation. F is the Fourier-basedexpansion operator and αp is the Fourier coefficients vector.For 2D case, the induced current can be expressed with two-dimensional Fourier coefficients tensor ¯γj as,

Ij = vec{IDFT{¯γj}}, (12)

where ¯γj is with non-zero elements corresponding to thoselow-frequency components and zero elements correspondingto the remaining high-frequency discrete Fourier bases, and thevec{} is the vectorization operator. In MATLAB convention,the MF×MF tensor γl is with four nonzero blocks of size MF

in the four corners, and M0 = 4×MF2 . Because of M0 �

M , the number of unknown variables for induced current isreduced dramatically.

We can determine αp and χ to minimize the objectivefunction via alternatively application of the conjugate gradient(CG) minimization method and least square method on eachunknown, respectively.

The main steps are as follows,1) Construct the objective function

f(α1, α2, . . . , αNi, χ) =

Ni∑p=1

(4intensityp

‖Fm,p‖2+4currentp

‖Eincp ‖2

)(13)

Initialization step, t=0: χ0=0, αp,0=0, and the searchdirection ρp,0=0.

2) The number of iterations: t=t+1.3) Calculate gradient of the objective function with respect

to αp :

gp = ∇αpf

=2F

H·[(GH

S ·A∗) · (|A|2 − Fm,p)]‖Fm,p‖2

+2[FH·(

I− χ ·GD)H]

·(F · αp − χ ·B

)‖Eincp ‖2

(14)where A = E

rad

p +GS ·F ·αp, B = Einc

p +GD·F ·αp,and I denotes unit matrix. Substitute αp,t−1 and χt−1

into equation (14) to obtain gradient value gp,t.4) Determine the search direction with the Polak-Ribiere

CG directions,

ρp,t = gp,t +Re[(gp,t − gp,t−1)∗ · gp,t

]‖gp,t−1‖2

· ρp,t−1 (15)

5) Define the search scalar λp,t by linear search methodwith the formula

αp,t = αp,t−1 + λp,tρp,t. (16)

Substitute (16) into the objective function (13) andcalculate gradient ∇λp,tf , then λp,t can be obtained bysolving the equation, i.e., ∇λp,t

f=0, and αp,t can beobtained by (16). After that, then the induced currentcan be updated with Ip,t = F · αp,t and the total fieldEtot

p,t can be got by (1).6) The solution of the nth element of the contrast function

can be obtained by least square methods

(χt)n =

[ Ni∑p=1

(Etot

p,t

)∗n

‖Eincp ‖·

(Ip,t

)n

‖Eincp ‖

]/

[ Ni∑p=1

(Etot

p,t

)∗n

‖Eincp ‖·

(Etot

p,t

)n

‖Eincp ‖

](17)

7) If termination condition is satisfied, stop iteration. Oth-erwise, back to the step 2).

6

The inversion of PD-CSI is to minimize the objectivefunction (13) by alternatively updating the Fourier coefficientsαp and the contrast χ. In this inversion, only a small numberof Fourier coefficients consisting of the low-frequency com-ponents are used in the inversion to get a rough image of thecontrast function. The number of unknowns is greatly reducedand in virtue of the Fourier bases, the fast Fourier transform(FFT) can be used directly in the calculation such that thecomputational time and resources are reduced significantly.Then we can get the rough image of contrast χ as the input ofCNN and via the powerful representational ability and learningcapability, let CNN to learn the high-frequency components intraining. By the process of the PD-CSI, the ability to retrievethe high-nonlinear PD-ISPs can be improved at the small costof the pre-network.

D. Computational Complexity

We compare the performance of different schemes by ana-lyzing computational cost, number of unknowns and numberof iterations. Since the DoI is discretized into M pixels, thecomputational cost C1 for iterative algorithms to solve (5) canbe described as [21]

C1 = O(NoptNiNforM logM) (18)

where M logM is the computational cost of matrix-vectormultiplication with FFT in each iteration of the forward solver,Nfor is the number of iterations for solving the forwardproblem, and Nopt denotes the number of iterations duringoptimization procedure. In iterative methods, the key to solveISPs with low computational complexities is to reduce thevalue of Nopt [37].

DIS directly uses phaseless data as the input of CNN, so thecomputational cost includes the basic operations in CNN, andthe complexity is dominated by convolutions. Specifically, ifthere are Qi input feature maps and Qo output feature mapsand the feature map size and the convolution kernel size areM1×M1 and Kf × Kf (Kf = 3 in this paper), respectively,the computational workload in the convolution layer is in theorder of O(M2

1K2fQiQo) [37].

To compute the input of PD-DICs and PD-CSI, the com-putational cost is dominated by iteratively computing themultiplication of Green’s function and the induced curren-t (e.g., GD·Ip). So, the computational costs of them areO(NoptNiNforM logM) if FFT is applied in the matrix-vector multiplication for the iterative pre-calculation. However,the number of unknown variables for the input network of PD-DICs is reduced to L×Ni (i.e, DPICs ) from the number ofM × Ni of the induced current, where L (e.g., 15) is muchless than M . Also the unknown variables of induced currentsfor the input network of PD-CSI is reduced to M0 × Ni,where M0 herein is usually smaller than 200. Owing to thereduction of number of unknown variables, the number ofiteration Nopt and computation time are drastically reducedcompared to traditional iteration inversion methods. Herein,the typical numbers of iterations for PD-DICs and PD-CSI areabout 20 and 50, which takes approximately about 0.9s and2s, respectively. And the numbers of input channels and output

channels of U-net for PD-DICs and PD-CSI are set to 2 and1, respectively, such that less feature maps in the convolutionlayers are utilized. Consequently, the computational sourcesfor CNNs can be further saved compared to the work in [37].With GPU-based parallelization, the training procedure can befinished more efficiently.

IV. NUMERICAL SIMULATIONS

This section presents some numerical tests to evaluate theperformance of the three CNN schemes proposed in SectionIII. In the training stage, the circular-cylinder data sets [37] areused for all the test examples. In all tests, reconstructions areconducted at a single frequency, e.g., 400MHz, correspondingto the wavelength λ = 0.75 m in the air background medium.

In the numerical tests, the DoI is a 2 × 2 m2 squarecentered at the origin and is discretized into 32 × 32 pixels.The homogeneous background is the free space. There are16 linearly polarized transmitters and 32 receivers equallyplaced on a circle of radius 3 m centered at (0, 0) m tocollect the fields. For each incidence, 32 probes are evenlydistributed along the same circle S to measure the scatteredfields. The absolute value total fields from 16 incidences aregenerated numerically using MOM and recorded into a matrix

Et

AM =| Erad

+ Esca| with the size of 32 × 16. Additive

white Gaussian noise (AWGN) is added into the syntheticdata. The level of AWGN is given by ‖¯n‖F /‖E

t

AM‖F , where‖·‖F represents Frobenius norm of a matrix, and ¯n represents

the noise. So the resultant noise-corrupted matric ¯n + Et

AM

substitutes the total field for inversion.In order to assess the quality of the reconstruction results,

the relative error of the reconstructed profile is defined as

Ertot =

√√√√ 1

M

∑m

∑n

( ¯εinvr;m,n − ¯εtrr;m,n¯εtrr;m,n

)2

, (19)

where ¯εinvr;m,n is the relative permittivity of each iteration and¯εtrr;m,n is the true relative permittivity of the scatterers.

A. Preparation for training

Training process is the key step for neural network. In thetraining, neural network establishes a relationship betweeninput and output of the system. Quality of this relationshipdetermines the representation ability of neural network inoutput subspace and accuracy of nonlinear system functionapproximation. For implementation of PD-DICs and PD-CSI,two empirical parameters L and M0 should be determinedfirstly. According to the spectrum analysis with SVD, L isdecided by the noise level, as mentioned in [15]. And M0

determines how many low-frequency Fourier components areutilized to construct induced currents in PD-CSI. Here L = 20and M0 = 196 are selected for numerical tests.

As did in [37], the generated circular-cylinder (term “cylin-der” in the following parts) as the scatterers in the domainD are used for training to get the synthetic phaseless fielddata. In the training, the radius, number, location and relativepermittivity of the cylinders are randomly distributed. Note

7

that the cylinders can be overlapped, but all the cylindersmust be located inside the DoI. Given the size of DoI, theradii of random generated cylinders are set between 0.15 mand 0.6 m, the number is between 1 and 3, and the relativepermittivity is between 1.1 and 1.5. According to these rulesabove, we numerically generate 5000 true images of randomlydistributed cylinders scatterers. In virtue of numerical forwardproblem calculation with MoM, the corresponding phaselessfield data sets, (e.g., E

t

AM ) for 5000 profiles can be got.Among the 5000 samples, 4000 image pairs are randomlyselected as the training set, and the other 1000 image pairsare used for testing. It is stressed that in all the numericalsimulation, the same training sets (e.g.,cylinders) are onlyused. The workstation server configuration with Intel Core i7-8700K CPU, 32-GB RAM, and GeForce GTX 2080Ti GPUis used for the numerical simulations.

Epoch

0 50 100 150 200

Loss

0

0.5

1

1.5

2

DIS

PD-DICs

PD-CSI

Fig. 3: The MSE of the three schemes in the training.

In all the learning-based inversion, the pytorch, i.e., adeep learning framework in Python, is used to implementthe proposed CNN schemes. For training, GPUs are used toaccelerate the execution times of machine learning workloads.Pytorch can conveniently use the GPUs to train U-nets, thecomputational time spent on training and test processes canbe significantly reduced by GPUs accelerations.

The Mean-Square Error (MSE) is used as the loss functionfor training the U-net CNNs, which can be expressed as,

MSE =1

Tn

Tn∑i=1

(xi − yi)2 (20)

where Tn is the total number of outputs. xi and yi are theoutputs of the last layer and targets of the CNN schemes,respectively. The hyperparameters for training are as follows:learning rate is set to 0.01, and maximum 200 epochs are setfor training.

TABLE I: Computational time for the training and the losswith different numbers of hidden layers

Number of layers 2 3 4 5Loss 0.06 0.030 0.024 0.024

Training time (seconds) 439 548 669 1132

It’s known to all that DNNs with a greater number of hiddenlayers could obtain better nonlinear mapping capability andbetter predicted performance. However, lots of hidden layersalso cost more computational time and resources. So, thereis a trade-off between the number of hidden layers and theimaging performance. TABLE I shows the loss and trainingtime when different number of layers in the neural networkare used. As listed in TABLE I, the loss gets smaller andsmaller when the number of hidden layers is smaller than 4.But when the number of layers exceeds 4, the loss seems tonot reduce obviously any more. The case with the numberof layers with being equal to 4 is saturation point of theloss. Also, the computational time for the training procedureunder the number of layers less than 4 (including 2, 3, and4) does not vary much, which is less than 670 seconds.As the number of layers is larger than 4, the computationalcost is drastically increased. Comprehensively considering thecomputational time and the calculation accuracy, the 4 layersU-net CNN is utilized for the inversion.

In addition, the numbers of samples for training is signif-icant for the performance of the neural network. Therefore,the learning capability of the proposed methods with differentnumbers of samples is discussed. For consistency, the ratiobetween the number of training set and testing set is set to4:1. As shown in TABLE II, the average relative errors ofthe testing set with 100 samples from MNIST data (relativepermittivity is 1.5) and the training time with 4 hidden layersare listed when the number of samples are 1000, 3000 and5000, respectively. As can be seen from the TABLE II,although it takes minimum training time under the size oftraining set being 800 (1000 × 0.8 = 800), the calculatederror is much larger than the others. With the increase ofthe size, the errors get smaller and the time spent increasesalmost proportionally. There is no large improvement of thecalculation accuracy when the number of samples is larger than5000. As numerical simulation validated, the 3000 samplesis an approximated threshold set in terms of the calculationaccuracy for the inversion. However, since we did not knowthe complexity of the problems to be solved in advance, theset of 5000 samples is chosen to ensure the sufficient samplesto keep the convergence of the training. Consequently, thesize of the training set used in this paper is set to 4000(5000× 0.8 = 4000) and the 4 layers U-net with 200 epochsare used for the training. The loss curves of the three schemesfor the training are depicted in Fig. 3. It is shown that all ofthe three schemes have good convergence after 200 epochs andthe best convergence can be achieved by the PD-CSI scheme.

TABLE II: Investigation for different number of samples withfour hidden layers.

Number of samples 1000 3000 5000Average calculation error 0.0820 0.0657 0.0648Training time (seconds) 130 400 630

B. Tests With Cylinders

In the example 1, the relative permittivity is set between 1.1and 1.5 with 5% AWGN presented in the phaseless total field

8

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

True

test(1)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

DIS

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

PD-DICs

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

PD-CSI

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

test(2)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

test(3)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

test(4)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

x(m)

-1 -0.5 0 0.5 1y(m

)-1

-0.5

0

0.5

1

1

1.2

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

test(5)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

Fig. 4: Example 1: reconstructed relative permittivity profiles for DIS, PD-DICs, and PD-CSI, where the relative permittivityis between 1.1 and 1.5 with 5% Gaussian noise presented in the phaseless total field. The first column shows the true imagesfor five representative tests.

for the test set. The training data set is noiseless. Reconstructedrelative permittivity profiles of five typical tests with three dif-ferent CNN schemes are presented in Fig. 4. It is can be seenthat, although DIS-CNN can retrieve some simple profiles,with more than one cylinders, the reconstructed profiles aredistorted and some unexpected artifacts exists, especially forthe overlapped

cases. In comparison, the reconstructed results by PD-DICsand PD-CSI are much better. In order to better compare theimaging quality in the reconstruction of cylinder using DIS,PD-DICs, and PD-CSI, the reconstructed errors are reportedin TABLE I. According to TABLE I, we can also see that the

reconstructed errors of PD-DICs and PD-CSI are lower thanDIS.

TABLE III: Relative errors for the reconstructions in Fig. 4

Target test(1) test(2) test(3) test(4) test(5)DIS 0.0377 0.0904 0.0509 0.0764 0.0714

PD-DICs 0.0397 0.0339 0.0297 0.0510 0.0409PD-CSI 0.0346 0.0347 0.0265 0.0510 0.0632

It shows that, for weak scatterers (relative permittivitybetween 1 and 1.5) or simple cases, the reconstructed errorsare in the same level for both PD-DICs and PD-CSI.

9

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

True

test(1)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

DIS

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

PD-DICs

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

PD-CSI

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

test(2)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

test(3)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

1.6

x(m)

-1 -0.5 0 0.5 1y(m

)-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

test(4)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

test(5)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

Fig. 5: Example 2: reconstructed relative permittivity profiles for DIS, PD-DICs, and PD-CSI, where the relative permittivityis 1.5 with 5% Gaussian noise presented in the phaseless total field.

C. Tests With MNIST Database

In order to validate the generalization capability of proposedCNNs, the MNIST dataset are used to test the performancesunder the training set of cylinders. The MNIST dataset con-tains 70000 images of handwritten digits, and each digit imagehas a size of 28 × 28 pixels. It is a test bench in machinelearning, which has a database of ten handwritten digits from 0to 9 and is written by 250 different people. One can downloadthe MNIST dataset at http://yann.lecun.com/exdb/mnist/.

In the example 2, the relative permittivity of MNIST profilesis set 1.5 with 5% AWGN added into the phaseless totalfield for the test. We are devoted to reconstructing the profileinstead of recognizing the digits. Reconstructed profiles fromsome representative examples are presented in Fig. 5, and the

corresponding reconstructed errors of the tests are showed inTABLE II.

TABLE IV: relative errors for the reconstructions of MNISTin Fig. 5

Target test(1) test(2) test(3) test(4) test(5)DIS 0.1013 0.1414 0.1231 0.1063 0.1221

PD-DICS 0.0777 0.0961 0.0686 0.0746 0.0710PD-CSI 0.0541 0.0851 0.0612 0.0655 0.0599

It is found that despite the results have some degradationcompared to the ones obtained in Example 1, PD-DICs andPD-CSI can still reconstruct profiles successfully. And theprofile retrieved by PD-CSI is more accurate than the PD-DICsand can better distinguish the digits profiles. Also TABLEII shows that the retrieved error by PD-CSI is smaller than

10

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

1.6

1.8

test(1)

True

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

1.6

1.8

test(2)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

1.6

1.8

test(3)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

1.6

1.8

test(4)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

1.6

1.8

test(5)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

PD-DICs

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

x(m)

-1 0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4PD-CSIs

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

x(m)

-1 -0.5 0 0.5 1y(m

)-1

-0.5

0

0.5

1

1

1.2

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

Fig. 6: Reconstructed relative permittivity profiles for PD-DICs and PD-CSI, where the relative permittivity is 1.8 with 5%Gaussian noise presented in the phaseless total field.

PD-DICs. However, the reconstructed profiles of DIS is veryblurred and one almost can not recognize the digits at all, suchas in the test(4) and (5). It validates that the generalizationcapability of DIS is weak, and it is difficult to reconstructthe scatterers which has different shapes from the profiles intraining set.

TABLE V: relative errors for the reconstructions of MNISTin Fig. 6

Target test(1) test(2) test(3) test(4) test(5)PD-DICs 0.1495 0.1684 0.1462 0.1616 0.1420PD-CSI 0.1135 0.1497 0.1210 0.1466 0.1075

To further distinguish the proposed methods with PD-DICsand PD-CSI, we test the MNIST data when the relativepermittivity of the profiles is raised to 1.8. In this case, thenonlinearity of the ISPs is increased owing to the strongermultiple scattering effects. In Fig. 6, we present the recon-structed relative permittivity profiles of five representative testsfor both of PD-DICs and PD-CSI. It is can be seen clearlythat, the profiles by PD-DICs are heavily deformed and cannot be recognized at all. But the reconstructed profiles byPD-CSI are able to be recognized well, although the retrievalrelative permittivity is about 1.5, which a little lower thanthe exact value. Also in TABLE III, it is shown that theretrieval errors by PD-CSI is much lower than the ones by PD-DICs. Compared to the PD-DICs, PD-CSI gives much betterreconstruction performance than PD-DICs when solving ISPswith higher nonlinearity (higher contrast or electrically largesize).

x(m)

-1 -0.5 0 0.5 1

y(m

)-1

-0.5

0

0.5

1

1.1

1.2

1.3

1.4

1.5

(a)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

(b)

Fig. 7: “Austria” profile with relative permittivity 1.5: imagereconstruction of “Austria” profile. (a) Ground truth of “Aus-tria” profile, reconstructed profiles with (b) PD-DICs (left) andPD-CSI (right) with 5% Gaussian noise.

D. Tests With “Austria” Profile

To further compare the performance of the PD-DICs andPD-CSI, the test bench example (“Austria” profile, denoted asexample 3), is used to validate the trained CNNs. The exact“Austria” profile is depicted in Fig. 7(a).

The “Austria” profile consists of two disks and one ring.Both of the disks have a radius of 0.2 m and centered at (0.3,0.6) and (-0.3, 0.6) m. The ring has an inner radius of 0.3 mand an exterior radius of 0.6 m. The relative permittivities ofbackground and scatterers are 1 and 1.5, respectively.

The “Austria” is a challenging profile for inversion that iswell known in the ISPs [37]. Here, since the generalization

11

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1.2

1.4

1.6

1.8

(a)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.1

1.2

1.3

1.4

(b)

Fig. 8: “Austria” profile with relative permittivity 1.8: imagereconstruction of “Austria” profile. (a) Ground truth of “Aus-tria” profile, reconstructed profiles with (b) PD-DICs (left) andPD-CSI (right) with 5% Gaussian noise.

capability of DIS was proven to be poor in the previousexample, we do not test DIS anymore. Fig. 6(b) and (c)present the reconstructed profiles for PD-DICs and PD-CSI,respectively.

Although the images from both PD-DICs and PD-CSI canbe recognized, the reconstructed disks and the center of thering by PD-DICs are seriously polluted. In comparison, PD-CSI can obtain better profile, which has more accurate valueof relative permittivity and better edges preserving effect inFig. 7(c). The reconstructed errors for PD-DICs and PD-CSI are 0.1034 and 0.0829, respectively. Consequently, PD-CSI perform better than PD-DICs when dealing with strongscatterers.

Similarly, we increase the relative permittivity of “Austria”to 1.8, which means the nonlinearity of ISPs is much higher.The retrieval results by PD-DICs and PD-CSI are shown inFig. 8(a) and (b), respectively. It is seen that, the reconstructedprofile of PD-DICs is seriously distorted and the two diskscan not seen any more. But PD-CSI can still reconstructed theshape of “Austria” profile successfully. Owing to the maximumpermittivity of 1.5 in the training, the reconstructed relativepermittivity is a little lower than 1.8. The the reconstructederrors for PD-DICs and PD-CSI are 0.1946 and 0.1635, re-spectively. This example shows again that PD-CSI is superiorto PD-DICs in the face of solving stronger ISPs.

In order to further validate the robustness even under lessthan ideal conditions, herein the cases when the size of DoIfor tests different form the default setting in the training areanalyzed. In the training, the DoI is a 2 × 2 m2 square,whereas, in the test, the length of the square is 2.2 m. It isnoted that the larger the electrical size of DoI is, the moredifficulty it is for inversion. The reconstruction results withthe trained CNNs are depicted in the Fig. 10. It can be seenclearly, although there are some artifacts, the reconstructedprofiles can still be satisfied for the “Austria” examples withrelative permittivity of 1.6. Whereas, the relative permittivity isincreased to 1.8, the retrieval profiles and relative permittivity

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1.1

1.2

1.3

1.4

1.5

(a) x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1.2

1.4

1.6

1.8

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

(b) x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

(c) x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

Fig. 10: Imaging results under the size of DoI 2.2×2.2m2 for“Austria” profiles (different from the setting in the training,e.g., 2× 2m2) (a) Ground truth, the reconstruction results by(b) PD-CSI, and (c) PD-DICs.

seems to be blurry for both PD-CSI and PD-DICs. But thebasic shapes can be also observed.

E. Tests With Lossy Scatterers

All of the mentioned examples above are the inversion ofthe lossless dielectric profiles. In order to further validate theversatility of the proposed methods, we try to reconstruct lossyunknown scatterers with the proposed CNNs schemes. Thesetups in the training are the same with those of lossless casesbut complex permittivities are used in this example. The realand imaginary parts of relative permittivities are in the rangeof 1.5 to 2 and 0 to 1, respectively, which can be referredto the examples in [47]. Note that here we choose L = 10and M0 = 196 for PD-DICs and PD-CSI, respectively. Thenumber of output channels of CNNs are changed to 2, withthe real part and the imaginary part of images, respectively.As depicted in Fig. 9, three representative true profiles andthe corresponding reconstructed profiles by both PD-DICs andPD-CSI are presented. It is seen that both CNNs schemes canachieve satisfying results for lossy scatterers. When dealingwith the complicated cases with more than two cylinders,the performance by PD-CSI is much better than PD-DICs. InTABLE IV shows the retrieval errors by two different methods,which further numerically illustrates that PD-CSI achieves thebetter performance than that of PD-DICs.

12

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

1.6

1.8

True(R)

test(1)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

0

0.2

0.4

0.6

True(I)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

1.6

1.8

PD-DICs(R)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

0

0.2

0.4

0.6

PD-DICs(I)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

1.6

1.8

PD-CSI(R)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

0

0.2

0.4

0.6

0.8

PD-CSI(I)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

1.6

1.8

test(2)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

0

0.2

0.4

0.6

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

1.6

1.8

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

0

0.2

0.4

0.6

0.8

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.5

2

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

0

0.2

0.4

0.6

0.8

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

1.6

test(3)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

0

0.05

0.1

0.15

0.2

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

1.6

1.8

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

0

0.2

0.4

0.6

0.8

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.2

1.4

1.6

1.8

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

0

0.2

0.4

Fig. 9: Lossy scatterers tests: reconstructed relative complex permittivity profiles from scattered fields with 5% noise for PD-DICs and PD-CSI, where the real part and imaginary part of relative permittivity are in the range of 1.5-2 and 0-1, respectively.It is noted that R and I in the figure denote real and imaginary parts, respectively.

TABLE VI: Relative errors for the reconstructions of lossyscatterers in Fig. 9

Target test(1) test(2) test(3)PD-DICs 0.0967 0.1831 0.1707PD-CSI 0.0781 0.0730 0.1303

80mm

31mm

foam

plas�c

Fig. 11: The probed object consists of a composition ofcylindrical plastic (left) and foam (right) objects.

V. TESTING EXPERIMENTAL DATA

Finally, the “FoamDielExt” experimental data from 2 to10 GHz in TM cases provided by the Institute Fresnel,Marseille, France is uded for test. And detailed informationon the configuration of the experimental measurement setuphas been described in [48]. Fig. 11 shows the exact profile of“FoamDielExt” within DoI of 20 cm × 20 cm, consisting oftwo cylinders, where the red cylinder is a dielectric (plastic)with a relative permittivity of 3 ± 0.3 and a diameter of 80mm, and the blue cylinder is a dielectric (foam) with a relativepermittivity of 1.45 ± 0.15 and a diameter of 31 mm. Notethat the ± sign denotes that there is some error in the relativepermittivity in the experiment. Due to the lossless scatterers,the null imaginary part of the profile is not shown. In the

inversion, a 32×32 grid mesh of the DoI is used. To calibratethe data, the method outlined in [37] is utilized. As mentionedin [37], L is chosen as 10 here, which is smaller than the valueused in the previous tests and M0 is set to 100.

In experimental test, we use a new training data set, whichis similar to those used in Example 1, to train the U-netCNNs. Different from Example 1, the relative permittivitiesof cylinders are set to between 1.5 and 3.3. We also usenoiseless synthetic data to train the CNNs without requiring alot of experimental data and test the network using measuredexperimental data. As shown in Fig. 12, we present thereconstructed relative permittivity profiles by PD-DICs andPD-CSI at both 6 and 8 GHz, respectively. At 6 GHz, both ofPD-DICs and PD-CSI can successfully retrieve the scatterersin DoI, although there are some artifacts between the twocylinders. And the shape and the retrieval relative permittivityare satisfied and the reconstructed profile by PD-CSI is a litterbetter than PD-DICs. The relative errors are 0.1604 and 0.1513for PD-DICs and PD-CSI, respectively.

When the frequency increases to 8 GHz, the nonlinearityof ISPs also increases. As shown in Fig. 12(c), although theposition and the shape the scatterers are accurately detected,the reconstructed relative permittivity profiles of foam differsgreatly from the exact value, especially for the PD-DICs. Therelative errors are 0.3351 and 0.3265 for PD-DICs and PD-CSI at 8GHz, respectively. Consequently, the retrieved errorsat 8 GHz is much larger than the ones at 6 GHz because of thehigher nonlinearity of ISPs. Compared with the reconstructedresults both at 6 GHz and 8GHz, the relative errors by PD-CSI are smaller than that of PD-DICs. Thus, the experimentalinversion results further verify our conclusion that, PD-CSIhas a better performance when dealing with high nonlinearISPs compared with PD-DICs.

VI. CONCLUSIONBeing lack of phase information in measured data makes

it more challenging to solve the ISPs. In this work, we

13

x(m)

-1 -0.5 0 0.5 1

y(m

)-1

-0.5

0

0.5

1

1.5

2

2.5

3

(a)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.5

2

2.5

3

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.5

2

2.5

3

(b)

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.5

2

2.5

3

x(m)

-1 -0.5 0 0.5 1

y(m

)

-1

-0.5

0

0.5

1

1

1.5

2

2.5

(c)

Fig. 12: Experimental data tests: image reconstruction of“FoamDielExt”. (a) true profile of “FoamDielExt”. (b) and (c)the reconstruction results at 6 GHz and 8 GHz by PD-DICsand PD-CSI, respectively.

proposed three inversion schemes based on U-net CNNs tosolve phaseless ISPs, including DIS, PD-DICs, and PD-CSI.DIS is directly used the phaseless data as the input of theneural network, and the latter two schemes need to firstly geta preliminary image within a small signal subspace using thephaseless inversion algorithm, and then use the images as theinput of the neural network. It is validated that the computationtime for all the CNN schemes in reconstructing the profiles ismuch less than that traditional iterative reconstruction. In thenumerical tests, it is found that DIS can only reconstruct somesimple ISPs, and the generalization ability is much worse thantwo other methods.

DIS is like a ’black-box’ optimization method directly withmeasurement field data and leaves much already known phys-ical information to the neural networks and make the neuralnetworks burdensome. PD-DICs and PD-CSI use the phaselessinversion algorithm to deal with the measured phaseless datafirstly and then the preliminary retrieved images are taken asthe input of the neural network. Owing to that the numberof unknown variables (i.e., induced currents) is far less thanthe number pixels, so the computational cost of pre-inversioncan be significantly reduced compared with traditional iterativeinversion. Numerical and experimental results validate that,PD-CSI and PD-DICs can obtain satisfying results, have good

generalization ability and pretty robustness against noise. Theabove results can be attributed to the following two reasons:1) Avoid directly dealing with phaseless measurement data,where CNN has to spend unnecessary cost to train and learnunderlying wave physics. Extract out as much as possible whatones can do and leave the remaining to CNNs 2) Throughobtained feature map (the rough estimated images) by the PD-DICs and PD-CSI with low computational cost, much easiernonlinear problems with less numbers of unknowns have beenleft to be solved by the CNNs. Therefore, the CNNs just learnthe relationship between the feature maps and the final targetdistribution, which is much easier than the ones directly withthe scattered fields.

When the relative permittivity of the unknown scatterersin the test exceeds the range of training set, the performanceof PD-CSI is always better than PD-DICs. In the PD-CSI,some APICs (representing high frequency components) areincluded to get the preliminary profile, but only DPICs areutilized for get the rough images for the input in the PD-DICs. Consequently, PD-CSI has stronger ability to solvinghigh-nonlinear ISPs. The examples of “Austria” example andexperimental test have validated it. By this implementation,it supply a new learning based method to solve the phaselessISPs, which is capable of having great application in engineer-ing application, such as biomedical imaging, microscopy andsome other engineering applications.

REFERENCES

[1] H. Kagiwada, R. Kalaba, S. Timko, and S. Ueno, “Associate memoriesfor system identification: Inverse problems in remote sensing,” Mathe-matical and Computer Modelling, vol. 14, pp. 20-202, 1990.

[2] R. Zoughi, Microwave non-destructive testing and evaluation principles.Springer Science & Business Media, 2012.

[3] A. Quarteroni, L. Formaggia, and A. Veneziani, Complex systems inbiomedicine. New York, Milan: Springer, 2006.

[4] R. Persico, Introduction to ground penetrating radar: inverse scatteringand data processing. John Wiley & Sons, 2014.

[5] O. M. Bucci, L. Crocco, T. Isernia, and V. Pascazio, “Subsurface inversescattering problems: Quantifying qualifying and achieving the availableinformation,” IEEE Trans. Geosci. Remote Sens., vol. 39, no. 11, pp.2527-2537, 2001.

[6] Y. Wang and W. Chew, “An iterative solution of the two-dimensionalelectromagnetic inverse scattering problem,” Int. J. Imag. Syst. Technol.,vol. 1, no. 1, pp. 100-108, 1989.

[7] W. Zhang and A. Hoorfar, “Reconstruction of two-dimensional permit-tivity distribution with distorted Rytov iterative method,” IEEE AntennasWireless Propag. Lett., vol. 10, pp. 1072-1075, 2011.

[8] W. C. Chew and Y. M. Wang, “Reconstruction of two-dimensionalpermittivity distribution using the distorted Born iterative method,” IEEETrans. Med. Imag., vol. 9, no. 2, pp. 218-225, 1990.

[9] P. M. van den Berg and R. E. Kleinman, “A contrast source inversionmethod,” Inverse Prob., vol. 13, no. 6, p. 1607, 1997.

[10] M. Li, O. Semerci, and A. Abubaker, “A contrast source inversionmethod in the wavelet domain,” Inverse Prob., vol. 29, no.2, p. 025015,2013

[11] P. M. van den Berg and A. Abubakar, “Contrast source inversion method:State of art,” Prog. Electromagn. Res., vol. 34, pp. 189-218, 2001.

[12] P. Mojabi and J. LoVetri, “Overview and classification of some regu-larization techniques for the Gauss-Newton inversion method applied toinverse scattering problems,” IEEE Trans. Antenna Propag., vol. 57, no.9, pp. 2658-2665, 2009.

[13] P. Mojabi and J. LoVetri, “Comparison of TE and TM inversions inthe framework of the Gauss-Newton method,” IEEE Trans. AntennaPropag., vol. 58, no. 4, pp. 1336-1348, 2010.

[14] L. Pan, K. Agarwal, Y. Zhong, S. P. Yeo, and X. Chen, “Subspace-basedoptimization method for reconstructing extended scatterers: Transverseelectric case,” J. Opt. Soc. Amer. A, Opt. Image Sci., vol. 26, no. 9, pp.1932-1937, 2009.

14

[15] X. Chen, “Subspace-based optimization method for solving inversescat-tering problems,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 1, pp.42-49, 2010.

[16] X. Ye, X. Chen, Y. Zhong, and K. Agarwal, “Subspace-based optimiza-tion method for reconstructing perfectly electric conductors,” Progressin Electromagnetic Research, vol. 100, pp. 119-128, 2010.

[17] P. Rocca, M. Benedetti, M. Donelli, D. Franceschini, and A. Massa,“Evolutionary optimization as applied to inverse scattering problems,”Inv. Prob., vol. 25, no. 12, pp. 123003.1-123003.41, Dec. 2009.

[18] P. Rocca, G. Oliveri, and A. Massa, “Differential evolution as appliedto electromagnetics,” IEEE Antennas Propag. Mag., vol. 53, no. 1, pp.38-49, 2011.

[19] A. Massa, P. Rocca, and G. Oliveri, “Compressive sensing in electro-magnetics - a review,” IEEE Antennas Propag. Mag., vol. 57, no. 1, pp.224-238, 2015.

[20] G. Oliveri, M. Salucci, N. Anselmi, and A. Massa, “Compressive sensingas applied to inverse problems for imaging: theory, applications, currenttrends, and open challenges.,” IEEE Antennas Propag. Mag., vol. 59,no. 5, pp. 34-46, 2017.

[21] X. Chen. Computational methods for electromagnetic inverse scattering.Hoboken, NJ, USA: Wiley, 2018.

[22] L. Li, H. Zheng, and F. Li, “Two-dimensional contrast source inversionmethod with phaseless data: TM case,” IEEE Trans. Geosci. RemoteSens., vol. 47, no. 6, pp. 1719-1736, 2009.

[23] L. Crocco, “Inverse scattering from phaseless measurements of the totalfield on a closed curve,” J. Opt. Soc. Amer. A, Opt. Image Sci., vol. 20,pp. 622-631, 2006.

[24] G. Franceschini, M. Donelli, R. Azaro, and A. Massa, “Inversion ofphaseless total field data using a two-step strategy based on the iterativemultiscaling approach,” IEEE Trans. Geosci. Remote Sens., vol. 44, no.12, pp. 3527-3539, 2006.

[25] A. Litman and K. Belkebir, “Two-dimensional inverse profiling problemusing phaseless data,” J. Opt. Soc. Amer. A, Opt. Image Sci., vol. 23,no. 11, pp. 2737-2746, 2006.

[26] T. Takenaka, D. J. N. Wall, H. Harada, and M. Tanaka, “Reconstructionalgorithm of the refractive index of a cylindrical object from the intensitymeasurements of the total field,” Microw. Opt. Technol. Lett., vol. 14,no. 3, pp. 182-188, 1997.

[27] G. Gbur and E. Wolf, “Hybrid diffraction tomography without phaseinformation,” J. Opt. Soc. Amer. A, Opt. Image Sci., vol. 19, no. 11, pp.2149-2202, 2002

[28] M. H. Maleki, A. J. Devaney, and A. Schatzberg, “Phase-retrievaland intensity-only reconstruction algorithms from optical diffractiontomography,” J. Opt. Soc. Amer. A, Opt. Image Sci., vol. 10, no. 5,pp. 1086-1092, 1993

[29] O. Bucci, L. Crocco, M. DUrso, and T. Isernia, “Inverse scattering fromphaseless measurements of the total field on open lines,” J. Opt. Soc.Amer. A, Opt. Image Sci., vol. 23, no. 10, pp. 2566-2577, 2006.

[30] L. Li, W. Zhang, and F. Li, “Tomographic reconstruction using thedistorted Rytov iterative method with phaseless data,” IEEE Geosci.Remote Sens. Lett., vol. 5, no. 3, pp. 479-483, 2008.

[31] A. A. Govyadinov, G. Y. Panasyuk, and J. C. Schotland, “Phaseless threedimensional optical nanoimaging,” Phys. Rev. Lett., vol. 103, no. 21, p.213-901, 2009.

[32] T. Takenaka, D. J. N. Wall, H. Harada, and M. Tanaka, “Reconstructionalgorithm of the refractive index of a cylindrical object from the intensitymeasurements of the total field,” Microw. Opt. Technol. Lett., vol. 14,no. 3, pp. 182-188, 1997.

[33] G. Hislop, G. C. James, and A. Hellicar, “Phase retrieval of scatteredfields,” IEEE Trans. Antennas Propag., vol. 55, no. 8, pp. 2332-2341,2007.

[34] G. Franceschini, M. Donelli, R. Azaro, and A. Massa, “Inversion ofphaseless total field data using a two-step strategy based on the iterativemultiscaling approach,” IEEE Trans. Geosci. Remote Sens., vol. 44, no.12, pp. 3527-3539, 2006.

[35] L. Pan, Y. Zhong, X. Chen, and S. P. Yeo, “Subspace-based optimizationmethod for inverse scattering problems utilizing phaseless data,” IEEETrans. Geosci. Remote Sens, vol. 49, no. 3, pp. 981-987, 2011.

[36] L. Li, L. G. Wang, F. L. Teixeira, C. Liu, A. Nehorai, and T. J. Cui,“DeepNIS: Deep neural network for nonlinear electromagnetic inversescattering,” IEEE Trans. Antennas Propag, vol. 67, no. 3, pp. 1819-1825,2019.

[37] Z. Wei and X. Chen, “Deep-learning schemes for full-wave nonlinearinverse scattering problems,” IEEE Trans. Geosci. Remote Sens., vol.57, no. 4, pp. 1849-1860, 2019.

[38] Z. Wei and X. Chen, “Physics-inspired convolutional neural network forsolving full-wave inverse scattering problems,” IEEE Trans. AntennasPropag., vol. 67, no. 9, pp. 6138-6148, 2019.

[39] Y. Sanghvi, Y. Kalepu, and U. Khankhoje, “Embedding deep learning ininverse scattering problems,” IEEE Trans. Comput. Imag, early access,2019.

[40] R. Guo, X. Song, M. Li, F. Yang, S. Xu, and A. Abubakar, “Superviseddescent learning technique for 2-D microwave imaging,” IEEE Trans.Antennas Propag, vol. 67, no. 5, pp. 3550-3554, 2019.

[41] A. Massa, D. Marcantonio, X. Chen, M. Li, and M. Salucci, “DNNsas applied to electromagnetics, antennas, and propagation - a review,”IEEE Antennas Wireless Propag. Lett., vol. 18, no. 11, pp. 2225-2229,2019.

[42] A. Massa, G. Oliver, M. Salucci, N. Anselmi, and P. Rocca, “Learning-by-examples techniques as applied to electromagnetics,” J. Electromagn.Waves Appl., vol. 32, no. 4, pp. 516-541, 2018.

[43] B. Zhang, and H. Zhang, “Recovering scattering obstacles by multifre-quency phaseless far-field data,” J. Comput. Phys., vol. 345, pp. 58-73,2017.

[44] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deepnetwork training by reducing internal covariate shift,” Proc. Int. Conf.Mach. Learn., 2015, pp. 448-456.

[45] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networksfor biomedical image segmentation,” Proc. 18th Int. Conf. Med. ImageComput. Comput.-Assist. Intervent. (MICCAI), 2015, pp. 234-241.

[46] X. Chen, “Application of signal-subspace and optimization methods inreconstructing extended scatterers,” J. Opt. Soc. Amer. A, Opt. ImageSci., vol. 26, no. 4, pp. 1022-1026, 2009.

[47] Y. Zhong and X. Chen, “An FFT twofold subspace-based optimizationmethod for solving electromagnetic inverse scattering problems,” IEEETrans. Antennas Propag., vol. 59, no. 3, pp. 914-927, 2011.

[48] J.-M. Geffrin, P. Sabouroux, and C. Eyraoud. Free space experimentalscattering database continuation: Experimental set-up and measurementprecision. Inv. Probl., vol. 21, no. 6, pp. S117-S130, 2005.

Documents

Deep Learning-Based Inversion Methods for Solving Inverse