Bonding quality monitoring applying statistical modeling of Scanning White Light Interferometry data

www.elsevier.com/locate/mee

Microelectronic Engineering 84 (2007) 2757–2768

Bonding quality monitoring applying statistical modelingof Scanning White Light Interferometry data

Robert Schafer a,*, Henri Seppanen b, Ivan Kassamakov b,c,Edward Hæggstrom b, Peter Hautpmann a

a Institute of Micro and Sensor Systems, University of Magdeburg, P.O. Box 4120, 39016 Magdeburg, Germanyb Department of Physical Sciences, University of Helsinki, Helsinki, P.O. Box 64, FIN-00014, Finland

c Helsinki Institute of Physics, University of Helsinki, FIN-00014, Finland

Received 20 February 2007; received in revised form 22 May 2007; accepted 7 June 2007Available online 13 June 2007

Abstract

Scanning White Light Interferometry (SWLI) is an optical and hence non-destructive solution to precisely measure surface profiles,while bonding profiles yield information about the bond quality as determined by the pull force required to break the mechanical inter-connection. This work applied statistical models relating the pull force to bond profile data obtained by SWLI. The models permit rapid,non-destructive pull force estimation. The approach was tested on samples that were cut from parts intended for the ALICE/ITS/SSDmodule production (A Large Ion Collider Experiment at CERN LHC/Inner Tracking System/Silicon Strip Detector). The achieved pre-diction accuracy improved by 60% compared to the approach of Kassamakov et al. in 2007, where the pull force was predicted merely bybonding process parameters. The robustness of different model estimates, given by Ordinary Least Squares (OLS), Principle ComponentRegression (PCR), and Partial Least Squares (PLS) was compared. The predictive power of different model input parameters was ana-lyzed using Ridge Traces. The experimental results were discussed and related to bonding physics.� 2007 Elsevier B.V. All rights reserved.

Keywords: Bonding; Interferometry; Statistical modeling; Pull force estimation

1. Introduction

An important quality parameter of bondings is the pullforce, required to mechanically disconnect the lead fromthe pad. Consequently, controlling the quality of a bondingprocess concerns assuring that the pull force exceeds a crit-ical level. Direct pull force measurement destroys the inter-connection. Therefore, traditional quality monitoring testssome samples out of a production batch providing a statis-tical estimate of failure probability. This method showstwo major drawbacks. First, a certain number of producedmicroelectronic devices are lost due to destructive testing.Secondly, there is no online-measurement examining each

0167-9317/$ - see front matter � 2007 Elsevier B.V. All rights reserved.doi:10.1016/j.mee.2007.06.002

* Corresponding author. Tel.: +49 391 6712681; fax: +49 391 6712609.E-mail address: [email protected] (R. Schafer).

produced bonding. Hence, there is a risk of undetecteddefects.

The pull force depends on physical conditions at thevery bond site. Modeling the dependence of directly mea-sured parameters characterizing the physical state of aninterconnection and a resulting pull force permits indirectpull force measurement. Characterizing parameters areprovided by Scanning White Light Interferometry (SWLI)in terms of optically measured profile data, because themechanical strength of a bonding is systematically relatedto its geometric profile. Hence, SWLI provides non-destructively measured data characterizing the pull force.

A physical model of general applicability describing thepull force as function of profile data is not available, sincethis relationship is specific to the bonding process as well asto the bonding tool. Therefore, statistical modeling wasapplied approximating the pull force as a linear combina-

mailto:[email protected]

2758 R. Schafer et al. / Microelectronic Engineering 84 (2007) 2757–2768

tion of characterizing parameters and accordant coeffi-cients. This model structure has two advantages. First, itis independent of changing bonding process conditions. Amodified process only needs to recalibrate the coefficients.Secondly, multiplying two vectors, parameters and coeffi-cients, can be done fast. Thus, combining statistical model-ing and Scanning White Light Interferometry yields anon-destructive method of indirect pull force measurementshowing online-capabilities.

A similar approach was proposed by Kassamakov et al.in 2007 [1], where the pull force was predicted by bondingprocess parameters. The current paper demonstrates thatthe prediction error can noticeably be reduced by addition-ally considering SWLI profile data. Two different analysesof SWLI data, cross-section analysis and histogram analy-sis, were explored. The applicability of the most commonlinear model estimates in terms of Ordinary Least Squares(OLS), Principle Component Regression (PCR), and Par-tial Least Squares (PLS) was compared. Ridge Traces wereused to examine the predictive power of single profile char-acteristics. The obtained results were discussed and relatedto bond physics.

2. Non-destructive online-measurement of SpTAB profile

data

2.1. Single point tape automated bonding (SpTAB)

In the SpTAB process, an aluminium (Al) lead is bondedto the Al base with an ultrasonic bonder [2]. BumplessSpTAB connects thin Al leads to an Al pad. During bond-ing the TAB tool connects a single lead and presses it againstthe pad, accompanied by ultrasonic excitation. During theprocess, lead and pad connect after the native Al oxide isbroken and cleaned [3]. Two bonders were used: (1) asemiautomatic bonder (Kulicke&Soffa 4523AD) [4] withan ‘‘x’’-shaped tool [5], relying on three bonding parame-ters: Bonding Force (BF), Ultrasonic Power (USP), andUltrasonic Time (UST), and (2) an automatic bonder(F&K Delvotec 6400) [6] with a ‘‘+’’-shaped tool [5]. Thisbonder uses four bonding parameters: BF, USP, TouchDown Force (TDF), and Program Deformation (PRD).

Fig. 1. SWLI images of SpTAB samples. The left image represents the profilelead between the polyimide layer and base (III). Section 2 is the deformation cawere determined. The right image represents the profile obtained for the ‘‘+’’-shthe polyimide layer (II). Section 3 shows the lead area deformed by the bondinindicate the cross-sections, whereas the dotted square shows the area of the h

2.2. Scanning White Light Interferometry

Scanning White Light Interferometry (SWLI) is a well-established method for surface profiling [7]. Interferencefringes on a surface were created by a microscope objective(Nikon 50·) equipped with a built-in Mirau interferometer.The interference fringes occurred only in those areas of thesample where the optical path difference between the inter-ferometer arms was smaller than the coherence length ofthe light source. The white light interference fringes arelocalized in space and therefore provide absolute location.The height information is contained in the fringe visibility,from which the height can be extracted. A closer view ofSWLI operation is given by [8]. The SWLI instrumentapplied for the measurements reported herein, providing3D images of the SpTAB, was built at the University ofHelsinki [9]. The lateral resolution and pixel size is(0.3 · 0.3 lm2) whereas the depth resolution is 10 nm. Var-ious analyses of the 3D image can be done. We used cross-section and histogram analysis in this work deriving lateraland depth parameters of the 3D image to find a parameterset which is adequate to predict the pull force.

2.3. Cross-section analysis

Cross-section analysis provides lateral and depth infor-mation about the 3D image. Cross-sections along the linesindicated in the SWLI images yielded xy-curves with sam-ple lateral dimension on the x-axis and vertical dimensionon the y-axis (cf. Fig. 1). Height and width parameters usedin statistical modeling were determined from the xy-curve.The cross-section locations were chosen to provide postbonding data about the bond and lead state in locationsthat, by deformation and fracture mechanics theory ofthe metals [10], were expected to be critical indicators ofthe bond strength. During the analysis the final cross-sec-tion locations were chosen manually. The repeatability ofthe analysis methods is nearly equal but compared to histo-gram analysis, cross-section analysis is less robust againstnoise or small height variations, e.g. in the base level defi-nition of the image. Therefore median filtering was used inthe cross-section but not in the histogram analysis. As an

obtained for the ‘‘x’’-shaped tool, denoted sample class A. Section 1 is theused by the bonding tool. Five lines indicate where the lead cross-sectionsaped tool, denoted as sample class B. Here, Section 1 is the lead residing ong tool. Section 4 represents the base where the bond is connected. Six linesistogram analysis.

R. Schafer et al. / Microelectronic Engineering 84 (2007) 2757–2768 2759

example, Fig. 2 explains the cross-section related parame-ters of the line E shown in the right image of Fig. 1.

2.4. Histogram analysis

The histogram analysis, detailing the number of pixelsrepresenting a certain height, provides bond height/depthinformation. The histogram, which is robust against imagenoise, also provides the total isoaltitude surface (number ofpixels, i.e. area residing at a certain depth) without positioninformation. Histograms were taken from the region ofinterest (ROI) of the SWLI images. The ROI for the histo-gram analysis was chosen manually in such a way that thearea covered the deformed lead. As an example, Fig. 3shows the pixel height histogram of the area marked inthe right image of Fig. 1.

The repeatability of the SWLI measurements was char-acterized by a relative deviation of 0.4–7.7%. The repeat-ability analysis was performed on one randomly selected

Fig. 2. Cross-section on the line E, shown in Fig. 1 (right: sample class B),provides width (in lm) and height (in lm) information about the bond.For sample class B the cross-section analysis give several parameters: baseheight (I,V), lead width (I,V), left, and right bond height (II, IV) and totalheight (III). The base height is used as origin.

Fig. 3. Pixel height histogram in the area marked in Fig. 1 (right).Histogram analyses were done only on sample class B. Symbols marked inthe figure show the histogram points that were used to analyze thehistogram. Point O indicates base height (lm). Pixels with heights belowthe base were not used in the analysis. Points I, II and III were used tocalculate heights of the bond. Point I shows the lowest height of the bond,whereas the area deformed by the bonding tool resides between I and II.Point III shows the highest parts of the bond whereas the ‘+’-shaped areaof the bond resides between II and III. For the factor analysis we usedpoint height, number of pixels and maximum value of number of pixelsbetween points I–II and II–III. Also the area related parameters, numberof pixels between I–II and II–III, were used. The average pixel noise wascalculated in point IV.

bond from which five SWLI pictures were obtained andanalyzed once, providing the SWLI reading and analysisprocedure compounded variability estimate sCVE. Further-more one of these pictures was randomly selected and ana-lyzed three times, providing the analysis procedurevariability estimate sAVE. The standard deviation wasrecorded as a measure of the variability. It was assumedthat the total parameter variability emerges from two inde-pendent sources, namely the SWLI instrument reading var-iability (including sample arrangements) and the variabilityemerging from the SWLI picture analysis procedure.Therefore by subtracting sAVE from sCVE one obtainsan estimate for the SWLI reading variability sRVE ¼ðs2

CVE � s2AVEÞ

0:5. The variability estimates of the cross-sec-tion data and the histogram data as well as the SWLI mea-surement and the SWLI analysis procedures are shown inTable 1.

2.5. Samples

The main SpTAB parameters are dimensions and mate-rials of lead and pad, bonding geometry from cross-sec-tions and histogram analyses as well as processparameters [2]. We used two sample classes with certaindifferences in bonding process, bonding geometries, sampledimensions and analyzing process (cf. Fig. 4). In class A)the lead width was 82 lm, the polyimide cable was placedon top of the Al lead, and the bonding was lead-to-lead,whereas in class B) the lead width was 34 lm, and thepolyimide cable was sandwiched between the Al lead andthe Al pad.

As samples we used cables, subhybrids and sensors thatwere cut from parts intended for the ALICE/ITS/SSDmodule production (A Large Ion Collider Experiment atCERN LHC/Inner Tracking System/Silicon Strip Detec-tor) [11]. Our samples were cut smaller to facilitate testing.To avoid artifacts, parts of the samples near the cuttingedge were not used for the tests. The samples were handledcarefully to avoid random, spatial or time dependent con-tamination. Bonds were done in a clean room. The sampleswere handled by teflon tweezers and stored in separate

Table 1Relative deviations for measurements and analysis

Value type Minimum relativedeviation in %

Maximum relativedeviation in %

SWLI measurement(sCVE)

0.4 7.7

SWLI analysis (sAVE) 0.4 4.6Height measurements

(cross-section)0.4 3.5

Width measurements(cross-section)

0.4 2.0

Height measurement(histogram)

0.7 7.7

Area measurement(histogram)

0.7 2.3

Fig. 4. Bonding geometries in sample classes B (left) and A (right). Thelight gray indicate polyimide layers, the dark gray aluminum layers andthe bottom hatching silicon (left) or carbon fiber stiffener (right). Dottedlines indicate top Al layer bending during the bonding process. Left: Onthe top a (14 ± 0.5) lm thick and (34 ± 2) lm wide pure Al lead on a12 lm thick polyimide flex was connected to an Al pad on a silicon sensor.Right: On the top a 12 lm thick polyimide flex resided on a 14 lm thickand 82 lm wide pure Al lead. The lead was connected to another 30 lmthick lead on a 20 lm thick polyimide that was connected to the carbonfiber stiffener.


closed boxes while not operated. For this reason, chemicaland surface analyses, e.g. EDX, SIMX, XPS, are notwithin the scope of this paper. Possible impact of contam-ination and variation of material composition, considerednegligible for the samples regarded herein, is aimed to beexplored in another contribution.

2.6. Measurements

All bonds were created and tested with a similar proce-dure. (1) A single lead was bonded to the pad applyingbonding parameters defined by Bonding Force (BF), Ultra-sonic Power (USP), and Ultrasonic Time (UST) for sampleclass A, as well as BF, USP, Touch Down Force (TDF),and Program Deformation (PRD) for sample class B. Rely-ing on earlier experience [1] and on the experimental designoutput the bonding parameters were chosen to cover aparameter space that was known to be reasonably closeto optimal values. (2) A SWLI image of the single bondwas taken and (3) the bond was pull tested with definedpull speed and pull direction.

The bonds were non-destructively analyzed by SWLI toobtain various geometrical parameters. Cross-section andhistogram analyses were used. Five cross-sections forsample class A and six cross-sections as well as one histo-gram for sample class B were analyzed from a single 3DSWLI image. The procedure to collect parameters from aSWLI image for the factor analysis comprises four steps:(1) image tilting to make the base level horizontal. (2)Defining the histogram area manually to cover thedeformed bond area and saving data for analysis (onlyfor sample class B). (3) Applying 3 · 3 median filteringtwice to reduce the image noise level. (4) Manually definingcross-section lines for the bonding and the lead as well assaving it for factor analysis.

3. Indirect pull force measurement

Kassamakov et al. [1] had shown that there is an under-lying dependence of measured pull force and bond processparameters. Modeling this dependence permits indirect pull

force estimation. More accurate models can be obtained byadditionally considering SWLI profile data. The informa-tion gain obtained by combining SWLI and process param-eters was examined comparing the predictive power of pureprocess data and pure SWLI profile data with the predic-tive power of a combined data set.

3.1. Linear regression

There is no physical model of general applicabilitydescribing the pull force, F, as a function of directly mea-sured parameters, x = [xn] j n = 1 . . . N, characterizing thephysical state of the bonding, since F(x) is specific to boththe bonding process and the applied bonding tool. In thiscase statistical modeling in terms of linear regression is apossible solution. The regression model

F ðxÞ ¼X

n

anxn þ e; ð1Þ

approximates the underlying dependence of the pull forcefrom characterizing parameters as a linear combinationof descriptors, x, and coefficients, A = [an], whereas e de-notes Gaussian white noise. The coefficients can be cali-brated given a set of K known references, (xk;Fk) j k =1 . . . K, yielding the estimate bA. The pull force is interpo-lated within an interval, [min(Fk); max(Fk)], determinedby the calibration. Given measured descriptors,Xm,n = xn(Fm) j m = 1 . . . M, the pull force Fm is estimatedby

F ¼ ½F m� ¼ XbA: ð2Þ

3.2. Descriptor sets and data preparation

Section 2 explained how SpTAB profile data wasobtained using Scanning White Light Interferometry.Cross-section analysis and histogram analysis were usedto derive a computable number of significant parametersfrom the recorded 3D images. These geometric dataformed one major set of predicting groups. Together withthe corresponding process parameters (BF, USP, UST,TDF, and PRD) as a second major set, the complete setsfor sample classes A and B were obtained as shown inTables 2 and 3, respectively.

The tables define subsets of data, called groups, whichsort single parameters either by their position within thearea scanned by SWLI or by their belonging to a certaintype of data (e.g. histogram parameter, cross-sectionalparameter or process parameter). Local mappings ofgroups, B–F for sample class A, and C–H for sampleclass B, are indicated in Fig. 1. Depending on the applieddescriptors different prediction types were declared, sinceSWLI and process parameters are data of different kind.Process conditions represent bonding process configura-tion data while geometric parameters are the result of abonding process. Type (I) means that X comprises onlybond process parameters, yielding a comparable situation

Table 2Parameter set of sample class A

M Group Parameter m Group Parameter m Group Parameter

1 A Bond force 10 D Hole width 19 E Lateral offset of the hole2 A Ultrasonic power 11 D Lateral offset of the hole 20 E Fitted bond height3 A Ultrasonic time 12 E Lead thickness 21 F Centre part width4 B Height 13 E Left hole lowest height 22 F Left hole width5 B Width 14 E Right hole lowest height 23 F Right hole width6 C Height 15 E Left hole averaging height 24 F Left hole depth7 C Width 16 E Right hole averaging height 25 F Right hole depth8 D Width 17 E Width 26 F Left spike height9 D Bond height 18 E Centre part width 27 F Right spike height

Groups: (A) bond process parameters; (B) cross-section 1; (C) cross-section 2 (D) cross-section 3; (E) cross-section 4; (F) cross-section 5.Type (I) = A; Type (II) = B–F; Type (III) = A–F.

Table 3Parameter set of sample class B

m Group Parameter m Group Parameter m Group Parameter

1 A Bond force 18 B Point III height 35 F Left bond height2 A Ultrasonic power 19 B Point III pixels 36 F Right bond height3 A Touchdown force 20 B Point IV height 37 F Left bond depth4 A Program deformation 21 B Point IV pixels 38 F Right bond depth5 B I–II pixels (area) 22 C Width 39 G Width6 B II–III pixels (area) 23 C Height 40 G Height7 B Analyzed pixels (area) 24 C Base level 41 G Left bond height8 B Base O height 25 D width 42 G Right bond height9 B Base O pixels 26 D base level 43 G Left bond depth

10 B Point I height 27 E width 44 G Right bond depth11 B Point I pixels 28 E height 45 H Width12 B I–II maximum height 29 E Left bond height 46 H Height13 B I–II pixels 30 E Right bond height 47 H Left bond height14 B Point II height 31 E Left bond depth 48 H Right bond height15 B Point II pixels 32 E Right bond depth 49 H Left bond depth16 B II–III maximum height 33 F Width 50 H Right bond depth17 B II–III pixels 34 F Height

Groups: (A) bond process parameters; (B) histogram data; (C) cross-section 1; (D) cross-section 2; (E) cross-section 3; (F) cross-section 4; (G) cross-section5; (H) cross-section 6.Type (I) = A; Type (II-H) = B; Type (II-C) = C–H; Type (III) = A–H.


as described in [1]. Type (II) indicates that only geometricdata obtained by SWLI is used. For sample class B, his-togram and cross-sectional data are distinguished by type(II-H), and (II-C), respectively. Type (III) uses all avail-able data giving an idea about the information gainedwhen type (I) data is used in combination with type (II)data.

Directly measured data blocks in terms of descriptormatrix X and pull force references, F = [Fk], can showuncorrelated offsets, DX and DF. Estimating the coefficientsand subsequently predicting the indirectly measured vari-able applying biased data normally reduces the predictivepower of a regression model. Therefore, an essential datapreparation step is centering descriptors and modeled vari-ables. This is done by subtracting the mean value of eachcolumn of both blocks, X and F, from the respective col-umn. Furthermore, each column is scaled to unit length.This is done to prevent that any column is a priori givena greater importance than others when modeling. Scalingis important when predicting the pull force by SWLI data,

because the descriptors are composed of diverse parameterswhich may lead to matrix columns showing large differ-ences in variance. In the following, X and F are assumedto be centered and scaled. Accordingly, the modeled vari-able needs to be de-scaled (multiplying each column bythe respective standard deviation stored earlier), and de-centered (adding the according offset) in order to obtainthe original pull force range.

3.3. Comparing different linear model estimates

The prediction according to Eq. (2) needs a prior estima-tion of the linear model in a one-time calibration step. Abrief outline of the most important linear models obtainedby Ordinary Least Squares (OLS) [12], Principle Compo-nent Regression (PCR) [13], and Partial Least Squares(PLS) [14], comparing their predictive power when estimat-ing the pull force, is presented in the following. A detailedview of linear modeling theory is given by the literature[15–17].


In our experiment we measured KA = 124 and KB = 144pairs of describing parameters and an accordant pull forcevalue, (xk;Fk) j k = 1 . . . KA,B, in case of sample class A andB, respectively. Data of odd measurement number

X0

F0

� �¼½X2j�1;n�½F2j�1�

� �j ¼ 1 . . .

K2

�� ; ð3Þ

were used to calibrate a certain linear model using OLS,PCR or PLS. Data of even measurement number remaineda priori unknown to the calibrated model. These data wereused to estimate an error benchmark indicating the predic-tive power of the model. In the following the predictivepower of a model estimate, bA, is benchmarked by relativedeviation

srel ¼s

F

¼ffiffiffiffiK2

r XK2

m¼1

F 2m

0@ 1A�1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXK

2

m¼1

F 2m �XN

n¼1

anx2m;n

!2vuut ; ð4Þ

defined as quotient of standard deviation estimate, s, of thepredicted pull force compared to destructively measuredreferences, F2m, over the mean pull force. Defining theerror benchmark according to Eq. (4) permits to comparepredictions covering different pull force ranges as obtainedfor both sample classes (class A: 30 . . . 170 mN; class B:35 . . . 82 mN).

If the descriptor matrix has full column rank, the modelestimate can be computed bybAOLS ¼ XT

0 X0

� ��1XT

0 F0: ð5Þ

In case of linear-dependent columns of the descriptor ma-trix, a reduced set of linear combinations has to be applied.This is usually done by a Principle Component Regression(PCR) where the Principle Components, N = X0C, of thedescriptors are used to run the inversion, yieldingdiag�1(k) = (NT N)�1 instead of the inverted covariancematrix ðXT

0 X0Þ�1. C and k are the matrix of Eigenvectorsand the vector of Eigenvalues of the covariance matrixCXX ¼ XT

0 X0, respectively. The according PCR estimate ofthe linear model isbAPCR ¼ Cdiag�1ðkÞCTXT

0 F0: ð6ÞThe PCR estimate is optimal when it can be guaranteedthat the most significant principle components explain themajor part of variance occurring in F. Otherwise the

Table 4Prediction errors of OLS, PCR and PLS estimates at different data types and

Data type OLS PCR

Class A (%) Class B (%) Class A (

I 19.3 16.9 19.1II 15.8 – 16.1II-C – 13.4 –II-H – 11.0 –III 16.5 14.5 13.6

cross-covariance CFX ¼ FT0 X0 should be considered. This

is efficiently done by Partial Least Squares (PLS). PLS iter-atively projects the descriptors onto an orthogonal basis, T,spanned by the PLS components. At each iteration stepone PLS component in terms of a new column of T is com-puted as product of the most significant Eigenvector of CFX

and the error between X0 and its projection onto T of theprevious step. This difference equals X0 initially. The subse-quent cross-covariance matrix is computed using the errorof X0 and its projection onto T of the current step (defla-tion). Deflation assures the orthogonality of T as well asa decreasing error while iterating. Literature [15] showsthat T can be projected onto X0. The respective coefficientsare R=W(PTW)�1, whereas the columns of W hold thecomputed dominant Eigenvectors of CFX. P ¼ XT

0 -TðTTTÞ�1 yields the transposed coefficients of the projec-tion of X0 onto T. The PLS estimate, given by

bAPLS ¼ RðTTTÞ�1RTXT

0 F0 ð7Þ

is optimal in a sense that it best explains the cross-covari-ance of X0 and F0. In contrast the PCR estimate aims tooptimally explain the covariance of X0. Table 4 comparesthe prediction error, srel, of the three tested estimates,bAOLS, bAPCR, and bAPLS, for both samples classes, A andB, as well as for the predefined data types, (I), (II), (II-C), (II-H), and (III).

In general, the smallest prediction error was achievedwhen the PLS estimate was applied, followed by thePCR and the OLS estimate. Both process data andSWLI profile data provide descriptors of high multicol-linearity, whereas the descriptive power of a principlecomponent of the descriptor matrix does not necessarilycorrespond to the respective Eigenvalue. Hence, there isa latent structure describing the pull force as a functionof directly measured parameters which is estimated bestby PLS. Partial Least Squares provided an error (srel)as low as 13.4% in case of sample class A and 10.3%in case of sample class B (cf. Fig. 5). In both cases,the best result was obtained when type (III) data wasapplied. That means, there is additional informationavailable when combing both process data and bondingconnection profile data.

Kassamakov et al. reported a similar linear model in [1]applying solely process parameters in a quadratic polyno-mial fit to predict the pull force, whereas the goal was tooptimize the process configuration in terms of a maximum

sample classes

PLS

%) Class B (%) Class A (%) Class B (%)

16.9 19.2 16.7– 14.5 –12.8 – 12.410.7 – 11.210.6 13.4 10.3

Fig. 5. Predicted vs. reference pull force in case of sample class A and Bwhen PLS derives the respective model estimate.

Fig. 6. Prediction error of the OLS, PCR, and PLS estimates as a functionof Gaussian white measurement noise added to the descriptor data ofparameter types (II), (II-C), (II-H), and (III).


pull force. The predicted pull force values compared todestructively tested reference (mean) values reported in[1] showed an error of 33.4% according to Eq. (4). Weregard this prediction error as reference, since experimentalconditions, sample types and quality reported herein arecomparable to the SpTAB process reported by Kassama-kov. The reference error of 33.4% was achieved using a‘‘x’’-shaped tool, herein denoted as sample class A. Hence,this value must be compared with the smallest predictionerror of sample class A, which is 13.4%. In conclusion,combining bonding process parameters and SWLI profiledata of a bonding connection reduces the error by approx-imately 60% when predicting the pull force using a linearmodel estimate obtained by PLS.

Table 4 also shows that when solely applying profiledata, a cross-sectional analysis is less predictive than a his-togram analysis. This result was achieved for all regardedestimates. Comparing both sample classes, a smaller pre-diction error in case of class B samples than in case of classA samples was observed. This is reasoned by the fact thatthe pull force range of class B (35–82 mN) is smaller thanthe range of class A (30–170 mN). Considering a compara-ble number of references uniformly distributed over theregarded range for both classes leads to a finer resolutionin case of class B than in case of class A, yielding moredetailed information when calibrating the linear model.This gives a more accurate prediction in case of class Bthan in case of class A.

Type (III) data leads to an increased error compared totype (II) data when OLS is applied. In contrast, addi-

tional information is extracted by PCR and PLS whencombining process data and SWLI data. The PrincipleComponents holding the direction of the major variancein X are rotated with respect to a Cartesian basis,spanned by the unity matrix I, when switching from adescriptor matrix of data type (II) to data type (III).When using the projection of F onto X, as done byOLS, the significant variances representing F aredecreased along the vectors of I due to the rotation. Thiscauses an increased model error of the OLS estimatewhen combining both profile data and process data. How-ever, also rotating the basis yielding the Principle Compo-nents, N, of the combined set extracts the significantvariances holding more detailed information about thepull force than type (II). This is the reason why type(III) data improves the prediction accuracy using PCRand PLS, while OLS leads to an increased predictionerror. This observation was independent of the sampleclass.

3.4. Measurement noise

An important issue concerning the applicability of linearmodeling is the model robustness against Gaussian whitenoise added to directly measured, predicting data. In orderto examine noise sensitivity of the OLS, PCR and PLS esti-mates obtained by Eqs. (5)–(7), respectively, as well as ofthe noise sensitive data types (II), (II-C), (II-H), and(III), a random error was added to the respective descriptormatrix


Xr ¼ X½Iþ EðrÞ�jE � Nð0; r2IÞ: ð8ÞParameter r was varied in a range of 0–100%. Fig. 6 shows

the model error of the three estimates, bAOLS, bAPCR, andbAPLS, as a function of Gaussian white noise, added to thedescriptor data according to Eq. (8). The model error ofeach prediction was computed separately for the regardeddata types. In general, the estimates show a comparablerobustness against measurement noise. However, thereare slight differences depending on the applied data type.Predictions using histogram data (II-H) are a little lessnoise-sensitive (smaller slope) at higher noise levels thanpredictions applying cross-sectional data (II-C). Thenoise-sensitivity of type III is comparable to that of typeII-H data. In conclusion, the regarded estimates show anacceptable robustness against noise. They are expected alsoto work under industrial conditions where measured sam-ples are part of a production batch.

4. Descriptor analysis

The previous sections have shown that statistical model-ing is an adequate tool to predict the maximum sustainablepull force of bonding connections relying on SWLI data.So far, a coarse classification of parameters has been real-ized by introducing different types of predicting data. Itturned out that histogram parameters predict the pull forcerather well, leading to noise-robust models. However,regarding the quality of bonding connections, it is conceiv-able that some parameters are of greater importance thanothers.

An approximate guess of the significance of each param-eter affecting bond quality yields two benefits. First, oneobtains valuable information, such as implications regard-ing bonding tool geometry and process parameter set, thatmay be used to optimize the bonding process. Second, onemay generate a modified descriptor matrix with less impor-tant parameters left out. This reduces computation effortsdramatically, so that a low cost microprocessor or a FPGAmight be used to realize a hardware implementation of thestatistical model in an industrial application. In this con-text, the question is: How to measure the importance ofeach descriptor?

There are a number of variable selection proceduresreported on in the literature, such as Forward SelectionProcedures, Backward Selection Procedures, or StepwiseMethods [18,19]. All those methods are recommended tobe used cautiously with respect to the individual regressionproblem. Answering this question for our regression prob-lem is rather difficult due to the multicollinearity of thedescriptors. Variable selection using Ridge Regression[22,23] appears to be an adequate tool, especially for themulticollinear case, because Ridge Regression generatesan equation with stable coefficients in a sense that theyare unaffected by slight variations in the estimation data.Comparably to that, the objectives of a good variable selec-tion are to point out a set of variables that provides a clear

understanding of the studied process and to obtain anequation with good prediction power. It appears that theobjectives of Ridge Regression and variable selection fit.Consequently, Ridge Regression should be applicable toprocess variable selection. This is done by examining theRidge Trace,bAðbÞ ¼ ðXT

0 X0 þ bIÞ�1XT

0 F0; ð9Þwhich is a plot of regression coefficients against the Ridgecoefficient b = 0 . . . 1. The Ridge Trace is used to eliminatesingle variables from the descriptor set considering the fol-lowing criteria:

1. Elimination of descriptors whose coefficients anðbÞ arestable but small. Due to centering and scaling, the mag-nitude of various coefficients is comparable.

2. Elimination of descriptors with unstable coefficientstending to zero. Those descriptors do not maintain theirpredictive power.

The second criterion has a qualitative characteristic.Unstable coefficients tending to zero are identifiable. Weselected the absolute ratio of the maximum coefficientvalue, max½anðbÞ�, over the coefficient value at maximumb anð1Þ, to decide whether a coefficient maintains its predic-tive power or not. Each ratio is compared to a classifyinglevel

rcritical ¼1

N

Xn

max½anðbÞ�anð1Þ

�� : ð10Þ

Coefficients showing a ratio above this level are classified‘‘insignificant’’ due to instability, otherwise they are consid-ered ‘‘significant’’. The first criterion, to some degree,shows sensitivity to subjective appreciation in a sense, thatthere is no general definition of ‘‘small coefficients’’. Select-ing a certain threshold might be a possible solution. Higherthresholds would eliminate more descriptor variables dueto insignificance while smaller thresholds would accept alarger descriptor set. We selected the average of the abso-lute coefficient values at maximum b,

tcritical ¼1

N

Xn

janð1Þj; ð11Þ

as the threshold, since averaging coefficient magnitudes at acertain value of b permits comparing each coefficientbehavior with the characteristic given by the complete setof coefficients. Maximum b was selected to assure thatunstable coefficients do not dominate the threshold. Coef-ficients showing an absolute value above this level are clas-sified ‘‘significant’’, provided that also the second criterionclassified them ‘‘significant’’. Otherwise they are classified‘‘insignificant’’.

Fig. 7 shows the Ridge Traces of coefficientsanðbÞjn ¼ 1 . . . 17, obtained for the histogram data of sam-ple class B, corresponding to data group B (cf. Table 3,parameter numbers m = 5 . . . 21). Table 5 indicates the

Fig. 7. Ridge Traces of coefficients anðbÞjn ¼ 1 . . . 17 corresponding to thehistogram data of sample class B. The marked traces (n = 5,6,7,8,15)above the critical threshold, tcritical, indicate coefficients of high predictivepower. These coefficients are related to significant parameters holding themajor part of information about the pull force.


state of each descriptor variable, where ‘‘s’’ stands for sig-nificant and ‘‘i’’ represents insignificance. When using the‘‘s’’-marked histogram parameters only, the model erroraccording to Eq. (4) was: 12.3% (OLS), 11.9% (PCR),and 12.1% (PLS), which is comparable to the predictionresults obtained with the complete type (II-H) parameterset. In contrast, when using the ‘‘i’’-marked histogramparameters only, the model error was: 16.5% (OLS),16.4% (PCR), and 16.3% (PLS). These results confirm thedistribution of predictive power over the parameter set asit has been obtained from the descriptor analysis usingRidge Traces as described above.

5. Discussion

In this section we discuss the obtained results. We com-pare the achieved pull force estimates to those achieved

Table 5Significance of each descriptor variable (data type II-H, sample class B) obtai

n m State n m State n

1 5 i 5 9 s 92 6 i 6 10 s 103 7 i 7 11 s 114 8 i 8 12 s 12

Index n denotes the nth parameter of class B data type II-H, corresponding towhile state ‘‘i’’ denotes insignificant parameters.

earlier [1] and we provide a physical explanation why cer-tain analyzed parameters worked well. The aim was toidentify whether linear regression can be used with opticalnon-destructive quality control parameters to predict thepull force of a SpTAB bonding connection. Three estimatesof statistical modeling (OLS, PCR, PLS) were tested ana-lyzing optical parameters obtained by SWLI together withthe bonding process parameters. A destructively measuredbond pull force value was used as a reference. Two differentkinds of SpTAB samples (classes A and B) were analyzedseparately. Because the sample classes differ in terms ofdimension, geometry, bonding devices, bonding parame-ters, and bonding wedges, we claim that the sample classesare separate. Since the applied techniques of sample prep-aration, optical measurement (SWLI), and numerical anal-ysis are equal for both sample classes, the results arecomparable and indicate that the proposed method mightfeature general capability to predict bond quality.

The importance of this research is increased by the factthat the samples were part of the ALICE/ITS/SSD mod-ules, which means that the method works for real worldsamples. Moreover, the quality control methods describedhere have been used to test modules during production,which means that the test rate and prediction accuracyare enough for demanding applications. The samples werehandled carefully to avoid bending, twisting or contamina-tion, which could affect the results. The samples were cutand the part close to the cut line was not used in the tests.Therefore we consider the samples to be unbiased represen-tatives of units encountered in high-tech production.

The capability of SWLI to measure hole depth permitsto use it for SpTAB bond height measurements. Eventhough 2D optical methods [20], relying on one or morecameras, have been used to measure wire dimensionsand deformation in wire bond characterization, thismethod can not provide the bond depth and histogramparameters that are important in our case. However, thespeed of the SWLI setup is slightly lower than that of2D systems, since during 3D scanning several 2D framesare taken and analyzed. SWLI scan parameters are 1Dor 2D physical distances or areas or single coordinatepoints. Distances and areas can be used in the analysisas they are, but absolute coordinate points must be com-pared to some kind of origin, e.g. in sample class B, thebase height can be compared to the polyimide baseheight. The bond is caused by ultrasonic excitation and

ned by Ridge Traces

m State n m State n m State

13 i 13 17 i 17 21 i

14 i 14 18 i

15 i 15 19 s

16 i 16 20 i

parameter m shown in Table 3. State ‘‘s’’ denotes significant parameters


pressure of the bonding wedge. The bonding wedgedeforms the lead, which makes it possible to predict thebond strength by lead deformation. The results shownin Table 4 confirm that SWLI profile data give an excel-lent representation of the pull force for both sample clas-ses. When using reduced sets of linear combinations interms of PCR and PLS the prediction accuracy wasimproved by combining bonding process parameters andmeasured SWLI profile data. In contrast, the OLS esti-mate yielded an increased prediction error when type(III) data was applied. As shown in Fig. 5 for the PLSestimate, when predicting the pull force of class B sam-ples, there are outliers in a small range around 50 mN.This observation was independent of the linear model esti-mate. Hence, they might be explained by a systematicerror when measuring the pull force or the profile datain that range.

Generally, the difference in model errors between sampleclasses can be due to several reasons. Since in the automaticbonders the bond controlling system controls variousbonding parameters in real time this kind of bonder mightcreate more reproducible bonding connections than semi-automatic bonders. A higher reproducibility of differentbond profiles clarifies the distinctive geometrical character-istics being responsible for the stability of the connection.Another possible reason – outside the scope of this paper– is the difference in bond breaking mechanisms, ‘lift off’and ‘cut off’, [1] due to different tool geometries. In caseof ‘cut off’, the relation lead cut and lead deformation ismore direct. On the other hand, with ‘lift off’, the strengthof the bonded region can not directly be appreciated opti-cally and one must rely on the fact that there exist a causalrelation between bond deformation and bonded region.Sample class A exhibits more ‘lift offs’ than ‘cut offs’ andvice versa for sample class B.

The results, provided in Fig. 5 and Table 4 as well as thedescriptor analysis of Section 4, show for each sample classthat, statistically, it is highly unlikely that the pull forceprediction is either material or surface state dependent.Since we obtained different prediction errors when varyingthe type of describing SWLI data, it is reasonable toassume that the measured surface geometry determinesthe pull force. If the measurements would depend on arti-facts in terms of material or surface impurity, first, similarresults would be obtained for every of the regarded datatypes and, second, the prediction error is expected to be sig-nificantly larger due to a weak or vanishing correlationbetween bond geometry and pull force.

In the following, we propose a physical explanation forwhy certain parameters worked well (classified significantin Table 5). Parameter 9 (further parameter numberingrelates to index m of Tables 3 and 5), Base O pixels: Thisparameter shows the maximum number of pixels corre-sponding to a certain Al bond pad height, parameter 8.The value of parameter 8 was determined by maximizingthe value of parameter 9, tilting the sample using the anal-ysis software. Parameter 8 served as a reference for all

other bond height measurements. Physically parameter 9is related to the bond pad roughness and to the area seenin the SWLI image. The more the bond spreads out acrossthe bond pad the more the parameter 9 value decreases. Alarge bond can have a large bonded interface which canincrease the maximum sustained pull force compared to asmall bond. On the other hand, a large spread out candecrease the lead cross-section, which increases the proba-bility of ‘cut off’ during a pull test and hence decreases thesustained pull force. Since the bond pad width and initiallead width are within certain limits, and since these dimen-sions are unrelated to the bonding process, the value ofparameter 9 can change if either the bonded lead area(parameters 5 + 6) or the analyzed area (parameter 7)change. The analysis shows that parameter 9 better thanparameters 5, 6 and 7 predicts the pull force. The reasonfor this is still unclear. The surface roughness varies littleacross samples.

Parameter 10, Point I height: This histogram analysisparameter shows the height above the bond pad. Parameter10 shows the lowest point of the lead’s top surface and ishence related to the tool deformation depth (indentationinto the lead). This depth and lead cross-section area arerelated as are also the bond height and the bonded area. Boththe cross-section and the bonded area are related to thepull force. This is the physical explanation why parameter10 estimates the pull force.

Parameter 11, point I pixels: This parameter shows thenumber of pixels residing on the specific height recordedby parameter 10. The physical reason why parameter 11predicts pull force is unclear. However, since this param-eter along with parameter 10 defines the height coordinateof the lowest point on the top surface of the bond lead(indentation), cf. Fig. 3, and since it was probed from amanually selected point on the histogram, parameter 11might correlate with the reliability of the parameter 10value. A large value of parameter 11 might indicate thateither the chosen parameter 10 evaluation point on thehistogram was poorly selected, or that noise or bond tipcaused grooves or increased surface roughness have madethe parameter 10 value less significant for the analysis.

Parameter 12, I–II maximum height: This parameterindicates which height comprises the maximum numberof pixels between the pre-defined points I and II. Physicallyparameter 12 describes the mean height of the indentedcavities above the bond pad (actually which height withinthe boundaries of the indentation cavities exhibits the larg-est projection onto the bond pad). The physics associatedwith this parameter is closely related to that conveyed byparameter 10 and it seems that parameters 10 and 12 areexchangeable because both strongly depend on the needleshape. However, since the parameter value is an averageof several pixels and since it was selected automatically itis more robust than e.g. parameter 10.

Parameter 19, point III pixels: Parameter 19 shows thenumber of pixels in the edge area of the bond lead, cf.Fig. 1. A physical explanation why this parameter predicts


the pull force well is unavailable. However, parameter 19indicates whether the bonding tool has introduced an edgebetween the deformed and undeformed part of the lead. Asharp and deep edge can decrease the sustained pull force,reducing the parameter 19 value.

We propose the following physical explanation why theSWLI-PLSR concept produced fairly accurate pull forcepredictions. The pull strength of a bonding depends on leadcross-section and the area bonded at the interface [1]. Sincethe lead material was the same for all samples, the mea-sured pull force differences between samples can be consid-ered to be due to differing lead geometries. To maximize abond’s pull strength therefore means to maximize thesmallest cross-section in the pull direction as well as thebonded area. However, during the ultrasonic excitationthe bonding wedge deforms the lead, which decreases thecross-section and increases the bonded area. Low bondingparameters (small force, power and time) cause no leaddeformation, i.e. the cross-section strength is high, butsince the bonded area is small the total pull strength ofthe bond is low. High bonding parameters, on the otherhand, produce a large bonded area, but deform the lead,reducing the cross-section, and hence the bond’s pullstrength is low. As a result, intermediate process parame-ters provide maximum pull strength. However, the bondingprocess itself exhibits effects which can cause differences inbond pull strength even for constant machine processparameters. Such confounders are e.g.: contaminated bondsurfaces, scratched leads, cantering of the wedge on thelead or the initial height of the lead from the pad. The influ-ence of these effects alters the post bonding bond geometry.A bonding surface contamination can cause slip betweenthe bond interfaces during the bonding so that the wedgedoes not deform the lead as much as without slip. On theother hand, if the wedge position on the lead change, thebond cross-section can change, too. As an example, boththese effects alter the pull strength of the bond and makeit possible to differentiate the geometry of a damaged andnormal bond. The 3D SWLI image of the bond recordsthe geometrical differences between the bonds, whereas sta-tistical modeling extracts the most significant geometryparameters that predict maximum pull force. This paperdeals only with SpTABs used with a few sample kinds.The generalizability of the reported results is only as largeas are the variations in the investigated process parameters,samples and devices. For example, we have not tested if thesame results can be obtained for different materials or bondtypes. This needs to be done.

The present research has shown which modeling methodto use and which one to avoid, which SWLI parameters tocombine, and what weights to give to the parameters inorder to achieve the best correlation with the pull forcevalue of the bond. We also know what prediction accuracyto expect and what the impact of noise is. All this knowl-edge can be used to non-destructively predict the pull forcevalue of a SpTAB bond more precisely than earlier. Hence,it could also find use in zero-defect engineering.

6. Conclusion

The results obtained from the experiment analyzed inthis paper show that profile data of a SpTAB bondingconnection, measured by SWLI, relates to the pull forcerequired to break the bonding connection. Statisticalmodeling can predict the pull force indirectly from geo-metrical bond data. The regression model approximatesthe relationship between the tool-specific bond padgeometry and physical conditions of the bonding connec-tion, whereas the pull force depends on the physical con-ditions. Comparing cross-sectional and histogram data,the area-integrating histogram data hold the most signif-icant information explaining the pull force. Due to itsintegrating characteristic, histogram data is more noise-robust than cross-sectional data. Additional informationabout the bond strength is provided when profile dataand process parameters are combined. In conclusion,the process parameters do not only affect the geometricdata obtained by SWLI but also other parameters whichwere not measured directly, e.g. material parameters ofthe bond pad. Further work needs to deal with this issuemore in detail. The best prediction was obtained apply-ing PLS estimates and a combined set of process andprofile data. Comparing to earlier publication by Kass-amakov et al. the linear modeling reported hereinimproves the prediction accuracy by 60%. Combiningstatistical modeling and SWLI yields a non-destructivetechnique to indirectly measure the pull force of aSpTAB bonding.

Based on the results presented by this contribution, itwould be interesting to broaden to other cases, such aswedge or ball bondings, considering different material com-positions as well as bonding process parameters.

Acknowledgements

We would like to thank the ALICE collaboration(CERN) and V. Borshchov (SRTIIE) for kindly providingsamples for these investigations. V. Borshchov and hisgroup merit special thanks for valuable assistance duringthe measurements.

References

[1] I.V. Kassamakov, H.O. Seppanen, M.J. Oionen, E.O. Hæggstrom,J.M. Osterberg, J.P. Aaltonen, H. Saarikko, Z.P. Radivojevic,Microelectron. Eng. 84 (2007) 114–123.

[2] J.H. LauHandbook of Tape Automated Bonding, vol. 2, VanNostrand, New York, 1992, pp. 10.

[3] G.G. Harman, J. Albers, IEEE Trans. Parts, Hybrids, PackagingPHP-13 (December) (1977) 406–412.

[4] Kulicke & Soffa Industries, http://www.kns.com, accessed February2007.

[5] Gaiser Tools Company, <http://www.gaisertool.com/catalog/7.pdf>,accessed February 2007.

[6] F&K Delvotec Bondtechnik GmbH, <http://www.fkdelvote-cusa.com>, accessed February 2007.

[7] L. Deck, P. de Groot, Appl. Optics 33 (31) (1994) 7334–7338.

http://www.kns.com

http://www.gaisertool.com/catalog/7.pdf

http://www.fkdelvotecusa.com

http://www.fkdelvotecusa.com


[8] J.C. Wyant, Proc. SPIE 4737 (July) (2002) 98–107.[9] J.P. Aaltonen, I.V. Kassamakov, R. Kakanakov, M.V. Luukkala,

TUKEVA Research Program on Future Mechanical Engineering2000–2003, November 2003, pp. 120–126.

[10] R.W. Hertzberg, Deformation and Fracture Mechanics of Engineer-ing Materials, fourth ed., Wiley, New York, 1996.

[11] M. Oinonen, et al., ALICE silicon strip detector module assemblywith single-point TAB interconnections, 11th Work. on Electr. forLHC and Future Exper. LECC 2005, (12–16 September) Heidelberg,Germany, 2005, pp. 92–97.

[12] P. Brown, Measurement Regression and Calibration, Oxford Uni-versity Press, Oxford, 1993, pp. 33–38.

[13] I.T. Joliffe, Principle Component Analysis, second ed., Springer, NewYork, 2002, pp. 167–198.

[14] S. Wold, H. Martens, The multivariate calibration problem inchemistry solved by the PLS method, in: H. Wold, A. Ruhe, B.

Kagstrom (Eds.), Lecture Notes in Mathematics: Matrix Pencils,Springer, Heidelberg, 1983, pp. 286–293.

[15] A. Phatak, S. De Jong, J. Chemometrics 11 (1997) 311–338.[16] S. Wold, M. Sjostrom, L. Eriksson, Chemom. Intell. Lab. Sys. 58

(2001) 109–130.[17] G. Seber, Linear Regression Analysis, Wiley, New York, 1977, pp.

42–54.[18] S. Chatterjee, A.S. Hadi, B. Price, Regression Analysis by Example,

third ed., John Wiley & Sons, New York, 2000, p. 13.[19] P.H. Westfall, S.S. Young, D.K.J. Lin, Statist. Sinica 8 (1998) 101–117.[20] A. Khotanzad, H. Banerjee, M.D. Srinath, IEEE Trans. Semicond.

Manuf. 7 (4) (1994) 413–422.[22] R.N. Draper, F. Pukelsheim, Technometrics 44 (3) (2002) 250–259,

ISSN 0040–1706 CODEN TCMTA2.[23] G.H. Golub, Matrix Computations, second ed., Hopkins University

Press, Baltimore, London, 1998, pp. 565–566.

Documents

Bonding quality monitoring applying statistical modeling of Scanning White Light Interferometry data