42
1 Reliability Growth Planning: Its Concept, Applications, and Challenges Tongdan Jin Assistant Prof. of Industrial Engineering Ingram School of Engineering Texas State University-San Marcos November 11, 2010

Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

1

Reliability Growth Planning: Its Concept, Applications, and Challenges

Tongdan Jin

Assistant Prof. of Industrial Engineering Ingram School of Engineering

Texas State University-San Marcos

November 11, 2010

Page 2: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

2

Contents

•  RGT vs. RGP

•  Design for Reliability

•  New Reliability Monitoring Metrics

•  Reliability Growth under Budget Constraints •  Conclusion

Page 3: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

3

RGT vs. RGP

Design and Development

Prototype and Pilot Phase

Volume Production, Field Use and End of Life

Product Life Cycle

Reliability Growth Testing (RGT)

Reliability Growth Planning (RGP)

Page 4: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

4

Why Need RGP? •  Design Cycle Shrinks

•  Cut-off of Testing Budget

•  Different Design/Development Schedule

Automatic Test Equipment

Basic subsys 1

Basic subsys 2

time

Basic subsys 3

Basic design Volume manufacturing and shipping

Adv. subsys 4

Adv. subsys 5

Adv. subsys 6

t1 t2 t3 t4t0

Figure 3 Compressed System Design Cycle

Page 5: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

5

System Reliability vs. Shipment

MTBF

System Installs

Syst

em M

TB

F

Fiel

d Sy

stem

Pop

ulat

ions

Chronological Time

Target MTBF

Page 6: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

6

Design for

Reliability

software

mfg

process NFF

Driving Reliability Growth

optimization

budget

failure mode pareto

Reliability Growth Planning Across Lifecycle Time

design hardware CA effectiveness

Note: mfg=manufacturing, NFF=no fault found, CA=corrective action

Page 7: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

7

Topic One: Design for Reliability •  Component/Hardware Failures

•  Non-Component Failures Design weakness Software failures Manufacturing defects Process/handling issues No-fault-found (NFF)

Page 8: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

8

System Failure Mode Categories

Failures Breakdown by Root-Cause Catagory

0%

10%

20%

30%

40%

50%H

ardw

are

Des

ign

Mfg

Pro

cess

Sof

twar

e

NFF

(com

pone

nts)

A

B

C D

Page 9: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

9

Different MTBF Scenarios

Time

Target MTBF

Page 10: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

10

Modeling Hardware Failure Rate

RFTQET πππππλλ 0=

R

FT

Q

E

T

πππππλ0 = base failure rate.

= temperature factor. = electrical stress factor. = quality factor. = fault tolerance factor. = redundancy factor.

For a given design, play essential roles in the actual component reliability.

ET ππ ,

Page 11: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

11

Aggregate Failure Rate for Hardware

∑=∑===

k

iEiTiii

k

iiihw nn

10

1ππλλλ

][][][1

0 Ei

k

iTiiihw EEnE ππλλ ∑=

=

∑==

k

iEiTiiihw n

1

20

2 )var()var( ππλλ

Where

k = number of types of devices used in the product.

ni = quantity of ith type of device used in the product.

0i = base failure rate for ith type of device.

ASIC Temperature Distribution

0

2

4

68

10

12

14

<65 [65, 70)[70, 75)[75, 80)[80, 85)[85, 90) >90

Degree in Celsius

Qua

ntity

00.010.020.030.040.050.060.070.08

pdf

histogrampdf

Page 12: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

12

Challenges in Modeling Non-Hardware Failures

1.  Quite often data is not well recorded

2.  Varies from one product line to another

3.  Process related

4.  Design experience

5.  Other random factors

Page 13: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

13 Triangle Models for Non-Hardware Failures

⎪⎪⎪

⎪⎪⎪

≤<−−−

≤≤−−−

=

otherwise

bcbabbc

caaabac

g

0

)())((

2

)())((

2

)( λλ

λλ

λ

a = the smallest possible value of the failure rate b = the largest possible value of the failure rate c = the most likely value, and c=3 -b-a = is the sample mean for the dataset

λλ

Where:

a bcλ

g(λ)

h

Page 14: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

14 Example for Non-Hardware Failure Estimate

Example: Based on historical data of predecessor products, it shows failure rates pertaining to manufacturing issues are (faults/hour): 1.210-6, 1.410-6 and 2.4 10-6. Then : = (1.210-6+1.410-6 +2.3 10-6)/3=1.610-6 a = 1.210-6

b = 2.4 10-6

c = 1.310-6

λ

Page 15: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

15 Combining HW and Non-HW Failure Rate

∑+++++==

k

iiiopmsdsys n

1λλλλλλλ

Where: d = failure rate of design weakness s = failure rate of software m = failure rate of manufacturing p = failure rate of process o = failure rate of other issues (e.g. NFF)k= total number of HW component types i = failure rates for component type i

Page 16: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

16 Confidence Intervals for Failure Rate

∑+++++==

k

iiiopmsdsys n

1λλλλλλλ

∑+++++==

k

ii iopmsdsysn1

22222222λλλλλλλ σσσσσσσ

sysλ sysλσ2sysλσ2−

Page 17: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

17

Application to Reliability Design (cnt’d) 51013.1][][ −

− ×=+= HWnonHWsys EE λλµ

112 1023.2)var()var( −− ×=+= HWnonHWsys λλσ

µsys 51043.2 −×

0.3%

Page 18: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

18

MTBF with 99.7% Confidence

%7.99}Pr{ ≥≥ tMTBF

%7.99}1Pr{ ≥≤tsysλ

MTBF(99.7%) =41,115 hours

MTBFSYS1=λ

MTBF Estimate with Confidence Neutral MTBF Estimate

The mean of PCB failure rate is 1.1310-5 faults/hours

MTBF=1/(1.1310-5 ) =88,100 hours

Page 19: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

19

Topic Two:

Failure Mode Rate &

Failure-In-Time

Page 20: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

20

Pareto Chart for Failure Modes

Difficulties: •  Static View

•  No Trend of Each Failure Mode

•  Fail to Reflect Product MTBF

Pareto by Failure Mode From January to March

02468

101214

Rel

ays

Res

isto

rs

No

Faul

tFo

und

Col

dS

olde

r

Sof

twar

eB

ug

Op-

Am

p

Qty

0%

20%

40%

60%

80%

100%

No C/AC/A In ProcessC/A CompletePercentage

Pareto Chart by Failure Mode From April to June

048

1216202428

Op-

Am

p

Res

isto

rs

Col

dS

olde

r

Rel

ays

softw

are

bug

No

Faul

tFo

und

Qty

0%

20%

40%

60%

80%

100%

No C/AC/A In ProcessC/A CompletePercentage Note: C/A= corrective action

Page 21: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

21

Failure Mode Rate (FMR)

onsinstallatiproductfieldFMoftypeaforfailures=FMR

Page 22: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

22

FMR Estimation: Example

For example: Assuming 120 PCBs were shipped and installed in the field in the first quarter, 5 failures returned due poor solder joints, then the FMR for poor solder joints in the first quarter is

quarterboardfaultsFMR //042.01205 ==

oninstallatiproduct fieldFMoftypeaforquantityfailure=FMR

Page 23: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

23 FMR Run Chart

Failure Mode Rate (FMR) by Quarter

0.00

0.01

0.02

0.03

0.04

0.05

0.06

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

Failu

res

Per

Boa

rd

0

50

100

150

200

250

300

350

400

Cum

ulat

ive

PC

B S

hipm

entrelays

resistor

Op-amp Product shipment

Page 24: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

24 Estimate MTBF using FMR Chart

Quarters 1st 2nd Qtr 3rd 4th Cumulative Shipment 120 200 220 264

Cum Run Hours 262,080 436,800 480,480 576,576 Cum FM rate 0.117 0.150 0.057 0.051

Defective Boards 14 30 12 13 MTBF (hours) 18720 14560 38541 42856

13 Weeks Rolling MTBF

0

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

45,000

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

MTB

F (h

ours

)

0

10

20

30

40

50

60

70

80

90

100

Failu

res

Defective Boards

MTBF

Page 25: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

25

Estimate for PCB Failure Rate

∑+++++=

×=

∑+++++=

=

=

k

iiopmsdsys

k

iiiopmsdsys

FITFITFITFITFITFITFIT

FIT

n

1

91

10λ

λλλλλλλ

Notice

Where d = failure rate of design errors s = failure rate of software bugs m = failure rate of manufacturing p = failure rate of process o = failure rate of other issues i = failure rates for component type i k= total number of new component types ni= quantity of component type i used in the product

Page 26: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

26

FIT-Based Reliability Driven: Example (1)

FM Category Target MTBF (hrs) Target FIT

Overall Product 50,000 20,000

Components (hardware) 117,647 8,500

Others (NFF) 250,000 4,000

Design 333,333 3,000

Manufacturing 500,000 2,000

Process 666,667 1,500

Software 1,000,000 1,000

MTBFFIT

910Notice =

Page 27: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

27

FIT-Based Reliability Driven: Example (2) Product Target

FIT Categorical FM FIT Failure Mode Target FIT Current FIT Ownership

PCB (20,000)

Component (8,500)

Relay 2,000 2,491 Tom

Op-Amp 3,000 4,097 Jones

Resistor 1,500 2,786 Carlos

DC-DC converter 800 1,393 Jesson

ASICs 1,200 1,716 Jim

Design (3,000)

Eng Change Order 1,300 2,383 David

FPGA Rev Upgrade 900 1,643 Kim

Change relay type 800 1,498 John

Manufacturing (2,000)

cold Solder 1,600 3,092 Tony

backward component 250 355 Joe

Faked component 150 255 Paul

Process (1,500) broken part 700 942 Jen

Missing part 300 447 Chris

OES 500 515 Andrew

Software (1,000) Sever bugs 200 398 Eileen

Medium bugs 400 665 Ed

Trivial bugs 400 497 Eric

Others (4,000) NFF 3,000 457 Mark

PCFD 1,000 1,669 Jeff

Page 28: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

28

Topic Three:

Reliability Growth Prediction &

Corrective Action under Budget/Cost Constraints

Page 29: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

29

Crow/AMSAA Growth Model

∑ ⎟⎟⎠

⎞⎜⎜⎝

⎛=

=

N

i i

s

tt

N

1ln

β̂β

α ˆˆstN=

1ˆˆˆ −= ββαλ tFailure Intensity:

22/1,2ˆ

2θχ

β −< NN 2

2/,2ˆ2

θχβ NN >Reject H0

Where

Hypothesis Testing: H0: β=1, HPP

H1: β1, NHPP

or

0

1

2

3

4

5

6

0 1 2 3 4 5

Failu

re In

tens

ity

Time

Various Failure Intensity Models

beta 1beta 0.5beta 1.5

=1 for all

ts=termination time, ti=ith failure arrival time

Page 30: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

30

An Example

797.0ln

ˆ

1

=∑ ⎟⎟⎠

⎞⎜⎜⎝

⎛=

=

N

i i

s

tt

0266.0ˆ ˆ ==β

αstN

N=10 Cumulative

FailuresFailure Arrival Time (hours)

Interarrival Time (hours)

ln(ts/ti)

1 67 67 3.232 150 83 2.433 234 84 1.984 360 126 1.555 533 173 1.166 720 187 0.867 912 192 0.628 1102 190 0.439 1345 243 0.2310 1632 287 0.04ts 1700 sum 12.55

Page 31: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

31

Failure Modes (FM) Pareto Chart

Cumulative operating time is 4800 hours, total failures is 14. Current MTBF=4800/14=343 hours.

Which FM should be fixed? Given limited budget.

Given $10 budget for corrective actions. Option one: Fix relays MTBF=4800/(14-2.5) =417 hours Option two: fix all others MTBF=4800/(14-9) =960 hours

Page 32: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

32

New Reliability Growth Model

1.  Failure mode based growth prediction

2.  Reliability growth subject to CA budget constraints

3.  No assumption of parametric models

4.  CA effectiveness function

Page 33: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

33

Limit Recourses ($) Spent on CA due to

1.  Retrofit 2.  ECO

Maximize Reliability

Growth

CA Effectiveness Function

Why Need the CA Effectiveness Function?

Page 34: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

34 An Example: ECO or Retrofit

A type of relays used on a PCB module fails constantly due to a known failure mechanism. Two options available for corrective actions 1.  Replace all on-board relays upon the failure return of the

module 2.  Pro-actively recall all modules and replace with new types

of relays having much higher reliability

CA Option Cost ($) CA Effectiveness

ECO Low Low

Retrofit High High

Page 35: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

35

0 c

x

1

effe

ctiv

enes

s

b

cxxh ⎟⎠⎞⎜

⎝⎛=)(

h(x)

CA budget ($)

Effectiveness Model

b>1 b=1

b<1

Modeling CA Effectiveness

b and c to be determined

Effectiveness= Failure rate before CA – Failures rate after CA

Failure rate before CA

Page 36: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

36 An Example

The current failure rate a type of relay is 210-8 faults per hour. Upon the implementation of CA, the rate is reduced to 510-9. The CA effectiveness can be expressed as 0.75, that is

75.0102

1051028

98

×−×−

−−

Page 37: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

37 Incorporate h(x) into System Failure Rate

)()()(11ttnt

m

kii

k

iiis ∑+∑=

+==λλλ

b

cxxh ⎟⎠⎞⎜

⎝⎛=)(

∑ −+∑ −=+==

m

kiiii

k

iiiiiCAs txhtxhnt

11, )())(1()())(1()( λλλ

HW Non-HW

Page 38: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

38

Making The Prediction via MS Excel (I)

Week  No. 1 2 3 4 5 6 7 8Cum  Failures  

by  FM Cum  Opting  Hours 1680 3360 5040 6720 8400 10080 11760 134407 Replay 2 0 1 0 3 0 1 06 resistors 1 1 0 1 0 2 0 14 op-­‐amp 0 0 0 1 0 1 1 15 capacitor 1 0 0 1 1 0 0 22 design  error 0 0 1 0 0 0 1 04 software  bugs 1 0 0 0 1 1 1 06 cold  solder 0 2 0 1 0 0 1 22 bad  process 0 0 1 0 0 0 0 14 NFF 1 2 0 0 0 1 0 0

Latent  Failure  Modeweekly  cum  failures 6 5 3 4 5 5 5 7

Actual  MTBF 280 305 360 373 365 360 356 336

Page 39: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

39

Making The Prediction via MS Excel (II)

Week  No. 1 8 9 10 11 12 13 14 15 16Required  Budget  ($)

Cost  for  fix  FM  ($) Target  FM  %

Cum  Failures  by  FM Cum  Opting  Hours 1680 13440 15120 16800 18480 20160 21840 23520 25200 26880

150 300 50% 7 Replay 2 0 1.0 0 0.5 0 1.5 0 0.5 0500 500 0% 6 resistors 1 1 0 0 0 0 0 0 0 0100 200 50% 4 op-­‐amp 0 1 0 0 0 0.5 0 0.5 0.5 0.5350 350 0% 5 capacitor 1 2 0 0 0 0 0 0 0 0700 700 0% 2 design  error 0 0 0 0 0 0 0 0 0 0125 250 50% 4 software  bugs 1 0 0.5 0 0 0 0.5 0.5 0.5 0100 100 0% 6 cold  solder 0 2 0 0.0 0 0.0 0 0 0.0 0.00 50 100% 2 bad  process 0 1 0 0 1.0 0 0 0 0 1.0225 450 50% 4 NFF 1 0 0.5 1.0 0 0 0 0.5 0 0

Latent  Failure  Mode 0.3 0.2 0.2 0.1 0.3 0.2 0.2 0.2weekly  cum  failures 6 7 2 1 2 1 2 2 2 2

Actual  MTBF 280 3362250 Predicted  MTBF 336 318 348 372 404 420 439 457 473

Page 40: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

40

Reliability Growth Planning Process

SystemManufacturer

Repair Center

In-serviceSystems

Stocks

RetrofitTeam

Retrofit Loop

ECO Loop1. Failure analysis2. CA decisions3. Reliability prediction

ECO=Engineering Change Order CA=Corrective Actions

FRACA

Page 41: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

41

Conclusions 1.  Design for reliability (DFR) should incorporate hardware and

non-hardware issues along with the variation of the failure rates.

2.  Trade-off should be made between the reliability growth and the associated availability of CA resources.

3.  The CA effectiveness function links the CA budget with the expected failure mode reduction rate.

4.  A reliability database system such as FRACAS is essential for performing RGP.

Page 42: Reliability Growth Planning: Its Concept, Applications ...asq.org/.../reliability-growth-planning-its-concept-applications-and-challenges.pdfReliability Growth Planning: Its Concept,

42

Thanks ! &

Questions/Comments ?