Click here to load reader

Statistical Tools for Multivariate Six Sigma

  • View
    24

  • Download
    2

Embed Size (px)

DESCRIPTION

Statistical Tools for Multivariate Six Sigma. Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc. The Challenge. The quality of an item or service usually depends on more than one characteristic. - PowerPoint PPT Presentation

Text of Statistical Tools for Multivariate Six Sigma

  • Statistical Tools for Multivariate Six SigmaDr. Neil W. PolhemusCTO & Director of DevelopmentStatPoint, Inc.

  • The ChallengeThe quality of an item or service usually depends on more than one characteristic.

    When the characteristics are not independent, considering each characteristic separately can give a misleading estimate of overall performance.

  • The SolutionProper analysis of data from such processes requires the use of multivariate statistical techniques.

  • OutlineMultivariate SPC Multivariate control charts Multivariate capability analysisData exploration and modelingPrincipal components analysis (PCA)Partial least squares (PLS) Neural network classifiersDesign of experiments (DOE)Multivariate optimization

  • Example #1Textile fiber

    Characteristic #1: tensile strength - 115 1

    Characteristic #2: diameter - 1.05 0.05

  • Sample Datan = 100

  • Individuals Chart - strength

  • Individuals Chart - diameter

  • Capability Analysis - strength

  • Capability Analysis - diameter

  • Scatterplot

  • Multivariate Normal Distribution

  • Control Ellipse

  • Multivariate CapabilityDetermines joint probability of being within the specification limits on all characteristics

    Observed

    Estimated

    Estimated

    Variable

    Beyond Spec.

    Beyond Spec.

    DPM

    strength

    0.0%

    0.00307572%

    30.7572

    diameter

    0.0%

    0.00445939%

    44.5939

    Joint

    0.0%

    0.00703461%

    70.3461

  • Multivariate Capability

  • Capability Ellipse

  • Mult. Capability IndicesDefined to give the same DPM as in the univariate case.

    Capability Indices

    Index

    Estimate

    MCP

    1.27

    MCR

    78.80

    DPM

    70.3461

    Z

    3.80696

    SQL

    5.30696

  • Test for Normality

    P-Values

    Shapiro-Wilk

    strength

    0.408004

    diameter

    0.615164

  • More than 2 CharacteristicsCalculate T-squared:

    where

    S = sample covariance matrix

    = vector of sample means

  • T-Squared Chart

  • T-Squared DecompositionSubtracts the value of T-squared if each variable is removed.

    Large values indicate that a variable has an important contribution.

    T-Squared Decomposition

    Relative Contribution to T-Squared Signal

    Observation

    T-Squared

    diameter

    strength

    17

    26.3659

    22.9655

    25.951

  • Control Ellipsoid

  • Multivariate EWMA Chart

  • Generalized Variance ChartPlots the determinant of the variance-covariance matrix for data that is sampled in subgroups.

  • Data Exploration and ModelingWhen the number of variables is large, the dimensionality of the problem often makes it difficult to determine the underlying relationships.

    Reduction of dimensionality can be very helpful.

  • Example #2

  • Matrix Plot

  • Analysis MethodsPredicting certain characteristics based on others (regression and ANOVA)

    Separating items into groups (classification)

    Detecting unusual items

  • Multiple Regression

    MPG City = 29.6315 + 0.28816*Engine Size - 0.00688362*Horsepower - 0.297446*Passengers - 0.0365723*Length + 0.280224*Wheelbase + 0.111526*Width - 0.139763*U Turn Space - 0.00984486*Weight

    Standard

    T

    Parameter

    Estimate

    Error

    Statistic

    P-Value

    CONSTANT

    29.6315

    12.9763

    2.28351

    0.0249

    Engine Size

    0.28816

    0.722918

    0.398607

    0.6912

    Horsepower

    -0.00688362

    0.0134153

    -0.513119

    0.6092

    Passengers

    -0.297446

    0.54754

    -0.543241

    0.5884

    Length

    -0.0365723

    0.0447211

    -0.817786

    0.4158

    Wheelbase

    0.280224

    0.124837

    2.24472

    0.0274

    Width

    0.111526

    0.218893

    0.5095

    0.6117

    U Turn Space

    -0.139763

    0.17926

    -0.779668

    0.4378

    Weight

    -0.00984486

    0.00192619

    -5.11104

    0.0000

    R-squared = 73.544 percent

    R-squared (adjusted for d.f.) = 71.0244 percent

    Standard Error of Est. = 3.02509

    Mean absolute error = 1.99256

  • Principal ComponentsThe goal of a principal components analysis (PCA) is to construct k linear combinations of the p variables X that contain the greatest variance.

    _1210832918.unknown

    _1210832919.unknown

    _1210832917.unknown

  • Scree PlotShows the number of significant components.

  • Percentage Explained

    Principal Components Analysis

    Component

    Percent of

    Cumulative

    Number

    Eigenvalue

    Variance

    Percentage

    1

    5.8263

    72.829

    72.829

    2

    1.09626

    13.703

    86.532

    3

    0.339796

    4.247

    90.779

    4

    0.270321

    3.379

    94.158

    5

    0.179286

    2.241

    96.400

    6

    0.12342

    1.543

    97.942

    7

    0.109412

    1.368

    99.310

    8

    0.0552072

    0.690

    100.000

  • Components

    Table of Component Weights

    Component

    Component

    1

    2

    Engine Size

    0.376856

    -0.205144

    Horsepower

    0.292144

    -0.592729

    Passengers

    0.239193

    0.730749

    Length

    0.369908

    0.0429221

    Wheelbase

    0.374826

    0.259648

    Width

    0.38949

    -0.0422083

    U Turn Space

    0.359702

    -0.0256716

    Weight

    0.396236

    -0.0298902

    First component

    0.376856*Engine Size + 0.292144*Horsepower + 0.239193*Passengers + 0.369908*Length

    + 0.374826*Wheelbase + 0.38949*Width + 0.359702*U Turn Space + 0.396236*Weight

    Second component

    -0.205144*Engine Size 0.592729*Horsepower + 0.730749*Passengers + 0.0429221*Length

    + 0.259648*Wheelbase - 0.0422083*Width - 0.0256716*U Turn Space 0.0298902*Weight

  • Interpretation

  • Principal Component Regression

    MPG City = 22.3656 - 1.84685*size + 0.567176*unsportiness

    Standard

    T

    Parameter

    Estimate

    Error

    Statistic

    P-Value

    CONSTANT

    22.3656

    0.353316

    63.302

    0.0000

    size

    -1.84685

    0.147168

    -12.5492

    0.0000

    unsportiness

    0.567176

    0.339277

    1.67172

    0.0981

    R-squared = 64.0399 percent

    R-squared (adjusted for d.f.) = 63.2408 percent

    Standard Error of Est. = 3.40726

    Mean absolute error = 2.26553

  • Partial Least Squares (PLS)Similar to PCA, except that it finds components that minimize the variance in both the Xs and the Ys.

    May be used with many X variables, even exceeding n.

  • Component ExtractionStarts with number of components equal to the minimum of p and (n-1).

  • Coefficient Plot

  • Model in Original Units

    MPG City = 50.0593 0.214083*Engine Size - 0.0347708*Horsepower

    - 0.884181*Passengers + 0.0294622*Length - 0.0362471*Wheelbase

    - 0.0882233*Width - 0.0282326*U Turn Space - 0.00391616*Weight

  • ClassificationPrincipal components can also be used to classify new observations.

    A useful method for classification is a Bayesian classifier, which can be expressed as a neural network.

  • 6 Types of Automobiles

  • Neural Networks

  • Bayesian ClassifierBegins with prior probabilities for membership in each group

    Uses a Parzen-like density estimator of the density function for each group

  • OptionsThe prior probabilities may be determined in several ways.A training set is usually used to find a good value for s.

  • Output

    Number of cases in training set: 93

    Number of cases in validation set: 0

    Spacing parameter used: 0.0109375 (optimized by jackknifing during training)

    Training Set

    Percent Correctly

    Type

    Members

    Classified

    Compact

    16

    75.0

    Large

    11

    100.0

    Midsize

    22

    77.2727

    Small

    21

    76.1905

    Sporty

    14

    85.7143

    Van

    9

    100.0

    Total

    93

    82.7957

  • Classification Regions

  • Changing Sigma

  • Overlay Plot

  • Outlier Detection

  • Cluster Analysis

  • Design of ExperimentsWhen more than one characteristic is important, finding the optimal operating conditions usually requires a tradeoff of one characteristic for another.

    One approach to finding a single solution is to use desirability functions.

  • Example #3Myers and Montgomery (2002) describe an experiment on a chemical process:

  • Experiment

    run

    time

    temperature

    catalyst

    conversion

    activity

    (minutes )

    (degrees C )

    (percent )

    1

    10.0

    170.0

    2.0

    74.0

    53.2

    2

    15.0

    170.0

    2.0

    51.0

    62.9

    3

    10.0

    200.0

    2.0

    88.0

    53.4

    4

    15.0

    200.0

    2.0

    70.0

    62.6

    5

    10.0

    170.0

    3.0

    71.0

    57.3

    6

    15.0

    170.0

    3.0

    90.0

    67.9

    7

    10.0

    200.0

    3.0

    66.0

    59.8

    8

    15.0

    200.0

    3.0

    97.0

    67.8

    9

    8.3

    185.0

    2.5

    76.0

    59.1

    10

    16.7

    185.0

    2.5

    79.0

    65.9

    11

    12.5

    160.0

    2.5

    85.0

    60.0