• 1. Multivariate StatisticalAnalysisRaul Caetano, M.D., Ph.D.
• 2. Types of AnalysisIn most cases data analysis is done to test the association between two or more variables (bivariate). In health research very frequently the aimis to establish the association between riskfactor and disease or between therapyand outcome (patient oriented research).
• 3. 12-Month Prevalence of DSM-IVAlcohol Dependence by Age: NLAES199218-29 yrs 30-34 yrs 45-64 yrs 65+ yrs AllWomen 6 3 1 .2 1Men 13 6 3 1 6From Grant et al., 1994
• 4. Life is Not SimpleIn reality most statistical analysis test complex relationships in which more than two variables are considered.In health research either multiple risk factors are under consideration or many subject variables must be considered in ascertainingthe outcome of a clinical trial.The ultimate goal of these analysis is either explanation or prediction, i.e., more than just establishing an association.
• 5. Examples of Multivariate Analyses Postulating progression of a specific type of lungcancer as a function of baseline stage ofdiagnostic, prior therapeutic regiments, age, sexamong other clinical and /or demographicfactors. Evaluating the likelihood of domestic violencetaking into account age of the individuals,whether or not they consume alcohol, ethnicbackground and level of education.
• 6. Multivariate Analyses The idea (model) for the first example(progression of lung cancer) in this casewill be: Progression of cancer = overall effect(irrespective of the effect of other factors)+ effect of baseline stage of dx + effect ofprior therapeutic regiments + effect of ageand sex.
• 7. Multivariate Analyses: How The model underlying the different effectsis quantifiably measured for eachindividual. Then the different effects are solved forusing mathematical techniques (calculusand numerical methods) to quantify theireffects and tests for their significance.
• 8. Advantages of MultivariateAnalysesIt assures that the results are not biased and influenced by other factors that are not accounted for. Case in point: if difference in cancer progressiondue to different therapies is not a mere reflectionof differences in prior therapies and or stage ofdiagnostic and or demographic factors.
• 9. Which Technique?Statistical techniques are like tools: the task to be accomplished determines the selection of the tool.The selection of a statistical technique is greatly based on the type of data under analysis (e.g., measurement level, type ofdistribution) and the reason (research question) for the analysis.
• 10. Normal Curve00.050.10.150.20.250 5 10 15 20Choice of Technique and Data Distribution
• 11. Choice of Technique and DataDistribution: Log Normal Curve
• 12. Choice of Technique: Levels(Types) of Data Nominal (Categorical) Measures: Areexhaustive and mutually exclusive (e.g.,religion). Ordinal Measures: All of the above plus can berank-ordered (e.g., social class). Interval Measures: All of the above plus equaldifferences between measurement points(temperature). Ratio Measures: All of the above plus a truezero point (weight).
• 13. The Link Between Measurementand Statistical Analysis A chi square can be used to test theassociation between two nominal levelvariables (e.g., gender and liver cirrhosis). A correlation can be used to assess theassociation between two interval or ratiolevel measures (e.g., height and weight).
• 14. What Type of Measure Should IUse? Whenever possible use interval or ratiomeasures. However, many attributes of individuals(e.g., gender, religion) or health outcomes(cured/not cured) cannot be easilymeasured by interval or ratio data. Thus, nominal or ordinal measures arealso frequently used.
• 15. What Level for MultivariateAnalysis? In the past most techniques required at leastinterval data, that is, all measures in the analysis(including independent and dependentvariables) should have at least equal differencesbetween points on any part of the scale. Nowadays many techniques (e.g., logisticregression) accept categorical, ordinal, intervalor ratio level variables. These techniques arevery popular in health research because of thedifficulty of implementing interval or ratiomeasurements.
• 16. Level of Measurement andMultivariate Statistical TechniqueIndependent Variable Dependent Variable TechniqueNumerical Numerical Multiple RegressionNominal or Numerical Nominal Logistic RegressionNominal or Numerical Numerical (censored) Cox RegressionNominal or Numerical Numerical ANOVA, MANOVANominal or Numerical Nominal (2 or morevalues)Discriminant AnalysisNumerical No DependentVariableFactor and ClusterAnalysis
• 17. Assumptions in Linear Regression1) The relationship under analysis is linear.2) The values of the independent variable arefixed (not random variable).3) The values of the dependent variable arerandom.4) For each value of the independent variable thevalues of the dependent variable are normallydistributed.5) The variance of the dependent variable is thesame for all values of the independentvariable.
• 18. The Basic Regression ModelSimple Linear RegressionY = a + bXY = Dependent variablea = interceptb = slope/regression coefficient (change in Y with a one unit change inX)X = predictor valueMultiple Linear RegressionY = a + b1X1 + b2X2 +b3X3 +
• 19. The Regression LineYXY=a+bXab
• 20. Prevalence of Alcohol Dependence byFrequency of Drinking 5 or More/Day01020304050600 2 4 6 8 10 12MenWomenFrequency of drinking 5 or more per dayFrom Caetano et al., 1997
• 21. Liver-relatedDeathsAlcohol Consumption(per 106people)USAFranceGermanyMexicoUSSRAustraliaGreat BritainCanadaJapan
• 22. Measurements of Total Lifetime Dose of Alcoholand Muscular Strength for 50 Alcoholic MenUrbano Marquez etal., 1989
• 23. Scatterplot of the Total Lifetime Dose of Alcoholvs. Muscular Strength for the Observations
• 24. The Least Squares Regression LineAdded to the ScatterplotRegression Equation: Y = 26.4 - .296X
• 25. Replicating Lifes Complex Relationships inData Analysis: Developing The Model A set of theoretically and or evidencebased associations between variables is amodel. Data analysis tests the extent to which theproposed theoretical associations areobserved in the data.
