111
EXPLORATORY FACTOR ANALYSIS: MODEL SELECTION AND IDENTIFYING UNDERLYING SYMPTOMS By Matthew K. Cole A thesis submitted to Johns Hopkins University in conformity with the requirements for the degree of Master of Science. Baltimore, Maryland September, 2017

Exploratory Factor Analysis: Model Selection and

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Exploratory Factor Analysis: Model Selection and

EXPLORATORYFACTORANALYSIS:MODELSELECTIONANDIDENTIFYINGUNDERLYINGSYMPTOMS

By

MatthewK.Cole

AthesissubmittedtoJohnsHopkinsUniversityinconformitywiththerequirementsforthedegreeofMasterofScience.

Baltimore,Maryland

September,2017

Page 2: Exploratory Factor Analysis: Model Selection and

ii

Abstract:

Exploratoryfactoranalysis(EFA)isacommonyetpowerfultooltobetterunderstand

thetheoreticalstructureofasetofvariables.AcoreproblemofconductinganEFAis

determiningthenumberoffactors(m)toextractandexamine.Inthisthesis,weexaminedthe

performanceofexistingmethodsofestimatingmwhileproposingandassessingacross

validatedmethodforestimatingmacrossvarioussettings.Thesemethodswerethen

consideredinastudyincorporatingEFAtoassesstherelationshipandcategorizationofself-

reportedchronicrhinosinusitis(CRS)symptoms,acommonsinusinflammatorydisease,within

threecrosssectionalquestionnairesaswellaswithintheinthechangesinsymptomsbetween

questionnaires.

Acrossvalidatedapproach(trace)wasdevelopedbywhichmincreasesuntilthe

discrepancybetweentheimpliedcorrelationofapartitionofdataandtheobservedcorrelation

oftheotherdatapartitionincreases.Inordertoassesstheperformanceofthisnewmethodas

wellasother,commonapproaches,asimulationstudywasdesignedinwhichvalidfactor

loadingmatricesweresimulatedusinganewprocedure,andrandomsamplesweredrawn

fromtheirrespectivecorrelationmatrices.Thetracemethoddisplayedquicklyincreasing

accuracywhenmoresamplesweredrawn,aphenomenonnotobservedinothermethods.

TracewasalsoappliedtotheCRSdata,suggesting13factorstobeextracted,morethanother

methods.Thisnon-agreementpossiblyhighlightsthedifferencesinfactorextraction

interpretations,andthedifferentmeaningsof“correct”m.

AnEFAwascarriedoutonself-reportedCRSsymptomsaswellaschangesinsymptom

responsesovertimeinordertoidentifyanyrelationshipsbetweenorcategorizationofCRS

Page 3: Exploratory Factor Analysis: Model Selection and

iii

symptoms.Atotalof3535primarycarepatientswereincludedthisstudyhavingrespondedto

threequestionnairesof37repeatedquestionsspanninga16-monthperiod.Afterextracting

factorsfromallthreequestionnairesandtwosymptomdifferencescores,fivestablefactors

wereidentifiedineach.Thefactorsofcongestionanddischarge,facialpainandpressure,smell

loss,asthmaandconstitutionalaswellasearandeyesymptomswereconsistentwiththe

hypothesisthatCRSsymptomsaremeasuringseveraldistinctbiologicalprocesses.

Readers:KarenBandeen-Roche,BrianSchwartz

Page 4: Exploratory Factor Analysis: Model Selection and

iv

Acknowledgments:IwouldfirstliketothankmythesisadvisorsandreadersDr.KarenBandeen-RocheandDr.BrianSchwartzoftheJohnsHopkinsBloombergSchoolofPublicHealthfromwhomIhavelearnedsomuch.WiththeirguidanceandsupportIwasallowedtoworkanddevelopideasonmyownwhilebeingsteeredbackontrackwhenneeded.Imustalsoexpressmygratitudetomyparentswhohavesupportedandencouragedmethroughmystudies.Withouttheirsupport,thisworkwouldnotbepossible.

Page 5: Exploratory Factor Analysis: Model Selection and

v

TableofContents

Abstract:.................................................................................................................................ii

Acknowledgments:.................................................................................................................iv

Chapter1-Introduction.........................................................................................................1

Chapter2-ACross-ValidatedApproachtoExploratoryFactorAnalysisModelSelection........3Introduction.....................................................................................................................................3Background......................................................................................................................................7

NotationandAssumptions..................................................................................................................7EFAandPCA.........................................................................................................................................8ExistingEFAModelSelectionStrategies..............................................................................................8

NovelMethodforModelSelection.................................................................................................13SimulationStudy............................................................................................................................14Results...........................................................................................................................................16

ApplicationtotheCRSStudy.............................................................................................................17Discussion......................................................................................................................................19Futurework...................................................................................................................................23TablesandFigures..........................................................................................................................25Appendix........................................................................................................................................31

AdditionalFigures:.............................................................................................................................31SimulatingFactorModelCorrelationMatrices.................................................................................44

Chapter3-ExploratoryFactorAnalysisofCRSSymptoms.....................................................47Introduction...................................................................................................................................47Methods........................................................................................................................................49

Studypopulationanddesign.............................................................................................................49Datacollection...................................................................................................................................50Analyticvariables...............................................................................................................................51StatisticalAnalysis.............................................................................................................................51SensitivityAnalysisandDiagnostics..................................................................................................55

Results...........................................................................................................................................56Descriptionofstudysubjects............................................................................................................56Cross-sectionalEFAs..........................................................................................................................57LongitudinaldifferenceEFAs.............................................................................................................58FactorScores.....................................................................................................................................59

Discussion......................................................................................................................................60LimitationsandFurtherWork:..........................................................................................................66

Conclusion.....................................................................................................................................67Tables&Figures.............................................................................................................................69Appendix........................................................................................................................................80

Chapter4-Conclusion...........................................................................................................94

References............................................................................................................................97

Page 6: Exploratory Factor Analysis: Model Selection and

vi

ListofTablesChapter2:TABLE1.NUMBEROFCORRECTFACTORNUMBERASSESSMENTSBYSAMPLESIZEADJUSTEDBIC(SSBIC),

STANDARDBIC(BIC),KAISEREIGENVALUESGREATERTHAN1RULE(K1),PARALLELANALYSIS(PA),ANDTHEPROPOSEDMETHOD(TRACE)OUTOF100SIMULATIONREPLICATES..............................................................25

TABLE2ESTIMATEDNUMBEROFFACTORSFOREACHOFTHE3QUESTIONNAIRES(BASELINE,6MONTH,AND16MONTHFOLLOWUPS)FROMCOMMONLYUTILIZEDMETHODSINCLUDINGKAISEREIGENVALUESGREATERTHAN1RULE(K1),PARALLELANALYSIS(PA),STANDARDBIC(BIC),EMPIRICALBIC(EBIC),SAMPLESIZEADJUSTEDBIC(SSBIC),ANDTHEPROPOSEDMETHOD(TRACE)........................................................................27

Chapter3:TABLE1.DEMOGRAPHICINFORMATIONOFTHE3535PATIENTSINCLUDEDINTHECURRENTANALYSISANDTHE

4312PATIENTSWHORETURNEDTHEBASELINEQUESTIONNAIREBUTWERENOTINCLUDEDINTHECURRENTANALYSIS............................................................................................................................................................69

TABLE2.QUESTIONSFORTHETHREECROSS-SECTIONALQUESTIONNAIRES.QUESTIONRESPONSESWEREONA5-ITEMLIKERTSCALE*..........................................................................................................................................70

TABLE3.FACTORLOADINGSANDSYMPTOMCOMMONALTIESFROMTHEEXPLORATORYFACTORANALYSIS(EFA)OFTHE37PRESENCE,SEVERITY,ANDSECONDARYCRSSYMPTOMATBASELINE.THEEFAWASFITUSINGORDINARYLEASTSQUARESANDANOBLIMINROTATION(NUMBEROFPATIENTS=3535).LOADINGSLESSTHAN0.3WEREOMITTEDFORREADABILITY.COMMUNALITIESREPRESENTTHEFRACTIONOFEACHSYMPTOM’SVARIABILITYTHATWASCAPTUREDBYTHEUTILIZEDFIVEFACTORMODEL................................72

TABLEA1.FACTORLOADINGSANDSYMPTOMCOMMONALTIESFROMTHEEXPLORATORYFACTORANALYSIS(EFA)OFTHE37PRESENCE,SEVERITY,ANDSECONDARYCRSSYMPTOMAT6MONTHFOLLOWUP.THEEFAWASFITUSINGORDINARYLEASTSQUARESANDANOBLIMINROTATION(NUMBEROFPATIENTS=3535).LOADINGSLESSTHAN0.3WEREOMITTEDFORREADABILITY.COMMUNALITIESREPRESENTTHEFRACTIONOFEACHSYMPTOM’SVARIABILITYTHATWASCAPTUREDBYTHEUTILIZEDFIVEFACTORMODEL.................86

TABLEA2.FACTORLOADINGSANDSYMPTOMCOMMONALTIESFROMTHEEXPLORATORYFACTORANALYSIS(EFA)OFTHE37PRESENCE,SEVERITY,ANDSECONDARYCRSSYMPTOMAT16MONTHFOLLOWUP.THEEFAWASFITUSINGORDINARYLEASTSQUARESANDANOBLIMINROTATION(NUMBEROFPATIENTS=3535).LOADINGSLESSTHAN0.3WEREOMITTEDFORREADABILITY.COMMUNALITIESREPRESENTTHEFRACTIONOFEACHSYMPTOM’SVARIABILITYTHATWASCAPTUREDBYTHEUTILIZEDFIVEFACTORMODEL.................88

TABLEA3.FACTORLOADINGSANDSYMPTOMCOMMONALTIESFROMTHEEXPLORATORYFACTORANALYSIS(EFA)OFTHE37PRESENCE,SEVERITY,ANDSECONDARYCRSSYMPTOMCHANGESFROMBASELINETO6MONTHS.EFAWASFITUSINGORDINARYLEASTSQUARESANDANOBLIMINROTATION(NUMBEROFPATIENTS=3535).LOADINGSLESSTHAN0.3WEREOMITTEDFORREADABILITY.COMMUNALITIESREPRESENTTHEFRACTIONOFEACHSYMPTOM’SVARIABILITYTHATWASCAPTUREDBYTHEUTILIZEDFIVEFACTORMODEL............................................................................................................................................................................90

TABLEA4.SYMPTOMCOMMONALTIESFROMTHEEXPLORATORYFACTORANALYSIS(EFA)OFTHE37PRESENCE,SEVERITY,ANDSECONDARYCRSSYMPTOMCHANGESFROMBASELINETO6MONTHSAND6MONTHSTO16MONTHS.EFAWASFITUSINGORDINARYLEASTSQUARESANDANOBLIMINROTATION(NUMBEROFPATIENTS=3535)..............................................................................................................................................92

Page 7: Exploratory Factor Analysis: Model Selection and

vii

ListofFiguresChapter2:FIGURE1.STRONG“BLOCKED”FACTORLOADINGSUTILIZEDTOCREATETHECORRELATIONMATRIXINTHE

SIMULATIONSTUDY...........................................................................................................................................28FIGURE2.CORRELATIONMATRIXGENERATEDFROMTHESTRONG“BLOCKED”FACTORLOADINGSMATRIX.........29FIGURE3TRACEFUNCTION’SDISCREPANCYVALUESONTHEBASELINECRSDATA.VERTICALLINEDENOTESTHE

MINIMUMACHIEVEDAT13FACTORS...............................................................................................................30FIGUREA1.THEUTILIZEDSTRONGLOADINGMATRIX,CREATEDUSINGTHEDIRICHLETSIMULATIONPROCESS,

CONSISTINGOF5FACTORSAND25VARIABLES................................................................................................31FIGUREA2.THEUTILIZEDSTRONGCORRELATIONMATRIX,CREATEDFROMTHECORRESPONDINGSTRONG

LOADINGMATRIX..............................................................................................................................................32FIGUREA3.THEUTILIZEDMODERATELOADINGMATRIX,CREATEDUSINGTHEDIRICHLETSIMULATIONPROCESS,

CONSISTINGOF5FACTORSAND25VARIABLES................................................................................................33FIGUREA4.THEUTILIZEDMODERATECORRELATIONMATRIX,CREATEDFROMTHECORRESPONDINGMODERATE

LOADINGMATRIX..............................................................................................................................................34FIGUREA5.THEUTILIZEDWEAKLOADINGMATRIX,CREATEDUSINGTHEDIRICHLETSIMULATIONPROCESS,

CORRESPONDINGFROM5FACTORSAND25VARIABLES..................................................................................35FIGUREA6.THEUTILIZEDWEAKCORRELATIONMATRIX,CREATEDFROMTHECORRESPONDINGWEAKLOADING

MATRIX..............................................................................................................................................................36FIGUREA7.THEUTILIZEDMODERATE/LOWDIMENSIONALLOADINGMATRIX,CREATEDUSINGTHEDIRICHLET

SIMULATIONPROCESS,CONSISTINGOF5FACTORSAND11VARIABLES..........................................................37FIGUREA8.THEUTILIZEDMODERATE/LOWDIMENSIONALCORRELATIONMATRIX,CREATEDFROMTHE

CORRESPONDINGMODERATE/LOWDIMENSIONALLOADINGMATRIX............................................................38FIGUREA9.THEUTILIZEDMODERATE/DIFFERENTDIMENSIONLOADINGMATRIX,CREATEDUSINGTHEDIRICHLET

SIMULATIONPROCESS,CONSISTINGOF5FACTORSAND27VARIABLES..........................................................39FIGUREA10.THEUTILIZEDMODERATE/DIFFERENTDIMENSIONALCORRELATIONMATRIX,CREATEDFROMTHE

CORRESPONDINGMODERATE/DIFFERENTDIMENSIONALLOADINGMATRIX..................................................40FIGUREA11.THEUTILIZEDTENFACTORLOADINGMATRIX,CREATEDUSINGTHEDIRICHLETSIMULATIONPROCESS,

CONSISTINGOF10FACTORSAND100VARIABLES............................................................................................41FIGUREA12.THEUTILIZEDTENFACTORCORRELATIONMATRIX,CREATEDFROMTHECORRESPONDINGTEN

FACTORLOADINGMATRIX................................................................................................................................42FIGUREA13.TRACEFUNCTION’SDISCREPANCYVALUESONTHESTRONG“BLOCKED”FACTORLOADINGMATRIX.

VERTICALLINEDENOTESTHEMINIMUMACHIEVEDAT5FACTORS.................................................................43Chapter3:FIGURE1.LASAGNAPLOTDISPLAYINGTHEPROPORTIONOFINDIVIDUALSWITHEACHGIVENRESPONSETOTHE

QUESTION“ONAVERAGE,HOWOFTENINTHEPAST3MONTHSHAVEYOUHADPOST-NASALDRIP?”ATBASELINEAND6MONTHSAND16MONTHSLATER(1=NEVER,2=ONCEINAWHILE,3=SOMEOFTHETIME,4=MOSTOFTHETIME,5=ALLTHETIME).Y-AXISVALUESINDICATETHENUMBEROFPATIENTSWITHEACHPARTICULARRESPONSEATBASELINE......................................................................................................76

FIGURE2.INTER-FACTORCORRELATIONSATBASELINEFROMTHEBASELINEQUESTIONNAIREEXPLORATORYFACTORANALYSISFITVIAORDINARYLEASTSQUARESANDANOBLIMINROTATION......................................77

FIGURE3.FACTOR1(CONGESTIONANDDISCHARGE)SCORESBYCRSEPOSGROUPSATBASELINE.FACTORSCORESACROSSFACTORSANDCRSEPOSSGROUPS(CURRENTCRS,PREVIOUSCRS,NEVERCRS)FORTHE

Page 8: Exploratory Factor Analysis: Model Selection and

viii

CONGESTIONANDDISCHARGEFACTORATBASELINEWITHNUMBEROFINDIVIDUALS(N)INEACHGROUP.FACTORSCORESWEREESTIMATEDBYTHEITEMRESPONSETHEORY(IRT)BASEDSCORESMETHOD.X-AXISWASJITTEREDTOIMPROVEREADABILITY........................................................................................................78

FIGURE4.CONTINUOUSFACTORSCORESCATEGORIZEDTOSHOWLONGITUDINALCHANGEACROSSQUESTIONNAIRESFORFACTOR1(CONGESTIONANDDISCHARGE).FACTORSCORESWERECATEGORIZEDAS:FACTORSCORE<-1WEREASSIGNEDVALUESOF-2;BETWEEN-0.5AND-1,ASSIGNED-1;BETWEEN-0.5AND0.5,ASSIGNED0;BETWEEN0.5AND1,ASSIGNED1;AND>1,ASSIGNED2.Y-AXISLABELSINDICATETHENUMBEROFPATIENTSATBASELINEINEACHADJUSTEDFACTORSCOREGROUP.FACTORSCORESWEREESTIMATEDBYTHEIRTMETHOD......................................................................................................................79

FIGUREA1.SCREEPLOTFORTHEBASELINEQUESTIONNAIREDISPLAYINGEIGENVALUESONTHEY-AXISANDTHEIRCORRESPONDINGFACTORNUMBERONTHEX-AXIS........................................................................................81

FIGUREA2.SCREEPLOTFORTHE6MONTHFOLLOWUPQUESTIONNAIREDISPLAYINGEIGENVALUESONTHEY-AXISANDTHEIRCORRESPONDINGFACTORNUMBERONTHEX-AXIS..............................................................82

FIGUREA3.SCREEPLOTFORTHE16MONTHFOLLOWUPQUESTIONNAIREDISPLAYINGEIGENVALUESONTHEY-AXISANDTHEIRCORRESPONDINGFACTORNUMBERONTHEX-AXIS..............................................................83

FIGUREA4.SCREEPLOTFORTHEFIRSTDIFFERENCE(BASELINETO6MONTHQUESTIONNAIRES)DISPLAYINGEIGENVALUESONTHEY-AXISANDTHEIRCORRESPONDINGFACTORNUMBERONTHEX-AXIS......................84

FIGUREA5.SCREEPLOTFORTHESECONDDIFFERENCE(6TO16MONTHQUESTIONNAIRES)DISPLAYINGEIGENVALUESONTHEY-AXISANDTHEIRCORRESPONDINGFACTORNUMBERONTHEX-AXIS......................85

Page 9: Exploratory Factor Analysis: Model Selection and

1

Chapter1-Introduction

Exploratoryfactoranalysis(EFA)isastatisticalmethodutilizedtoinvestigateand

summarizethejointdistributionofacollectionofvariablesthroughtheestimationofthe

relationshipbetweentheseobservedvariablesandunobservedbuttheorizedfactors.Itrelies

ontheassumptionthatcovarianceamongmeasuredvariablesarisesfromasmallersetof

latentfactorswhichareassociated,tovaryingdegrees,toeachobservablevariable(knownas

thecommonfactormodelassumption).Thesemethodsarecommonlyutilizedwhenthereis

littletonoaprioriknowledgeaboutthelatentstructureassociatedwithvariables,andhave

beenemployedacrossavarietyofscientificdomains.Commonly,EFAsareemployedinan

attempttobetterunderstandrelatedphenomena,toallowresearcherstocreatescales,andas

anintuitivewaytostudywhatacollectionofobservationsismeasuring.

Despitebeingalong-establishedtechnique,considerabledifficultystillisencountered

whenemployingthisapproach.Themostcommonissuepractitionersfacewhileattemptingto

utilizeanEFAiswhichfactormodeltoutilize—largely,howmanyfactorstoincorporate.

Determiningthenumberoffactorstoinclude(𝑚)canbeadifficultproblemasasubtlechange

in𝑚canvastlyimpacttheresultsandinterpretationsoftheanalysis,andthechoiceof𝑚itself

canbeveryambiguous.Furthermore,theproblemofselecting𝑚hasbeenapproachedfroma

varietyofangles,noneofwhichhasearnedconsensusagreementasagoldstandardmethod.

Partofthisthesisincludesworkstudyingexistingmethodsforchoosingthenumberoffactors

(𝑚),whileproposingandexamininganewmethodwhichincorporatesacrossvalidated

approachtoselecting𝑚.

Page 10: Exploratory Factor Analysis: Model Selection and

2

Inaddition,weappliedtheprinciplesstudiedtoananalysisofchronicrhinosinusitis

(CRS)symptomdatacollectedfrompatientsidentifiedinalarge,integratedhealthsystem.A

totalof3535patientsrespondedto37questionspertainingtoaspectrumofCRSsymptoms

threetimes(baseline,6-monthand16-monthfollow-upquestionnaires).TheCRSstudy,infact,

motivatedthestatisticalwork.EFAwasconductedtobetterunderstandrelationshipsamong

andcategorizationofCRSandCRSrelatedsymptoms.EFAswereconductedforeachofthe

threequestionnairesaswellasforthechangeinreportedsymptomfrequency(baselineto6

months,and6monthsto16months).Wewereabletoidentifyfivesimilarfactorsineachof

theseanalyses,eachwithabiologicallyplausiblepathologicalexplanation,suggestingthat

theremayberealphenomenadrivingtheseobservations.Atthesametime,theselectionof

fiveasthefactornumberwasbetterevidencedinsomeperiodsthanothers,andbysome

methodsthanothers,illustratingthechallengesstudiedinourfirstpaper.

Thisthesiscomprisestwopapers,onethatutilizedEFAinordertoexplorethe

covariancestructureofself-reportedsymptomsrelatedtoCRSandcommon,co-morbid

conditions,whiletheotherstudiedthevariousmethodsofdeterminingthenumberoffactors

inEFAsettingsandproposedanothermethodtodoso.Itbeginswiththemethodologicalwork

andthenproceedstothedetailedCRSanalysis.Thesepapersworkintandembyexaminingthe

theoryandchallengesfromananalyticandphilosophicalstandpointofselectingthe“optimal”

numberoffactors,whileestimatingthenumberoffactorsandputtingEFAtoworkinareal-

worldsettinginvolvingacommondisease.Aconcludingchapterprovidessynthesisand

identifiesareasforfuturework.

Page 11: Exploratory Factor Analysis: Model Selection and

3

Chapter2-ACross-ValidatedApproachtoExploratoryFactorAnalysisModelSelection

Introduction

Frequentlyitisofpublichealthinteresttocharacterizeattributesofdatathatcannotbe

measureddirectly.Instead,agroupofobservablevariablesthatindirectlycharacterizethe

unobservedarecollected.Insuchsettings,itisoftenofimportancetoexploretheunderlying,

“latent”structureofdatainadditiontotheobservedmanifestvariables.Ideally,indoingsowe

canstudyunobserved,latentvariableswhichmaybemoreinterpretableorofagreater

importancethantheirmeasuredcounterparts.Onemethodofestimatinglatentstructureis

throughfactoranalysis(FA).Thismethodiscommonindiversefieldsrangingfrompsychology

andeconomicstohealthandspirituality(Fabrigar,Wegener,MacCallum,&Strahan,1999;

Hirose,Kawano,Konishi,&Ichikawa,2011;Underwood&Teresi,2002).

Therearetwogeneralcasesoffactoranalysis,exploratoryandconfirmatory.

Exploratoryfactoranalysis(EFA)aimstoidentifytheunderlyingstructureofvariableswithlittle

aprioriknowledgeofanysuchrelationship,whileconfirmatoryfactoranalysisisutilizedtotest

whetheraproposedlatentstructureadequatelyfitstheobserveddata.Whilebothareuseful

andpowerfultechniques,exploratoryfactoranalysisrequirestheadditionalstepofchoosing

𝑚,theestimatednumberoffactors(𝑚)characterizingtheobserveddatadistribution,amodel

selectionproblemwhichwillbeconsideredbelow.Thismodelselectionisimportantasboth

Page 12: Exploratory Factor Analysis: Model Selection and

4

thequantitativeandqualitativeresultsofanEFAmayrelyheavilyonthisselection,suchthat

differentspecificationsof𝑚maypotentiallyleadtoalteredinterpretationsandinferences.

Currentmethodsofdetermining𝑚varywithrespecttocomputationaswellastheory,some

utilizinglikelihoodbasedmethods,includingAkaikeInformationCriterion(AIC)andBayesian

InformationCriterion(BIC),toassessthehypothesisthatthedataisgeneratedfrommodels

withspecific𝑚,whileothersutilizepropertiesofcorrelationmatrices(intermsofeigenvalues),

orothercriteriatotestmodelfit.Somemethodsofestimating𝑚areuseful,butmosthave

drawbacksaswell.

TheFAapproachisbasedonthecommonfactormodelbywhichobservedvariablesare

conceptualizedtoarisefrom3components:commonfactors,uniquevariability,and

measurementerrors/noise(Brown,2014).Thecoreequationofthecommonfactormodelis

asfollowsinscalarnotation:

𝑦#,% = 𝜆%,(𝜈#,(

*

(+,

+ 𝜖#,%

where𝑦#,% isthevalueofvariable𝑗forperson𝑖,𝜈#,( isthe𝑔thfactorvariableforperson𝑖,𝜆%,(is

the“loading”ofthe𝑗thvariableontothe𝑔thfactor,and𝜖#,% istheresidualtermforperson𝑖and

variablejwhichremainsunexplainedbythefactormodelspecifictoour𝑗thvariable,thesum

ofuniquenaturalvariationandmeasurementerror(Brown,2014).Withoutrepeated

observations,thenoisetermisnotdistinguishablefromthenaturalvariationterm,andinthe

aboveequation,bothcomprise𝜖#,%.Thefactorsimpact,tovaryingdegreesasreflectedbytheir

loadings,manyobservedvariables.

Page 13: Exploratory Factor Analysis: Model Selection and

5

Itwasouraimtointroduceacrossvalidatedapproachbywhich𝑚waschosentobethe

valuewhichreducedthedifferencebetweendistributionsimpliedbyanestimatedfactormodel

andempiricallycharacterizingindependentlyheldoutdata.

Factoranalysismodelselectionisdifficultandcommonlyreliesonqualitativeorbiased

methods.Thereisaneedtoexplorealternativemethodsofidentifying𝑚whichdisplay

desirableproperties,suchascloseproximitytoanunderlyingdatadistributionor

reproducibility.Wealsoperceiveneedtoevaluatemethodsinsettingsinwhichthenumberof

observedvariablesislargerelativetothesamplesize.Aselaboratedshortly:Currentmethods

canbeinaccurateacrosssomeoralltestingattributes(e.g.correlationstrength,factor

structure,andsamplesize).Theproposedmethodutilizesacross-validatedapproachto

determinewhentheadditionofafactordoesnotsummarizeadditionalcommonvariabilityin

theEFAmodel.Inmorepreciseterms,theproposedmethodcontinuallyincreasesthenumber

offactors(increasing𝑚)untilthedifferencebetweentheobservedandproposedcorrelation

matrix,asmeasuredbyadiscrepancyfunction,increases.Cross-validationhelpsusavoidthe

followingpitfall:Asthenumberoffactors(𝑚)increasestowardsthenumberofvariables,𝑝,in

asinglesample,the'difference'betweentheobservedandimpliedcorrelationmatrixwill

decrease-evenifonlyduetonoise.Byincorporatingacrossvalidatedapproach,weexpect

thattheadditionofafactorthatonlycharacterizesnoiseinonepartitionshouldactually

worsenfitinanother.Whenthisphenomenonisobserved,weproposethatthepreviously

utilizednumberoffactorsisabetterfitforthedataathand.

Page 14: Exploratory Factor Analysis: Model Selection and

6

Thisstudywasmotivatedbyaprojectinvestigatingchronicrhinosinusitis(CRS),asinus

inflammatorydiseaseimpactingapproximately15%oftheUnitedStatesadultpopulation(Tan,

Kern,Schleimer,&Schwartz,2013).CRSiscommonlydefinedbythepresenceof4cardinal

symptomsassociatedwithsinusswellingpresidingforanextendedperiodoftime,butits

diagnosistypicallyalsorequiresobjectiveevidenceofinflammationsuchasbycomputerized

tomography(CT)scanning.Onebarriertotheeffectivediagnosisandtreatmentofthedisease

isthattheconnectionbetweenobjectiveinflammationandpatientsymptomsofsinus

opacificationisnotwellunderstood(Wjetal.,2012).Additionally,obtainingobjectiveevidence

canbedifficultinresource-limitedorhigh-volumesettings,makinganimprovedsymptom-

basedmethodofdiagnosisdesirable.Tobeginaddressingthesebarriers,themotivatingstudy

assessedalargesampleofpatientsfromalarge,integratedhealthsystemforpresenceand

severityofalargenumberofsinus-relatedsymptoms.EFAwasproposedasamethodto

summarizesymptomclusteringandhencefacilitatethesubsequentstudyofsymptom

relationshipswithobjectiveevidenceofCRS(Cole,Schwartz,&Bandeen-Roche,2017).

Intheremainderofourpaper,weexamineexistingmethodsforestimatingthenumber

offactorsinEFAsettingsasabackgroundforthisstudy,discussingsomepreviouslyestablished

strengthsandweaknessesofeach.Thenweintroducetheproposedmethodandprovidea

simulationstudycomparingefficacyofmethodsacrosspotentialcircumstancesincluding

samplesizes,correlationstrengthsanddistributions,aswellasnumberofvariables.Resultsare

comparedacrossmethods,correlationstructures,andsamplesizes.Finally,weapplythe

variousmethodsforselectingthenumberoffactorstotheCRSstudy.TheCRSfindings

themselvesarepresentedinthenextthesispaper.

Page 15: Exploratory Factor Analysis: Model Selection and

7

Background

NotationandAssumptions

Exploratoryfactoranalysisaimstorepresentamultivariatedatadistribution(commonly

throughthecorrelationmatrix,𝑅)accordingtothecommonfactormodel.Thismodelis

characterizedbyparameterswecollectivelylabelas“𝜃”,consistingofa𝑝×𝑚loadingmatrix,𝛬,

𝑝×𝑝inter-factorcorrelationmatrix𝛹,and𝑝×𝑝uniquevariance(diagonal)matrix,𝛥:.Because

each𝜃“implies”exactlyonecorrelationmatrix,𝑃 𝜃 ,whichisthemodel’scharacterizationof

thematrixofcorrelationsamongtheobservedvariables,wecanmakestatementsaboutan

EFA'simpliedcorrelationmatrixwhichhastheform𝑃(𝜃) = 𝛬𝛹𝛬′ + 𝛥?.

Thecommonfactormodelrepresentsthemeasuredvariablesasfunctionsoflatent

(unobserved)factorsaswellasmodelparameters,mostnotablyfactorloadings.Inmatrix

notation(scalarrepresentationhasbeenpreviouslyprovided)𝒚𝒊 = 𝛬𝜁 + 𝛿where𝒚𝒊isthe

(𝑝×1)vectorofobservedvariablesforperson𝑖,𝛬is,again,the(𝑝×𝑚)loadingmatrix,𝜁isthe

(𝑚×1)latentfactorvector,and𝛿isthe(𝑝×1)vectoroferrortermsforeachindividual.Each

element𝛿isassumedtobeindependentfromeachotherandindependentof𝜁aswell.

Commonassumptionsofthismodelarethattheerrortermsaremutuallyindependent

andindependentofthefactorvariables,andthatthecollectionofthefactoranderrorterm

variablesismultivariatenormallydistributed.Thevariancesofvariable-specificresidualterms,

Var(𝜖%),arereferredtoas"uniqueness"termsthatremainunexplainedbythecommonfactors,

leavingVar(𝑌%)-Var(𝜖%)asthe“common”orsharedvariance(“commonality”)inagiven

observedvariablethatisattributabletoitsfactorcontributions.

Page 16: Exploratory Factor Analysis: Model Selection and

8

Fortheremainderofthispaper,𝑁willdenotethesamplesize.

EFAandPCA

ItmaybeworthmakingaquicknoteofthedifferencebetweenEFAandprincipal

componentsanalysis(PCA),tworelatedandcommonlyconfused,yetdistinct,techniques.The

aimofPCAistoreducethedimensionalityofdatawhileretainingasmuchinformation

(variability)aspossiblethroughtheconversionofcorrelatedvariablesintoasetofuncorrelated

linearcombinationsof(oftenstandardized)observedvariablescalledprincipalcomponents.

EFAisamodel-basedanalysisthataimstoidentifytherelationshipbetweenhypothesized

latent,butpotentiallyrelatedfactors,andtheobservedvariables.Whilebothofthesemethods

aredimensionreductiontechniques,eachisusedtoanswerverydifferentquestions,andare

notinterchangeable.

ExistingEFAModelSelectionStrategies

TherearemanycommonapproachestoEFAmodelselection,eachselectingan𝑚based

onsomecriteriathattrytoidentifyan'optimal'numberoffactors,𝑚.Ifanappropriatechoice

for𝑚exists,choosing𝑚 > 𝑚iscalledoverfactoring.Overfactoringmayresultina

misunderstandingofreallatentconstructspresentindataastruefactorsmaybesuperfluously

splitintoseveralfactorsorfactorswhichhave'random'lowloadingvariablesmayappear

(Norris&Lecavalier,2010).Ontheotherhand,choosing𝑚 < 𝑚iscalledunderfactoring.

Underfactoringisconsideredtobeamoreseriousconcern,asobservedfactorswillload

Page 17: Exploratory Factor Analysis: Model Selection and

9

erroneouslyontofactorstowhichtheydon'tbelong,providingmisleadingevidenceforfactor

identity(Norris&Lecavalier,2010).

Severalstrategiesforestimating𝑚relyheavilyonassessingandinterpretingthe

eigenvaluesofacovarianceorcorrelationmatrix-eitheroftheobservedvariables,orthatis

estimatedtoarisefromthecommonfactorportionofthefactormodel.Fortheremainderof

thispaperwewillassumethatanalysesarebasedoncorrelationmatrices:thishasthe

advantageofstandardizingallvariancestoequaloneandallcovariationmeasurestolieona-1,

1range.Ineithercase,theeigenvalueofafactorisrepresentativeoftheamountofvariability

thefactorcontributestothesumofvariablevariancesinmostfactoringmethods(Norris&

Lecavalier,2010).Computationally,afactor'seigenvaluewithrespecttotheobservedvariable

correlationmatrix(whenfactorsareorthogonal)isitssumofsquaredloadings(Norris&

Lecavalier,2010).

Therearegraphicalmethodsthatemploycorrelationmatrixeigenvaluesaswell,suchas

Cattell’sscreetest(Cattell,1966).Thistestconsistsofexaminingaplotofeigenvaluesversus

factorindex(1,2,3,…)tovisuallyselectthenumberoffactorstoextract.Ideally,therewillbe

aninitialsteepdropineigenvaluesfollowedbyaclearlevelingoutinthetrendofremaining

decrease,creatingaclassic'elbow'shape(Cattell,1966).Thenumberofeigenvaluesbeforethe

elbowistheproposednumberoffactors.Whileintuitive,aclearproblemwiththistechniqueis

thatthereisnotalwaysaclear‘elbowshape’,leavingpotentialforvaryinginterpretationby

differentinvestigators.

Page 18: Exploratory Factor Analysis: Model Selection and

10

OneofthemostpopularmethodsforestimatingthenumberoffactorsistheKaisertest

(K1),whichisbasedon𝑒,, 𝑒J, . . . 𝑒L—theeigenvaluesoftheobservedvariablecorrelation

matrix.K1choosesthenumberoffactorstobeequaltothenumberofeigenvaluesgreater

than1,𝑚 = 𝛴,L𝐼(𝑒# > 1)(Cattell,1966;Kaiser,1960).Thismethodprovidesanintuitive

understandingoffactorretentionmethodsbywhichoneretainsthefactorsaccountingfor

morethanasinglestandardizedvariableworthofvariability(Velicer,Eaton,&Fava,2000).It

tendstooverestimatethenumberofcomponentsinPCAsettings,however,potentiallydueto

randomnoisepushing'borderline'eigenvaluesoverthe'threshold'of1(Veliceretal.,2000;

Zwick&Vejicer,1984).

ApopularextensionoftheK1testisparallelanalysis,bywhichobservedeigenvaluesare

compared—typically,plotted—againstameasureofcentraltendencyforeigenvaluesas

simulatedfrommanyrandomlygeneratedindependent(noise)correlationstructures

(Humphreys&Jr,1975;Timmerman&Lorenzo-Seva,2011).Themeasureofcentraltendency

maybemean,median,possiblysomeotherpercentile,orevenasinglerealizationofsimulated

eigenvalues(Humphreys&Jr,1975).Thefactorsselectedisthenumberofobserved

eigenvaluesthataregreaterthanthesimulatedeigenvalues(Horn,1965).Parallelanalysishas

beenconsideredtobeoneofthemostpowerfulandaccuratemethodsofdeterminingthe

numberoffactorstoextract,andhasbeenshowntodisplaybetterperformancethantheK1,

scree,andothermethodswhilebeingrelativelyeasytounderstand(Veliceretal.,2000;Zwick

&Vejicer,1984).Parallelanalysisissensitivetosamplesizehowever,withincreased𝑁

commonlyresultinginmorefactorsbeingretained,potentiallyoveranoptimalamount(Velicer

etal.,2000).

Page 19: Exploratory Factor Analysis: Model Selection and

11

Othermethodsdonotexplicitlyconsidereigenvaluesandinsteadtakeamore

traditionalmodelselectionapproach.Suchmethodsincludeevaluatinglikelihoodsand

functionsoflikelihoods(Preacher,Zhang,Kim,&Mels,2013).Thelikelihoodutilizedinfactor

analysis,oncelogarithmicallytransformed,isgivenas

𝑙𝑜𝑔(ℒ) = − ,J𝑁(𝑙𝑜𝑔(|𝑃 𝜃 |) + 𝑡𝑟(𝑃 𝜃 V,𝑅))wheneachofthe𝑁observationsaretakento

benormallydistributedandindependentofoneanother(Akaike,1987).Likelihoodratiotests

canbecreatedwhichassessthenullhypothesisthattheobserveddataaregeneratedfrom

factormodelwithaspecific𝑚(Preacheretal.,2013).Typically,onecontinuestoincrease𝑚

untilthefactormodelfitsthedata(lowest𝑚forwhichthenullisnotrejectedbythelikelihood

ratiotest).Unfortunately,thismethodcomeswithseveraldrawbacks.Largesamplesizescause

even'small'discrepanciesbetweenmodelandobserveddatatocausearejectionwhileinsmall

samplesizesituations,largediscrepanciesmaynotbeidentified,leavingperformancetobe

determinedlargelybythegivensamplesize(Norris&Lecavalier,2010).

Inadditiontolikelihoodratios,variousextensionsincludingtheAICandBICaswellas

BICderivativessuchassamplesizeadjustedBIC(SSBIC)andempiricalBIC(EBIC)havebeen

frequentlyutilizedinafactoranalysismodelselectionframework(Hiroseetal.,2011;Lopes&

West,2004;Press&Shigemasu,1999).

Forthekorthogonal-factormodel,𝐴𝐼𝐶(𝑚) = −2×𝑙𝑜𝑔(ℒ(𝑚)) + [2𝑝(𝑚 + 1) − 𝑚(𝑚 −

1)](Akaike,1987),𝐵𝐼𝐶(𝑚) = −2×𝑙𝑜𝑔(ℒ(𝑚)) + log(𝑛)[𝑝(𝑚 + 1) − 0.5𝑚(𝑚 − 1)](Lopes&

West,2004),and𝑆𝑆𝐵𝐼𝐶 = −2×𝑙𝑜𝑔 ℒ 𝑚 + 𝑚×log(𝑛 + 2 24)(Sclove,1987),where𝑚is

thenumberoffactors,𝑝isthenumberofobservedvariablesand𝑛isthenumberof

Page 20: Exploratory Factor Analysis: Model Selection and

12

observationsutilized(Akaike,1973).Althoughcloselyrelated,thesemethodscanproducevery

differentestimatesof𝑚(Hiroseetal.,2011).Thisisnotsurprising,becauseAICisdirected

towardoptimizingprediction,whereasBICwasdesignedtoidentify“true”modelcomplexity

(Akaike,1973;Schwarz,1978).Inaddition,otherextensionsofAICandBIChavebeen

produced,andappliedinthecontextoffactoranalysisincludingmethodssuchasgeneralized

BIC(Hiroseetal.,2011).

Bootstrappingmethodshavebeenproposedtoprovideanalternativemethodof

determiningthenumberoffactorswhileproducingameasureofuncertainty(Thompson,

1988).Somesuchproceduresdrawbootstrapsamplesandthenestimate𝑚foreachsample

usingacommonapproachsuchasparallelanalysis:bydoingthis,onecouldobtainabootstrap

intervalfor𝑚,whichcouldinformanappropriaterangeofvaluesfor𝑚.Insimilarfashion

bootstrapintervalsforcommonality,loadingsandinter-factorcorrelationmeasuresalsocould

beprovided.Other,recentmethodologicalworkhasshowntheefficacyofcrossvalidated

methodologiesaswell.A“bi-cross-validation”techniqueproposedbyOwen&Wang(2016)

randomlyholdsoutsubmatricesofthedatamatrix,againstwhichfactormodelpredictions

developedontheremainingcomponentsofthedatamatrixaretested.Thismethodhasbeen

showntooutperformavarietyofothermethods,evenparallelanalysis,undercertain

simulatedsituations(Owen&Wang,2016).Thismethodevaluatespredictionswithrespecttoa

submatrixoftheobserveddatamatrix,incontrasttothemethodweshortlypropose,which

evaluatespredictionswithrespecttoacorrelationmatrixbasedonasubsetofthesampled

observations.Italsousesadistinct,multi-stepproceduretodeveloppredictionsandhasa

Page 21: Exploratory Factor Analysis: Model Selection and

13

distinctgoalofrecoveringtheunderlyingfactorprediction,ratherthana“true”numberof

factors,m(Owen&Wang,2016).

NovelMethodforModelSelection

Intheordinaryleastsquaresmethodforestimatingthefactoranalysismodel,the

discrepancybetweenandobservedandfactor-impliedcorrelationmatricescanbemeasuredby

thefollowingdiscrepancyfunction(Lee,Zhang,&Edwards,2012):

𝑓 =12TRACE( 𝑅 − 𝑃 𝜃 )J ,

wherethetracefunctionisthesumofdiagonalmatrixentries.Wecanassessfactormodelfit

bytrackingthevalueof𝑓atvaryingnumbersoffactors.Iffittingandassessmentareconducted

withinone,samedataset,weexpectthat𝑓willimprove(decrease)as𝑚,thenumberof

factors,increases.Whencross-validationisapplied,however,withfittingconductedinone

subsetandtestingappliedinanother,weexpectthat𝑓willimproveas𝑚increasestoapoint,

butthenworsenasoneexceedsthedimensionneededtocharacterizethetruedata

distribution.Thespecificprocedureweproposeisasfollows:

Procedure:

1. Randomlysplitdataintoatrainingandtestingset(ℎ = 2)

2. Calculatecorrelationmatrixfortrainingandtestingset

Page 22: Exploratory Factor Analysis: Model Selection and

14

3. Fitfactormodels(using𝑚from2to𝑝)usingthetrainingmatrix

4. Computethevalueoftracefunctioncomparingtheimpliedcorrelationmatrixfromthe

trainingdatatoobserveddatatestcorrelationmatrix.

5. Findwhere𝑓increases,stopthere.Ourchoice(𝑚)isthelast'step'before𝑓increases.If𝑓

increasesindefinitely,ourbestestimateof𝑚willbe𝑝,indicatingthatafactormodelis

notaparsimoniousfitforthedataathand.

SimulationStudy

Weassessedtheeffectivenessofourproposedmethodrelativetocommonexisting

modelselectionmethodsinasimulationstudy.Randomsamplesfromfactormodelswith

known𝑚weregeneratedandtheproportionofsamplesforwhich𝑚 = 𝑚wasestimatedfora

varietyofcorrelationstructuresarisingfromdifferentfactormodels.Factormodelswerefit

usingtheordinaryleastsquares(OLS)methodasimplementedbythepsychRpackage(Leeet

al.,2012;RCoreTeam,2016;Revelle,2017).Itwasouraimtocomparefindingsover

correlationstructuresvaryinginseveraldifferentaspectsincluding:strengthsofloadings,

numberofobservedvariables,anddistributionofloadingsamongfactors.Theprocedure

outlinedbelowwasutilizedinordertoprovideloadings/correlationmatriceswithboth

structureandelementsofrandomnessinthehopesofapproximatingreal-worldsituations.

Ninefactorstructureswereutilized,sixfromasimulatedtheoreticalframeworkand

threeincorporatingtheimpliedcorrelationmatricesfromtheCRSEFAwhichmotivatedthis

Page 23: Exploratory Factor Analysis: Model Selection and

15

study.EFAstructuresforfive-factormodels(representingassessmentsatbaseline,6month

followup,16monthfollowup),wereutilizedinordertoprovidestructuretothissimulation

studywhichapproximatedacomplexempiricalscenario,whilebeingdeterminedinadvance

andthusfeasiblycapableofmodelselectionmethodaccuracy.Outofthesixtheoretical

matrices,three“blocked”matrices(namedweak,moderate,andstrong)contained25variables

and5factors,witheachfactorhavingexactly5variablesloadingontoitweakly,moderately,or

strongly(meanloadings=0.38,0.56,and0.71respectively)andremainingloadingsonly

minimally(meanloadings=0.15,0.11,and0.07respectively).A“moderate,lowdimensional”

matrixrepresentedamodelincluding5factorsand11observedvariables,with3variables

loadingmoderately(meanloadings=0.56)ontothefirstfactorwhiletheremainingfactors

contained2uniquevariablesloadingmoderately;minimalloadingswere0.11onaverage.A

“moderate,differentdimensional”matrixcontained5factorsand27variables,withfactors

loadingon10,7,5,3,and2variablesrespectivelywithmeanloadingof0.56whileminimal

loadingswere0.11onaverage.A10-factor,100variablematrixwasalsoutilizedwith10

variablesloadingheavilyontoeachfactor(meanloadings=0.037)and90variablesloading

minimally(meanloadings=0.007).

Theoreticalmatricesdescribedaboveweregeneratedasfollows.Bytreatingeachof𝑝

rowsoftheloadingmatrix𝛬asan𝑚dimensionalDirichletvectorwithparameters𝛼#,,, . . . 𝛼#,*

wecangenerateavalidloadingmatrixforwhichthesumofsquaresforeachrowislessthanor

equalto1(avoidingaHeywoodcase),andeachfactormatrixentryisbetween-1and1.Wecan

thencomputethecommonalitiesas𝛥: = 𝐼 − 𝑑𝑖𝑎𝑔(𝛬𝛬′).Thematrix:𝛬𝛬′ + 𝛥: representsour

simulatedcorrelationmatrixgeneratedfromaknownfactorstructure:Figures1and2illustrate

Page 24: Exploratory Factor Analysis: Model Selection and

16

theresultforthestrong“blocked”correlationdesign,andothersareillustratedinthe

appendix.Thisfactorstructureiscompletelydeterminedbythechoiceof𝑝,𝑚,andallDirichlet

parametervalues.Althoughthisapproachissimple,itiscapableofgeneratingawiderangeof

structures(seeappendix).

Foreachcorrelationmatrix,simulationrunswereconductedforsamplesizesofN=100,

300,500,700,1000.Ineachrun,samplesoftherespectivesize(N)weredrawnfroma

multivariatenormaldistributionwithmeanzeroandtherespectivecorrelationmatrixasits

covariancematrix.Usingthesesimulatedsamples,K1,parallelanalysis,BIC,samplesize

adjustedBIC,andtheproposedmethodwereappliedtodeterminethenumberoffactors.For

eachrun,100repetitionswereconducted,andthenumberofsuccessfulestimates

(alternatively,thepercentofcorrectestimations)wererecorded.

Results

Theproposedtracefunctionmethodperformedincreasinglywellwithincreasing

samplesizes.Insomecases(e.g.'moderate'blockedstructure)accuracyincreasedfrom

approximately5%at𝑁 = 100to100%at𝑁 = 1000,whilethedifferenceinaccuracyofthe

other,standardmethodsvariedlessacross𝑁(Table1).Weakerstructuresweremoredifficult

fortheproposedmethodtoidentify.Forthe'moderate,lowdim'matrix,accuracyvariedfrom

5to35%asoneincreasedthesamplesizefrom100to1000(Table1).

Whilethetracefunctionimproveditsestimationfrom100samplesto1000foreach

proposedcorrelationmatrix,theK1methodaswellastheSSBICmethoddidnotonceimprove

Page 25: Exploratory Factor Analysis: Model Selection and

17

fromthesamechangeinN(Table1).MeanwhiletheBICmethodimprovedonlyonceinthe9

proposedscenarioswhiletheparallelanalysisimprovedintwoofthe9(Table1).Overall,all

testedmethodsperformedverywellwithstrongercorrelationstructures,althoughBIC

performedcomparativelyworseintheCRSand10-factorsettings(Table1).

Therewerecaseswherestandardmethodsperformedoutstandinglywell.Inthestrong

correlationsimulations,theSSBIC,K1,andparallelanalysisapproachesperformedat100%

accuracyacrossN.SimulationsinvolvingthethreeCRScorrelationstructuresandthetenfactor

structuresawperfectaccuracyacrossNwithSSBICandK1methodswhilethetracemethod

achievedatleast97%accuracyatNof1000(Table1).Inthemoderatelowdimensionaland

moderatedifferentdimensionalmatricesatlowN,allmethodsperformedpoorly,withthe

tracefunction(N=100,accuracy=5%),andK1methodsachievingthebestresults(N=100,

accuracy=16%)ineachrespectivescenario(Table1).

ApplicationtotheCRSStudy

Ouranalysisaddresses3535GeisingerHealthSystempatientswhowerefollowedfora

durationof16months,eachofwhomwasselectedusingastratifiedsamplingmethoddesigned

tooversampleracialminoritiesandthosewithahighpropensityforCRSviaInternational

ClassificationofDiseases(ICD-9)andCurrentProceduralTerminologydefinedattributesin

electronichealthrecorddata(Hirschetal.,2017;Tustinetal.,2017).Eachofthesepatients

respondedtothreequestionnairescontaining37commonquestionsatbaseline,6months,and

16months.Ofthese37questions,21inquiredaboutthepresenceandseverityofCRSnasal

andsinuswhiletheremainingquestionsassessedpresenceofasthma,allergy,earand

Page 26: Exploratory Factor Analysis: Model Selection and

18

constitutionalsymptoms.Allofthequestionsinquiredaboutthefrequencyofexperiencinga

symptom,orthefrequencyofbeingbotheredbyasymptominagiventimespanandeach

questionwasansweredonthesameLikertscale(1=never,2=onceinawhile,3=someofthe

time,4=mostofthetime,or5=allthetime).Thesequestionswerespecificallydesignedto

predictsinusopacificationlocationandseverityusingonlyself-reportedsymptoms.Polychoric

correlationswerecalculatedfromeachofthesesurveys,andeachofthemethodsstudiedin

oursimulationstudywasappliedtodeterminethenumberoffactorstoextract.

K1andparallelanalysisindicated6,5,7,and5,5,6factorsforbaseline,6monthand16

monthquestionnairesrespectivelywhileBIC(15,16,14),SSBIC(17,17,20),andtrace(13,13,

16)suggestedsubstantiallyhighernumbersoffactorsforthesamethreequestionnaires.Scree

plotswerealsoexamined,whichappearedtosuggest5factorsineachsurvey.Thetrace

function(Figure3)mayindicatesomeambiguityinfactornumberchoice,achievingaminimum

at13factors,butonlymodestslopebelowandabovethisnumber–muchlessthaninother

simulatedscenarios(seeappendix).Weexamineda13-factorsolutionforthebaselineCRSEFA,

extractedusingthesameOLSmethodandobliminrotationasinthe5factorEFAs,inorderto

examinethequalitativedifferencesdrivenbythedifferencesinthenumberofextracted

factors:Twofactors’interpretationswereinvariant--thefacialpainandpressuresymptomand

smelllosssymptomfactors;otherfactorshowever,werereducedtoidentifyingsymptoms

relatedtosinglephenomenaororgans(nasalcongestion,ear,eye,fatigue,etc.)andaddressing

bothpresenceandseverity(whenpresentinthesurveys).

ThesubstantiveCRSfactoranalysiswasnotapproachedfromadogmaticfactormodel

vantagepoint,butratheraimedtoidentifybiologicallyfeasiblesymptomclusterswithinthe

Page 27: Exploratory Factor Analysis: Model Selection and

19

questionnaireEFAs.Assuch,screeplotsandparallelanalysiswereprioritized,andinaddition

wasviewedasbeingparsimoniouscomparedtoothermethods.Althoughparallelanalysisdid

notsuggestthesamenumberoffactorsforeachofthequestionnaires,itwasdecidedtodoso

forinterpretabilityandcomparisonreasons.FurtherelaborationisprovidedintheCRSEFA

chapter.

Discussion

Theresultsofoursimulationstudyprovidesomeinsightintotheefficacyofthe

proposedmethodaswellastheeffectivenessofother,commonlyutilizedmethods

incorporatedintothisstudy(K1,parallelanalysis,SSBIC,BIC).Itwasshownthat,withincreasing

samplesizes,thetracefunctionmethodperformedprogressivelybetter,eclipsingthe

performanceofothermethodsinmanytestedcircumstances(Table1).Interestingly,theother

methodsdidnotappeartovaryinperformancewithincreased𝑁acrossthemajorityoftested

circumstances(Table1).

Inthesesimulatedmatrices,thetracefunctionmethodoutperformedthethree

standardmethodswhenthestrengthoftheunderlyingfactorswassmall(Table1).The𝑁

requiredtoattainsuperiorperformancevaried,butseemedtobelessforweakercorrelation

structures(Table1).Forstrongercorrelationstructures,eachofthestandardmethodstested

producedconsistentlyveryaccurateresults(Table1).InallthreeCRSimpliedcorrelation

matrices,thesamplesizeadjustedBICaswellastheeigenvaluegreaterthan1method

achievedperfectaccuracyacrossallvaluesof𝑁,whilethestandardBICmeasureperformed

Page 28: Exploratory Factor Analysis: Model Selection and

20

verypoorly,withaccuraciesconstantlybelow10%.Althoughthetracefunctiondidnotattain

100%accuracyacrossallN,itimprovedfrom61,46and34%accuracyat𝑁 = 100to100,97,

and99%accuracyat𝑁 = 1000forbaseline,6month,and16monthCRSmatricesrespectively

(Table1).

TherehavebeenmanymethodsdevelopedtoselectthenumberoffactorsinanEFA

setting,allwithslightlydifferentinterpretations,strengthsandweaknesses.Frequently,

practitionersconductingEFAtreatthefactormodelliterallyandstrivetouncoverthe'true'𝑚

andevaluatetheirrespectiveloadingsaccordingly(Preacheretal.,2013).Webelievethisgoal

oftenisunclear,andpotentiallyunhelpful,fortworeasons.First,inpractice,researchersapply

severaldifferentcriteriatothetargettheyseek,includingaliteralnumberoffactorsunderlying

theobserveddata,themostinterpretablenumberoffactors,andvariousothercriteria.

Second,similarlytolinearregressions,modelsareonlyapproximate-withthatinmindwewill

understandthatwewillneverobserve'truth',onlyestimationsandapproximations(Preacher

etal.,2013).Assuch,afactor,onceidentified,isnotnecessarilyrealbutatbestauseful

approximationtoreal(biologicalorotherwise)phenomenon.

Searchingfortheliteralnumberoffactorsisarguablyinfeasible,outsideofextremely

controlledsettings,becauseobservedvariableslikelyareaffectedbyahugenumberof

contributing'factors'(latentorotherwise).Considerthenumberoffactorsunderlyingan

individual’stakehomeincome.Therearelikely'strong'factorsincluding:age,education,work

ethicandindustry.Buttherealsoisaprofusionof‘weak'factorssuchas,appearance,senseof

style,andvoicequalitywhichmayberelativelyunmeasurablebutcoulddrivestarkdifferences

inincomeamount.Notwithstandingthatfactormodelstypicallymustgreatlysimplifydata

Page 29: Exploratory Factor Analysis: Model Selection and

21

interrelationships,theestimationoffactorpresenceandidentitycanbeanextremelypowerful

tool,providinginsightintodiseasepathologiesandsymptomclassifications.Inthemotivating

study,EFAwasusedtobetterunderstandtherelationshipsbetweenCRSsymptomsandlatent

factors.Itwashypothesizedthatthesefactorsreflectedpathobiologicalphenomena,shedding

lightontowhatsymptomquestionresponsesaretrulymeasuringinthispopulation.

Asseeninthisstudy’sapplicationtotheCRSdata,differingmethodscouldpotentially

estimateawiderangeoffactors.Wehypothesizethatthisdiscrepancyreflectsthediffering

objectivefunctionsemployedbythevariousmethods.TheK1andparallelmethodsare

designedtoestimatethedimensionalityneededtorepresentsharedcovariationamongone’s

items,withoutexplicitreferencetoafactormodel.BICandthetracefunctionbothexplicitly

incorporatethefactormodelspecificationindeterminingfittotheempiricalcovariance

matrix—henceaddressFAassumptionsinadditiontodimensionality.Itmaynotthenbe

surprisingthatthesemethodsrequireahigherdimensionalitytoreproducetheempiricaldata

structure.Weconsiderthesensitivityofthedimensionalitychoicetothefactormodel

assumptionstobeinstructiveinthepresentcase.IntheCRSapplication,simpledimensionality

wasofinterest,ratherthanconsistencywithafactormodelperse,makingprioritizationof

thosemethodstunedtothisaswellasabiologicallymeaningfulinterpretationappropriate.

Whenextractingthe13factorsconsistentwiththetracemethodfromthebaselineCRSstudy,

moreover,themajorityoffactorsextractedappearedas‘singlesymptom’factors,addressing

singlephenomenaororgans.Theunderstandingofthesefactorsisnotparticularlyinteresting

fromabiologicalormedicalstandpoint,butmayrathersuggest‘latent’constructsassociated

withpatientresponses(similarsymptomquestionsareansweredsimilarly),whileother

Page 30: Exploratory Factor Analysis: Model Selection and

22

methodssuchasparallelanalysisprovideamoredesirableunderstandingoflatentconstructs

associatedwithactualpatientsymptoms.

Sincethe'true'numberoffactorsmayhavequestionablemeaninginsome

circumstances,somehavearguedthatsearchingforthe'optimal'numberoffactorsmaybethe

bestcourseofaction,basedonacriterioncenteredaroundachievingaspecificgoalsuchas

verisimilitudeorappearanceofreasonabletruth(Preacheretal.,2013).Othercriteriafor

“optimal”numbersoffactorsmayincludegeneralizability,theabilitytoattainsimilarresultson

anindependentdatacollectedfromthesamepopulation(Myung,2000),oraccurateand

precisedataapproximation(Owen&Wang,2016).

Themethodpresentedhereseekstoapproximateanunderlyingnumberoffactors

withoutforgoinggeneralizability.AfrequentproblemintheEFAframeworkisthatEFAsfrom

onesamplemaynotnecessarilymatchanotherEFAcarriedoutonanindependentsamplefrom

thesamepopulation.Byincorporatingandembracingtheideaofcrossvalidationand

generalizabilityfromthebeginning,thereisapossibilitythatcrossvalidatedapproachesmay

removevariabilityinthechoiceof𝑚resultinginmoreconsistentresultsacrossstudies

(Friedman,Hastie,&Tibshirani,2001).Theseconjectureswarrantfurtherresearchasthereare

manycasesandtypesofdatainwhichEFAframeworkswouldbeutilized.

AninterestingobservationfromoursimulationsisthatstandardmethodsSSBIC,BIC,

andK1,seemtoproducethesameerrorrateregardlessof𝑁(Table1).Thisempiricalresult

suggeststhatthesemethodsmaynotbeconsistentestimatorsof𝑚inthesetestedsettings.

Thetracefunctionontheotherhandnearlymonotonicallyincreasedinaccuracywith

Page 31: Exploratory Factor Analysis: Model Selection and

23

increasing𝑁(Table1).Inaddition,byutilizingah-foldcrossvalidatedprocedure,accuracymay

increasemoresteeplyasafunctionofNasℎ > 2hasbeenshowntoimproveestimation

accuracyandconvergence(Friedmanetal.,2001).Severalstudieshavesoughttoshow

consistencyintheEFAframework,astraditionalmethodssuchasBICmaynotbeconsistentfor

𝑚inallsettings(Bai&Ng,2002).Utilizingagreaterrangeof𝑁heremayenableustoshow

someempiricalconsistencywithsomeofthemethodsused,althoughonlythetracefunction

methodappearedtoapproachthisacrossavarietyofsampleandcorrelationstructuresettings.

Futurework

Oursimulationstudyprovidessomeinsightintotheperformanceofourcrossvalidated

discrepancyapproachtoEFAmodelselection.However,onlyasmallsamplingofpossible

correlationstructureswereutilized,allwithsimilartrue𝑚(𝑚 = 5or𝑚 = 10).Althoughwe

hypothesizethatthesignofanyloadingwillnotchangetheaccuracyofanyofthemethods

utilized,itisimportanttonoteallofthesimulatedmatricescontainedonlypositivefactor

loadings.

Expandingtheproposedmethodtoimplementh-foldcrossvalidationisalogicalnext

step.Currently,thedataissplitintotwoandmischosentobethenumberoffactorswhich

minimizesthediscrepancybetweenfitmodelandheldoutdata.Typically,allowingforan

increasednumberofdatapartitionsleadstomoreaccurateestimatesoftheerrorrate,andin

thiscase,mayproducemorestableestimatesof𝑚potentiallyatlowervaluesof𝑁aswell

(Friedmanetal.,2001).

Page 32: Exploratory Factor Analysis: Model Selection and

24

Additionally,itisimportanttoassesstheestimationbiasinallmethodsdiscussed.

Currently,accuracywasassessedonlyaswhetherornotthecorrectnumberoffactorswas

identified.Consideringtheimportanceofunderstandingamethod’stendencytooverorunder

factor,weplanalsotoassessthemagnitudeanddirectionofeachmethod’smisses.Ifover

factoringis,generally,betterthanunderfactoringforinstance,wemaywishtopenalize

overestimationof𝑚lessthananyunderestimation.Inaddition,missingthenumberoffactors

by1islikelylessdeleteriousthanby4,forexample,sothemagnitudebywhichanygiven

methodmis-estimatedshouldbeconsideredinfuturework.

Page 33: Exploratory Factor Analysis: Model Selection and

25

TablesandFiguresTable1.NumberofcorrectfactornumberassessmentsbysamplesizeadjustedBIC(SSBIC),standardBIC(BIC),Kaisereigenvaluesgreaterthan1rule(K1),parallelanalysis(PA),andtheproposedmethod(trace)outof100simulationreplicates.

Simulatedsamples

CorrelationStructure

SSBIC BIC K1 PA Trace

100 Strong 100 98 100 100 79

300 Strong 100 99 100 100 97

500 Strong 100 99 100 100 99

700 Strong 100 98 100 100 97

1000 Strong 100 98 100 100 99

100 Moderate 25 0 92 99 5

300 Moderate 20 0 83 100 69

500 Moderate 23 0 81 100 98

700 Moderate 16 0 90 99 98

1000 Moderate 24 0 86 100 100

100 Weak 2 0 38 75 0

300 Weak 0 0 34 89 30

500 Weak 0 0 44 79 72

700 Weak 1 0 40 82 89

1000 Weak 0 0 33 82 97

100 Moderate,lowdim 0 0 0 0 5

300 Moderate,lowdim 0 0 0 0 8

500 Moderate,lowdim 0 0 0 0 19

700 Moderate,lowdim 0 0 0 0 13

1000 Moderate,lowdim 0 0 0 0 35

100 Moderate,difdim 0 0 16 0 2

300 Moderate,difdim 0 0 14 0 11

500 Moderate,difdim 0 0 11 0 41

700 Moderate,difdim 0 0 14 0 54

1000 Moderate,difdim 0 0 14 0 75

100 CRSibl 100 7 100 100 61

300 CRSibl 100 5 100 100 100

500 CRSibl 100 7 100 99 100

Page 34: Exploratory Factor Analysis: Model Selection and

26

700 CRSibl 100 3 100 100 100

1000 CRSibl 100 9 100 99 100

100 CRSi6m 100 0 100 99 46

300 CRSi6m 100 0 100 100 94

500 CRSi6m 100 0 100 98 98

700 CRSi6m 100 0 100 97 95

1000 CRSi6m 100 0 100 99 97

100 CRSi16m 100 0 100 95 34

300 CRSi16m 100 0 100 95 96

500 CRSi16m 100 0 100 91 98

700 CRSi16m 100 0 100 87 96

1000 CRSi16m 100 0 100 91 99

100 10factor 100 0 100 95 34

300 10factor 100 0 100 95 96

500 10factor 100 0 100 91 98

700 10factor 100 0 100 87 96

1000 10factor 100 0 100 91 99

Page 35: Exploratory Factor Analysis: Model Selection and

27

Table2Estimatednumberoffactorsforeachofthe3questionnaires(baseline,6month,and16monthfollowups)fromcommonlyutilizedmethodsincludingKaisereigenvaluesgreaterthan1rule(K1),parallelanalysis(PA),standardBIC(BIC),empiricalBIC(EBIC),samplesizeadjustedBIC(SSBIC),andtheproposedmethod(TRACE).

Method Baseline 6-monthfollowup 16-monthfollowupK1 6 5 7PA 5 5 6BIC 15 16 14EBIC 8 8 8SSBIC 17 17 20TRACE 13 13 16

Page 36: Exploratory Factor Analysis: Model Selection and

28

Figure1.Strong“blocked”factorloadingsutilizedtocreatethecorrelationmatrixinthesimulationstudy.

Page 37: Exploratory Factor Analysis: Model Selection and

29

Figure2.Correlationmatrixgeneratedfromthestrong“blocked”factorloadingsmatrix.

Page 38: Exploratory Factor Analysis: Model Selection and

30

Figure3Tracefunction’sdiscrepancyvaluesonthebaselineCRSdata.Verticallinedenotestheminimumachievedat13factors.

Page 39: Exploratory Factor Analysis: Model Selection and

31

Appendix

AdditionalFigures:

FigureA1.Theutilizedstrongloadingmatrix,createdusingtheDirichletsimulationprocess,consistingof5factorsand25variables.

Page 40: Exploratory Factor Analysis: Model Selection and

32

FigureA2.Theutilizedstrongcorrelationmatrix,createdfromthecorrespondingstrongloadingmatrix

Page 41: Exploratory Factor Analysis: Model Selection and

33

FigureA3.Theutilizedmoderateloadingmatrix,createdusingtheDirichletsimulationprocess,consistingof5factorsand25variables.

Page 42: Exploratory Factor Analysis: Model Selection and

34

FigureA4.Theutilizedmoderatecorrelationmatrix,createdfromthecorrespondingmoderateloadingmatrix

Page 43: Exploratory Factor Analysis: Model Selection and

35

FigureA5.Theutilizedweakloadingmatrix,createdusingtheDirichletsimulationprocess,correspondingfrom5factorsand25variables.

Page 44: Exploratory Factor Analysis: Model Selection and

36

FigureA6.Theutilizedweakcorrelationmatrix,createdfromthecorrespondingweakloadingmatrix

Page 45: Exploratory Factor Analysis: Model Selection and

37

FigureA7.Theutilizedmoderate/lowdimensionalloadingmatrix,createdusingtheDirichletsimulationprocess,consistingof5factorsand11variables.

Page 46: Exploratory Factor Analysis: Model Selection and

38

FigureA8.Theutilizedmoderate/lowdimensionalcorrelationmatrix,createdfromthecorrespondingmoderate/lowdimensionalloadingmatrix

Page 47: Exploratory Factor Analysis: Model Selection and

39

FigureA9.Theutilizedmoderate/differentdimensionloadingmatrix,createdusingtheDirichletsimulationprocess,consistingof5factorsand27variables.

Page 48: Exploratory Factor Analysis: Model Selection and

40

FigureA10.Theutilizedmoderate/differentdimensionalcorrelationmatrix,createdfromthecorrespondingmoderate/differentdimensionalloadingmatrix

Page 49: Exploratory Factor Analysis: Model Selection and

41

FigureA11.Theutilizedtenfactorloadingmatrix,createdusingtheDirichletsimulationprocess,consistingof10factorsand100variables.

Page 50: Exploratory Factor Analysis: Model Selection and

42

FigureA12.Theutilizedtenfactorcorrelationmatrix,createdfromthecorrespondingtenfactorloadingmatrix

Page 51: Exploratory Factor Analysis: Model Selection and

43

FigureA13.Tracefunction’sdiscrepancyvaluesonthestrong“blocked”factorloadingmatrix.Verticallinedenotestheminimumachievedat5factors.

Page 52: Exploratory Factor Analysis: Model Selection and

44

SimulatingFactorModelCorrelationMatrices

Motivation

Insimulationsoffactoranalyses,itisimportanttobeabletorandomlygeneratevalid

correlationmatriceswhichstemfromsomeknownfactormodelinordertoassessmodel

selectionmethodsandotherattributesoffactoranalysisprocedures.

Computation

Acoreideaoffactoranalysisisthatwecanexplainvariabilityinourobserveddataby

meansofasmallernumberofunderlying,latentfactors,whichareassociatedwithobserved

variables.Mathematicallyspeaking,ourcorrelationmatrix,𝜌,canbebrokendownassuch:

𝜌 = 𝛬𝛹𝛬′ + 𝛥?

where𝛬isthe𝑝×𝑚matrixoffactorloadings,𝛹isthe(𝑚×𝑚)factorcorrelationmatrix,and

𝛥?isthematrixofuniquevariances(𝑝×𝑝diagonalmatrix).

Thiscanbefurtherwrittenas𝜌 = 𝛬𝛹𝛬′ + (𝐼 − diag(𝛬𝛹𝛬′))

Ifweareinterestedingeneratingarandom,structuredloadingsmatrix,thefollowing

procedureisproposed.Wecantreateachrow𝑖of𝛬asaDirichlet(𝛼#,,, . . . , 𝛼#,*)×Beta(𝑥#, 𝑦#)

whereeach𝛼#,% issomeproposedweightastohowstrongwewouldlike(onaverage)variable𝑖

toloadoneachfactor.

Theseconstraintsensurethat𝛬willbeavalidloadingsmatrixas:

Page 53: Exploratory Factor Analysis: Model Selection and

45

• Allloadingsarebetween-1and1

• Thesumofsquaredloadings(foravariable)islessthan1(avoidingaHaywoodcase)

Now,wewilllet𝜓bethe𝑚dimensionalidentitymatrix(allfactorsareorthogonal).

𝜌 = 𝛬𝛹𝛬′ + (𝐼 − diag(𝛬𝛹𝛬′))

= 𝛬𝛬′ + (𝐼 − diag(𝛬𝛬′))

Weknow𝛬𝛬′ispositivesemi-definiteaseachelementof𝛬isarealnumber.

Wealsoknow,apositivesemi-definitematrixplusamatrixofthesamedimensionwith

allnon-negativeentriesisalsopositivesemi-definite.Itfollowsthat𝜌ispositivesemi-definite.

Inaddition,becauseeachentryof𝛬isin[0,1],weknoweachelementof𝛬𝛬′willbebetween

[0,1]whilethe+(𝐼 − diag(𝛬𝛬′))termensuresthatthediagonalof𝜌areall1.Thus,𝜌isavalid

correlationmatrix,uniquelydeterminedby𝛬.

Use

Thisideaofbeingabletoconstructrandomcorrelationmatricesisimportanttocertain

simulationstudieswhereonemustgeneraterandomyetvalidcorrelationmatricesinwhichthe

truenumberoffactorsisknownandfixed.Becauseanycorrelationmatrixstemmingfromthis

methodwouldinherentlybeabletoperfectlydecomposeintothetruenumberoffactors,

samplingnoiseshouldbeadded.Thiscanbeaccomplishedbysamplingfromamultivariate

distribution(inthiscasemultivariatenormal)withthespecifiedcorrelationmatrix,them

computinga'simulatedempirical'correlationmatrixfromthesimulatedmultivariatedata.

Page 54: Exploratory Factor Analysis: Model Selection and

46

Furtherwork

Theabovemethodcanproducemanytypesofrandomcorrelationmatrices;however,

allloadingsmustbepositive,whichisnotrequiredoffactoranalysisloadings.Wemaybeable

toincorporatenegativeloadingsbyutilizingaBernoulliprocessbywhicheachcellof𝛬has

someprobabilityofbeingmultipliedby-1or1.Thiswouldallowfornegativeloadings(and

correlations)whileensuring𝜌remainsavalidcorrelationmatrix.

Page 55: Exploratory Factor Analysis: Model Selection and

47

Chapter3-ExploratoryFactorAnalysisofCRSSymptoms

Introduction

Chronicrhinosinusitis(CRS)isaninflammatoryconditioncharacterizedbynasaland

sinussymptoms,affecting15%oftheUnitedStatespopulation(Wjetal.,2012).Thereare

consideredtobefourcardinalsymptomsofthediseasewhichincludenasaldrainage(anterior

orposterior),nasalblockage(congestion),smellloss,andfacialpainorpressurelastingfor12

ormoreweeks(Browne,Hopkins,Slack,&Cano,2007;Tan,Kern,Schleimer,&Schwartz,2013).

TheEuropeanPositionPaperonRhinosinusitisandNasalPolyps(EPOS)diagnosismethodology

forCRSisbaseduponthepresenceofnasalobstructionordischargeandatleastoneother

symptomaswellasobjectiveevidenceofinflammationonsinuscomputerizedtomography(CT)

scanorsinusendoscopy,whichmayincludesinusorosteomeatalcomplexmucosalchanges,

presenceofnasalpolyps,ormucopurulentdischargefromthemiddlemeatus(Wjetal.,2012).

BecauseofthedifficultyofobtainingsinusCTorendoscopyinlarge-scalepopulationstudies,

EPOSalsohasanepidemiologicdefinitionofCRSbasedonsymptomsanddurationonly.

However,EPOSdoesnotspecifyhowtomeasuresymptomsintermsofseverity(e.g.,some

blockageorcompleteblockage;partialsmelllossorcompletesmellloss;thequantityof

discharge;theseverityofpain)orfrequency(e.g.,someofthetime,mostofthetime,orallof

thetime).

Nasalandsinussymptomslastingthreemonthsarequitecommon,andmanystudies

havereportedthatthereisnotastrongcorrelationbetweensuchsymptomsandobjective

Page 56: Exploratory Factor Analysis: Model Selection and

48

opacificationonsinusCTscans(Browneetal.,2007;Ferguson,Narita,Yu,Wagener,&

Gwaltney,2012;Wjetal.,2012).Upto40%ofthosewithsymptomsmeetingEPOScriteriafor

CRSdonothavesignificantsinusopacificationonCT(Fergusonetal.,2012).Thelackof

correlationofsymptomsmeetingEPOScriteriaforCRSandfindingsonsinusCTcouldbedueto

imprecisioninthewaysthatnasalandsinussymptomshavebeenmeasuredintermsof

severity,frequency,andduration(Hamilos,2011;Wjetal.,2012).Inaddition,therearefew

studiesthatexaminehownasal,sinus,andotherrelevantsymptomsrelatetooneanother

withinpatientscross-sectionallyorlongitudinally.Understandingtheserelationshipsamong

symptomsmayguidemoreprecisesymptommeasurementinwaysthatincreasethelikelihood

thatpatientswithcertainnasalandsinussymptomsalsohaveobjectiveevidenceof

opacification.

Weusedexploratoryfactoranalysis(EFA)toassessthelatentstructureofnasal,sinus

andothercommon,relevantsymptomsatcross-sectionforthreeseparatetimepoints,aswell

asthechangeinthesesymptomsovertime.Bylatentstructureofsymptoms,wemeanthe

otherwiseunseenpatientattributesdrivingthemanifestationofsymptoms.Whilepriorstudies

haveusedEFAappliedtoCRSsymptomsatonepointintime,theyhaveutilizedtheSino-nasal

OutcomeTest(SNOT)familyofquestionnaires,designedtoassesstreatmenteffectiveness

amongpatientsknowntohaveCRS(Browneetal.,2007;ClaireHopkins,Browne,Slack,Lund,&

Brown,2007).SNOTassessessymptomseverityinatwo-weekrecallwindow,socannotbe

usedtoevaluatecompliancewithEPOSdurationcriteria,anddoesnotevaluatesymptom

frequency(ClaireHopkinsetal.,2007;Wjetal.,2012).Thequestionnaireutilizedinthisstudy

incorporatedquestionsassessingfrequencyofEPOSdefinedsymptoms,aswellasfrequencyof

Page 57: Exploratory Factor Analysis: Model Selection and

49

severeandrelatedsymptoms,inordertoassessabroadrangeofpotentiallyCRS-associated

manifestations.Understandinghowsymptomsmaygroupatonepointintimeandchangeover

timecouldallowdevelopmentofmorepreciseapproachestosymptommeasurement;andalso

allowdevelopmentofdifferentbiologicrationalesforhowthesesymptomsmaygrouptheway

theydo.

Methods

Studypopulationanddesign

Atotalof200,769GeisingerClinicprimarycarepatientsovertheageof18yearswith

bothelectronichealthrecord(EHR)andrace/ethnicitydatawereeligibleforparticipationin

thisstudy.Fromthesepatients23,700werechosentoberecipientsofaseriesof

questionnairesutilizingasamplingschemethathasbeenpreviouslydescried(Hirschetal.,

2017;Tustinetal.,2017).Inbrief,usingastratifiedrandomsamplingmethodtoover-sample

bothracial/ethnicminoritiesaswellasthosewithhigherlikelihoodsofCRSusingInternational

ClassificationofDiseases(ICD-9)andCurrentProceduralTerminologycodesinEHRdata,

patientswereselectedtoreceiveself-administeredquestionnairesthroughthemail(Hirschet

al.,2017;Tustinetal.,2017).

Participantswhoreturnedthebaselinequestionnaire(n=7847)werefollowedfor16

months,fromApril2014toAugust2015,andreceivedtwoadditionalquestionnairesatsix

monthsand16months.Non-respondersweresentquestionnairesoneortwoadditionaltimes.

Thequestionnaireswerediverseintermsofinformationrequested,providinginformation

Page 58: Exploratory Factor Analysis: Model Selection and

50

aboutaspectrumofsymptomsincludingpresence,frequency,severity,andbotherofarange

ofsymptomsassociatedwithCRSandco-morbidconditionslikeheadachedisordersand

asthma(Table1,Hirschetal.,2017;Tustinetal.,2017).Eachquestionnaireincluded37

commonquestions,eachwiththesameresponseoptions(howoftenthesymptomoccurredin

thepastthreemonthsas1=never,2=onceinawhile,3=someofthetime,4=mostofthe

time,or5=allthetime;Table2).Atotalof21questionswereaboutthepresence,severity,

anddegreeofbotherofCRSnasalandsinussymptomswereincorporatedwhiletheremaining

questionsassessedpresenceoffourasthmasymptoms;fourallergysymptoms;threeear

symptoms;andfiveconstitutionalandotherrelatedsymptoms(Table2).

Datacollection

ThebaselinequestionnairewasmailedinApril2014,the6-monthfollow-upinOctober

2014,andthe16-monthfollow-upinAugust2015.Theseconsistedof94,87,and79questions

respectively,butthecurrentanalysisfocusedonthe37questionsthatwerecommontoall

three(Table2).Afterquestionnaireswerereceived,eachwasscannedandthendatawas

double-checkedandverified.Atotalof7834personsreturnedthebaselinequestionnaire

(respondingtoatleastoneofthe37questionsofinterest),4945returnedthe6-monthfollow-

upquestionnaire,and4584returnedthe16-monthfollow-upquestionnaire.

Skippatternswerepresentinthequestionnaires,bywhichpatientswouldbeaskedto

skipblocksofquestionsiftheresponsestothesequestionscouldbecompletelydeterminedby

previousresponses.Thisoccurredonlywhenpatientsindicatedthattheyhadnotexperienced

thesymptom(s)ofinterest,makingfurtherquestionspertainingtothatsymptomirrelevant.

Page 59: Exploratory Factor Analysis: Model Selection and

51

Theseskippatternswereaccountedforbyfillinginimpliedresponseswhenskip-pattern

missingnesswaspresent.

Analyticvariables

TheEuropeanPositionPaperonRhinosinusitisandNasalPolypssubjective(EPOSs)

criteriawereusedtoclassifypatientsascurrent,previous,orneverCRSbasedonpatient

reportedsymptomsfromonlythebaselinequestionnaire.EPOSscriteriarequirethreemonths

ofobstructionoranteriororposteriordischargewithoneotherofthecardinalsymptomsof

smellloss,facialpain,orfacialpressure,lastingthreeormoremonths.Patientswereclassified

usingquestionnaireresponses,specificallylifetimeandprevious3monthsofsymptoms,being

labeledascurrentCRS,iftheymetEPOSsCRScriteriainthethreemonthsbeforethebaseline

questionnaire;aspastCRSiftheymetthesecriteriaintheirlifetimebutnotinthethree

monthsbeforethebaselinequestionnaire;andneverCRSiftheynevermetthesecriteriain

theirlifetime.Thequestionnairehasbeenpreviouslydescribed,fromtheChronicRhinosinusitis

IntegrativeStudiesProgram(Hirschetal.,2017;Wjetal.,2012),andincludedincomeand

educationinformationatbaseline.Otherdemographiccharacteristicsincludingage,sex,and

race/ethnicity,aswellashealthinformationsuchasbodymassindex(BMI,measuredin

kg/m2),werecollectedviaelectronichealthrecorddata.

StatisticalAnalysis

Thegoalsoftheanalysisweretoidentifytheunderlyingstructure,ifpresent,amongthe

37surveyquestionsateachquestionnairetimepoint,andthenamongthechangeinthese

Page 60: Exploratory Factor Analysis: Model Selection and

52

symptomsovertime,frombaselineto6-monthfollow-upandfrom6-monthfollow-upto16-

monthfollow-upquestionnaires.Ofthe7847patientswhoreturnedthebaseline

questionnaire,theanalysisincludedthe3535patientswhoreturnedallthreequestionnaires

withnomorethan5missingvaluesforthe37questionsforanysinglequestionnaire.Wedid

notwanttoimputevaluesforsubjectswithmanymissingquestionssincetheprimarygoalof

theanalysiswastoevaluatetheunderlyinglatentstructureofthepatternsofsymptom

reporting,andalargeportionofpatientswereonlymissingasmallnumberofresponses.For

missingnessforsubjectswithfiveorfewermissingvalues,whichweassumedtobeatrandom,

multivariateimputationbychainedequationswasconductedtoimputemissingvaluesfor

patientquestionnairesthatwereincludedinthisstudy(3.5%),utilizingonlyinformationwithin

eachsurvey.ThisimputationwascarriedoutviathemiceRpackageusingthepredictivemean

matchingmethod(Buuren&Groothuis-Oudshoorn,2011).Oncedatawerefinalizedforeach

patientquestionnaire,twochangescoreswerecalculatedasthedifferencebetweeneach

person’sadjacentquestionnaires(baselineto6-monthand6-monthto16-month).

Duetotheexclusioncriteriautilizedinthisstudy,notallpatientswereincludedinthe

finalanalysis.Summarystatisticsofdemographic,health,andsocioeconomicinformationwas

computedandcomparedbetweentheincludedindividualsinthisanalysisandthosewhowere

excluded.Inaddition,lasagnaplotswereexaminedinordertovisuallyassessthetransitions

betweenindividualquestionresponsesoverthe3questionnairedurationofthestudyperiod

(Figure1,Swihartetal.,2010).

Exploratoryfactoranalysiswasutilizedasthereweremultiplehypothesesandlittlea

prioriknowledgeoftheunderlyingstructureofsymptomreporting.Recognizingtheordinal

Page 61: Exploratory Factor Analysis: Model Selection and

53

scalingofthedata,impliedPearson(polychoric)correlationswereestimatedamongthe37

questionsforthethreecross-sectionalquestionnaires,usingthequicktwostepprocedureas

implementedbythepsychRpackage(Revelle,2017).Thesecorrelationswerethenutilizedin

exploratoryfactoranalyseswithordinalvariables.Meanwhile,Pearsoncorrelationmatrices

werecalculatedforeachofthetwochangescoresasthedifferencescoredistributionappeared

symmetricandcontainedmorelevelsthanpracticalforpolychoriccorrelations.

ForeachofthefiveEFAs(threecross-sectionalandtwodifferences),afactoranalysis

wasconductedfittingloadingsestimatesandcommunalitiesapplyingtheordinary

(unweighted)leastsquares(OLS/ULS)proceduretocorrelationsestimatedasjustdescribed.An

obliminrotationforeachfactoranalysiswasutilizedinordertoallowforcorrelationsamong

factors(Revelle,2017).InEFAsettings,determiningthenumberoffactorsisakeystepin

identifyingfactorstructure.Commonly,manymethodsareutilizedinordertoassesswhich

selectionismostappropriate,eachwithdifferentoptimalitycriteriadrivingdifferent

interpretationsofresults.Inthisstudy,biologicalinterpretabilityandparsimonywerestressed,

inaccordancewithanalysesandconsiderationsprovidedinthepreviouschapter,the

qualitativemethodofexaminingscreeplotsandthequantitativeparallelanalysismethodwere

takentogethertodeterminingtheoptimalnumberoffactorstoextract.Thescreeplotdisplays

eigenvaluesofthecorrelationmatrixinrankorderbysizefromlargesttosmallest(x-axis=size

rank,y-axis=eigenvalues)toassessthelocationofaclear“elbow”shapewheretheslopeof

thecurvechangedfromrapiddeclineineigenvalueswithincreasingranktoaflatteningofthe

curve,asperCattell’sScreetest(Cattell,1966).Meanwhile,parallelanalysiscompares

eigenvaluesfromrandomdatamatriceswithuncorrelateditemresponseswithobserved

Page 62: Exploratory Factor Analysis: Model Selection and

54

eigenvalues:thenumberofrankedobservedeigenvaluesgreaterthantherandomlygenerated

onesisthenumberoffactorsretained(Humphreys&Jr,1975).Oncefactorloadingswere

extracted,factorscoreswereestimatedforeachidentifiedfactorforeachpatientusingitem

responsetheory(IRT)basedscoresforpolytomousitemsforeachofthethreesurveys(Kamata

&Bauer,2008).Theseestimatedfactorscoreswerecomputedasameasureofthestrengthof

eachlatentfactorforeachpatient.TheseestimatedfactorscoreswerecomparedacrossEPOSs

CRSstatusgroups(current,previous,never).Amultivariateanalysisofvariance(MANOVA)was

fitinordertocomparethemeanmultivariatefactorscore(vectoroftheestimatedfactor

scores)betweenEPOSsCRSstatusgroupsforfactorscoreswhichappearedtofollowan

approximatelynormaldistribution.Onefactorscorehadamixed-scaledistribution,witha

considerableproportionofindividualshavingalow(attheIRT-lowerbound)valueandthe

remainingindividualsdistributedrelativelycontinuouslyamonghighervalues.Torelatethis

scoretoEPOSstatus,alogisticregressionwascomputedtoestimatetheoddsofhavingalow

factorscoreasafunctionofEPOSsCRSgroup,andalinearregressionwasusedtoestimatethe

meanfactorscoresbyEPOSsCRSgroupforthosepatientswhodidnothavethelowerbound

factorscore.

Wealsosoughttoassesswhetherornotthebaseline–6monthdifferencecaptured

morevariabilitythanthe6month–16monthdifference.Tothisend,eachdifferenceEFA

communalitywasextractedandthencomparedbytimeperiod(baseline-6monthversus6

month-16month),usingaWilcoxonsignedranktest.Wehypothesizedhighermean

communalityvaluesinthebaseline–6monthperiodEFAthanthe6month–16monthperiod,

correspondingtohigherstabilityoverashorterperiodforchange.

Page 63: Exploratory Factor Analysis: Model Selection and

55

SensitivityAnalysisandDiagnostics

Diagnostics

Kaiser-Meyer-Olkin(KMO)factoradequacywasevaluatedforeachcomputed

correlationmatrixtofurtherassesstheappropriatenessoffactoranalysis.KMOmeansquare

error(MSA)statisticsof0.96,0.96,0.95wereobservedforbaseline,6-monthfollow-up,and

16-monthfollow-upquestionnaires,respectively.Thetwochangescoredifferencecorrelation

matricesyieldedKMOMSAsof0.91and0.90forthefirstandseconddifferences,respectively.

TheseKMOstatistics,allofwhichweregreaterthan0.9,indicatedaveryhighdegreeof

commonvariance,andsupportedaconclusionthatourcovariancematriceswereverywell

suitedtobesubjectedtofactoranalysis.

Sensitivitytofactoringmethod

Eachfactoranalysiswasrefitusingweightedleastsquares,principalfactors,maximum

likelihood,andgeneralizedleastsquarestoensurethequalitativeinterpretationoftheloadings

wasnotconditionalonthefactoringmethod.Weselectedordinaryleastsquaresasthefinal

factoringmethodasOLSproducesunbiasedrotatedfactorloadings,andhasdesirable

characterizesatlargesamplesizes(Lee,Zhang,&Edwards,2012).Loadingmatrices,

communalities,andinter-factorcorrelationmatriceswereexamined.Loadingsmatrices,which

maycontainentriesrangingfrom-1to1,provideameasureofthestrengthoftherelationship

betweeneachquestionandeachoftheextractedfactors,whilethecommonalitiesforeach

Page 64: Exploratory Factor Analysis: Model Selection and

56

question,whichrangefrom0to1,areinterpretedasthefractionofhowmucheachquestion’s

variabilityisexplainedbytheutilizedfactormodel.Finally,inter-factorcorrelationmatrices

examinedeachfactor’srelationswiththeotherfactorsthatwerederivedfromthefinalEFA.

Imputation

Toevaluatethesensitivityofresultstomissingdataandimputation,atotalof100

imputeddatasetsweregeneratedfromtheoriginaldatasetwithmissingnessusingthesame

multipleimputationmethodology(mice)aspreviouslystated.Latentcontinuous(polychoric)

correlationswerecalculatedandcomparedacrossimputeddatasetsforeachquestionnaire

item.Eachofthese666(allofthebivariatecorrelationsamongtheresponsestothe37

questions)pairwisecorrelations’standarddeviationswerecomputedusingthe100imputed

datasets,andwereexamined.Acrossthesepairwisecorrelations,99.5%ofstandarddeviations

werebelow0.0064,0.0089,and0.0036forthebaseline,6-monthfollow-up,and16-month

follow-upquestionnaires,respectively,suggestingthattheimpactofrandomimputationon

correlationmatricesandsubsequentfactoranalyseswasminimal.

Results

Descriptionofstudysubjects

The3535patientsincludedintheanalysiswerefirstcomparedtothe4312respondents

ofthebaselinequestionnairewhowerenotincluded(Table1).Thetwogroupsweresimilaron

sexdistribution(37.8%vs.36.9%male,respectively)andmeanbodymassindex(BMI,30.0vs.

30.3kg/m2,respectively).However,includedandexcludedpatientsdifferedonanumberof

Page 65: Exploratory Factor Analysis: Model Selection and

57

otherstudyvariables,includingage(57.5vs.53.2yearsonaverage,respectively),race/ethnicity

(94.0%vs.87.5%white,respectively),andsocioeconomicstatus(32.9%vs.25.1%earnedover

$50,000annually,respectively).

Itwasobservedthatacrosstime,individualsexperiencedvaryingdegreesofchanging

symptoms,asexemplifiedbyresponsestothequestion(number3)aboutthefrequencyofpost

nasaldripacrossvisits(Figure1).Althoughsymptomsatbaselinegenerallypredictedsymptoms

overtime,itwascommonforsymptomstochangebyonefrequencycategory,andsome

patientschangedbytwoormore.Thosewhoanswered“never”havingpostnasaldripinthe

previous3monthsatbaselinetypicallyrespondedhavinglowfrequency(neveroronceina

while)ofthesymptomat6monthsand16months(Figure1).Similarly,thosewhoresponded

experiencingthesymptom“allofthetime”atbaseline,weremorelikelytoexperiencethe

symptomoftenat6monthsand16months.Thispatternwasevidentinotherquestionsaswell

(resultsnotshown);dataaredisplayedforquestion3becauseithadarelativelyuniform

distributionofresponsesatbaseline(theotherquestionshadlargerproportionsofsubjects

whoreportedneverexperiencingthesymptom).

Cross-sectionalEFAs

Foreachofthethreecross-sectionalEFAs,screeplotresultssupportedtheextractionof

fivefactors(seeappendix).Parallelanalysissuggestedtheretentionoffivefactorsforbaseline

and6monthquestionnaires,andsixfactorsforthe16-monthfollowup.Forcomparability,5

factorswereextractedfromeachofthesequestionnaires.Eachofthestructuresinthesethree

EFAswassimilar,andthecontentofthefivefactorswasthesameforeach,withonefactor

Page 66: Exploratory Factor Analysis: Model Selection and

58

eachforcongestionanddischargesymptoms,painandpressuresymptoms(including

headache),asthmaandconstitutionalsymptoms,earandeyesymptoms,andsmellloss(Table

3forbaselineEFA,othertwocross-sectionalEFAs,seeappendix).Factorloadings,orthedegree

towhichanyspecificquestionwasrelatedtoaspecificlatentfactor,wereconsistentacrossall

threequestionnaires(similartoresultsinTable3forbaseline,othercross-sectionalEFAresults

notshown).Mostobservedcommunalitieswerehigh,indicatingthatthefactormodelswell-

representedthesequestionswhichwereincludedintheanalysis(Table3).Afewlow

communalitieswereobservedforbadbreath(0.26),fever(0.34),cold/flusymptoms(0.37)and

fatigue(0.39),suggestingthatourfive-factormodeldidnotaccountformuchofthevariability

inthesesymptoms,andthustheydidnotloadheavilyonanysinglefactor.Afterusingthe

baselinemodeltoestimatefactorswithinindividualsatbaseline,theinter-factorcorrelations

resultingfromobliminrotationrangedfrom0.30to0.64(Figure2).

LongitudinaldifferenceEFAs

Analysisresultsalsosupportedfive-factormodelsforeachofthetwolongitudinal

differenceEFAs.Symptomsidentifiedtoloadonsinglefactorsinthedifferenceanalyses

indicatesthatthesesymptomschangetogetherandinthesamedirectionovertime.Notably,

thetwodifferenceEFAs(bothfor6and10monthdurations)yieldednearlyidenticalfactors

(Table4forchangefrom6-monthto16-monthquestionnaires,seeappendix)whichdisplayeda

fairlysimilarstructuretothefactorsidentifiedineachofthecross-sectionalEFAs(Table3).In

ordertocomparemodelfitbetweenthetwodifferenceEFAs,aWilcoxonsignedranktestwas

utilizedtotestthehypothesisthatthebaselineto6-monthdifferenceEFAexplainedmore

Page 67: Exploratory Factor Analysis: Model Selection and

59

variabilitythanthe6to16-monthdifferenceEFAbyexaminingthedifferenceinindividual

variablecommonalitiesbetweenthetwoEFAs.Thebaselineto6monthdifferenceEFAhada

significantlygreateraveragecommunalitiesthanthe6monthto16monthdifferenceEFA(p-

value=0.002;seeappendix).

FactorScores

UtilizingthefactorsfromthebaselinequestionnaireEFA,factorscoreswereestimated

andcomparedbetweenEPOSsCRSgroups(current,previous,andneveratbaseline;Figure4

forfactor1,resultsforotherfactorsnotshown).AMANOVAwasfit,comparingthefourfactor

scoresinthethreeCRSstatusgroupssimultaneously(omittingthe4thfactor,smellloss,asit

appearedtobenon-normallydistributed),andthisindicatedasignificantdifferencebetween

meanfactorscoresbetweengroups(p-value<0.001;Figure3).Weobservedfactorscores

werehigher,indescendingorder,forcurrent,past,thennever.WealsoobservedthatCRS

Factorscoresshowedaweaker,butsimilarstructureasindividualquestionnaireresponses,

withlowfactorscorevaluescorrelatingmorestronglywithlowscoresorresponseinthe

followingsurveyandviceversa(Figure2,4).Utilizingthefactorscoresfromthe4thfactor(smell

loss),tworegressionswereconducted,alogisticregressionestimatingtheoddsofhavinga

lowerboundedfactorscore(-4)asafunctionofEPOSsCRSstatus,andalinearregression

comparingaveragefactorscoresforthosewithfactorscoresabove-4asafunctionofEPOSs

CRSstatus.ThosewithEPOSsCRScurrentatbaselinehadanoddsratioofexperiencinglower

boundfactorscoresof0.05(95%CI:0.04,0.06)comparedwithEPOSsnever,andEPOSs

previoushadanoddsratioofexperiencinglowerboundedfactor4factorscoresof0.15(95%

Page 68: Exploratory Factor Analysis: Model Selection and

60

CI:0.12,0.18)comparedwiththeEPOSsnevergroup.Forthoseabovethelowerbounded

factorscore,alinearregressionwasconducted.ThoseatEPOSsCRScurrenthadafactor4

score0.48higherthanthoseatEPOSsCRSneveratbaseline(95%CI:0.35,0.60)whilethoseat

EPOSsCRSprevioushadafactor4factorscoreof0.22higherthanthoseatEPOSsCRSneverat

baseline(95%CI:0.1,0.34).

Discussion

Exploratoryfactoranalysiswasconductedasameasurementexercisetobetter

understandtherelationshipandcategorizationofnasalandsinus,asthma,headache,

constitutional,allergy,andearsymptomsutilizingbothcrosssectionalsymptomquestionnaire

responsesandchangesinsymptomresponsesovertime.AllfiveEFAspresentedconsistent

findingsoffiveunderlyingfactorsthatwereidentifiableascongestionanddischarge,painand

pressure,asthmaandconstitutional,earandeye,andsmelllossfactors.ThebaselineEFAwas

usedtoestimatefivefactorscoreswithinsubjects,andallfivewerehigherinsubjectswhomet

EPOSscurrentCRScriteriaandlowestinthosewhometEPOSsneverCRScriteria.The37

questionsutilizedinthisanalysisweredevelopedtocaptureawiderangeofoverlapping

symptomsthatoccurinseveralco-morbidconditions;andtoevaluateboththefrequencyand

severityofsymptoms.Understandinghowsymptomsclusterwithinvisitandacrossvisitscan

provideusefulinformationthatcanaidclinicalpractice,informsymptommeasurementinCRS

patients,andleadtohypothesesaboutthepathobiologyunderlyingthesesymptoms.

WehypothesizedseveralpatternsofresultsintheEFAs.Onetheonehand,ifthereisan

underlyingconstructofCRSthatcanbemeasuredwithsixcardinalsymptomsthataremainly

Page 69: Exploratory Factor Analysis: Model Selection and

61

interchangeable,astheEPOScriteriasuggest,thenthesecardinalCRSsymptomscouldbe

expectedtoloadonasinglefactor.Ontheotherhand,therearespecificsinusesinwhich

inflammationhasbeenassociatedwithspecificsymptoms,suchasmaxillarysinusinflammation

associatedwithfacialpainandpressureandethmoidsinusinflammationassociatedwithsmell

loss.Thispathobiologicconsiderationwouldsuggestthatatleastthreefactorswouldbe

identifiedwiththecardinalCRSsymptoms.Wefoundthatthe37symptomsidentifiedfive

factors,andthesixcardinalCRSsymptomsloadedonthreefactors,bothincross-sectionaland

longitudinalEFAmodels.Thisstability,andcoherencetorealbiologicalprocessesmayprovide

someevidencethatthesefivefactorsmayhaveanunderlyingcommonpathobiology.

ThreeofthefivefactorswerecomposedofquestionsthatarecomponentsoftheEPOSs

criteriaforCRS,specificallythenasalcongestionanddischarge,facialpainandpressure,and

smelllossfactors,eachofwhichisoneofthecardinalEPOSsCRSsymptoms(Wjetal.,2012).As

thesesymptomsloadedondifferentfactors,itispossiblethattheunderlyingpathobiologymay

bedifferentfromoneanother.Whiletherewasclearevidencethattherewassome

longitudinalchangeinsymptomreporting,largetransitions(twoormorestepsontheLikert

scale)werenotverycommon.Thissuggeststhatmostpatientshavelongdurationsymptoms

andperhapsachronicpathobiologicprocess.Wehavepreviouslyreportedthatusingthe

cardinalsymptomstodefineEPOSsCRScategories(current,past,andnever)resultedinlarge

transitionsinpatientsmeetingcriteriaforthesecategoriesovertime;forexample,amongthe

subjectswhometEPOSscurrentCRSatbaseline,almosthalfdidnotmeetcriteriaforcurrent

CRSsixmonthslater(Sundaresan,2017,inPress).

Page 70: Exploratory Factor Analysis: Model Selection and

62

InCRS,thesinusesareinflamedandswollen,andassuchwecanconsiderthesefactors

inthecontextofparanasalsinusopacification.Themaxillaryandethmoidsinuses,when

congestedorinflamed,canpresentsymptomsoffacialswellingandpain,whichwerepresentin

thefacialpainandpressurefactor(Waldetal.,1981).Sphenoidopacification,although

comparativelyrare,canbeassociatedwithsometimessevereheadaches(Sieskiewiczetal.,

2011).Frontalandethmoidsinusitiscanbeassociatedwithsmelllossandnasaldischarge,

consistentwiththesmelllossfactorwhichwasobserved(Chang,Lee,Mo,Lee,&Kim,2009).

Thesinussymptomsthattendedtoclusterinourfivefactorshavegenerallystrongsinus

opacificationcorrelates(Changetal.,2009;Sieskiewiczetal.,2011;Waldetal.,1981).Itis

possiblethatthefactorsthatwereidentifiedbytheEFAproceduremaybeassociatedwith

sinusopacification,ananalysisthatiscurrentlyunderway.Whilethefactoranalysiswas

performedinover3500subjects,sinusCTscanswereobtainedfrom646ofthesesubjects.

Furthermore,theearandeyesymptomsseentopredominateinfactor5maybemeasuring

allergypresenceandseverity.

Becauseofthefactorrotationmethodchosen,non-orthogonalitybetweenfactorswas

allowedasameanstoseparatesymptomsintodistinctfactorsinsofaraspossible.Thus,

examiningthecorrelationbetweenfactorsmayprovidesomeinsightintothesymptom

relationships.Thecongestionanddischargefactorwasmoderatelycorrelatedwithboththe

painandpressurefactor(𝜌 =0.67)aswellastheearandeyesymptomfactor(𝜌 =0.63,Figure

1).Thesecorrelationsarenotentirelyunexpectedasthereareseveralpathobiologiesthat

coulddrivethesepatterns;otherdriversconnectingreportingofdifferentsymptomscouldalso

beatwork.

Page 71: Exploratory Factor Analysis: Model Selection and

63

Mostquestionnaireresponseswerewellrepresentedbytheobservedfactormodelas

measuredbythecommunalities,whichgiveaquantitativemeasureofthevariabilityofeach

questionexplainedbyourfinalfive-factormodel.IntheEFAofthebaselinequestionnaire,

mostoftheEPOSscorequestionshadcommonalityvaluesabove0.6,indicatingthatthemodel

accountedforamajorityofthevarianceinthereportingofthesesymptoms.Incontrast,

severalsymptomsdidnotloadheavilyontoanyfactorsinthebaselineEFA,andassuchalso

hadthelowestcommonalitiesof0.26(badbreath),0.37(coldsymptoms),and0.39(fatigue),

suggestingthatthemodelfailedtocapturemuchofthevariabilityinthosequestions.InanEFA

setting,wedonotnecessarilyexpectallcommunalitiestobehigh.Instead,theselow

communalityvaluesrevealthateitherthedriversofthesevariablesweredifferentthanthefive

observedfactors(highuniquevariance)orthatthesesymptomsweresubjecttohigher

measurementerrorthanothers.

Itiscommon,insocialandmedicalsciences,forfactoranalysestoincludedatafroma

singlepointintimeorinacontextwheretimeisunimportant(Browneetal.,2007).Inthis

study,wewereabletoincorporaterepeatedobservations,providinguswithnotonlythree

responsesforeachsymptomquestion,butalsoexplicitmeasuresofhowsymptomschanged

overtime.EFAtheoryhypothesizesthattherearerealunderlyingmechanisms,including

commonpathobiology,reportingphenomena,orotherreasonswhichmanifesteditselfinthe

clusteringofsymptomsintotheobservedfactors.Ifthishypothesiswascorrect,wewould

expectfactorcomposition(loadings)tobeinvarianttotime(i.e.noseasonality);andifthere

wassufficientvariationinsymptomreportingovertime,wewouldexpecttonotonlysee

factorspresentthemselvesacrosstime,butwewouldexpectthechangesinsymptomstodoso

Page 72: Exploratory Factor Analysis: Model Selection and

64

accordingtothesesamefactors.Withoutsufficientsymptomchanging,wewouldlikelynot

havestrongenoughdifferencestoidentifythesesamefactors.Inthisstudy,wedidobservethe

samefactorsinEFAsofcross-sectionalresponsesaswellasinthedifferencesinreportingover

time,afindingconsistentwiththeideathatthesearerealconstructsdrivingtheobserved

symptoms.InthedifferenceEFAs,therewerealmostalwayslowercommonalitiescompared

withthecrosssectionalEFAs.Becauseresponsestoquestionnaireitemsovertimedidnot

evidencelargechanges(i.e.,mostchangescoresfellbetween-1and+1,andmanyat0),we

wouldexpectthecommonalitiesindifferencestobesmallerthanfortheircross-sectional

counterparts.Thesedifferencesincommunalitiesalsocouldbeduetothedifferentcorrelations

utilized,polychoric(impliedPearsoncorrelations)versusPearsoncorrelations.Inaddition,

measurementandreportingerrorwerelikelylargerinthedifferencemeasuresaswewere

combiningtogetherpotentiallytwo(notnecessarilyindependent)errorterms.

Thecommunalitiesinthedifferencescoreswereconsiderablylowerthanthoseinthe

crosssectionalEFAsrangingfrom0.09(fever)to0.75(smellloss)inthechangefrom6-month

to16-monthEFAandfrom0.26(badbreath)to0.95(smellloss)inthebaselineEFA.These

commonalitiesshowthatsmelllosswasaverypersistentsymptom.Thetimedurationinthe

twochangeEFAswerenotthesame;thefirstchangemeasurewasoversixmonthsandthe

secondwasover10months.Wewouldexpectcommonalitiestobelowerforthelonger

durationEFA,andwefoundthistobethecase.Themeancommunalityofthefirstchange

measure(0.392)wassignificantlylargerthanforthesecond(0.356,p-value=0.001),whichwas

inlinewithexpectations,suggestingthatthedifferencefrombaselineto6months(6month

Page 73: Exploratory Factor Analysis: Model Selection and

65

duration)capturedvariabilitybetterthanthedifferencefrom6monthto16month

questionnaires(10monthduration).

ThemultidimensionalmeanfactorscoreswerecomparedbetweenthethreeEPOSsCRS

groups(current,past,never)usingMANOVA,logisticregression,andlinearregression:a

significantdifferencewasfound,indicatingadifferenceinfactorscoredistributionsbetween

theEPOSsCRSgroups(P-value<0.001).Forallfivefactors,themeanestimatedfactorscores

tendedtobehighestamongcurrentCRSsubjects,nexthighestforpastCRS,andlowestfor

neverCRS.ThisresultisnotinitselfsurprisingastheCRSgroupshereweredeterminedbythe

EPOSsdefinitionofthedisease,whichitselfisbasedonmanyofthesymptomsinthefactors.

However,theEFAincludedmanyquestionsbeyondthoseusedtodefineEPOSsCRSstatus.The

higherfactorscorescomprisedofeye,ear,asthma,constitutional,andheadachesymptoms

mayrepresentthecommonco-occurrenceofallergy,asthma,andheadachedisorders,for

example,amongCRSpatients.

WhiletherehasbeensomepriorworkonCRSfactorsatasinglepointintimewiththe

SNOT-20andSNOT-22questionnaires,therehasbeennopriorworkonCRSfactorsusing

longitudinalinformationonsymptoms(Browneetal.,2007).Previousstudieshaveexamined

thedecompositionofCRSandrelatedsymptomsusingtheSino-nasalOutcomeTest(SNOT)-20

and(SNOT)-22questionnairewhichmeasures“symptomsandsocial/emotionalconsequences

ofrhinosinusitis”througharangeofsymptomandhealth-relatedqualityoflifequestionsin20

or22Likert-scalequestions(DeConde,Bodner,Mace,&Smith,2014;C.Hopkinsetal.,2006).

Thesequestionsasktheparticipantstoconsiderphysical,functional,andemotionalsymptoms

theyhaveexperiencedintheprevious2-weekperiod(DeCondeetal.,2014;C.Hopkinsetal.,

Page 74: Exploratory Factor Analysis: Model Selection and

66

2006).SNOTwasdesignedtoprovideasinglemeasureofpatientqualityoflifeandCRS-related

symptomseverity,implicitlysuggestingthateachquestionprovidesinformationregardinga

singleCRSconstructorfactor(Browneetal.,2007).OnestudyfoundquestionsfromtheSNOT-

22decomposedintofiveclearrhinologicsymptoms,extranasalrhinologicsymptoms,ear&

facialsymptoms,psychologicaldysfunction,andsleepdysfunctionfactors,notasinglefactoras

themissionoftheSNOTsurveysmaysuggest(DeCondeetal.,2014).Similarly,ananalysisof

SNOT-20revealedfourunderlyinglatentfactors,rhinologicalsymptoms,earandfacial

symptoms,sleepfunction,andpsychologicalfunction(Browneetal.,2007).Takentogether,

bothofthesestudiessuggestedthatquestionsetstypicallythoughtofmeasuringonlyCRS

symptomseverityorCRS-relatedqualityoflife,wereactuallymeasuringavarietyof

unobserveddimensions.Althoughthesetof37questionsutilizedinourstudyismuchdifferent

inscopeandaimthantheSNOTquestionnaires,theresultswereconsistentinrevealingseveral

factors.

LimitationsandFurtherWork:

Weobservedbothsimilaritiesanddifferencesbetweenpatientcharacteristicsinthe

subjectwhocompletedthebaselinequestionnairewhowereincludedandexcludedinthe

EFAs.SubjectsintheEFAanalysis,whoreturnedallthreequestionnaireswithoutexcessive

missingdata,weremorelikelytobewhite,morehighlyeducated,andwithhigherincomes.

Thismayhaveresultedinselectionbiasthatcouldhaveinfluencedtheresults.Thisproject

reliedextensivelyonquestionnairequestionresponses.Whiledirectandeasytointerpretor

compare,surveymethodologiessimilartothisencouragerespondentstoonlyreportquestions

Page 75: Exploratory Factor Analysis: Model Selection and

67

thatwereaskedabout,potentiallymissingsymptomassociationsandrelationshipswithlatent

factors.Inaddition,theisthepotentialofsamesourcebiasimpactingresultsbywhichsome

individualsreportinasystemicmannernotnecessarilyassociatedwithsymptoms,suchas

someindividualsalwaysorneverreportingexperiencingsymptoms.Finally,whilewefound

strongevidenceofclusteringamong37symptomswithinvisitsandovertime,theultimate

utilityofthefindingswillbeincomparisontosinusopacification,whichawaitsfurtheranalysis

inasubsectoftheincludedsubjects.

ThisEFAgeneratedseveralhypotheses,mainlythattheunderlyingfactorsidentifiedby

theprocedurearemeasuringrealbiologicalphenomena,includingdistinctsinusopacification,

allergies,andasthmaseverity.Whileinteresting,studiesexaminingobjectivemeasuresofthe

presenceoftheseconditionsalongwiththesesymptomquestionsareneededtoprovide

substantiveevidenceofthisrelationship.

Conclusion

Inananalysisof37nasalandsinus,allergy,ear,asthma,headache,andconstitutional

symptoms,weidentifiedfiveunderlyingfactors–congestionanddischarge,painandpressure,

asthmaandconstitutional,earandeye,andsmellloss–thatwereconsistentinthreecross-

sectionalandtwolongitudinalchangeEFAs.Questionsassessedpresence,severity,bother,and

frequencyofall37symptoms.Themodelsgenerallyexplainedalargeproportionofthe

variationinthesesymptomswithinvisits,andsymptomslikesmelllossshowedmuch

persistenceacrossvisits.ThefindingshaveimplicationsforhowtoidentifypatientswithCRS

usingquestionnairesandmaysuggestsignificantmisclassificationinEPOSapproachestoCRS

Page 76: Exploratory Factor Analysis: Model Selection and

68

identification.TheymayexplainwhypatientswhomeetEPOScriteriaforCRSoftendonothave

evidenceofsinusopacification.MoredirectevidenceawaitsanalysisofthesinusCTdataina

subsetofthesesubjects.

Page 77: Exploratory Factor Analysis: Model Selection and

69

Tables&Figures

Table1.Demographicinformationofthe3535patientsincludedinthecurrentanalysisandthe4312patientswhoreturnedthebaselinequestionnairebutwerenotincludedinthecurrentanalysis.

Excluded IncludedMale,n(%) 1591(36.9) 1335(37.8)

Age,years,mean(SD) 53.2(16.8) 57.5(14.8)

SmokingstatusNever,n(%)Former,n(%)Current,n(%)

2253(52.2)1299(30.1)760(17.6)

2053(58.1)1100(31.1)382(10.8)

Income:<$25,000,n(%)$25,000-$50,000,n(%)>$50,000,n(%)

1599(37.1)1098(25.5)1083(25.1)

1021(28.9)970(27.4)1163(32.9)

Bodymassindex,kg/m2,mean(SD)

30.3(7.05) 30.0(6.93)

EducationlevelHighschool,n(%)Somecollege,n(%)Collegegraduate,n(%)

1608(37.3)1364(31.6)977(22.7)

1209(34.2)979(27.7)1171(33.1)

Race/ethnicityWhite,n(%)Black,n(%)Hispanic,n(%)

3372(87.5)264(6.1)276(6.4)

3323(94)78(2.2)134(3.8)

Page 78: Exploratory Factor Analysis: Model Selection and

70

Table2.Questionsforthethreecross-sectionalquestionnaires.Questionresponseswereona5-itemLikertscale*

Item# Questiontext

Onaverage,howofteninthepast*monthshaveyouhad…

1 …blockageofyournasalpassages(nasalcongestion)?

2 …nasaldischargethatwasyelloworgreenincolor?

3 …post-nasaldrip?

4 …lossofsenseofsmell?

5 …facialpain?

6 …facialpressure?

Checktheboxthatdescribeshowofteneachproblemhashappenedinthepast†months,onaverage

7 …bothofmynasalpassageshaveblockage

8 …atleastoneofmynasalpassagesiscompletelyblocked

9 …Ihavebeenverybotheredby,myblockednasalpassage(s)

10 …Ihavealotofnasaldischarge

11 …Ihavetoblowmynosemorethan10timesadaybecauseofmynasaldischarge

12 …Ihavebeenverybotheredbymynasaldischarge

13 …IhavebeencoughingafterIeatorliedown

14 …Ihavehadmucusinmythroatthatfeltlikealumporblockage

15 …Ihavebeenverybotheredbymypost-nasaldrip

16 …Ihavenotbeenabletosmellanything

17 …Ihavebeenverybotheredbymylossofsenseofsmell

18 …Onascaleof0to10,myfacialpainhasbeenatleasta5(0=nopain,10=worstpain)

19 …Ihavebeenverybotheredbymyfacialpain

20 …Myfacialpressurehasbeensevere

21 …Ihavebeenverybotheredbymyfacialpressure

Checktheboxthatdescribeshowoften,onaverage,youhadthefollowinginthepast†months…

22 …headaches

23 …fevers

24 …coughing

25 …badbreath

26 …fatigue

27 …nasalitching

28 …sneezing

29 …eyeitching

30 …eyetearing

31 …earfullness

32 …earpain

33 …earpressure

34 …wheezing(breathingwithwhistlingsoundinchest)

35 …chesttightness

36 …shortnessofbreath

Page 79: Exploratory Factor Analysis: Model Selection and

71

37 …cold/flusymptoms

*1=Never,2=Onceinawhile,3=Someofthetime,4=Mostofthetimeand5=Allthetime.†Forthebaselineand16-monthfollow-up=3months,forthe6-monthfollow-up=6months.

Page 80: Exploratory Factor Analysis: Model Selection and

72

Table3.Factorloadingsandsymptomcommonaltiesfromtheexploratoryfactoranalysis(EFA)ofthe37presence,severity,andsecondaryCRSsymptomatbaseline.TheEFAwasfitusingordinaryleastsquaresandanobliminrotation(numberofpatients=3535).Loadingslessthan0.3wereomittedforreadability.Communalitiesrepresentthefractionofeachsymptom’svariabilitythatwascapturedbytheutilizedfivefactormodel.

# ItemLabel Factor1 Factor2 Factor3 Factor4 Factor5 Communalities1 Blockage 0.65 0.802 Dischargediscolored 0.49 0.613 PND 0.84 0.784 Smellloss 0.95 0.895 Facialpain 0.83 0.856 Facialpressure 0.76 0.877 Blockagebothsides 0.58 0.728 Blockagecomplete 0.55 0.739 Blockagebothered 0.61 0.8110 Dischargealot 0.86 0.8411 Blownose10xdaily 0.82 0.7612 Dischargebothered 0.84 0.8413 Coughliedown 0.72 0.7314 Lumpinthroat 0.69 0.7315 PNDbothered 0.84 0.8516 Smelllosscomplete 0.97 0.9517 Smelllossbothered 0.92 0.9118 Facialpain5+ 0.83 0.9019 Facialpainbothered 0.84 0.9120 Facialpressuresevere 0.77 0.8621 Facialpressure

bothered 0.78 0.90

22 Headaches 0.67 0.4823 Fever 0.43 0.3424 Coughing 0.46 0.5 0.5325 Badbreath 0.2626 Fatigue 0.3927 Nasalitching 0.56 0.5328 Sneezing 0.31 0.54 0.5129 Eyeitching 0.72 0.6230 Eyetearing 0.6 0.4931 Earfullness 0.35 0.54 0.6232 Earpain 0.51 0.49 0.6533 Earpressure 0.47 0.46 0.6334 Wheezing 0.8 0.6635 Chesttightness 0.85 0.7836 Shortnessofbreath 0.82 0.68

Page 81: Exploratory Factor Analysis: Model Selection and

73

37 Cold/flusymptoms 0.44 0.37

Page 82: Exploratory Factor Analysis: Model Selection and

74

Table4.Factorloadingsandsymptomcommonaltiesfromtheexploratoryfactoranalysis(EFA)ofthe37presence,severity,andsecondaryCRSsymptomchangesfrom6to16months.EFAwasfitusingordinaryleastsquaresandanobliminrotation(numberofpatients=3535).Loadingslessthan0.3wereomittedforreadability.Communalitiesrepresentthefractionofeachsymptom’svariabilitythatwascapturedbytheutilizedfivefactormodel.

# ItemLabel Factor1 Factor2 Factor3 Factor4 Factor5 Communalities1 Blockage 0.46 0.32 Dischargediscolored 0.32 0.193 PND 0.49 0.284 Smellloss 0.68 0.475 Facialpain 0.66 0.476 Facialpressure 0.59 0.417 Blockagebothsides 0.43 0.288 Blockagecomplete 0.34 0.239 Blockagebothered 0.52 0.410 Dischargealot 0.72 0.511 Blownose10xdaily 0.66 0.4312 Dischargebothered 0.75 0.5413 Coughliedown 0.34 0.33 0.2914 Lumpinthroat 0.36 0.2715 PNDbothered 0.57 0.416 Smelllosscomplete 0.84 0.6917 Smelllossbothered 0.68 0.4818 Facialpain5+ 0.78 0.619 Facialpainbothered 0.79 0.6320 Facialpressuresevere 0.65 0.4421 Facialpressure

bothered 0.72 0.54

22 Headaches 0.1423 Fever 0.124 Coughing 0.42 0.2925 Badbreath 0.1226 Fatigue 0.1427 Nasalitching 0.35 0.1928 Sneezing 0.38 0.2529 Eyeitching 0.5 0.2930 Eyetearing 0.51 0.331 Earfullness 0.64 0.4132 Earpain 0.53 0.3333 Earpressure 0.63 0.3934 Wheezing 0.58 0.3335 Chesttightness 0.64 0.4

Page 83: Exploratory Factor Analysis: Model Selection and

75

36 Shortnessofbreath 0.6 0.3737 Cold/flusymptoms 0.35 0.24

Page 84: Exploratory Factor Analysis: Model Selection and

76

Figure1.Lasagnaplotdisplayingtheproportionofindividualswitheachgivenresponsetothequestion“Onaverage,howofteninthepast3monthshaveyouhadpost-nasaldrip?”atbaselineand6monthsand16monthslater(1=Never,2=Onceinawhile,3=Someofthetime,4=Mostofthetime,5=Allthetime).Y-axisvaluesindicatethenumberofpatientswitheachparticularresponseatbaseline.

Page 85: Exploratory Factor Analysis: Model Selection and

77

Figure2.Inter-factorcorrelationsatbaselinefromthebaselinequestionnaireexploratoryfactoranalysisfitviaordinaryleastsquaresandanobliminrotation.

Page 86: Exploratory Factor Analysis: Model Selection and

78

Figure3.Factor1(congestionanddischarge)scoresbyCRSEPOSgroupsatbaseline.FactorscoresacrossfactorsandCRSEPOSsgroups(currentCRS,previousCRS,neverCRS)forthecongestionanddischargefactoratbaselinewithnumberofindividuals(N)ineachgroup.FactorscoreswereestimatedbytheItemResponseTheory(IRT)basedscoresmethod.X-axiswasjitteredtoimprovereadability.

Page 87: Exploratory Factor Analysis: Model Selection and

79

Figure4.Continuousfactorscorescategorizedtoshowlongitudinalchangeacrossquestionnairesforfactor1(congestionanddischarge).Factorscoreswerecategorizedas:factorscore<-1wereassignedvaluesof-2;between-0.5and-1,assigned-1;between-0.5and0.5,assigned0;between0.5and1,assigned1;and>1,assigned2.Y-axislabelsindicatethenumberofpatientsatbaselineineachadjustedfactorscoregroup.FactorscoreswereestimatedbytheIRTmethod.

Page 88: Exploratory Factor Analysis: Model Selection and

80

Appendix

Page 89: Exploratory Factor Analysis: Model Selection and

81

FigureA1.Screeplotforthebaselinequestionnairedisplayingeigenvaluesonthey-axisandtheircorrespondingfactornumberonthex-axis.

Page 90: Exploratory Factor Analysis: Model Selection and

82

FigureA2.Screeplotforthe6monthfollowupquestionnairedisplayingeigenvaluesonthey-axisandtheircorrespondingfactornumberonthex-axis.

Page 91: Exploratory Factor Analysis: Model Selection and

83

FigureA3.Screeplotforthe16monthfollowupquestionnairedisplayingeigenvaluesonthey-axisandtheircorrespondingfactornumberonthex-axis.

Page 92: Exploratory Factor Analysis: Model Selection and

84

FigureA4.Screeplotforthefirstdifference(baselineto6monthquestionnaires)displayingeigenvaluesonthey-axisandtheircorrespondingfactornumberonthex-axis.

Page 93: Exploratory Factor Analysis: Model Selection and

85

FigureA5.Screeplotfortheseconddifference(6to16monthquestionnaires)displayingeigenvaluesonthey-axisandtheircorrespondingfactornumberonthex-axis.

Page 94: Exploratory Factor Analysis: Model Selection and

86

TableA1.Factorloadingsandsymptomcommonaltiesfromtheexploratoryfactoranalysis(EFA)ofthe37presence,severity,andsecondaryCRSsymptomat6monthfollowup.TheEFAwasfitusingordinaryleastsquaresandanobliminrotation(numberofpatients=3535).Loadingslessthan0.3wereomittedforreadability.Communalitiesrepresentthefractionofeachsymptom’svariabilitythatwascapturedbytheutilizedfivefactormodel.

# ItemLabel Factor1 Factor2 Factor3 Factor4 Factor5 Communalities1 Blockage 0.51 0.65 0.652 Dischargediscolored 0.33 0.46 0.463 PND 0.76 0.65 0.654 Smellloss 0.98 0.9 0.95 Facialpain 0.89 0.84 0.846 Facialpressure 0.86 0.85 0.857 Blockagebothsides 0.5 0.61 0.618 Blockagecomplete 0.39 0.53 0.539 Blockagebothered 0.6 0.78 0.7810 Dischargealot 0.87 0.81 0.8111 Blownose10xdaily 0.82 0.7 0.712 Dischargebothered 0.85 0.8213 Coughliedown 0.53 0.34 0.6414 Lumpinthroat 0.55 0.6415 PNDbothered 0.79 0.7816 Smelllosscomplete 0.97 0.9417 Smelllossbothered 0.92 0.8918 Facialpain5+ 0.87 0.8819 Facialpainbothered 0.89 0.9120 Facialpressure

severe 0.83 0.84

21 Facialpressurebothered

0.85 0.88

22 Headaches 0.61 0.4923 Fever 0.39 0.3824 Coughing 0.4 0.5 0.5725 Badbreath 0.3326 Fatigue 0.4327 Nasalitching 0.48 0.51

Page 95: Exploratory Factor Analysis: Model Selection and

87

28 Sneezing 0.37 0.47 0.5329 Eyeitching 0.68 0.6130 Eyetearing 0.6 0.531 Earfullness 0.71 0.6932 Earpain 0.31 0.64 0.6833 Earpressure 0.66 0.6934 Wheezing 0.81 0.6835 Chesttightness 0.87 0.8136 Shortnessofbreath 0.87 0.7337 Cold/flusymptoms 0.45 0.46

Page 96: Exploratory Factor Analysis: Model Selection and

88

TableA2.Factorloadingsandsymptomcommonaltiesfromtheexploratoryfactoranalysis(EFA)ofthe37presence,severity,andsecondaryCRSsymptomat16monthfollowup.TheEFAwasfitusingordinaryleastsquaresandanobliminrotation(numberofpatients=3535).Loadingslessthan0.3wereomittedforreadability.Communalitiesrepresentthefractionofeachsymptom’svariabilitythatwascapturedbytheutilizedfivefactormodel.

# ItemLabel Factor1 Factor2 Factor3 Factor4 Factor5 Communalities1 Blockage 0.34 0.42 0.632 Dischargediscolored 0.3 0.483 PND 0.68 0.644 Smellloss 0.98 0.95 Facialpain 0.88 0.866 Facialpressure 0.88 0.867 Blockagebothsides 0.33 0.34 0.578 Blockagecomplete 0.34 0.38 0.579 Blockagebothered 0.4 0.43 0.7210 Dischargealot 0.84 0.7811 Blownose10xdaily 0.81 0.6712 Dischargebothered 0.85 0.813 Coughliedown 0.38 0.42 0.5714 Lumpinthroat 0.38 0.5715 PNDbothered 0.68 0.7216 Smelllosscomplete 0.99 0.9417 Smelllossbothered 0.93 0.918 Facialpain5+ 0.92 0.8919 Facialpainbothered 0.92 0.8920 Facialpressure

severe0.87 0.81

21 Facialpressurebothered

0.89 0.86

22 Headaches 0.58 0.4923 Fever 0.35 0.3524 Coughing 0.34 0.52 0.625 Badbreath 0.3526 Fatigue 0.3 0.4527 Nasalitching 0.46 0.5628 Sneezing 0.42 0.44 0.5629 Eyeitching 0.63 0.6130 Eyetearing 0.55 0.5231 Earfullness 0.59 0.6832 Earpain 0.41 0.5 0.68

Page 97: Exploratory Factor Analysis: Model Selection and

89

33 Earpressure 0.34 0.54 0.6834 Wheezing 0.85 0.6935 Chesttightness 0.85 0.7536 Shortnessofbreath 0.89 0.7237 Cold/flusymptoms 0.46 0.49

Page 98: Exploratory Factor Analysis: Model Selection and

90

TableA3.Factorloadingsandsymptomcommonaltiesfromtheexploratoryfactoranalysis(EFA)ofthe37presence,severity,andsecondaryCRSsymptomchangesfrombaselineto6months.EFAwasfitusingordinaryleastsquaresandanobliminrotation(numberofpatients=3535).Loadingslessthan0.3wereomittedforreadability.Communalitiesrepresentthefractionofeachsymptom’svariabilitythatwascapturedbytheutilizedfivefactormodel.

# ItemLabel Factor1 Factor2 Factor3 Factor4 Factor5 Communalities1 Blockage 0.68 0.492 Dischargediscolored 0.43 0.243 PND 0.7 0.454 Smellloss 0.63 0.425 Facialpain 0.67 0.516 Facialpressure 0.61 0.517 Blockagebothsides 0.58 0.398 Blockagecomplete 0.49 0.349 Blockagebothered 0.58 0.4610 Dischargealot 0.79 0.5711 Blownose10xdaily 0.73 0.5212 Dischargebothered 0.75 0.5713 Coughliedown 0.49 0.3414 Lumpinthroat 0.52 0.3715 PNDbothered 0.71 0.5516 Smelllosscomplete 0.87 0.7517 Smelllossbothered 0.71 0.5218 Facialpain5+ 0.79 0.6219 Facialpainbothered 0.83 0.6820 Facialpressure

severe 0.71 0.49

21 Facialpressurebothered

0.79 0.62

22 Headaches 0.1223 Fever 0.0924 Coughing 0.48 0.325 Badbreath 0.1326 Fatigue 0.1427 Nasalitching 0.1528 Sneezing 0.31 0.2229 Eyeitching 0.2230 Eyetearing 0.2131 Earfullness 0.64 0.43

Page 99: Exploratory Factor Analysis: Model Selection and

91

32 Earpain 0.63 0.4133 Earpressure 0.74 0.5334 Wheezing 0.54 0.335 Chesttightness 0.59 0.3436 Shortnessofbreath 0.57 0.3337 Cold/flusymptoms 0.39 0.2

Page 100: Exploratory Factor Analysis: Model Selection and

92

TableA4.Symptomcommonaltiesfromtheexploratoryfactoranalysis(EFA)ofthe37presence,severity,andsecondaryCRSsymptomchangesfrombaselineto6monthsand6monthsto16months.EFAwasfitusingordinaryleastsquaresandanobliminrotation(numberofpatients=3535).

# ItemLabel Baseline–6monthfollowup 6-16monthfollowup1 Blockage 0.49 0.32 Dischargediscolored 0.24 0.193 PND 0.45 0.284 Smellloss 0.42 0.475 Facialpain 0.51 0.476 Facialpressure 0.51 0.417 Blockagebothsides 0.39 0.288 Blockagecomplete 0.34 0.239 Blockagebothered 0.46 0.410 Dischargealot 0.57 0.511 Blownose10xdaily 0.52 0.4312 Dischargebothered 0.57 0.5413 Coughliedown 0.34 0.2914 Lumpinthroat 0.37 0.2715 PNDbothered 0.55 0.416 Smelllosscomplete 0.75 0.6917 Smelllossbothered 0.52 0.4818 Facialpain5+ 0.62 0.619 Facialpainbothered 0.68 0.6320 Facialpressuresevere 0.49 0.4421 Facialpressure

bothered0.62

0.5422 Headaches 0.12 0.1423 Fever 0.09 0.124 Coughing 0.3 0.2925 Badbreath 0.13 0.1226 Fatigue 0.14 0.1427 Nasalitching 0.15 0.1928 Sneezing 0.22 0.2529 Eyeitching 0.22 0.2930 Eyetearing 0.21 0.331 Earfullness 0.43 0.41

Page 101: Exploratory Factor Analysis: Model Selection and

93

32 Earpain 0.41 0.3333 Earpressure 0.53 0.3934 Wheezing 0.3 0.3335 Chesttightness 0.34 0.436 Shortnessofbreath 0.33 0.3737 Cold/flusymptoms 0.2 0.24

Page 102: Exploratory Factor Analysis: Model Selection and

94

Chapter4-Conclusion

Therearemanyusesfor,andmethodsof,conductingEFA.Inthisthesis,Ihave

proposedanewmethodtoidentifythenumberoffactorstoextract,studieditsperformancein

applicationtocertaindatastructures,andappliedEFAmodelselectionandfactorextraction

methodstoestimatelatentstructureinsymptomscommoninCRSanditsrelatedco-morbid

conditions.

TheproposedmethodfordeterminingthenumberoffactorstoextractduringanEFA

addstothevastliteratureaddressingtheproblemofestimating𝑚andhowtonavigatethis

situation.Thisnew𝑚-estimationprocedureperformedwellunderavarietyofsimulated

testingconditionswhichvariedwithregardtosamplesize(𝑁),datadimensionality(𝑃),and

strengthofcorrelationstructure.Thus,thismethodmaybeaviableandversatileoptionof

estimatingtheunderlyingfactormodelwhensamplesizeissufficientlylarge.

TheCRSsymptomEFAshedlightonthestudiedsymptoms,whichdecomposedintofive

interpretablefactors,generatingseveralhypothesizedbiologicalfactorunderpinnings.We

wereabletoidentifycongestionanddischarge,smellloss,earandeye,asthmaand

constitutional,andfacialpainandpressuresymptomfactors.Thesefactorsareconsistentwith

understandingofbiologyandpathologicalprocessesinindividualsinuses.

OurCRSstudyutilizedCattell’sscreetest(5,5,and5factors)andparallelanalysis(5,5,

and6factors)inordertodeterminethenumberoffactorstoextractforthebaseline,6-month

follow-up,and16-monthfollow-upquestionnaires.Interestingly,thesemethodsestimated

modestlydifferent𝑚comparedwiththeKaisereigenvaluegreaterthan1rule(K1;6,5,and7

factors)andquitedifferentmcomparedwithothercommonlyutilizedmethodsincludingthe

Bayesianinformationcriterion(BIC;15,16,and14factors)andsamplesizeadjustedBIC(SSBIC;

17,17,and20factors),forbaseline,6-month,and16-monthfollow-upquestionnaires,

respectively.Ournewlyproposedtracemethodalsoproducedanoptimalfactorcardinalityfar

removedfromthosepresentedintheCRSpaper(13,13,and16forbaseline,6-month,and16-

monthquestionnaires,respectively).InChapter2wehypothesizedthatthesedifferencesmay

beexplainedbydifferingstandardsoffitimplicatedbythedifferentlevelsofspecificity

Page 103: Exploratory Factor Analysis: Model Selection and

95

(dimensionalityversusdistributionalform)addressedbythemethods’objectivecriteria.

Furtherresearchisneededtoelucidatethisconjecture.

Determiningwhichmethodtoutilizefor𝑚-estimationisdifficultforseveralreasons.

Firstly,therearealargenumberofpotentialoptionsforestimatingthenumberoffactorswith

potentiallydifferenttheoreticalfoundationsincludinglikelihood-basedmethods,eigenvalue-

basedmethods,graphicalmethods,andcross-validatedorbootstrapmethods.Investigators

mustfirstconsiderthepurposeoftheiranalysiswhendecidingwhichmethodtoutilize.If

interpretabilityorconcisenessisofparamountimportance,onemayconsidermethodsaligned

withthisideal.Otherwise,forexample,ifoneisplacingemphasisonidentifyingthenumberof

factorsinaFAmodelhypothesizedtoliterallyunderliethedata,methodsattunedtothatgoal

suchasBICorTRACEshouldbeconsidered.Thistargetdeterminationisimportant,asitwill

drivetheresultsandinferencedownstreamintheanalysis.Thisthesishasshownthatthe

choiceof𝑚cansubstantivelyimpactqualitativeandquantitativechangesinloadingandfactor

interpretations.Assuch,thischoicedirectlyinfluenceswhethertheresearcher’sdesiredgoalis

attainedwithrespecttounbiasedestimation,verisimilitude,generalizability,orinterpretability.

TheresultsarethusofhighimportancetoresearchersconductingEFAs.

Werecommendthatfutureworkfocusonthedecisionofwhichmethod(s)tousewhen

attemptingtofind𝑚inEFAsettings.Thebestprocessofchoosingwhichmethodofestimating

𝑚mayverywellbe,firstlyidentifyingwhatinterpretationof𝑚isrelevantforthecurrent

study,narrowingthefieldofpotentialmethods.Followingthis,apractitionerwilllikelystillbe

facedwithchoosingbetweenseveralmethodswhichmayperformdifferentlyinapplicationto

theobserveddata.Itisclearfromthesimulationstudythatundercertain,possiblyidentifiable

conditions,methodsmayoutperformorunderperformcomparedtotheiraverageefficacy

acrossconditions.Becausethestrengthofcorrelationsandsamplesizeofobserveddatawere

strongdriversoftheefficacyofcomparativemethods,theseattributesalongwithothers

shouldshedlightonwhichmethodismostappropriate.Thus,itmightbethatobserved

correlationmatrixattributescouldbeutilizedwithinasingleanalysistodeterminewhich

methodswouldperformbestandfutureworkinthisareaalsowouldbevaluable.Apractitioner

couldthenchoosebetweenmethodswithanunderstandingandanticipationofwhichmethods

Page 104: Exploratory Factor Analysis: Model Selection and

96

maybemostappropriatefortheirspecificdataathand.Finally,agreementbetweenmethods

mayprovetobeevidencethattheagreedupon𝑚isdesirablecomparedtootherpossibilities.

Simulationstudiessuchastheonedescribedinthemethodsportionofthethesiscanaddress

thesequestionsforus,bytestinghypothesizedmethodsagainstaknowntruthwegenerate.

ThisthesiswasabletoidentifysimilarlatentstructureandfactoridentityinthreeCRS

symptomquestionnaireadministrations,aswellasthechangesinsymptomresponsescores

betweenadministrations.TheseEFAswereconsistentwiththehypothesisthathypothesized

biopathologicalphenomenaunderlaytheobservedsymptomresponses.However,objective

sinusinflammationdatamustbeincorporatedinordertoadequatelyassessthishypothesis.

ThetracemethodshowedpromiseasaviableadditionalmethodforEFAmodel

selection,outperformingmanycommonlyutilizedmethodsacrossseveralsimulation

conditions.However,inthediverserangeoffieldswhereEFAisutilized,thesimulated

scenariosweresmallinscope,asthenumberoffactorsassessedwasalwaysbetween5and10,

thenumberofvariablesutilizedwasbetween11and100,andthenumberofsimulated

sampleswasbetween100and1000.ThisthesisbringstolightalternativeapproachestoEFA

andEFAmodelselectionthatwehopewillproveusefulastheyarefurtherrefined.

Page 105: Exploratory Factor Analysis: Model Selection and

97

ReferencesAkaike,H.(1973).MaximumlikelihoodidentificationofGaussianautoregressivemoving

averagemodels.Biometrika,255–265.

Akaike,H.(1987).FactoranalysisandAIC.Psychometrika,52(3),317–332.

https://doi.org/10.1007/BF02294359

Bai,J.,&Ng,S.(2002).DeterminingtheNumberofFactorsinApproximateFactorModels.

Econometrica,70(1),191–221.https://doi.org/10.1111/1468-0262.00273

Brown,T.A.(2014).Confirmatoryfactoranalysisforappliedresearch.GuilfordPublications.

Browne,J.P.,Hopkins,C.,Slack,R.,&Cano,S.J.(2007).TheSino-NasalOutcomeTest(SNOT):

Canwemakeitmoreclinicallymeaningful?Otolaryngology–headandNeckSurgery,

136(5),736–741.

Buuren,S.van,&Groothuis-Oudshoorn,K.(2011).mice:MultivariateImputationbyChained

EquationsinR.JournalofStatisticalSoftware,45(3),1–67.

Cattell,R.B.(1966).Thescreetestforthenumberoffactors.MultivariateBehavioralResearch,

1(2),245–276.

Chang,H.,Lee,H.J.,Mo,J.-H.,Lee,C.H.,&Kim,J.-W.(2009).Clinicalimplicationofthe

olfactorycleftinpatientswithchronicrhinosinusitisandolfactoryloss.Archivesof

Otolaryngology–Head&NeckSurgery,135(10),988–992.

Cole,M.,Schwartz,B.,&Bandeen-Roche,K.(2017).ExploratoryFactorAnalysisofCRS

Symptoms.

Page 106: Exploratory Factor Analysis: Model Selection and

98

DeConde,A.S.,Bodner,T.E.,Mace,J.C.,&Smith,T.L.(2014).ResponseShiftinQualityofLife

AfterEndoscopicSinusSurgeryforChronicRhinosinusitis.JAMAOtolaryngology–Head&

NeckSurgery,140(8),712–719.https://doi.org/10.1001/jamaoto.2014.1045

Fabrigar,L.R.,Wegener,D.T.,MacCallum,R.C.,&Strahan,E.J.(1999).Evaluatingtheuseof

exploratoryfactoranalysisinpsychologicalresearch.PsychologicalMethods,4(3),272–

299.https://doi.org/10.1037/1082-989X.4.3.272

Ferguson,B.J.,Narita,M.,Yu,V.L.,Wagener,M.M.,&Gwaltney,J.M.(2012).Prospective

ObservationalStudyofChronicRhinosinusitis:EnvironmentalTriggersandAntibiotic

Implications.ClinicalInfectiousDiseases,54(1),62–68.

https://doi.org/10.1093/cid/cir747

Friedman,J.,Hastie,T.,&Tibshirani,R.(2001).Theelementsofstatisticallearning(Vol.1).

SpringerseriesinstatisticsSpringer,Berlin.

Hamilos,D.L.(2011).Chronicrhinosinusitis:Epidemiologyandmedicalmanagement.Journalof

AllergyandClinicalImmunology,128(4),693–707.

https://doi.org/10.1016/j.jaci.2011.08.004

Hirose,K.,Kawano,S.,Konishi,S.,&Ichikawa,M.(2011).Bayesianinformationcriterionand

selectionofthenumberoffactorsinfactoranalysismodels.JournalofDataScience,

9(2),243–259.

Hirsch,A.G.,Stewart,W.F.,Sundaresan,A.S.,Young,A.J.,Kennedy,T.L.,ScottGreene,J.,…

Schwartz,B.S.(2017).Nasalandsinussymptomsandchronicrhinosinusitisina

population-basedsample.Allergy,72(2),274–281.https://doi.org/10.1111/all.13042

Page 107: Exploratory Factor Analysis: Model Selection and

99

Hopkins,C.,Browne,J.P.,Slack,R.,Lund,V.,&Brown,P.(2007).TheLund-Mackaystaging

systemforchronicrhinosinusitis:Howisitusedandwhatdoesitpredict?

Otolaryngology-HeadandNeckSurgery,137(4),555–561.

https://doi.org/10.1016/j.otohns.2007.02.004

Hopkins,C.,Browne,J.P.,Slack,R.,Lund,V.,Topham,J.,Reeves,B.,…vanderMeulen,J.

(2006).Thenationalcomparativeauditofsurgeryfornasalpolyposisandchronic

rhinosinusitis.ClinicalOtolaryngology:OfficialJournalofENT-UK ;OfficialJournalof

NetherlandsSocietyforOto-Rhino-Laryngology&Cervico-FacialSurgery,31(5),390–398.

https://doi.org/10.1111/j.1749-4486.2006.01275.x

Horn,J.L.(1965).Arationaleandtestforthenumberoffactorsinfactoranalysis.

Psychometrika,30(2),179–185.https://doi.org/10.1007/BF02289447

Humphreys,L.G.,&Jr,R.G.M.(1975).AnInvestigationoftheParallelAnalysisCriterionfor

DeterminingtheNumberofCommonFactors.MultivariateBehavioralResearch,10(2),

193–205.https://doi.org/10.1207/s15327906mbr1002_5

Kaiser,H.F.(1960).TheApplicationofElectronicComputerstoFactorAnalysis.Educationaland

PsychologicalMeasurement,20(1),141–151.

https://doi.org/10.1177/001316446002000116

Kamata,A.,&Bauer,D.J.(2008).Anoteontherelationbetweenfactoranalyticanditem

responsetheorymodels.StructuralEquationModeling,15(1),136–153.

Lee,C.-T.,Zhang,G.,&Edwards,M.C.(2012).OrdinaryLeastSquaresEstimationofParameters

inExploratoryFactorAnalysisWithOrdinalData.MultivariateBehavioralResearch,

47(2),314–339.https://doi.org/10.1080/00273171.2012.658340

Page 108: Exploratory Factor Analysis: Model Selection and

100

Lopes,H.F.,&West,M.(2004).BayesianModelAssessmentinFactorAnalysis.StatisticaSinica,

14(1),41–67.

Myung,I.J.(2000).TheImportanceofComplexityinModelSelection.JournalofMathematical

Psychology,44(1),190–204.https://doi.org/10.1006/jmps.1999.1283

Norris,M.,&Lecavalier,L.(2010).EvaluatingtheUseofExploratoryFactorAnalysisin

DevelopmentalDisabilityPsychologicalResearch.JournalofAutismandDevelopmental

Disorders,40(1),8–20.https://doi.org/10.1007/s10803-009-0816-2

Owen,A.B.,&Wang,J.(2016).Bi-Cross-ValidationforFactorAnalysis.StatisticalScience,31(1),

119–139.https://doi.org/10.1214/15-STS539

Preacher,K.J.,Zhang,G.,Kim,C.,&Mels,G.(2013).Choosingtheoptimalnumberoffactorsin

exploratoryfactoranalysis:Amodelselectionperspective.MultivariateBehavioral

Research,48(1),28–56.

Press,S.J.,&Shigemasu,K.(1999).Anoteonchoosingthenumberoffactors.Communications

inStatistics-TheoryandMethods,28(7),1653–1670.

RCoreTeam.(2016).R:ALanguageandEnvironmentforStatisticalComputing.Vienna,Austria:

RFoundationforStatisticalComputing.Retrievedfromhttps://www.R-project.org/

Revelle,W.(2017).psych:ProceduresforPsychological,Psychometric,andPersonality

Research.Evanston,Illinois:NorthwesternUniversity.Retrievedfromhttps://CRAN.R-

project.org/package=psych

Schwarz,G.,&others.(1978).Estimatingthedimensionofamodel.TheAnnalsofStatistics,

6(2),461–464.

Page 109: Exploratory Factor Analysis: Model Selection and

101

Sclove,S.L.(1987).Applicationofmodel-selectioncriteriatosomeproblemsinmultivariate

analysis.Psychometrika,52(3),333–343.

Sieskiewicz,A.,Lyson,T.,Olszewska,E.,Chlabicz,M.,Buonamassa,S.,&Rogowski,M.(2011).

Isolatedsphenoidsinuspathologies–theproblemofdelayeddiagnosis.MedicalScience

Monitor:InternationalMedicalJournalofExperimentalandClinicalResearch,17(3),

CR179.

Sundaresan,A.,Hirsch,A.,Young,A.,Tan,B.,Schleimer,R.,Kern,R.,…Schwartz,B.(2017).

LongitudinalEvaluationofChronicRhinosinusitisSymptomsinaPopulation-based

Sample.

Swihart,B.J.,Caffo,B.,James,B.D.,Strand,M.,Schwartz,B.S.,&Punjabi,N.M.(2010).

Lasagnaplots:asaucyalternativetospaghettiplots.Epidemiology(Cambridge,Mass.),

21(5),621.

Tan,B.K.,Kern,R.C.,Schleimer,R.P.,&Schwartz,B.S.(2013).ChronicRhinosinusitis:The

UnrecognizedEpidemic.AmericanJournalofRespiratoryandCriticalCareMedicine,

188(11),1275–1277.https://doi.org/10.1164/rccm.201308-1500ED

Timmerman,M.E.,&Lorenzo-Seva,U.(2011).Dimensionalityassessmentofordered

polytomousitemswithparallelanalysis.PsychologicalMethods,16(2),209–220.

https://doi.org/10.1037/a0023353

Tustin,A.W.,Hirsch,A.G.,Rasmussen,S.G.,Casey,J.A.,Bandeen-Roche,K.,&Schwartz,B.S.

(2017).AssociationsbetweenUnconventionalNaturalGasDevelopmentandNasaland

Sinus,MigraineHeadache,andFatigueSymptomsinPennsylvania.Environmental

HealthPerspectives,125(2),189–197.https://doi.org/10.1289/EHP281

Page 110: Exploratory Factor Analysis: Model Selection and

102

Underwood,L.G.,&Teresi,J.A.(2002).Thedailyspiritualexperiencescale:development,

theoreticaldescription,reliability,exploratoryfactoranalysis,andpreliminaryconstruct

validityusinghealth-relateddata.AnnalsofBehavioralMedicine,24(1),22–33.

https://doi.org/10.1207/S15324796ABM2401_04

Velicer,W.F.,Eaton,C.A.,&Fava,J.L.(2000).ConstructExplicationthroughFactoror

ComponentAnalysis:AReviewandEvaluationofAlternativeProceduresfor

DeterminingtheNumberofFactorsorComponents.InR.D.Goffin&E.Helmes(Eds.),

ProblemsandSolutionsinHumanAssessment(pp.41–71).SpringerUS.

https://doi.org/10.1007/978-1-4615-4397-8_3

Wald,E.R.,Milmoe,G.J.,Bowen,A.,Ledesma-Medina,J.,Salamon,N.,&Bluestone,C.D.

(1981).Acutemaxillarysinusitisinchildren.NewEnglandJournalofMedicine,304(13),

749–754.

Wj,F.,Vj,L.,J,M.,C,B.,I,A.,F,B.,…Pj,W.(2012a).EPOS2012:Europeanpositionpaperon

rhinosinusitisandnasalpolyps2012.Asummaryforotorhinolaryngologists.Rhinology,

50(1),1–12.https://doi.org/10.4193/Rhino50E2

Wj,F.,Vj,L.,J,M.,C,B.,I,A.,F,B.,…Pj,W.(2012b).EuropeanPositionPaperonRhinosinusitis

andNasalPolyps2012.Rhinology.Supplement,(23),3pprecedingtableofcontents,1-

298.

Zwick,W.,&Vejicer,W.(1984).AComparisonofFiveMethodsforDeterminingtheNumberof

ComponentsinDataSets.PsychologicalBulletin,99(3).

Page 111: Exploratory Factor Analysis: Model Selection and

103

BiographyMatthewK.Colewasbornin1993intheUSA.

MattcompletedhisundergraduateworkatSacredHeartUniversityinFairfield,

Connecticut,wherehemajoredinBiologyandMathematics.Duringhisundergraduate

education,hespentsometimestudyingabroadinItalyandGermanyandspenthisother

summersresearchingtheecologyoftheAmericanHorseshoeCrabLimuluspolyphemusinLong

IslandSound.

In2015,MattbeganhisSc.M.atJohnsHopkinsUniversity.Hewasateachingassistant

fortheStatisticalMethodsinPublicHealthcoursesequence.