3 the Research Process

The Research Process

RrtrRgttrtY AND Vnttpttvlnternal Validity and ConfoundsExternal Validity

HvporHrsrsHvporHrsts Trsrtruc

Errors in Hypothesis TestingThe Probability of a Type I ErrorThe Probability of a Type ll ErrorWhy We Don't Accept the Null Hypothesis

How To Do ScrrrucrSummnnv

The scientific method is the process by which scientists, includingpsychologists, collect information and draw conclusions about their disci-plines. In this method, observations are made systematically and objec-

tively, so that the results will be as meaningful as possible.When using the scientific method in psychology, the researcher often

tries to determine the effect of some factor on some type of behavior. Inother words, the researcher wants to know if a change in an independentvariable will cause a change in a dependent variable.

It is important to be precise and concrete when designing a studyusing the scientific method. This precision and clarity allows the

researcher to more readily foresee pitfalls, ambiguitf, and confounds thatcould render the results meaningless.

One important way to avoid confounds and ambiguity in research is

by carefully defining all of the important concepts. Perhaps a researcher is

interested in the effect of stress on work efficiency. The researcher plans tcr

study this effect by inducing stress in half of the participants and then nrca-

41

42 ('lr.rPt1,p. Ilrr.t,r,

slrr.' all ()f thc participants' performance on some task. .r-hc first step is kr

define the terms "str6ss" u.,t "-o.k Jil;cy.,, Dictionary definitions arcnot precise enough for a researcher,s ,-r""ar. wt,at is required is an opera-tional definitioti-u auri'itior. that tells in"'r"ua"r exactly what was doneto produce a phenomenon o. to measure solne variable. In this example,the resear.h"r l:,".ds to explaln how r,."r, *irr b" i"d;;;; ura

"*u.tly howperformance w'l uu -uur'.r*d. The .";;;;;;1mar intend to induce stressin one sroup of particip;;;;y terins th;;th"t thly _i[;;;deotaped asthey give an impro-pttt t6.r. rr,"i, i'rormation'abour ir," ,ria"otaped

i:ffitffT:::^t:;;1"*tional deri"iti"" r', stress in this experiment; itderin*ion;T::i""fi :,.""j";.?llioo:"ilffi *:*i jt".,ru,,m

researcher will measure *ott-Jni.iJi;;; ,iXt.,r-uer of anagrams from alist of 10 that are solved during a 3-minut" i.i"rrrur. This op"Ltio.,al defi_nition of work efficiency ,"rilir-r" ,"ua"r-*ira oro *u, p"rfi..*a, for howlong it was performea,'""i'-r-,at measu."*u.,, _u, mide. Togethet, these;ff ::ll:'H,i;f :,':":rl'"1,:1l:"ffi ,ili[iH;{ix;#,#:*"sj,.,,uasup-An operational definition should u" * .r*r that a person who is notj;tfr

i':r HH:: jiffi;T""il" v e s risa ti o,, .u,, und ers tan d th e d efini ti onth a t a n i" Ji _ a

" a . u,, . o,,1" *li :il,i,#:tj, :l :, ri:l,il,.'J |,#fi

jiltiT,:;"ru"o use it inih"" ,u*" manner. BT

:1"?dy specifying howr i ke r v,h ",;;;itii;,:i i51i, T: ;#,ffi:lru

r a ur,.,,,i "" L

" r."s i i m ore

An operatioy] definition, of .or.re, has to Ip a rti cip ants tha t th ev wi I I

- u

"

",,i a

" o tup

" J *iir; Ht ltlry :i':#i"",Tr,??

iiffil,T T,i; ;ffiTlt1trffi :* iig ;l + ll " p ".;?d; i; ",",y p i.

nition would be of ri',r" ""ii;, il#""J::':iTl^'fis game operationut deri-motivatio.,urrp"ukers,r.;;ld:*;r".?r__=;H,t:tf :::tf"*:speaking in p'tlic ,"j'r-"a

"rl" find it "rp".iJily stressfur.

Ilrt' lit'st',rt't lt I'l'(rt'('ss .13

RureBrLrTY AND ValtoITYReliability is a key concept in research. Just as a reliable vehicle will

start each time the ignition key is turned, a reliable measure is consistent.In other words, different researchers who use the same procedure to mea-sure the same phenomenon should obtain the same results if the proce-dure is reliable. When used with comparable participants, a reliableoperational definition should yield similar results each time.

Validity is the extent to which a measurement technique measureswhat it purports to measure. An operational definition is likely to yieldvalid results if it corresponds closely to what is to be measured. Thus,measuring work efficiency by the number of anagrams completed in 3

minutes may be a valid measure if the results are meant to generalize towork based on written language. The same measurement techniquewould probably be an invalid measure if it were meant to generalize tophysical labor because anagram solving and physical labor are not veryclosely related.

Internal Validity and ConfoundsA specific type of validity that is important in scientific research is

internal validity. Internal validity is the extent to which the design of anexperiment ensures that the independent variable, and not some othervariable or variables, caused the measured difference in the dependentvariables. In other words, an internally valid study has no problems thatwould confound the results. A confound, as described in chapter 1, is afactor that yields alternative explanations for a study's results and thuslimits its internal validity. Internal validity is maximized by eliminatingconfounds. Experienced researchers automatically watch for some com-mon confounds and design their studies so that these confounds areavoided or controlled. For example, an inexperienced researcher maywish to compare performance on a simple task under two temperatureconditions: warm and cool. One research assistant is responsible for thecool condition, and another is responsible for the warm condition. If per-formance is found to be better in the cooler condition, it might be becausethe temperature had an effect on behavior, or it might be that the researchassistants affected the participants' behavior in some manner. Perhapsone research assistant was more neatly dressed than the other and theparticipants with the neater assistant took the project more seriously. Thiswould be a confound called an experimenter effect. An experiencedresearcher might foresee this problem and avoid it by using only oneassistant or by keeping both assistants but having each collect half theirdata in the warm condition and half in the cool condition.

Researchers must also ensure that the study is not confounded bydemand characteristics. Demand characteristics are the cues participants

which of the forowing is the most comprete operationar definitionl' [:']Ttj:::t

as the amount of information retained after a

t o,:Til1f,ff:T:n*J,il:::,"t correct responses on a 25-ques-

c' Learning derined as srudying ror;:ffJ:'ffil:::.in a recture.

d' Learning defined as a rerati*,y ourranent change in behavior thatoccurs because of experienau. '

I .l.t ( lt.tP[1'1' Ilrrt'r'

rtsc to clt'tcrntinc whut is expected of them in a stucly. Suppctse that tosttrcly the effect of mood on sense of wellness, a researcher induces eitherrr positive or a negative mood and then asks the participant some ques-tions about how healthy he or she feels. A participant in this study mightvery well perceive that the researcher expects mood to affect theresponses and may try to help the researcher by responding as theresearcher expects. To avoid this problem, the researcher would want totake special steps to dissociate the two parts of the study, perhaps by hav-ing a confederate act as if the questions about health are for a differentstudy entirely-.

There are numerous other potential confounds; each can threaten theinternal validity of a study. We'll discuss more threats to internal validityin later chapters, especially chapters 5 and 6.

External ValidityAnother important goal of research projects is external validity. Exter-

nal validity is the generalizabllity of the results of an investigationbeyond the specific participants, measures, and site of the research. Forexample, a study with results that generahze to all English-speakingadults has greater external validity than a study with results that general-ize to English-speaking college students. There is no rule of thumb, how-ever, about how externally valid a study needs to be. Many usefulresearch ideas come from studies with little external validity. Any investi-gation needs to have some external validity, though; an experiment withresults irrelevant beyond the particular participants in the study is of littleor no value.

The controls needed to create an internally valid study can sometimeslimit the external validity of the study. For example, suppose an investi-gator wishes to research the effect of hypnosis on pain tolerance. In anexperimental group, each participant will be hypnotized and given thesuggestion that he or she cannot feel pain; then, each participant will sub-merge his or her arm in a bucket of ice water. Participants in the controlgroup will not be hypnotized, but each person will also submerge an armin the ice water. The dependent variable is the length of time that eachparticipant keeps his or her arm in the water.

The investigator is aware that factors other than the independentvariabiles could possibly affect the outcome of this study-these are calledextraneous variables. In the present case, the sex of the experimenter andthe sex of the participants are extraneous variables. The sex of the experi-menter might affect how long the participant is willing to tolerate thepain of the ice water. For instance, male participants may withstand thepain for a ionger period of time in front of a male experimenter than infront of a female. Also, male participants may feel honor-bound to sustainpain longcr than would female participants. How should the researcherdeal with these problems? Should both male and female experimenters be

'l'lrt' l{t'st'arcll Itrot't'ss 45

used? The sex of the experimenter could be balanced across the control

and experimental grorrpr, and the researcher could also make certain that

half of the people itt euch grouP are tested by u member of the same sex

and half bi "member

of tie opposite sex. Should both male and female

participar,is be involved in the study, or should it be limited to only one

se*f Using both male and female participants and male and female exper-

imenters increases the external validity of the experiment, but compli-

cates the design of the study and requires more participants and more

time for the study to be conducted. there is no correct answer to this

problem. Some researchers will choose greater external validity, while

bth"tt will opt for a simpler, quicker study'The external validity of uit.tdy can also be affected by the manner in

which the participants are selected for the project' In research' we talk

about selectlng a sample of participants from a larger population' A popu-

lation is all of the orgunii-s (usually people, sometimes animals) to

which the researcher wishes to be able to generalize the research results' A

sample is a subset of the population; the goal is for the sample to represent

the population. The larger-the population-represented by fhe sample of

participants, the greatei the exiernal validity of the study) An effective

procedure is to identify participants from a population by landom selec-

tion. In random selection, all mbmbers of the population are equally likely

to be chosen. This procedure maximizes the probabitity that-the sample is

representative of the population, as long as a sufficient number of partici-

pants are chosen. Choosing five people from 5,000 possible participants is

not likely to yield a sample that is representative of the population'

More often than noi, participanis are not selected randomly; instead'

they come from a readily availible pool of potential volunteers, such as

coliege students. It is very co**ott for researchers to solicit volunteers

from introductory psychoiogy classes. This type of sample is called a con- |

, venience sample ioi ut't ac-cidental sample). In convenience samPling,

participants are not rand,omly chosen, but instead happen to be in the

,lght ptu." at the right time. Once a grouP of volunteers has been identi-

fiJa, ine participani, uru assigned to different experimental conditions

(the different levels of the independent variable)' The most common way

of assigning the participants to the conditions is by random assignment'

Randoin uriign-"nt is the use of a procedure-perhaps as simple as flip-

ping a coin-iuch that each participant is equally likely to be assigned to

anv of the conditions. Notice how this differs from random selection'

l-n#ao* selection describes how participants are chosen from the popu-

lation; random assignment describes how participants are assigned to

experimental conditions.' iDoes convenience sampling automatically reduce external validity? It

depends on the research. Ii a researcher is investigating the political cotr-

."ir* of 18- to 22-year-olds, then using only college students of that agt'

range will limit the external validity of the study; the results cannot [-rt'

I lrt' l{t'st'.r t't'lr I )l'( )('('ss 47.16 ( lt,tPll'1' llu't't.

8t'r1('r(rlizccl ttl ltl- tct22-year-olcls who clo n()t i-tttcncl collcgc. ()n the other'hanc1, research into physiological or perceptual processes, which a'lre.

arssumed to be pretty much the same whether an individual is in collegeor not, would be likely to have reasonable external validity even if theparticipants were exclusively college students. Finally, the external valid-ity of a study is open to testing. We simply repeat the work in a differentcontext to see if the results can be generalized.

Careful and precise planning is necessary when conducting researchby the scientific method. Only by planning ahead and thinking criticallycan a researcher avoid design flaws and make choices that will maximizea study's internal and external validity. Actually, designing projectsdevoid of confounds can be something of a brain teaser; for me, it makesup half the fun of doing research.

HypoTHESESThe other half of the fun in research is learning new things by testing

your ideas. Suppose that a researcher is interested in the relationshipbetween summer programs and the intelligence of grade-school children.In particular, this researcher wishes to know whether those who partici-pate in a summer program where students can pick from among a num-ber of intellectual topics are smarter than most people. This is the researchquestion. On the basis of this question, the researcher forms one or morehypotheses (or predictions). In this case, the researcher may hypothesizethat the IQ scores of students in the summer program will be higher thanthose of the population in general. This is the researcher's hypothesis.

To be precise, two hypotheses are involved because there are two sidesto every question: what the researcher expects and what the researcherdoes not expect. One of these hypotheses is called the null hypothesis(represented by Hs), and the other is called the alternative (or research)hypothesis (represented by Hr or sometimes Ha). The null hypothesis isthe prediction that there is no difference between the groups being com-pared. We would expect the null hypothesis to be correct if the populationfrom which the sample is taken is the same as the population with which itis beirig compared. In our example, if the students in the summer programare actlrally a representative sample of the general population, the stu-dents'IQ scores will be roughly equivalent to the IQ scores of the generalpopulation. The null hypothesis is typically what the researcher does notexpect to,find; a researcher does not usually predict the null hypothesis.

The alternative hypothesis is the prediction the researcher makesabout the results of the research. It states that there is a difference betweenthe scores of the groups being compared. In other words, it states that thesample is not representativc of that particular population's scores, butinstead better represents some other population's scores. There are two

tVpes of irlte rnirtive hyp-rotheses' ln tlne tyP"l- thc researcher sinrply prt'-

rlicts tftat the two groups being comParea wlU differ' but does not predict

tl-re clirection of tnat aifference-thl researcher does not predict which

group will score higher or lower. This is called a two-tailed hypothesis'

To clarify why it ls Jaid to be two-tailed, consider the normal curve in fig-

ure 3.1. In the middle of the curve is the population mean; in the case of

the IQ example, that would be 100. If a sumple mean (an average of the

sample ,-rr"*b"rrl tQ ,.or"s) were much higher than 100, it would fall fat

to the right of the mean, up in Jh" g::itii" tuit of the distribution' If a

sample mean were much lower than j00, it would fall far to the left of the

mean, down i";h; negative tail of the distribution' If a researcher simply

predicts that a ,urrrplJ*ean will be different from the population mean

and does not predict whether it will be higher or lower' the researcher is

predicting that it will fall in one of the twJtails of the distribution' Thus'

an alternative n-ypoin"ris that does not predict the direction of the differ-

ence is called a two-tailed hypothesis'

-igure f .l The normal distribution of lQ scores

7055 100 115 130 145

Asyoumayhaveguessed,.iftheresearcherpredictsthedirectionofthe differ"rr."-for^ "*uilpl",

if the researche, p'"dittt that the mean IQ of

college stud.ents will be i",igt "t than the population mean-this is a one-

tailed hypothesis. The rese-archerpredicts in which tail of the distribution

the sample mean is expected to fall. In our example, the alternative hypoth-

esis is that the studenis in the summer Program will have IQ scores greater

than those of th;;;r,".ul population. (V/hit would the two-tailed alterna-

tive hypothesis be"l Wnut would the other one-tailed hypothesis be?)

A researcher hypothesizes that a sample of families from the Midwest

differs in size from the national average family size' What are the null and

alternative hYPoth eses ?

I lrt' lit'st'.lrt'll l'l'(tt'('ss 49('lt.t1r[1'1''l lr;t't'

HypoTHESIS TESTINGAlthough scientific research is designed to determine if the alterna-

tive hypothesis is supportable, hypothesis testing actually involves test-ing the null hypothesis, not the alternative hypothesis. If the differencebetween the groups being compared is so large that the difference isunlikely to have been caused by chance, then the groups being comparedare unlikely to represent the same population and the null hypothesis isrejected.If the null hypothesis is rejected, the alternative hypothests rs sLtp-

ported. On the other hand, if the difference between the groups is so smallthat the difference is not unlikely to have occurred simply by chance, wefail to reject the null hypothesis. f the null hypothesis-is no't rejected, thealternative hypoth esis cannot be supported.

In our example, the researcher has predicted that the mean IQ scoresof summer-program students will be greater than the population mean of100. This is a one-tailed alternative hypothesis. The null hypothesis isalways that there is no difference between the groups being compared. Inthis case, the null hypothesis is that the sample mean will be no differentfrom the general population mean. If we collect our data and find a meanthat is greater than 100 (the mean IQ for the general population) by morethan could reasonably be expected by chance, then we can reject the nullhypothesis. When we do this, we are saying that the null hypothesis iswrong. Because we have rejected the null hypothesis and because thesample mean is greater than the population mean, as was predicted, wesupport our alternative hypothesis. In other words, the evidence suggeststhat the sample of summer-program students represents a popuiationthat scores higher on the IQ test than the general population.

On the other hand, if we collect our data and the mean IQ score doesnot differ from the population mean by more than could reasonably beexpected by chance, then we fail to reject the null hypothesis and also failto support our alternative hypothesis.

Errors in Hypothesis TestingResearchers carefully design their studies so that they answer their

research questions by either supporting or failing to support their alterna-tive hypotheses. However, because researchers are not omniscient, it ispossible to reject the null hypothesis when it really is true. A researchermay conclude that two populations differ when in fact they do not.Another possible error is to find no difference in a study when a differ-ence between the populations truly exists.

For any research problem there are two possibilities: either that thenull hypothesis is correct and there is no difference betr,veen the popula-tions or that the null hypothesis is false and there is a difference betweenthe popuiations. The researcher, however, never knows the truth. Look

.rtiigtrrc3.2.Altlngthclup.l'thetruth(whichthereseirrcherCanncvcrk'.w), .nc1 arlong the left side url th" researcher's two decision choices'

torejecttherrullhypothesi,o..failtorejectit.ThiSallowsfourpossi-bleoutcomes-twowaysfortheresearchertobecorrect,andtwowayst., oiilJ?;?

ways to be correct are straightforward. First, the researcher

can reject the riull hypothesis *n"r',, iri reality, it is false; that is' the

researcher finds a true difference between the'groups b:l"g.compared'

second, the researcher might ruir to reiect the riull hypothesis when' in

fact, the null hypothesis is true. In this case, the reselrcher would not

detect a differeir'ce between th; groups being compared' and' in reality'

in.t" is no difference between the grouPs'

The two possible errors "r;1; reject the.null hypothesis when it is

true (a Typ" r ";;;; ;nd to fail t"o t";".i,n" null hypoihesis when it is false

(" TlPh"":l ,Tl do" r error is to find a difference between the groups bging

compared that'd'oes not truly exist in the population. Regardless of how

well designed a study Tigni be, a difflrence is som"ti^"t detected

between sample grouPs that does not reflect an actual difference in the

populations. Fori*u*pt", *"'*ight find that the mean IQ score of our

sample of summer-progra* ,*JErlts is higher than that of the general

population.Butperhapso""u,,.pleof"'-*-"'-Programstudentsjusthappenea to ue ulight stud""t;, ;;'J there truly isn t a difference between

the IQ scores of the overall poirriuiio" or 1"1,*er-program

students and

the general public. Because u diff"'""ce was identified that does not truly

exist, we have made a Type I .; rg tfe "1,"1,

that the results of a study

have immediate ramifications-for instance' if important changes to the

curriculu* ur" *uJ" on the basis of IQ scores-Type I errors can be very

serious indeed'

Figure 3"2 The four possible research outcomes

THE TRUTH

The null hYPothesis The null hYPothesis

is false

THEDECISION

Reiect HsType I error

(ct)

Fail to reiect Hs

50 ('lr.rPlt't' Iltt't't'

'l'lrt''l'y1-rt'll ertrlr is to iail to clctcct a rlifit'rt'nct.[rt'twt't'rr thc sarnPlt'gr()LtPS whcn ar clifference truly exists Lretween the po1-rtrlations. We wor-rlcjhave macle a Type II error if our sample of summer-progrelm students clidnot have a mean IQ score significantly greater than the mean IQ for thegeneral population when, in fact, the population of summer-program stu-dents did have a higher IQ than the general population. Our study wouldhave failed to detect a difference that actually exists. This can happen fora number of reasons. Perhaps our sample included the less intelligent ofthe summer-program students. Perhaps our IQ test was administered in anonstandard way that caused greater variation in the scores than if it hadbeen conducted in the standard way. Still another possibility is that weincluded too few students in our sample to detect the difference.

Typ" II errors are often seen as less serious than Typ" I errors. If a dif-ference truly exists but is not identified in one research project, continuedresearch is likely to detect the difference. On the other hand, a Type I erroris seen as something to be avoided. The results of applied research affectpolicy and practice in many areas of our life, such as education, medicine,and government. The results of basic research further our body of knowl-edge and move along the development and advancement of theory thataffects applied research. Researchers set their standards high to avoidmaking Typ" I errors, to avoid finding differences between comparisongroups that don't actually exist in the populations. We need to keep theodds that advances in research and any changes in policy or practice arebased on real results, not erroneous results.

An analogy with the U.S. justice system may clarify the significanceof Type I and Typ" II errors. Consider the case of a person accused of acrime. The null hypothesis is that an accused person is innocent; theaccused Person is no different from the general population. The alterna-tive hypothesis is that the accused is guilty; the accused person is differ-ent from the general population, a deviant. In the United States, it isconsidered a more serious error to convict an innocent person than toacquit a guilty person; that is, it is more serious to find a difference thatdoes not exist (a TVpe I error) than to fail to find a difference that really isthere (a Type II error).

A researcher collects information on family size and concludes, on the basis of thedata, that Midwestern families are larger than the average family in the United States.However,,unbeknownst to the researcher, the sample includes several unusually largefamilies, and in reality, Midwestern families are no larger than the national average. Whattype of error was made?

I llt' lit'st"lt't lt l'l'ttt't'ss 5I

The Probabiliff of a TYPe I Error'['hc pr0balrility of making a Type I error is called alpha (c)' The

acccptabl" alpha level is typicflty .ftit". b.y the researcher; in the social

and behavioral sciences, it has truditlo.tally been set at '05' In other words'

researchers in the social and behavioral sciences are willing to accepl a5"/"

risk of making " tto" I error. with alpha set at .05, a difference between

the groups that is'l'urgu ".ough

for us to reject the null hypothesis will

occur by chance only 5 times out of 100 when the null hypothesis is true'

A difference this lutfe is said to be a significant difference'

Let,s consider our Summer-Program example again. The normal dis-

tribution in figure 3.3 representJ the"sampring dirttib.ttion of IQ scores in

the general public. (The sampling distributlJi:t the distribution of sam-

ple meanr, u, offosed to a irstrTbution of individual scores') If the null

hypothesis is trrle, the mean IQ score for the sample of summer-program

students will be included as part of this distribution' However' if the

alternativehypothesis-thatthepopulatiolm:anlQofthesummer-pro-gram students is greut"r than. th"

^*"utt for the general population-is

correct, the mean for our sample better represents a different distribution'

To determine whether the population mean of the summer-program stu-

dents is greater than or equal'to the population mean of the general pub-

lic, we compare our samPl" -"un to tn" population mean of the general

public. If our sample 1-.,"u., is so great-thaf it ialls in the top Soh of the sam-

pling distribution fo, the generaipublic, we infer that there was only a5"/o

chance of ou*u*ptu *Ju^ being drawn fooT that population' Having

chosen cr = .05, we then reject u6 lt'ra support H1. The 5-1. 9f .th"

distribu-

tion that is shaded in figure 3.3 represents o, ut'd is called the region of

reiection. If a score fallslwithin the region of rejection' the null hypothesis

is rejected.

ffiegionofrejectionforaone.tailedhypothesis

In our example, the alternative hypothesis was one-tailed' with a

one-tailed hypothesis, the region of teiection lies at one end of the clistri-

bution. For a two-tailed hypothesis, tkre region of rejection-is split ecltriilly

between the two tails-2.56h rnone tail uid'2'5"/" in the other tail whcrr tr

Region of reiection

7 52 ('lr,rPt1'p'l'hr.t't'

= '05 (figr-rre ? 4) ]f our s;rmple mean is so greart thart it l"alls irr thc tep 2.5,2,

:llh: sampling distribution for the generii public, or it is so small that itfalls in the bottom 2.5'/. of the tu-fli.tg distribution, then we infer thartthere 1as only a 5'/" chance (5% becausJ the two regions of rejection addup to 5'h of the distribution) of our sample mean b6ir,g dra*., from thatpopulation. Having chosen cr = .05, we then reject Ho aia support H1.

Figure 3.4 Regions of rejection for a two-taired hypothesis

Figure"3.5 A representation of power, beta, and alpha

The Probabili$ of a Type II ErrorThe probabilit{

9f m-aklng a Type II error is calred beta (p). Beta is ameasure of the likelihoo d of not finding a difference that truly exists. Theopposite of B is called Power and is caliulated as 1 - B. poweil, th" likeli-hoo-{ of finding a true difference. In general, researchers want to designstudies that are high in power and have a low B. However, B, cr, andPower are interconnected, as an examination of figure 3.5 makes clear.

In figure 3.5, the distribution on the left represents the distribution ofsample means when the null hypothesis is correct. The distribution on theright represents the distribution of sample means when the alternativehypothesis is correct. In terms of our ,.rro-"r-program example, the dis_tribution on the left is the distribution of -eun IQ scores for'the generalpublic; this distribution would include the sample of summer-programstudents if they are not significantly different from the general public. Thedistribution of sample means on tire right represent, thu rnuu1 Ie scores

ol the p-ro1-rr-rlrrtion of summer-Program stud.ents if they do score signifi-

crrntly higher than the general public'The darker shadeJ ur"u tibeted " aIpha" is the top 5o/o of the null

hypothesis distribution. If our sample'J me_an is so large that it falls

*iinir-r the top 5o/" ofthe null hypothesis distribution, then we say that it is

unlikely to belong to that distiibution, and we reject the null hypothesis'

fne tignter staded area of the alternative hypothesis-distribution (to

the left ot"utpnu; represents beta. This is the probability of making a Type

II error. If a mean ii too small to land in the region of reiection, but actu-

ally does belong to a separate population, then it will fall within the beta

,."[ior,. The resJarchers wlll fail to reject Hs, €ven though it is false, and

thus will make a Type II error. Whenever possible, a researcher attempts

to increase the Power of a project, in order to increase the likelihood of

relecting a false null hypothesis'Consider figure 3.5 again. Power can be increased by reducing beta

(imagine *orrmlg the line"delineating beta and alpha to-the lejt), and beta

can be reduced fiy i.r.r"using alpha (moving the line delineating beta and

alpha to the tefi). Often, io*",r"t, increasing atpha is not a realistic

option. Only rur"iy will a researcher-and perhaps more importantly, the

researcher's colleagues-trust the results of a study where alpha-is

greater than .05. A"relatively simple way for a researcher to increase the

power of a study is to increase tkre sample size. The larger the size of the

samples in a stuiy, the easier it is to find a significant difference using sta-

tistical tests. Statisiically, a small difference may indicate a significant dif-

ference if many participants were involved' in the study' If the same small

difference is based on only a few participants, the statistical test is more

likely to suggest that the results could have happened just by chance'

Why \it/e Dont Accept the Null HypothesisYou may be wondering why we keep saying that we foil !?

reiect the

ntttl hypotheiis,instead of slmply stating that we accept the null hypothe-

sis. If we reject the null hypothesis, we know it is because our finding was

relatively unlikely to occnt by chance alone. But, if we do not reject the

null hypothesis, *not does that mean? The null hypothesis says there is

no significant difference between our sample mean and the population

mean. If we do not reject the null hypothesis, does that mean that our

sample's scores are equal to the popul-ation's scores? Not necessarily' By

failing to reject the null hypotheiis, we have failed to find a significant

differince, but that does not mean we have found an equality. There can

be a number of reasons for failing to find a difference-that is, for failing

to reject the null hypothesis. It could be because we made a TyPe II error'

Perhaps our meth"d of data collection was not sensitive enough to detect

the difference, or we needed a larger sample to detect the differcnce c()lr-

sistently. Perhaps, simply by chance, our sample was such that its rllctlrl

*u, .toi signifiiantty iiiferent from the population's score' or perhtrps a

I'lrt' l{t'st'.t rclt l't'ttt't'ss

.--i5-.ozi

Ho Hl

I'ltt' lit'st'.lrt'lr l'l'ttt't':';s 55('lt.t pt1'1'' I'llrt't'

c()nfound in clur study caused our results tcl c()mc out cliifercntly thanexpected. A^y of these reasons and more could cause a Type ll error andmake us fail to reject the null hypothesis when it is false. Of course, thereis also the possibility that we failed to reject the null hypothesis becausethe null hypothesis is actually true. How can we tell if the null hypothesisis true or if we have made a Rpe II error? We can't, and for this reason itis risky to accept the null hypothesis as true when no difference isdetected. Similarly, it is risky to predict no difference between our sampleand the population. If we find no difference, we cannot know by this onestudy if it is because our prediction was accurate or because we made aRp" II error.

If a researcher does reject the null hypothesis, how much does thatsupport the alternative hypothesis? Support for the alternative hypothesismeans that the identified difference was so large as to be unlikely to haveoccurred by chance. If the difference didn't occur by chance, why did itoccur? Explaining the difference is the researcher's task. One researchermay believe that summer-program students are smarter than the generalpublic, while another researcher may think that the summer programserves to increase students' IQ scores. On the basis of their beliefs, both ofthese researchers are likely to predict that the mean IQ score for a sampleof summer-program students will be greater than 100. Suppose that bothresearchers collect and analyze some data, and both find results that areconsistent with their predictions. Each researcher can be sure that only 5times out of 100 would the mean of the summer-program students' IQscores be significantly greater than the population mean by chance. How-ever, neither can be totally confident that his or her explanation for theresults is correct. Rejection of the null hypothesis and support of the alter-native hypothesis lend confidence to the results found, but not to theexplanation given. The explanation may or may not be correct; it is vul-nerable to all of the subjectivity, wishful thinking, and faulty reasoningthat humans are heir to. The best explanation will emerge only after othercarefully designed investigations have been conducted.

How TO Do SctnxcnConducting scientific research, like any project, involves a series of

steps. In general, the steps are the same for any scientific research project,only the specifics differ from project to project. These steps are outlined infigure 3.6.

The first step is to identify the topic to be studied. At first this can besomewhat difficult, not because there are so few topics to choose from,but because there are so many.

One way to begin is to think about courses you have had in psychol-ogy and related fields. What courses were your favorites? Thumb

Figure3.6Thestepsinconductingscientificresearch

Step I

,r"{

ldentifY a toPic

\..0 ,Communicate results Learn about the topic-"t \

Step 3

,","'t0111'1,,,,, t"'-ltitJtne'is

3t"p e steP 4

Analyze data Design the studY

\ step5 ^/Collect data

through your old introductory psychology textbook' which chapters did

f." fi?a ^ost

fascinating? a"oinut approach i's to consult with faculty or

other students who ut! conductin[ research' Often' there are more

projects that need to be done than ariy single.person has time to address'

Choose a topic to research that you fit a plrtiiularly interestin$i research

is a time-.onr.r,,'ing task u.rd .ur, becoml tedious if you aren't especially

curious about the qlestions you have asked'

Once yor' havi identified a topic, the second step is to learn about

what has already been done in the area. The library is your primary

source for this iniormation. The results of previous research on your topic

may be found in whole books or in book cirapters' Research journals Pub-

lish descriptions of individual research projects, but they also publish

review articles, which describe the resulis of many projects' (For more

information uborrt journal articles and literature searches, see aPPendix

C.) Courses and. textbooks can also serve as sources of information about

an area. Less frequently considered, but often Very *'itlY|ile, is actual

CorresPond.ence*itntheexpertsandresearchersinafield.Theseindivid-uals can provide valuable information about details and nuances of their

work that would be unavailable elsewhere'

The impetus for specific research projects may aPPear during your

review of the area. Perhaps you find the reiults of one study doubtful and

wish to replicate it. Maybe you'd like to try to detect a particular phenom-

enon under a different set of circumstarr.ur. You might decide to combine

ideas from two different studies or to cond'uct the next in a series of

projects. The only way to learn from others' experiences is to discover

what others have done'The third step is to focus on a specific research question and form a

hypothesis. This entails narrowing your focus from a general area of

research to a specific question that you want to answer' Your predicted

I

I

1

56 ('lraP[1.p'l'llt.t.t,

ill-rswcr t. the research question. is the hypothesis. F{ypotlrcscs can Lrederived from theory or from previous ."r"u..t-, or may simply reflect curi-osity about a topic.

. f."thu-ps you have been learning about the research conducted on eat-ing disorders.and, during the same tlme period, you rearnedabout oper_ant conditioning in one of your psychology .turrlr. y;;At wonder ifoperant conditioning can be usid- to mJdify eating beha#ors by usingpositive reinforcement to increase eating. A hypotriesis needs to be pre_cisely stated in a testable manner; thereLre, this hypothesis needs to behoned some' Perhaps it develops into the-iollowing statement: partici-pants who receive positive reinforcement contingent upon eating willconsume more food than do participants who receive no positive rein-forcement. As you learn more about."ruu..h, you,ll see that the researchquestion and hypothesis guide how u rt"a/ i, designed.The fourth step involves designing yo,lr r,"ay so that the results willeither support or refute your hypotheJir. rni, is when you decide exactly

!o* you will make yorr observatior,r. H".,define your terms. Continuing with orr. uuttriglr""l#il?: :ff*" lll:terms "positive reinforcement contingent upon eating,, and ,,consumemore food" tt"* to be operationally iefinui. vtuyue positive reinforce-ment will be defined as complimentary statements about the participant,shair and clothing made *ithi' one second after the participant eats apotato chip. Having thus defined the food us potuto chips, we might mea_sure the consumption of potato chips i' g'ru*r. Th; research designimplied in this hypothesis involves u_. Jrp"rr-".,tut group (whichreceives the positive reinforcement) and a contror group. Many othertypes of research designs can be tm"a to test hypotheses; each has its ownadvantages and

,disadvantages. The choice tf .eseurch design oftenreflects a balance between theienefits and pitfalls of the design, the prac-tical concerns of the particular situalion, ur-,i-f".ronal preference.Many other specific decisions about yo.rl ,rray must also be madeduring this stage. who will be the particila;; * yg-r, study? How manywill you need? will they be testediog"tr*r, ir-, smulr groups, or individu_ally? How will the potaio chips b" p.Jr"r,t"iz wiil the same experimenterinteract with all of the particiiantsf where and when will this Jxperimenttake plate? How long-will ii take? should the participants be asked torefrain frgm eating foisome amount of time prior to the study? when willthe participants ue tota the true purpose of the study? As the questions areanswered and the experiment begins a:.ruk: J*", it is important to keep awary eye out for potential confo,,tndr. The chall*ru is to design your studyso that the results either clearly supporro.;;';;J"pport your hypothesis.The fifth step in conduciing icientific resea.in i, io u.t.ruily makeyour observations and collect y*. data according to the procedures pre_scribed in your research design. Here is where attention io detail duringthe design stage Pays off. It is"often unwise to change the procedures after

I lrt' ltt'st'.tt't'lt l'r'ot'('ss 57

a sttrcly is unclerway, as this makes it more difficult to intcrprct thcresults. F{owever, even the most experienced researchers are occasionallysurprised by problems that arise while the data are being collected, andsometimes this means scrapping the project and redesigning the study.Surprises are not necessarily bad, though, for with every surprise comes a

bit of new information and perhaps the seed of new research efforts.In the sixth step, the data that have been collected are summarized

and analyzed to determine whether the results support the hypothesis.This process is called statistical analysis. By using statistical analysis, youcan determine how likely or unlikely it is that your results are due tochance and with how much confidence you can state that your resultsreflect reality.

The seventh step involves interpreting the results of the statisticalanalyses and drawing conclusions about your hypotheses. Here youdetermine the implications of your results in relation to the topic youfocused on in step 1.

Finally, the eighth step is to communicate your research results to oth-ers. In psychology, this is done a number of ways, including conferencepresentations and publications. Psychology conferences are an importantvenue for presenting research. Many, but not all, conferences review sub-mitted projects and allow the authors of the best projects to present theirwork. Other conferences allow all members of the sponsoring organizationto present research. Also, numerous student conferences allow undergrad-uates to present research to their peers from other institutions. All of thesetypes of conferences provide excellent opportunities to gain up-to-dateinformation, to meet with people who are researching an area in which youare interested, and to become enthused and inspired to conduct research.

Probably the most prestigious way to communicate research results isby publishing an article in a scholarly journal. Other researchers reviewthe proposed article and provide the journal editor and author with feed-back about how to improve the project and/or manuscript; they also pro-vide their opinion about whether the article should be published.(Appendix C relays more information about this topic.) A publishedresearch article has been read by a number of professionals and typically(but not always) represents excellent research work.

Regardless of whether a research project results in a publication orpresentation, doing research inevitably provides the researcher with newinformation, new insights, or simply new questions. Then the cycle beginsagain, as a new research project begins to grow in the researcher's mind.

SurvrrvrARYConducting scientific research

studied and how it is studied so

involves being precise arbout what isthat confounds can be avoided. An

I stt ('lr.tpt1'1' I lrrt't'

irlplrp[.111t stcpr is to cerreftrlly defirle all imprlrtant tcrnrs using ()pcr(rti()nalclcfinitions. Operational definitions differ from dictionary definitions inthat they describe the exact procedures used to produce a phenomenon ormeasure some variable.

A good study not only has clearly defined terms but also provides.consistent results.jThe production of consistent results is called reliabitity.lAs important as reliability is validity.lValidlty is the extent to which a

measurement tool or technique measures what it purports to measurel Astudy that is not valid and/ or is not reliable is of no use to the researcher.

When a study is designed well, so as to provide reliable and validdata for which there is only one explanation, then the study is said tohave good internal validity. Internal validity can be threatened by theexistence of confounds such as experimenter effects or demand character-istics.(If the results of a study may be generalized beyond the original setof participants, it is said to have strong external validity.l

One way to increase the external validity of a study is by choosing a

sample carefully. Random selection maximizes the probability that thesample is representative of the population. However, convenience sam-pling is used more often and is typically followed by random assignmentof participants to different experimental conditions.

Much research in psychology and the other sciences is based onhypothesis testing. The null hypothesis states that there is no effect of theindependent variable on the dependent variable. If two groups are beingcompared, the null hypothesis states that there will be no significant dif-ference between the two groups.

The alternative hypothesis is typically the researcher's prediction.The researcher might predict that the independent variable will cause anincrease in the dependent variable-that vrould then be the alternativehypothesis. This particular example would be a one-tailed hypothesis,because it predicts the direction of the difference. A two-tailed hypothesis

/predicts a difference but does not predict its direction.lThe researcher actually tests the null hypothesis. If the researcher

finds strong enough evidence, he or she will reject the null hypothesisand thus support the alternative hypothesis. Without strong evidence, theresearcher fails to reject the null hypothesis.

The null hypothesis is rejected, or is not rejected, on the basis of prob-abilities. If the probability is strong enough, the researcher will reject thenull hypothesis. If in reality the null hypothesis is true, however, theresearcher has made a Type I error. If the researcher fails to reject the nullhypothesis when it is actually false, the researcher has made a Type IIerror. Researchers never know if they have made one of these errors, butthey do take measures to reduce the probability of doing to. fn" moststraightforward way to reduce the probability of a Type II error withoutincreasing the probability pf a Type I error is to increase the number ofparticipants in the study. I

F

l llt' ltt'st'lt't'lt l'r(tt'('ss 59

I ('Onrltrcting rcscarrch is not a linear task, but instead tencls ttl Lrc circtt-

lar; thc rt'sults ,',1 o." study affect the way in which the next study is

rlesignecl ttnd interPreted. I

IvrponrANT Tsnnas AND CoNcnPrs

alpha (cr) PoPulationaliernative (or research) Power

hypothesisbeta (B)

confoundconvenience (or accidental)

sampledemand characteristicsexperimenter effectexternal validitYextraneous variablesinternal validitYnull hypothesisone-tailed hYPothesisoperational definition

ExEnctsEs1. A researcher wishes to look at the effect of stress on fidgeting' What

terms need to be operationally defined? What are some possible oPer-

ational definitions?

2. what is the difference between reliability and validity? Can a study

be valid and not reliable? Can a study provide reliable data but not

valid data?

3. Suppose I have conducted a study in which participants were asked to

perform a mood induction task ihat created a happy, sad, or neutral

mood. The participants were then asked to complete a questionnaire

about their sense of wellness. why might demand characteristics be a

problem in this study? How .o.rid de"mand characteristics affect the

results?

4. A researcher wants to create a random sample of students at Smart U'

A friend suggests that the researcher walk across camPus "lqapproach every third Person she encounters. Is this a random sample?

If not, what tip. of sample is it? Can you dsvelop a procedure to cre-

ate a random samPle?

5. What is the difference between a null hypothesis and an alternative

hypothesis?

6. I want to investigate the effect of chocolate on mood. one group of pirr-"

ii.if""ts eats a cf,ocolate bar before completing a mood.scale.; thc othe r

participants complete the mood scale without first eating chocolate' I

random assignmentrandom selectionregion of rejectionreliabilitysamplesampling distributionsignificant differencestatistical analYsistwo-tailed hYPothesisTyp" I errorTyp" II errorvalidity

Documents

3 the Research Process