19

Click here to load reader

What measurement is all about

  • Upload
    u

  • View
    224

  • Download
    9

Embed Size (px)

Citation preview

Page 1: What measurement is all about

Theory & Psychology22(4) 467 –485

© The Author(s) 2012 Reprints and permission:

sagepub.co.uk/journalsPermissions.navDOI: 10.1177/0959354311429997

tap.sagepub.com

What measurement is all about

Uwe Saint-MontFH Nordhausen

Abstract The nature of psychological measurement is still the subject of fierce controversy. A rather philosophical debate has been going on in this journal; therefore a closer look at physicists’ ideas on measurement may be helpful. In particular, we will try to clarify matters with the help of the crucial concepts of access (validity), precision (reliability), and invariance.

Keywordsepistemology, invariance, measurement, reliability, validity

A recent discussion

During October 2009, the most frequently read article in Theory & Psychology was Trendler (2009), which builds on Michell (2000, 2004). These articles deal with quanti-fication, measurement (theory), and (normal and pathological) science. Closely related are a number of contributions of Michell’s, in particular Michell (2003a, 2003b) and the discussion it provoked in this journal (Borsboom & Mellenbergh, 2004; W.P. Fisher, 2003; Hoshmand, 2003; Kyngdon, 2008; Martin, 2003; Michell, 2005, 2008a; Niaz, 2005).

I wholeheartedly agree with Michell (2005, p. 261) that (a) qualitative methods are rather complementary to and certainly no alternative to quantitative methods and (b) one should not link them automatically to a certain philosophy (be it realist, representational-ist, empiricist, positivist, antirealist, or any other).

Moreover, it is wise to distinguish some slogan, in particular the “quantitative impera-tive,” from the actual compliance with it or with other traditional scientific values (Michell, 2003b, p. 50) by some group of scientists such as, for example, mainstream psychometricians. Michell (2005) is certainly right when he says that “methodological imperatives have no necessary place in science.” There should only be “methodological

Corresponding author:Uwe Saint-Mont, Fachbereich Wirtschafts- und Sozialwissenschaften, Fachhochschule Nordhausen, Weinberghof 4, 99734 Nordhausen, Germany. Email: [email protected]

429997 TAP22410.1177/0959354311429997Saint-MontTheory & Psychology2012

Article

at TEXAS SOUTHERN UNIVERSITY on December 16, 2014tap.sagepub.comDownloaded from

Page 2: What measurement is all about

468 Theory & Psychology 22(4)

rules [which] will always be conditional upon relevant features of the phenomenon investigated” (p. 258).

I start to feel uneasy however, when the last sentence continues with “(e.g., whether the features of interest are quantitative or not)” (Michell, 2005, p. 258). Michell (2004) had already repeated the thesis “that mainstream psychometricians have never seriously attempted to test [the] hypothesis … that psychological attributes … are quantitative,” and he went on to exclude explicitly “the tradition stemming from the writings of Luce and Suppes” (p. 122; e.g., Krantz, Luce, Suppes, & Tversky, 1971). In other words to him, the statement that a psychological attribute is quantitative is a (basic) hypothesis which should be tested empirically: “[T]he hypothesis that psychological attributes are quantitative is accepted as true by mainstream psychometricians, not on the basis of adequate evidence but for extraneous reasons” (Michell, 2008b, p. 12).

Michell (2000, 2003a, 2003b, 2005) extends the historical approach, reaching the conclusion that the idea that everything has to be quantitative can be found—among many others—in Kelvin (1891) and Thorndike (1918, p. 16). The closely related philo-sophical position is Pythagoreanism (Michell, 2005, p. 260f.), about which he writes:

The quantitative imperative is the view that studying something scientifically means measuring it. Measurement is thought to be a necessary part of science and non-quantitative methods are thought to be pre-scientific. This imperative is motivated by the idea that all attributes are fundamentally quantitative, an idea originating with the pre-Socratic Pythagoreans. (Huffman, 1999, as cited in Michell, 2003a, p. 6)

The critical attitude is taken up by Martin (2003), who emphasizes that “[p]sychological phenomena are meaningful, relational, non-extensive, interactive, socioculturally and historically constituted phenomena with moral and political significance. All of these attributes of psychological phenomena are non-quantitative” (p. 36).

In the hands of Trendler (2009) this becomes: “Psychological phenomena are neither manipulable nor controllable to the required extent. Therefore they are not measurable. … Recently, Joel Michell (2000, 2004) has described quantitative psychology … as a pathological science” (p. 579).

Basics on measurement

The term “quantitative” and also the use of mathematics in the sciences is a tricky one. To begin with, Trendler (2009) stresses the importance of the experiment and experimen-tal apparatus which he calls the Galilean Revolution. However, there is a drawback—the Millean Quantity Objection (his words): “psychological phenomena are not dependent or cannot be made to depend on a manageable set of conditions” (p. 590). Hence,

in psychology the extremely successful Galilean method reaches the limits for its successful application. … We must … recognize that Norman Campbell (1920, 1928) was right after all about the non-measurability of psychological attributes. … In psychology we cannot even satisfy Campbell’s first law of measurement … which contains the demand for equivalence between magnitudes. (p. 592)

at TEXAS SOUTHERN UNIVERSITY on December 16, 2014tap.sagepub.comDownloaded from

Page 3: What measurement is all about

Saint-Mont 469

A number of objections can be made to this very pessimistic view. In particular, a lot of scientific progress has come about without Galilean experimentation. Granted, it is unpleasant not to be able to conduct experiments. Nevertheless, astronomy is a highly respected science and Darwin’s theory of evolution has become a cornerstone of modern biology. Thus, even without strict experiments, observational data and experience can be highly informative.

Moreover, it is rather odd to praise Galileo and modern science, on the one hand, but to be highly critical with respect to quantitative methods, on the other. Upon introducing the experimental method, Galileo also emphasized the paramount importance of mathematics:

Philosophy is written in this enormous book which is continually open before our eyes (I mean the universe), but it cannot be understood unless one first understands the language and recognizes the characters with which it is written. It is written in a mathematical language, and its characters are triangles, circles, and other geometric figures. Without knowledge of this medium it is impossible to understand a single word of it; without this knowledge it is like wandering hopelessly through a dark labyrinth. (Galilei, 1623, pp. 16–17)1

It seems highly selective to embrace the first part of this message while ignoring the second equally important one. In the same vein it is also rather artificial to distinguish between modern (philosophy of) mathematics, focusing on structure, and the more tradi-tional view of mathematics, being the science of quantity (Michell, 2000, p. 652). No mathematician, whether he or she be “applied” or “pure,” draws a line between “qualita-tive” structure and “quantitative” amount or number. Of course, fields like networks, graphs, or even nonparametric statistics may at times have a rather qualitative flavour to them, but this is not to say that they are afraid of quantity or even averse to quantifying.

The basic idea that “whatever exists at all exists in some amount” (Thorndike, 1918. p. 16) is, in this generality, a priori belief, and one might call it an imperative (Michell, 2005). Nevertheless, it would be a scientific sensation if we encountered some phenom-enon that existed, but did not do so in some amount. Seen that way, Thorndike’s opinion is founded on hundreds of years of experience and scientific investigation which, as Michell (2003a, 2003b) himself points out, has proven to be enormously successful.

Moreover, even in principle, it is hard to imagine some object or phenomenon that although existent had no extension. Even a strictly dichotomous variable has some struc-ture, and such variables are often associated with highly informative quantitative attrib-utes. Think of gender: although it is a binary concept, the differences between men and women can be quantified in many respects. Thus it seems a rather fruitless endeavour to discuss the fundamental question as to whether each and every attribute “really” is quan-titative or not.

All we can do is observe the phenomena we are interested in, which almost inevitably leads to Stevens’ (1951) famous definition: “The most liberal and useful definition of measurement is the assignment of numerals to things so as to represent facts and conven-tions about them” (p. 29). Campbell (1921/1953)2 agrees: “What is measurement? ... It may be defined, in general, as the assignment of numbers to represent properties” (p. 110). Not only in principle, but rather in detail, Menger (1955/2007), a former member

at TEXAS SOUTHERN UNIVERSITY on December 16, 2014tap.sagepub.comDownloaded from

Page 4: What measurement is all about

470 Theory & Psychology 22(4)

of the famous Vienna Circle,3 follows this line of thought and starts his chapter on “the application of calculus to science” with “quantities,” which he explains as follows:

In the physical and social universe, man is surrounded by countless things with which numbers are somehow paired. ... Any such pair consisting of a thing and a number (in this order) will hereinafter be referred to as a quantity—the first member in the pair as the object, the second member as the value of the quantity. Thus quantity = (object, value). (p. 167)

In this very broad sense we seem to be able to “quantify” almost everything. Thus the real issue is not whether such a weak kind of “measurement” is possible at all. The con-crete point of concern is rather (or should be) whether a certain procedure is able to give valuable information about the object we wish to study. If we cannot access the phenom-enon properly, the numbers will be deficient and, in the worst case, give us no clue whatsoever.

Measurement and invariance theory

In Stevens’ theory, “measurement” is always idealized as a scale s between an empirical and a numerical relational system: that is, an empirical relation between two empirical objects holds if and only if the corresponding mathematical relation holds between the corresponding mathematical objects. Formally, one may write:

Empirical objects a ,b are related to one another (e.g., a b) if and only if their assigned numbers s(a), s(b) are related to one another (e.g., s(a) < s(b)).

In this sense, the measurement process is perfect and measurement becomes “fundamen-tal” in that it is the structure of the empirical system which determines what “level” of measurement is possible.4 In particular, no procedure whatsoever can yield smooth quan-titative information on a phenomenon that has a coarse, discrete structure. Suppes and Zinnes (1968) give further examples:

In general, any [emphasis added] empirical procedure for measuring mass does not determine the unit of mass. The choice of unit is an empirically arbitrary decision made by an individual or group of individuals. ... An empirical procedure for measuring temperature by use of a thermometer [emphasis added] determines neither a unit nor an origin. (p. 9)

Michell (2008b) is also perfectly clear about this: “[L]atent traits are still implicitly taken to be quantitative and the distinction between parametric and nonparametric models is thought to reside only in what we are able to infer about the traits” (p. 13).

It should be noted that the above argument has shifted attention from the performance of the real measurement procedure to the structure of the attribute being measured. In other words, Stevens’ theory diverts attention away from the primary problem of access (how do I have to measure in order to get hold of some phenomenon; in particular, what is my measurement device doing?) and devotes much thought to the question what may be said (at best) about a certain attribute with a given structure. This gives measurement theory a distinctive normative flavour, to which we will turn in a moment.

at TEXAS SOUTHERN UNIVERSITY on December 16, 2014tap.sagepub.comDownloaded from

Page 5: What measurement is all about

Saint-Mont 471

In order to distinguish between different kinds of measurement, one needs to look in more detail at the scale representing the measurement process. The latter also influences the result, in that it may “blur” the “real” structure. Think of temperature: before Kelvin, numerous scales had been proposed.5 It turned out that, although superficially different, they could all be transformed into one another by means of a simple linear function: for example, °F = 9 °C /5 +32. In other words, all methods to access temperature are equiva-lent up to origin and unit of measurement, which are conventions. Therefore it makes sense to say that an additional 5°C is just the same as an increase of 9°F. However, it does not make sense to think that 20°C is twice as hot as 10°C because the corresponding numbers in °F are 68 and 50.

Kelvin is celebrated for his absolute temperature scale, which gives the origin a physi-cal meaning. In the jargon of measurement theory, he managed to advance the measure-ment of temperature from an interval scale to a ratio scale, where it is meaningful to say that 30°K is just three times as “hot” as 10°K. Mathematically speaking, any other “ratio” measurement procedure of temperature with the unit °T, say, has to be connected to Kelvin’s scale by means of a linear transformation of the form, °T = a °K where a > 0. Stevens’ well-known typology of scales extends the two kinds of scales just considered, and thus provides a way to deal with the question as to how much we can trust our meas-urements (Table 1).

In particular, Stevens’ classification distinguishes between information on the quanti-tative and the qualitative level: that is, ratio and interval scales, on the one hand, versus ordinal and nominal scales, on the other hand. Michell builds on this tradition and explic-itly refers to “ordinal versus quantitative information” (Michell, 2008b, p. 13) and to quantitative attributes “(as opposed to merely ordinal)” (Michell, 2008a, p. 122).

Invariance

At the heart of any “change of unit” or appropriate transformation lies the idea of invari-ance, which has become more and more important in physics over the years. Wigner (1949) makes the point clearly:

[Einstein’s] papers on special relativity also mark the reversal of a trend: until then, the principles of invariance were derived from the laws of motion. ... It is now natural for us to try to derive the laws of nature and to test their validity by means of the laws of invariance, rather than to derive the laws of invariance from what we believe to be the laws of nature. (p. 522)

Nowadays physicists use the term “symmetry” to express the idea: “By symmetry we mean the existence of different viewpoints from which the system appears the same. It is only slightly overstating the case to say that physics is the study of symmetry” (Anderson, 1972, p. 394). Campbell (1920, 1928), also a physicist, was inspired by exactly this kind

Table 1

Class of Transformations: (Strictly) linear Affine Monotone 1:1Associated Scale Type: Ratio Interval Ordinal Nominal

at TEXAS SOUTHERN UNIVERSITY on December 16, 2014tap.sagepub.comDownloaded from

Page 6: What measurement is all about

472 Theory & Psychology 22(4)

of argument to set the “gold standard” of measurement: that is, given a set of measure-ment procedures, he observed that they are all related to one another by a group of trans-formations. The corresponding group of transformations is called admissible, and one may say that measurement (of a certain attribute on a certain level) is unique up to a class of transformations. In the case of temperature, all classical procedures used the same kind of physical effect: that is, the expansion of some substance upon a rise of tempera-ture in order to access the physical phenomenon of interest. Therefore it is not surprising that the scales they yield are equivalent. Yet it is interesting that they are all related to one another by means of a linear transformation.

This brings about the normative feature of measurement theory, classically expressed by Luce (1959):

If the interpretation of a particular statistic or statistical test is altered when admissible scale transformations are applied, then our substantive conclusions will depend on which arbitrary representation [emphasis added] we have used in making our calculations. Most scientists, when they understand the problem, feel that they should shun such statistics and rely only upon those that exhibit the appropriate invariances for the scale type at hand. Both the geometric and the arithmetic means are legitimate in this sense for ratio scales (unit arbitrary), only the latter is legitimate for interval scales (unit and zero arbitrary), and neither for ordinal scales. (p. 84)

Measurement: The real thing

The recent contributions to this journal and also the standard textbooks show that the invariance conception of measurement is still prevalent. Yet this attitude has been much criticized, in particular by Duncan (1984), to whom we owe the title of this section (see his chap. 5), and Tukey. (For an overview of the arguments see Velleman & Wilkinson, 1993; a more extensive discussion can be found in Saint-Mont, 2011.)

To begin with, coming from physics, Campbell insisted that “real” measurement had to be at least on an interval scale (in Stevens’ terminology). In a similar vein, statistician John Tukey, addressing psychologists, writes with respect to the quantitative vs. qualita-tive distinction:

Bear in mind a simple fact: the great majority of the useful facts that physics has learned—and recorded in numbers—are specific and detailed, not global and general. The qualitative properties of things have proved much less important than the quantitative ones. Why should this not hold true for people? I believe that just this will prove to be so, but not without much effort. Even if the task is hard, is it not past time to begin, especially in selected, more or less well-understood, subfields? (as cited in Jones, 1986b, p. 728)

Years later he reiterates:

For pure intellectual curiosity—and perhaps for writing treatises for the intellectually curious—it may be that confident directions (up, uncertain, or down) can suffice. It would be good to understand why many psychologists, for example, seem to be content with only confident directions. Is this a desire for abstract knowledge? Or a sign of inability to make use of more

at TEXAS SOUTHERN UNIVERSITY on December 16, 2014tap.sagepub.comDownloaded from

Page 7: What measurement is all about

Saint-Mont 473

quantitative results? Or an unwillingness to price (or a lack of experience in pricing) comparisons as a basis for real-world actions? Or a belief that qualitative knowledge is all that psychologists can hope for? Or what? (Tukey, 1991, p. 104)

Following this line of thought (or Stevens’ typology), Duncan (1984) criticizes nominal scales for they only distinguish between equal and unequal. Thus all three critics are less impressed by the merits of invariance than by the lack of information.

A look at the actual historical development also reveals that invariance considerations did not come first. For thousands of years, geometry had been practised successfully when Felix Klein finally used invariance to classify the field in his “Erlanger Programm” (Narens, 2002; Schönemann, 1994). As Wigner (1949) explains, it was not until the 20th century that the idea became prominent in theoretical physics, and the understanding of the measurement process followed the same track. Schönemann (1994) writes: “For many centuries, natural scientists were relatively unconcerned about the philosophical status of their measurements. Notwithstanding [sic], they were making good progress towards building up a quantitative science” (p. 152). Tukey (1986) gives a concrete example:

How were temperatures measured? With one of any of several kinds of thermometers. ... Would there be agreement between the different kinds of thermometers? Approximate agreement, yes; exact agreement, certainly not. Would any one kind have sufficient theoretical support to be chosen as the standard over all others. No. Clearly temperature was not measured on an interval scale in those days. But equally clear, it made good sense to ... calculate the arithmetic mean of a group of temperatures. Temperature was not measured on a mere ordinal scale. It was measured on a scale which, though not an interval scale, was still quite well defined. (p. 246)

Given this background, Tukey (1986) comments on Campbell:

Just as some have done for mathematics, measurement may be divided into “monastic” and “secular.” The analogy of the “high church” view, which we naturally call the “high monastery” view, is surely that provided by Norman R. Campbell [... His contributions] have been the source, proximate or remote, of many fears that assignment of numbers, many of which would have been perfectly useful, were not “measurements.” ... There is little doubt that measurement which fulfills Campbell’s requirements, exactly or approximately, is measurement which deserves the highest social status, the highest prestige that we can today imagine. (p. 248)

There are several lessons to be learned from this:

• A formalization of the measurement process followed by invariance considera-tions does not come first. Rather, these developments prove to be useful in struc-turing existing substantial results.

• A successful application of the invariance idea should depend on the field under study.

• A rigid or even “fundamental” point of view is to no avail.

at TEXAS SOUTHERN UNIVERSITY on December 16, 2014tap.sagepub.comDownloaded from

Page 8: What measurement is all about

474 Theory & Psychology 22(4)

Thus the “badmandment be exactly wrong, rather than approximately right” (Tukey, 1986, p. 201) seems to summarize quite well Tukey’s reply to Luce’s viewpoint given above:

The view thus summarized is a dangerous one. If generally adopted it would not only lead to inefficient analysis of data, but it would also lead to failure to give any answer at all to questions whose answers are perfectly good, though slightly approximate. All this loss for essentially no gain. [Luce’s point of view shows] a lack of adequate recognition that knowledge is approximate, not precise. ... An oversimplified and overpurified view of what measurements are like cannot be allowed to dictate how data is to be analyzed. (Tukey, 1986, pp. 243–247)

In the same vein Michell (1999) translates a passage from Galilei (1612):

We must not ask nature to accommodate herself to what might seem to us the best disposition and order, but must adapt our intellect to what she has made, certain that such is the best and not something else.6 (p. xiii)

Let us summarize with Schönemann (1994) that ever since Galileo “most knowledgeable people agree that the success of Western science ... is bound up with the successful appli-cation of mathematics, which, in turn, builds on the successful quantification of the phe-nomena it attempts to describe” (pp. 150, 152). Therefore the famous maxim “to measure what is measurable and to render measurable what is not yet so”7 catches the essence of successful empirical science.

Because of its first part, Michell (2008b) is right in citing Boring (1920, p. 33) that “it is senseless to seek in the logical process of mathematical elaboration a psychologically significant precision that was not present in the psychological setting of the problem” (p. 15). This also seems to be the main point in Martin (2003), who warns against quan-titative methods in general, because they typically “water down” some phenomenon of interest. I agree with him that questionnaires are a narrow instrument in order to access psychological attributes and that “good critical, historical, conceptual, interpretative and narrative research in psychology (all qualitative)” (p. 36) may at times prove helpful.

However, the second part of the maxim explains why it is hardly surprising that little progress can be achieved with these methods. By its very nature, qualitative research is imprecise and it is very difficult to break through the surface of the phenomena studied using nothing but “soft” natural language. Thus it is (at least) questionable whether such methods really provide an attractive alternative.

Invariance (“measurement”) theory diverts even more thoroughly from facing up to Galileo’s challenge—render measurable!—leading to rather fruitless mathematics and fundamental doubt: for example,

• Trendler (2009): “Psychological phenomena [emphasis added] ... are neither manipulable nor are they controllable to the extent necessary for an empirically meaningful application of measurement theory. Hence they are not measurable [emphasis added]” (p. 592).

• Michell (2008b): “We do not yet know [emphasis added] whether psychometrics actually has a subject [emphasis added]” (p. 22).

at TEXAS SOUTHERN UNIVERSITY on December 16, 2014tap.sagepub.comDownloaded from

Page 9: What measurement is all about

Saint-Mont 475

Instead of considering such rather fundamental questions, almost inevitably leading to all-or-nothing statements, dividing public opinion, and inviting endless debate, one had better reformulate the query. The question whether an attribute (really) is “quantitative” or not can hardly be answered in a reasonable way, yet it pays to ask narrower and more detailed questions that may be answered with the help of empirical data: for example, whether a certain attribute has a specific structure (be it “qualitative” or “quantitative”).

Reliable access

“Scientists want to find out how natural systems work” (Michell, 2008b, p. 7), and in order to do so, they have to gain access to the system. That is exactly what measurement is about: establishing a reliable link to the phenomenon under study. “Measuring the right things on a communicable scale lets us stockpile information about amounts” (Tukey, as cited in Jones, 1986b, p. 729).

Jeffreys (1973) expresses this idea in rather technical language:

It is widely supposed that dimensions are concerned entirely with transformations of units. This is not so. Dimensions of ... magnitudes arise through the method of measurement itself; and even if we never had to change units the dimensions [emphasis added] of a derived magnitude arise in describing the property it measures [emphasis added]. Dimensions do help in transformations of units, but dimensions come first [emphasis added]. (p. 94)

Thus invariance arguments are not only downstream from a historical point of view; much more importantly they are also second from a logical point of view. First comes the property being measured. Jeffreys (1973) gives an example:

When we say that a density is 1.34 grams per cubic centimetre, the expression “1.34 grams per cubic centimetre” must be taken as a whole; no item in it, neither “1.34” nor “grams” nor “cubic centimetre,” can be changed without altering the meaning of the whole. For this reason it is misleading to speak, as is often done in writings on the theory of dimensions, of a “mere change of units.” There is no such thing as a mere change of units. If we alter a unit without altering the number in the measure, we are speaking of a different physical system, and cannot assert anything about it without a physical law to guide us; while if we already know the physical law, a change of units tells us nothing that we cannot find out by keeping the same units and altering the numerical measure. (p. 91)

In the social sciences, linear statistical analyses are very common. Now, regression coef-ficients depend on the unit of measurement, correlation coefficients do not. Is it thus wise to prefer the latter? Tukey strongly disagrees:

If we wish to seek for constancies, then, regression coefficients are much more likely to serve us than correlation coefficients. Why then are correlation coefficients so attractive? Only bad reasons seem to come to mind. Worst of all, probably, is the absence of any need to think about the units for either variable. Given two perfectly meaningless variables one is reminded of their meaninglessness when a regression coefficient is given, since one wonders how to interpret the value. A correlation coefficient is less likely to bring up the unpleasant truth—we THINK we know what r=-.7 means. DO WE? How often? Sweeping things under the rug is the enemy of

at TEXAS SOUTHERN UNIVERSITY on December 16, 2014tap.sagepub.comDownloaded from

Page 10: What measurement is all about

476 Theory & Psychology 22(4)

good data analysis. Often, using the correlation coefficient is “sweeping under the rug” with a vengeance. Being so disinterested in our variables that we do not care about their units can hardly be desirable. (cited in Jones 1986b, p. 734)

Validity

Accessing a physical parameter can be easy. Just count heartbeats, step on a weighing machine, or apply a thermometer to the right place. Of course, this need not be the case in other sciences. With respect to psychological measurement, Meehl (1978) gives a nice example:

It is as if we were interested in the effect of sunlight on the mating behavior of birds, but not being able to get directly at either of these two things, we settle for correlating a proxy variable like field-mice density (because the birds tend to destroy the field mice) with, say, incidence of human skin cancer (since you can get that by spending too much time in the sun!). You may think this analogy is dreadfully unfair; but I think it is a good one. (p. 823)

Even if a variable can be operationalized rather easily, it may be rather questionable whether a standard procedure—for example, a questionnaire on objectionable behaviour—will give valid answers. In the case of drug abuse, this has recently been shown explicitly:

Trends in drug abuse are currently estimated indirectly, mainly by large-scale social, medical, and crime statistics that may be biased or too generic. We thus tested a more direct approach based on “field evidence” of cocaine use by the general population. ... [C]ocaine and its main urinary metabolite ... were measured by mass spectrometry in water samples collected from the River Po. (Zuccato et al., 2005, Abstract, 1st and 2nd para.)

The authors checked the quality of their analysis and wrote: “The fair correspondence of surface water and waste water findings, despite the different settings and assumptions, suggests that our approach is reliable, and our estimates realistic” (Discussion, para. 5). Moreover, in the meantime the results could be replicated without much effort elsewhere and are striking: actually, according to the new pharmacological methods, the amount of drugs consumed is several times as high as previously estimated.

Typically, psychologists refer to validity when discussing such issues. Here is a clas-sical definition: “The problem of validity is that of whether a test really measures what it purports to measure” (Kelley, 1927, p. 14).8 It should be clear, however, that validity is a basic concern for any measurement process and empirical science in general, as, without validity, we are totally missing our target. Cohen, Cohen, Aiken, and West (1999) say:

One of the most fundamental tasks in building a science is the establishment of standard operationalizations of the major constructs used in its theory. ... Measurement generally begins with some arbitrary reference unit around which information builds and familiarity is created. (p. 315)

Figuratively, this has been described as “sinking piles into a swamp” (Popper, 1959). However, having cited Popper, Meehl (1990) goes on to say that, “[u]nfortunately in the social sciences, the situation is more like standing on sand while you are shoveling sand ... and, alas, in soft psychology the sand is frequently quicksand” (p. 127).

at TEXAS SOUTHERN UNIVERSITY on December 16, 2014tap.sagepub.comDownloaded from

Page 11: What measurement is all about

Saint-Mont 477

Bias and precision

Instead of (a lack of) validity, statisticians are concerned about “bias”: that is, whether or not a substantial error can be avoided in the measurement process. Therefore, to a large extent, “unbiased” is the statistical equivalent to “valid.” However, as the above exam-ples show, there is much more to validity than just deviation, and one should not only think of validity in terms of (numerical) “error.”

Conversely, already statistics’ narrowest formal model—that is, a random variable T estimating some unknown constant µ9—offers the advantage of distinguishing clearly between systematic error (“bias” in a narrow sense) and unsystematic error (variability, lack of precision). In the social sciences the latter term is often called reliability and has, in general, like validity, a broader meaning. In particular, reliability is closely related to replicability and internal consistency (Cronbach, Rajaratnam, & Gleser, 1963).

Which of the two concepts is more important? Feldman Barrett (2009, p. 315) uses the terms meaningfulness (instead of validity) and effectiveness (instead of reliability). Thus it is obvious that meaning should be put in first place. Leaving nuances aside, the statistical perspective leads to the same conclusion: “The reduction of bias should, I think, be regarded as the primary objective—a highly precise estimate of the wrong quantity is not much help” (Cochran as cited in Rubin, 2006, p. 22).10

The traditional Neyman–Pearson theory of unbiased estimation follows this line of thought: first, it restricts attention to the class of all estimators without a systematic bias; then it goes on to select the one with the best (smallest) variance.

However, a large unsystematic error isn’t of much help either, and therefore, in a sense, reliability (precision) is a necessary condition for validity (meaning):11 A bad measurement device will never be able to pin down a phenomenon, and a hardly opera-tionalizable concept will never attain some well-circumscribed denotation. Moreover, systematic and unsystematic deviations are closely related, in particular by the well-known formula for the mean squared error (MSE): that is,

MSE T E T Bias T T( ) ( ) ( ) ( )= − = +µ σ2 2 2

Hence it seems unwise to minimize the bias if this inflates, at the same time, the variance σ2 of the estimator. Tukey gives a related argument:

We have often been guided by a purer-than-thou philosophy of “unbiased estimation of something, whether or not it be what we really want to estimate!” A biased estimate of what we really want to estimate can be more useful ... than an unbiased estimate of something we don’t want. (cited in Jones, 1986a, p. 110)

Invariance and reliability

It should have become clear by now that validity, in particular in the guise of unbiased-ness, cannot be the whole story. In a nutshell, in order to render something measurable, reliability is important too and invariance arguments are helpful.

at TEXAS SOUTHERN UNIVERSITY on December 16, 2014tap.sagepub.comDownloaded from

Page 12: What measurement is all about

478 Theory & Psychology 22(4)

Underlining this, there is a strong invariance argument against the traditional “unbi-ased” way of proceeding. Given a nonlinear transformation θ = g(µ), say, the correspond-ing estimate g(T) is no longer unbiased! In particular, if σ̂ is an unbiased estimate of the standard deviation, σ̂2 will be a biased estimate of the variance. R.A. Fisher (1973)12 concluded

This consideration would have eliminated such criteria as the estimate should be “unbiased,” meaning that the average value of the estimate should be equal to the true estimand; for if this were true of any parameter, it could not also be true of, for example, its square. (p. 146)

At least if validity is not an issue, it even seems to be reasonable to focus on reliability. That is exactly what physicists do. Nobel laureate Robert B. Laughlin (2005) elaborates:

[I]n physics, correct perceptions differ from mistaken ones in that they get clearer when the experimental accuracy is improved. This simple idea captures the essence of the physicist’s mind and explains why they are always so obsessed with mathematics and numbers: through precision, one exposes falsehood. A subtle but inevitable consequence of this attitude is that truth and measurement technology are inextricably linked. Exactly what you measure, how the machine works, how one decimates the errors, what uncontrolled factors set the reproducibility ceiling, and so forth matter more than the underlying concept [emphasis added]. (p. 14)

Laughlin goes on to say that physics is built on no more than 10–20 enormously accurate experiments! Contrary to this, if a conjecture “is not even wrong” (a quote attributed to Pauli),13 it is too vague to reach a firm decision, and physicists are very suspicious if a procedure cannot be made reliable or a concept remains imprecise despite all efforts to the contrary.

Cargo cult science

Excellent experiments lead to valid and reliable data. Feynman (1997) gives an example relevant to psychology:

[T]here have been many experiments running rats through all kinds of mazes, and so on—with little clear result. But in 1937 a man named Young did a very interesting one. He had a long corridor with doors all along one side where the rats came in, and doors along the other side where the food was. He wanted to see if he could train the rats to go in at the third door down from wherever he started them off. No. The rats went immediately to the door where the food had been the time before.

The question was, how did the rats know, because the corridor was so beautifully built and so uniform, that this was the same door as before? Obviously there was something about the door that was different from the other doors. So he painted the doors very carefully, arranging the textures on the faces of the doors exactly the same. Still the rats could tell. Then he thought maybe the rats were smelling the food, so he used chemicals to change the smell after each run. Still the rats could tell. Then he realized the rats might be able to tell by seeing the lights and the arrangement in the laboratory like any commonsense person. So he covered the corridor, and still the rats could tell.

at TEXAS SOUTHERN UNIVERSITY on December 16, 2014tap.sagepub.comDownloaded from

Page 13: What measurement is all about

Saint-Mont 479

He finally found that they could tell by the way the floor sounded when they ran over it. And he could only fix that by putting his corridor in sand. So he covered one after another of all possible clues and finally was able to fool the rats so that they had to learn to go in the third door. If he relaxed any of his conditions, the rats could tell. (p. 338)

If psychologists had understood the logic of measurement, the experiment just described would have become a classic in the field. All measures described by Young would be in use to this day, guaranteeing reliable data and thus leading to valid conclusions about learning in rats. Feynman (1997) says:

Now, from a scientific standpoint, that is a number-one experiment. That is the experiment that makes rat-running experiments sensible, because it uncovers the clues that the rat is really using—not what you think it’s using. And that is the experiment that tells exactly what conditions you have to use in order to be careful and control everything in an experiment with rat-running. (p. 338)

However, others did not build on these fundamental results, and Feynman continues:

I looked up the subsequent history of this research. The next experiment, and the one after that, never referred to Mr. Young. They never used any of his criteria of putting the corridor on sand, or being very careful. They just went right on running the rats in the same old way, and paid no attention to the great discoveries of Mr. Young, and his papers are not referred to, because he didn’t discover anything about the rats. In fact, he discovered all the things you have to do to discover something about rats. But not paying attention to experiments like that is a characteristic example of cargo cult science. (p. 338)

Going from bad to worse, hardly anybody seems to worry if “improving the precision of our observational conditions decreases the precision of our observations” (Michell, 2008b, p. 15).14 Moreover, this kind of paradox—improved input leading to diminished output—is not restricted to psychometrics. More than 40 years ago, Meehl (1967) described it with respect to the omnipresent testing of statistical hypotheses:

In physics, the null hypothesis corresponds to a consequence of a substantive theory. Increasing the number of observations increases precision and is therefore “setting up a more difficult observational hurdle for the theory T to surmount”. (p. 113)

In other words, it should be clear that

[t]here are no inferential grounds whatsoever for preferring a small sample ... the larger the sample the better. ... The larger the sample size the more stable the estimate of effect size; the better the information, the sounder the basis from which to make a decision. (Oakes, 1986, pp. 29, 32)

However, in mainstream psychology it’s just the other way around: the null-hypothesis usually is a “nil-hypothesis,” corresponding to chance. Thus the better the experiment, the weaker the empirical check. Meehl (1967) writes:

In the physical sciences, the usual result of an improvement of experimental design, instrumentation, or numerical mass of data, is to increase the difficulty of the “observational

at TEXAS SOUTHERN UNIVERSITY on December 16, 2014tap.sagepub.comDownloaded from

Page 14: What measurement is all about

480 Theory & Psychology 22(4)

hurdle” which the physical theory of interest must successfully surmount; whereas, in psychology and some of the allied behavior sciences, the usual effect of such improvement in experimental precision is to provide an easier hurdle for the theory to surmount. (p. 103)

Even worse, given enough data, almost any substantive hypothesis may be “confirmed.” Gelman, Carlin, Stern, and Rubin (2004) say:

Null hypotheses of no difference are usually known to be false before the data are collected; when they are, their rejection or acceptance simply reflects the size of the sample and the power of the test, and is not a contribution to science. (p. 193)

Thus they repeat what Meehl (1978) noticed much earlier: “Putting it crudely, if you have enough cases and your measures are not totally unreliable, the null hypothesis will always be falsified, regardless of the truth of the substantive theory” (p. 822). Methodology that is flawed to an extent that it is thoroughly misleading should be called wronger than wrong. It is rather frightening that psychology, despite much effort to the contrary, has not even been able to overcome its “test ritual” (Gigerenzer, 2004; Salsburg, 1985; Sedlmeier, 1996). I agree with Meehl (1978), who concluded:

I believe that the almost universal reliance on merely refuting the null hypothesis as the standard method for corroborating substantive theories in the soft areas is a terrible mistake, is basically unsound, poor scientific strategy, and one of the worst things that ever happened in the history of psychology. (p. 817)

Methodological consequences

Measurement in psychology should by no means be restricted to invariance (“measure-ment”) theory or even purely mathematical considerations (“abstract” measurement theory). As the physicist Jaynes (2003) put it:

[N]othing could be more pathetically mistaken than the prefatory claim ... that mathematical rigor “guarantees the correctness of the results.” On the contrary, much experience teaches us that the more one concentrates on the appearance of mathematical rigor, the less attention one pays to the validity of the premises in the real world, and the more likely one is to reach final conclusions that are absurdly wrong in the real world. (p. 674)15

(Prior) normative requirements, theory detached from data, or even philosophical argu-ments won’t help either. It is futile to evade the real problems using a time-honoured ritual or “a definition [of the measurement process] made to measure” (Michell, 1999, p. 162). The kind of operationalism which this “misreading” of The Logic of Modern Physics (Bridgman, 1927) evoked16 could only lead to unpleasant philosophical conse-quences; in particular: “Thus, an attribute is defined by its measuring procedure, no more and no less, and has no ‘real’ existence beyond that” (Hand, 1996, p. 453).17

Much worse, this legacy forces contemporary authors such as Borsboom (2005) to fight their way through the foundational morass of true scores, latent variables, and scales, just to finally hit upon the basic but crucial concept of validity. Instead of focus-ing on the real issues, like physicists—in particular, “what is my measurement device

at TEXAS SOUTHERN UNIVERSITY on December 16, 2014tap.sagepub.comDownloaded from

Page 15: What measurement is all about

Saint-Mont 481

really doing, and how could I improve it?”—all kinds of rather artificial questions have to be addressed by psychologists working in the field. At the end of the day, Jaynes’ (1976) words with respect to a basic statistical dispute are also valid here:

Philosophical disputation may be great fun; but through recorded history its score for actually solving problems is, I believe, precisely zero. Anybody who genuinely wants to see these issues resolved must recognize the need for a better method. (p. 230)

What is really needed is reliable access to the phenomena of interest. In this vein, experi-mental psychology following Wundt has been much more successful than psychoanaly-sis following Freud. And quantitative methods, be they narrow or based on “proxy” data, have led to deeper insights and better predictions than qualitative or narrative methodol-ogy. Barrett (2008) gives a nice example: “The best tool for predicting violent recidivism ... turns out to be a straightforward behavioral checklist, using a simple integer impor-tance-weighting scheme for its items, which, when summed, produces a classification accuracy of 72%” (p. 81).

Following Galileo, the measurement of what could be measured has given us many insights, and we have worked on better tools in order to make measurable what has lain beyond our reach. The simple idea to ask people what they are experiencing, thinking, and planning was a step in that direction and led to psychological tests. These standard-ized procedures have stood the test of time and have provided us with detailed informa-tion on many psychological attributes. Alas, they are subject to cheating, and far too many important psychological processes cannot be accessed that way.

Thus an even more direct approach would be better still and has been provided by methods monitoring brain activity. Starting with the EEG, these roads of access have proved to be immensely fruitful, yielding undisputed cumulative progress. Therefore modern neuroimaging, which has already led to a flood of publications, seems to be the best way of studying the mind in the future. The faster and more precise these visual methods are going to become, the more we will understand how the brain works and how it brings about all kinds of psychological phenomena.

With respect to measurement in general, the moral also seems to be clear: access (meaningfulness, validity) is crucial, precision (effectiveness, reliability) is important, and one should not forget about invariance properties. Thus the watchword should be: accurate information first, symmetries second, mathematical sophistication and philo-sophical debates third.

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Notes

1. My translation. Galileo Galilei wrote:

La filosofia è scritta in questo grandissimo libro che continuamente ci sta aperto innanzi a gli occhi (io dico l’universo), ma non si può intendere se prima non s’impara a intender la lingua, e conoscer i caratteri, ne’ quali è scritto. Egli è scritto in lingua matematica, e i caratteri sono

at TEXAS SOUTHERN UNIVERSITY on December 16, 2014tap.sagepub.comDownloaded from

Page 16: What measurement is all about

482 Theory & Psychology 22(4)

triangoli, cerchi, ed altre figure geometriche, senza i quali mezi è impossibile a intenderne umanamente parola; senza questi è un aggirarsi vanamente per un’oscuro laberinto.

2. See Schönemann (1994, p. 152). 3. See also the reference to Carnap in Michell (2000, p. 15). 4. The definition just given describes an analogy between empirical relations, on the one hand,

and formal relations, on the other. Thus the formal structure reflects or represents the empirical one. Mathematically speaking, a scale is a homomorphism. More details and elementary examples may be found in Pfanzagl (1968, pp. 6ff.).

5. See http://en.wikipedia.org/wiki/Template:Comparison_of_temperature_scales for a nice overview.

6. Here’s the original:

Circa il qual particolare, io voglio solamente rappresentare a V. E. quello che egli sa molto meglio di me, et è che noi non doviamo desiderare che la natura si accomodi a quello che parrebbe meglio disposto et ordinato a noi, ma conviene che noi accomodiamo l’intelletto nostro a quello che ella ha fatto, sicuri tale esser l’ottimo et non altro; e perchè ella si è compiaciuta di far muover le stelle erranti circa centri diversi, possiamo esser sicuri che simile costitutione sia perfettissima et ammirabile, et che l’altra sarebbe priva d’ogni eleganza, incongrua e puerile.

7. My emphasis. Usually, the maxim is attributed to Galileo, but he may never have said it (see Kleinert, 1988).

8. See Hood (2009) for many more facets of the concept. 9. A close cousin of the classical test model X=T+E (e.g., Borsboom, 2005, p.14).10. Notice that Tukey’s badmandment is very similar.11. See, e.g., Hood (2009, p. 462).12. See also Bennett (1990, p. 58) and R.A. Fisher (1922).13. See Peierls (1960, p. 186).14. See also Barrett (2008, p. 80).15. Also see Estes (1975, p. 273), Guttman (1981, p. 57), and Schönemann (1994, pp. 150, 155).16. See Koch (1992) for details.17. Hand cites Dingle (1950), who elaborates this further.

References

Anderson, P.W. (1972). More is different: Broken symmetry and the nature of the hierarchical structure of science. Science, 177(4047), 393–396.

Barrett, P. (2008). The consequence of sustaining a pathology: Scientific stagnation [Peer com-mentary on the paper “Is psychometrics a pathological science?” by J. Michell]. Measurement, 6, 78–123.

Bennett, J.H. (1990). Statistical inference and analysis: Selected correspondence of R.A. Fisher. Oxford, UK: Clarendon Press.

Boring, E.G. (1920). The logic of the normal law of error in mental measurement. American Journal of Psychology, 31, 1–33.

Borsboom, D. (2005). Measuring the mind: Conceptual issues in contemporary psychometrics. Cambridge, UK: Cambridge University Press.

Borsboom, D., & Mellenbergh, G.J. (2004). Why psychometrics is not pathological: A comment on Michell. Theory & Psychology, 14, 105–120.

Bridgman, P.W. (1927). The logic of modern physics. New York, NY: Macmillan.Campbell, N.R. (1920). Physics: The elements. Cambridge, UK: Cambridge University Press.

at TEXAS SOUTHERN UNIVERSITY on December 16, 2014tap.sagepub.comDownloaded from

Page 17: What measurement is all about

Saint-Mont 483

Campbell, N.R. (1928). An account of the principles of measurement and calculation. London, UK: Longmans, Green.

Campbell, N.R. (1953). What is science? New York, NY: Dover Reprint. (Original work published 1921)

Cohen, P., Cohen, J., Aiken, L.S., & West, S.G. (1999). The problem of units and the circumstance for POMP. Multivariate Behavioral Research, 34, 315–346.

Cronbach, L.J., Rajaratnam, N., & Gleser, G.C. (1963). Theory of generalizability: A liberalization of reliability theory. British Journal of Statistical Psychology, 16, 137–163.

Dingle, H. (1950). A theory of measurement. British Journal for the Philosophy of Science, 1, 5–26.Duncan, O.D. (1984). Notes on social measurement: Historical and critical. New York, NY:

Russell Sage Foundation.Estes, W.K. (1975). Some targets for mathematical psychology. Journal of Mathematical

Psychology, 12, 263–282.Feldman Barrett, L. (2009). Understanding the mind by measuring the brain. Lessons from meas-

uring behavior [Peer commentary on the paper “Puzzlingly high correlations in fMRI studies of emotion, personality, and social cognition” by E.H. Vul, P. Winkielman, & H. Pashler]. Perspectives on Psychological Science, 4, 314–318.

Feynman, R.P. (1997). Surely you’re joking, Mr. Feynman: Adventures of a curious character. London, UK: W.W. Norton.

Fisher, R.A. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London, Ser. A, 222, 309–368.

Fisher, R.A. (1973). Statistical methods and scientific inference (3rd ed.). New York, NY: Hafner Publishing Company.

Fisher, W. P. (2003). Mathematics, measurement, metaphor and metaphysics II: Accounting for Galileo’s “fateful omission”. Theory & Psychology, 13, 791-828.

Galilei, G. (1612). Galileo a Federico Cesi in Roma [Letter to Federico Cesi]. In A. Favaro (Ed.), Galileo Galilei: Le Opere, Edizione nazionale, Florenz (1890-1909) [Galileo Galilei: The works, national edition, Florence (1890–1909)] (sec. 716, pp. 285-286). Retrieved from http://www.liberliber.it/biblioteca/g/ galilei/le_opere_volume_xi_carteggio_1611_1613/pdf/le_ope_p.pdf

Galilei, G. (1623). Il Saggiatore [The assay balance]. In A. Favaro (Ed.), Galileo Galilei: Le Opere, Edizione nazionale, Florenz (1890-1909) [Galileo Galilei: The works, national edition, Florence (1890–1909)]. Retrieved from http://www.liberliber.it/biblioteca/g/galilei/il_saggiatore/pdf/il_sag_p.pdf

Gelman, A., Carlin, J.B., Stern, H.S., & Rubin, D. B. (2004). Bayesian data analysis. Boca Raton, FL: CRC Press.

Gigerenzer, G. (2004). Mindless statistics. The Journal of Socio-Economics, 33, 587–606.Guttman, L. (1981). What is not what in theory construction. In I. Borg (Ed.), Multidimensional

data representations: When and why (pp. 47–64). Ann Arbor, MI: Mathesis.Hand, D.J. (1996). Statistics and the theory of measurement. Journal of the Royal Statistical

Society, Ser. A, 159, 445–492.Hood, S.B. (2009). Validity in psychological testing. Theory & Psychology, 19, 451–473.Hoshmand, L.T. (2003). Can lessons of history and logical analysis ensure progress in psychologi-

cal science? Theory & Psychology, 13, 39–44.Huffman, C.A. (1999). The Pythagorean tradition. In A.A. Long (Ed.) The Cambridge companion

to early Greek philosophy (pp. 66–97). Cambridge, UK: Cambridge Univ. Press.Jaynes, E.T. (1976). Confidence intervals vs. Bayesian intervals. In W.L. Harper & C.A. Hooker

(Eds.), Foundations of probability theory, statistical inference, and statistical theories of sci-ence (pp. 175–257). Dordrecht, The Netherlands: Reidel.

at TEXAS SOUTHERN UNIVERSITY on December 16, 2014tap.sagepub.comDownloaded from

Page 18: What measurement is all about

484 Theory & Psychology 22(4)

Jaynes, E.T. (2003). Probability theory: The logic of science. Cambridge, UK: Cambridge University Press.

Jeffreys, H. (1973). Scientific inference (3rd ed.). Cambridge, UK: Cambridge University Press.Jones, L.V. (Ed.). (1986a). The collected works of J.W. Tukey: Vol. 3. Philosophy and principles

of data analysis: 1949–1964. London, UK: Chapman & Hall.Jones, L.V. (Ed.). (1986b). The collected works of J. W. Tukey: Vol. 4. Philosophy and principles

of data analysis: 1965–1986. London, UK: Chapman & Hall.Kelley, T.L. (1927). Interpretation of educational measurements. New York, NY: Macmillan.Kelvin, W.T. (1891). Popular lectures and addresses (Vol. 1). London, UK: Macmillan.Kleinert, A. (1988). “Messen, was messbar ist”: Über ein angebliches Galilei-Zitat [“Measuring what

can be measured”: A quotation attributed to Galileo]. Berichte zur Wissenschaftsgeschichte, 11, 253–255.

Koch, S. (1992). Psychology’s Bridgman vs. Bridgman’s Bridgman: An essay in reconstruction. Theory & Psychology, 2, 261–290.

Krantz, D.H., Luce, R.D., Suppes, P., & Tversky, A. (1971). Foundations of measurement (Vol. 1). New York, NY: Academic Press.

Kyngdon, A. (2008). Conjoint measurement, error and the Rasch model. Theory & Psychology, 18, 125–131.

Laughlin, R.B. (2005). A different universe: Reinventing physics from the bottom down. New York, NY: Basic Books.

Luce, R. (1959). On the possible psychophysical laws. Psychological Review, 66, 81–95.Martin, J. (2003). Positivism, quantification and the phenomena of psychology. Theory &

Psychology, 13, 33–38.Meehl, P.E. (1967). Theory-testing in psychology and physics: A methodological paradox.

Philosophy of Science, 34, 103–115.Meehl, P.E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow

progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806–834.Meehl, P.E. (1990). Apraising and amending theories: The strategy of Lakatosian defence and two

principles that warrant it. Psychological Inquiry, 1, 108–141.Menger, K. (2007). Calculus: A modern approach. Ginn, IL: Dover. (Original work published

1955)Michell, J. (1999). Measurement in psychology: A critical history of a methodological concept.

Cambridge, UK: Cambridge University Press.Michell, J. (2000). Normal science, pathological science and psychometrics. Theory & Psychology,

10, 639–667.Michell, J. (2003a). The quantitative imperative: Positivism, naïve realism and the place of qualita-

tive methods in psychology. Theory & Psychology, 13, 5–31.Michell, J. (2003b). Pragmatism, positivism, and the quantitative imperative. Theory & Psychology,

13, 45–52.Michell, J. (2004). Item response models, pathological science and the shape of error: Reply to

Borsboom and Mellenbergh. Theory & Psychology, 14, 121–129.Michell, J. (2005). The meaning of the quantitative imperative: A response to Niaz. Theory &

Psychology, 15, 257–263.Michell, J. (2008a). Conjoint measurement and the Rasch paradox: A response to Kyngdon.

Theory & Psychology, 18, 119–124.Michell, J. (2008b). Is psychometrics pathological science? Measurement, 6, 7–24.Narens, L. (2002). Theories of meaningfulness. London, UK: Erlbaum.Niaz, M. (2005). The quantitative imperative vs. the imperative of presuppositions. Theory &

Psychology, 15, 247–256.

at TEXAS SOUTHERN UNIVERSITY on December 16, 2014tap.sagepub.comDownloaded from

Page 19: What measurement is all about

Saint-Mont 485

Oakes, M. (1986). Statistical inference: A commentary for the social and behavioral sciences. New York, NY: Wiley.

Peierls, R.E. (1960). Wolfgang Ernst Pauli. 1900–1958. Biographical Memoirs of Fellows of the Royal Society, 5, 174–192.

Pfanzagl, J. (1968). Theory of measurement. Würzburg, Germany: Physica Verlag.Popper, K.R. (1959). The logic of scientific discovery. New York, NY: Basic Books.Rubin, D.B. (2006). Matched sampling for causal effects. Cambridge, UK: Cambridge University

Press.Saint-Mont, U. (2011). Statistik im Forschungsprozess: Eine Philosophie der Statistik als Baustein

einer integrativen Wissenschaftstheorie [Statistics in the process of research: A philosophy of statistics meant as a building block for an integrative philosophy of science]. Heidelberg, Germany: Springer.

Salsburg, D.S. (1985). The religion of statistics as practiced in medical journals. The American Statistician, 39, 220–223.

Schönemann, P.H. (1994). Measurement: The reasonable ineffectiveness of mathematics in the social sciences. In I. Borg, & P. Mohler (Eds.), Trends and perspectives in empirical social research (pp. 149–160). Berlin, Germany: Walter de Gruyter.

Sedlmeier, P. (1996). Jenseits des Signifikanztest-Rituals: Ergänzungen und Alternativen [Beyond the ritual of significance testing: Alternative and supplementary methods]. Methods of Psychological Research Online, 1(4), 41–63.

Stevens, S.S. (1951). Mathematics, measurement and psychophysics. In S.S. Stevens (Ed.), Handbook of experimental psychology (pp. 1–49). New York, NY: Wiley.

Suppes, P., & Zinnes, J.L. (1968). Basic measurement theory. In R.D. Luce, R.R. Bush, & E. Galanter (Eds.), Handbook of mathematical psychology: Vol. 1 (pp. 3–76). New York, NY: Wiley.

Thorndike, E.L. (1918). The nature, purposes, and general methods of measurements of educa-tional products. In G.M. Wipple (Ed.), Seventeenth yearbook of the national society for the study of education: Vol. 2 (pp. 16–24). Bloomington, IL: Public School Publishing.

Trendler, G. (2009). Measurement theory, psychology and the revolution that cannot happen. Theory & Psychology, 19, 579–599.

Tukey, J.W. (1986). Data analysis and behavioral science or learning to bear the quantitative man’s burden by shunning badmandments. In L.V. Jones (Ed.), The collected works of J.W. Tukey: Vol. III. Philosophy and principles of data analysis: 1949–1964 (pp. 187–390). London, UK: Chapman & Hall.

Tukey, J.W. (1991). The philosophy of multiple comparisons. Statistical Science, 6, 100-116.Velleman, P.F., & Wilkinson, L. (1993). Nominal, ordinal, interval, and ratio typologies are mis-

leading. The American Statistician, 47, 65–72.Wigner, E. (1949). Invariance in physical theory. Proceedings of the American Philosophical

Society, 93, 521–526.Zuccato, E., Chiabrando, C., Castiglioni, S., Calamari, D., Bagnati, R., Schiarea, S., & Fanelli,

R. (2005). Cocaine in surface waters: A new evidence-based tool to monitor community drug abuse. Environmental Health, 4(14). doi: 10.1186/1476-069X-4-14. Retrieved from http://www.ehjournal.net/content/4/1/14

Uwe Saint-Mont is Professor of Statistics and Computer Sciences at the University of Applied Sciences, Nordhausen. His research interests include statistics, psychology, and the philosophy of science. Address: Fachbereich Wirtschafts- und Sozialwissenschaften, Fachhochschule Nordhausen, Weinberghof 4, 99734 Nordhausen, Germany. Email: [email protected]

at TEXAS SOUTHERN UNIVERSITY on December 16, 2014tap.sagepub.comDownloaded from