Temeljna Mjerenja Za Procjenu Ishoda

Temeljna mjerenja za procjenu ishoda

UNIVERZITET U SARAJEVUFAKULTET SPORTA I TJELESNOG ODGOJA

Predmet: MjerenjeTema: Temeljna mjerenja za procjenu ishoda

ProfesorDoc. Dr. Kazazovi Elvir Student: Alihoda Muamer

Sarajevo 2011

Temeljna mjerenja za procjenu ishodaSve mjere su brojevi.No, nisu svi brojevi su mjere.Neki brojevi, kao to su auto licence, samo su oznake.Drugi, poput Soba # 215, naziv mjesta.On svibanj pomoi nam pronai Soba # 215.Ali to nam ne govori koliko je koraka potrebno da se tamo.Ostali brojevi dolaze kao grofovi, rezultate i redova, kao to je 10 bolnikih dana, 17 PECSskala bodova ocjena i 3. mjesto.Budui da smo dobili te brojeve neposredno opaanje, oni su realnost da se vidi da se dogodi.Kao rezultat ove vrste brojeva su esto u zabludi za mjere i tada zloupotrebljava u statistike analize za koje su dosta nezgodan.Grofovi, rezultate i zauzima ne kvalificiraju kao mjerenja jer je vrlo "stvarnosti" od betonskih blokova rauna jami da je njihova vanost za na razlog za njih brojanje ne moe biti jedinstvena.Neki "dani" vrijedi vie od drugih.Koliko daleko je od 1. do 2. mjesto?Konano, tu su brojevi koji zasluuju da ih se naziva "mjerama".To su grofovi apstraktnih generikih jedinica, kao i 3000 dolara, 99 stupnjeva i 50 funti.Svaka jedinica je savrena ideja koja se moe aproksimirati u praksi.No, njegova dobro-dovoljno-za-praksa aproksimacije je ono to stvara vrstu brojeva koji ine rad fizike znanstvenici, inenjeri, arhitekti, gospodarstvenici, krojai kuhari i koristan i produktivan.Genuine "mjere", a ne samo "bodovi su vrste brojeva trebamo za ishod ocjenjivanja.Kada koristite brojeve za usporedbu ishoda, radimo aritmetika s njima.Mi smo dodali i podijeliti za raunanje znai.Mi oduzeti usporediti alternative i mjera poboljanja.Mi izraunati stope promjene za procjenu trokova i vagati uinci izloenosti i lijeenja.Aritmetika uinio s numerikim oznakama ini gluposti.Kao to smo nastaviti vidjet emo da aritmetika radi s takama, rezultate i zauzima takoer moe biti tragino zabludu.Mi moramo nauiti kako da prevazie ovaj problem je rutinska primjena u jednostavnim matematikim modelom koji gradi apstraktnih linearnih mjera iz konkretnih sirove podatke za koje broji, rezultate i redovima su mediji.Za aritmetika statistike analize biti koristan, to mora biti uinjeno s jednakim intervalima, stalna jedinica, linearna mjera.Nita drugo e uiniti!Koji Brojevi su mjere?Kako moemo rei razliku izmeu brojeva koje su mjere i brojeve koji nisu?Mjere su vrste brojeva koji surauju s aritmetika, moe se dodati i oduzeti, mnoi i dijeli.Nitko ne koristi brojeve kao oznaka, kao to su registarske tablice, bi pomislio da radi aritmetiku s njima, osim kao ala.Druga uporaba brojeva kao grofovi, rezultate i redova, meutim, ne iskuavaj mnogo uiniti aritmetika.Ocjene obiljeiti nalog rasta.Tu je obino malo sumnje da je ocjena od dvije je znailo da znai vie od jednog posjetitelja i takoer manje nego ocjenu tri.Pitanje bez odgovora za ocjene nije to je vie, ali, s toke gledita zapravo radi aritmetike: Koliko jo?Koliko manje?Mi svibanj biti u mogunosti da vjeruju u poredak od zvjezdica.Uistinu, najbolji od naih ulaznih sirovina su podaci vie od ocjene.Ali, ako smo realno o situaciji, moramo priznati da nae ideje o tome razmak izmeu ocjene su nejasne.Mi svibanj osjeati sigurni da je ocjena od dva je vie od jednog posjetitelja.Ali mi ne znamo koliko vie.Kao to smo da je ovo vie smo duni da shvate da, iako ocjene odrediti poredak, oni nemaju numeriki svojstva potrebno sluiti aritmetika.Oduzimanja od ocjene su besmislene.Tako su sume ocjene, a time i prosjeci i standardne devijacije za ocjene.Druga vrsta promatrana broja, sirovi rezultat, esto se zloupotrebljava kao da je mjera, jer to je broj koji iznose ocjene.No, ne raunajui da sirovo rezultate bilo bolje za aritmetika analize koje broje ocjene.Kad smo dodali sirovi rezultate smo prebrojavanja "pravo" odgovore.No, nema razloga uope pretpostaviti da prave odgovore broje su svi iste veliine.Ova realizacija je daleko od nedavno.U 1904 je osniva obrazovnih mjerenja, ameriki psiholog Edward Thorndike, primijetio:Ako netko pokua mjeriti ak i tako jednostavna stvar kao pravopisa, jedan je oteana injenicom da ne postoje jedinice u kojoj se mjeri.Jedan proizvoljno moe se popis rijei i promatrati sposobnost prema broju ispravno napisane.Ali ako netko istrauje popis jedan je pogoen nejednakost jedinica.Svi rezultati temelje na jednakosti svih jedne rijei s bilo kojim drugim nuno netona.(1904, p.7)Thorndike vidio neizbjena dvosmislenost u brojanje konkretnim dogaajima, no indikativno oni mogli initi.Moglo bi se promatrati znakove pravopis sposobnosti.No, ne bi mjeriti pravopisa, jo nije (Engelhard 1984, 1991, 1994).Problem to raunati, da je, subjekt dvosmislenost, prisutan je u znanost, trgovinu i kuhanje.to, dolaze da mislim o tome, je jabuka?Koliko malo jabuka napraviti jedan veliki?Kako trula jabuka moe biti i jo uvijek dobiti broje?Zato ne tri jabuke uvijek troak isti iznos?Uz jabuke, moemo rijeiti subjekt nesporazuma odricanje konkretne jabuka grof i okree se, umjesto toga, do apstraktnih volumena jabuka ili, jo bolje, jabuka teina (Wright 1992, 1994).Ova dvosmislenost kao na vrijednost onoga to se broji ini sirovo rezultate prikladni za aritmetiku toliko pogreno uinjeno s njima.Kakav brojeva, tada, su dovoljno dobro da ne aritmetika s?Samo brojevi koji priblina ideja o savreno jedinstvene jedinica koja ini razliku od jednog izmeu bilo koja dva broja znai uvijek tono isti iznos.Tek tada moemo uiniti aritmetiku s brojevima.Tek tada moemo smisla sredstava i standardne devijacije.Tek tada emo imati brojeve dovoljno dobro koristiti kao mjera.Ali brojevi kao da se ne pojavljuju prirodno.Njihovo savrenstvo je samo ideja, apstrakcija, trijumf mate.Za dobivanje stvarne vrijednosti za takve brojeve moramo izmisliti ureaje koji proizvode dobre dovoljno aproksimacije nae "savrene" jedinice biti korisne.Mjerilo je kanonski primjer.Mi smo sadraj koritenje ina oznake na veini mjerila, kao da su savreno bili rasporeeni na tono jedan inch apart.Znamo, naravno, da su mi se gledati usko bismo mogli dokazati izvan svake sumnje da ina oznake na bilo koji odreeni kriterij neznatno varirati u svojim razmak.Da li mi onda napustiti arin?Apsolutno ne!Sve to treba je da se palac znak razmaka priblinu ujednaenost dovoljno dobro da moe zadrati mjerilo koristan kao ureaj za mjerenje duljine, recimo do najblie ina.Mnogi drutveni znanstvenici su zbunjeni o tome je razlika izmeu ocjene, sirovi rezultate i mjere.Oni pogreku redni ocjene i rezultate mjera i pokuati razumjeti njihove podatke za linearne analize tih redni vrijednosti.Njihovi rezultati su neizbjeno dvosmislena na proizvoljne naine koji su ostali nepoznati tim istraivaa.Malo udo da ima toliko konfuzije i proturjeja i tako malo napretka u suvremene drutvene znanosti.Ishod mjerenja ne mogu priutiti da pate ovaj lud pogreku.Pogreka je u potpunosti nepotrebno.Tu je jednostavan, uinkovit i lako se primjenjuje metoda za gradnju dobre aproksimacije od apstraktnih mjera iz betona redni opaanja kao sirovina rezultate i ocjene.Ovo poglavlje raspravlja o nezaobilazni nejasnoe u prikupljanju podataka, daje razumijevanje naina na koji sirovom rezultate i Likertova skala u zabludu kada zabludi mjera, objanjava kako sirov, konkretne primjedbe kao to su ocjene mogu se koristiti za izgradnju mjerile i daje neke primjere korisne karte i bodovanje oblicima koji proizlaze iz izgradnje mjera.OcjeneZa mjerenje ishoda lijeenja, mi stopa pacijenata tipine funkcionalne neovisnosti o tipinih zadataka svakodnevnog ivota u skladu s mjerilima kao to su:1 - maksimalno ovisnih2 - Umjerena pomo3 - Minimalna pomo4 - Zahtijeva nadzor5 - Limited neovisnosti6 - funkcionalnu neovisnost7 - potpuno neovisanKada koristite ove ocjene za procjenu pacijenta, pretpostavljamo da emo:- Sudac je pacijent pravilno,b - Prema reproducirati kriterijima,C - s ocjene tono zabiljeen,D - u smislu ravnomjerno rasporeenih razinama kao 1,2,3,4,5,6,7,e - Koji dodati do bodova dobar kao mjera.Ali na pretpostavke su naivni.Nae ocjene nisu bolji od:a'-Obrazovani pogodaka,b'-Prema promjenjive osobni kriteriji,c'-ne uvijek ispravno snimila,d'-U rednim ocjene,e'-koji ne dodati do mjere.Sirova Rezultati se ne mjereThorndike nije samo svjestan "nejednakosti jedinica" broje, ali i ne-linearnosti, bilo je rezultiralo "sirovi rezultate".Sirovi rezultati su ogranieni na poetak na "nitko pravo" i na kraju na "sve u redu".No, linearni mjere namjeravamo sirove rezultati impliciraju da nemaju takve granice.The monotono poveava ogival razmjena izmeu sirove bodova i mjera prikazan je na slikama 1 i 2.Horizontalne x-osi u ovim brojkama je definirana linearna mjera.Njihova vertikalne osi y-definirani su sirovi rezultat postotaka na slici 1 i sirovi bodove na slici 2.Slika 1Ekstremni Sirova Rezultati su pristran protiv Mjere

Slika 2Mjera promjene mogu preokrenuti Raw implikacije ocjenu

Razmjena jo jedan pravi odgovor za mjeru prirast je najotriji gdje stavke su gusti, obino prema sredini testa blizu 50% u pravu.U krajnosti od 0% i 100% u pravu, meutim, razmjena postaje stan.To znai da za simetrian skup predmet tekoe jo jedan pravi odgovor podrazumijeva najmanje mjere prirast blizu 50%, ali beskonani prirast u svakoj ekstremnim.Zamislite situaciju u kojoj se mjeri ishod lijeenja ste nehotice koristiti set pitanja koja su previe lako rezultat na, ak i prije poetka lijeenja.Ovo nerealizirani mistargeting e uiniti ak i najuinkovitije lijeenje pojavljuju nedjelotvornim.Isto e se dogoditi kada se set pitanja su previe teko rezultat na.Doista, oito uinkovitost lijeenja u sirovom bodova ovisit e iskljuivo o tome kako vaa pitanja su usmjerene na uzorku pacijenata vam se dogoditi da se ispita.Veliina sirove rezultat poveanja u produkciji mjera razlike izmeu bolesnika i izmeu tretmana je u potpunosti u skladu s ciljanje test.Slika 1 prikazuje tipian sirovi rezultat nacrtane protiv mjera krivulje.Ova krivulja opisuje nelinearnu formular za ocjenjivanje sirovina rezultat (u ispravne posto) na linearne mjere odnos funkcionalne sposobnosti mjeru.Uoite da horizontalne mjere udaljenost izmeu vertikalne bodova od 88% i 98% jepet putavea od udaljenosti izmeu rezultata od 45% i 55%, iako sirovi rezultat razlike su ekvivalent 10%.Jesu li mi za procjenu relativne uinkovitosti alternativnih tretmana i B u smislu njihove sirove bodova, sa sreditem na mjeru od 0 logits i B centrirano na 4,4 logits na x-osi, mi bi se dovesti u zabludu u zakljuujui da su dvije tretmani su jednako uinkovit, iako mjere njihove ocjene ukazuju pokazuju da je lijeenje B je, zapravo, pet puta uinkovitiji od lijeenja A.Ima li bilo koji nain moemo potvrditi da je sirovi rezultat pristranosti odnos implicira na slici 1 je stvarno pet?Da biste vidjeli da je "pet" govori o pravu, zamislite primjeni novi set pitanja koja se mjeri du iste varijable, ali su tee rezultat na.Zamislite krivulja ovog tee test mora biti slini oblik krivulje na slici 1, ali pomaknuo se 4,4 logits na pravo, tako da je usmjeren na 2,8 logit mjera promjenu lijeenja B. Sada, kada smo koristiti novi krivulje pogledati y-osi posto presree pravo na x-osi mjere na 3 i 5,8 logits, vidimo da oni impliciraju sirovi rezultat poveanje od 25% do 75% toan, umjesto 45% do 55%.To je promjena od 50%, to je uistinu pet puta vei u odnosu na prethodnu promjenu od 10%.Dakle, kad smo usporediti sirove rezultat promjene preispitivanju svake izmjene s test usmjeren na regiju koja promjena, onda smo dobili sirovi rezultat omjera koji nalikuju mjera odnosi se eksplicitno na slici 1 za promjenu bilo gdje uz bilo koji test.To pokazuje kako linearnih mjera, koje mogu biti izgraen od sirovog bodova, ispraviti sirovi rezultat pristranosti protiv off ciljati promjene i tako nas tite od toga da bude zaveden od strane nezaobilazne hirovima mistargeting.Slici 2, koja prikazuje dva testa krivulje razmaknuti oko 20 linearna jedinica osim, ilustrira situaciju u kojoj oito prednost terapijskog efekta na grupe od A1 do A2 od 4 sirovina bodove tijekom lijeenja uinak na Grupa B od B1 s B2 i samo 2 sirovo bodove je, u stvari, obrnuto kada su ti isti sirovi rezultati se pretvaraju u izjednauje linearnih mjera.Sada je prednost je sa est linearnih jedinica promjenu u skupini B u samo dva linearna jedinica promjena iz skupine A.[Slika 2]Ovo tumaenje je potvrena kada smo ispitati drugi isprekidane test krivulja centrirano na 20 linearnim jedinicama s desne strane vrste test krivulja centrirano na nulu.Sada na ovom "tee" testa vidimo da je ciljani sirovi rezultat poveanja za grupe B u 12 toaka je tri puta ciljano sirovi rezultat poveanja za grupe od samo 4 boda.Ovaj primjer pokazuje da, kada smo se osloniti na rezultate sirove sama suditi relativnu veliinu ishod promjena koje se odvijaju na razliitim udaljenostima od sredita testa krivulje mi se dogoditi da se rad s, moemo doi do zakljuaka kako zabludi da su preokrenuti naih nalaza .Moemo zakljuiti suprotno od onoga to naim podacima, kada je pretvorena iz redni sirovog rezultate na linearne mere, ine ravnice.Tablica 1 dokumenata veliine sirove rezultat pristranosti protiv ekstremne mjere za test normalno distribuirane predmet potekoe i test ravnomjerno rasporeen predmet potekoe.U tablici se koristi odnosi usporediti mjere poveanja odgovara na jo jednu rejting korak u uz najvei ekstremnim korak s mjerom poveanja odgovara jo jedan korak na najmanji sredinje korak.Tablica 1Mjera prirast Omjerijo jedan pravi odgovor nanajveih i najmanjih ocjena korakaBrojkoraka *NormalanTestJedinstvenimTest

102.03.5

254.64.5

508.96.0

10017.68.0

* Npr. 7 kategorija bodovnoj ljestvici zalihe 6 koraka po stavku.13 takvih predmeta proizvesti 6 x 13 = 78 koraka.Ovi prorauni su objanjene na stranicama 143-151 odnajbolji test dizajn(Wright & Stone 1979).Omjer za normalan test L stavki:loge{2 (L-1) / (L-2)} / loge{(L 2) / (L-2)}

Mi moemo vidjeti u Tablici 1, koja ak i kada su stavke irenje u uniformi koracima toke teine, sirovo rezultat pristranosti protiv mjera koracima u krajnosti moe lako biti faktor 5 ili vie.Kada stavke klaster u sredini testa, obino sluaj, onda moe doi do pristranosti faktor 10 ili vie.Trebamo li se dogoditi da istraivanja usporedbe ishoda lijeenja, gdje je lijeenje efekti su centriran na razliitim razinama funkcionalnu neovisnost, manje uinkovito lijeenje lako bi se mogao pojaviti pet puta bolje nego uinkovitiji tretman samo zato to ti uinci tretmana su ciljani drugaije testa mi se dogodilo koristiti.Sirova rezultat pristranosti nije ograniena na dihotomna odgovora.Zbog utjecaja dodatnih unutar stavke korake, pristranost je jo tee za djelomino kreditima, ljestvice i, naravno, famozna Likertovoj ljestvici, zloporabe koje su potisnuli Thurstone's sjemeni 1920-djelo o tome kako transformirati sirovi rezultate na linearne mjere izvan upotrebe.Ovi primjeri sirove bodova pristranost u korist sredinje bodova i protiv ekstremne rezultate, pokazuju da su sirovi rezultati su meta pristran i uzorak ovisan (Wright & Stone 1979, Wright & Masters 1982, Wright i Linacre 1989).Svaka statistika metoda kao to su linearne regresije, analiza varijance, generalizabilnosti, LISREL ili faktorska analiza da zloupotrijebi nelinearno sirovi rezultate ili Likertova ljestvica kao da su linearne mjere e imati svoj izlaz sustavno iskrivljene ovim pristranosti.Kao sirovina rezultate na kojima se temelje, svi rezultati e biti meta pristran i uzorka ovisi, a time i inferentially dvosmislena.Malo udo da toliko tzv drutvena znanost nije nita vie od prolazne opis nikada ne bude reencountered situacijama lako biti u suprotnosti s gotovo bilo koji odgovor.Oigledan i jednostavan za praksu (Wright & Linacre 1997, Linacre & Wright 1997) zakon mjerenja je da:Prije nanoenja linearne statistike metode za beton sirovih podataka, prvo moramo koristiti model mjerenja za izgradnju, iz promatranog neobraenih podataka, apstraktni uzorak i ispitivanje bez linearne mjere.Tu su i dvije dodatne prednosti dobivenih modelom upravljani linearizacije koji su odluujui za uspjeno znanstveno istraivanje.Svaki mjerenja model-procijenjene mjera i umjeravanje sada su u pratnji realne procjenepreciznostii srednja kvadratna vrijednost ostatka-od-oekivanja procjene u kojoj mjeri svoje podatke obrasce stane mjernog modela, odnosno njihove statistikevaljanosti.Kada smo zatim nastavite s plotanje rezultata i primjenom linearne statistike za prouavanje odnosa izmeu mjera, ne samo linearne mjera za rad s, ali isto tako znam svoje preciznosti i valjanosti.Tablica 2 prikazuje razlike izmeu betona rednim sirove rezultate i apstraktne interval linearnih mjera.Aditivnostije prva razlika.Bez aditivnosti ne moe koristiti i obinu aritmetiku za analizu one rezultate.Ne moe se primijeniti uobiajene linearne statistike analize varijance ili regresijske bez pretrpjela nerjeiv nejasnoa uzrokovanih sirovina rezultat pristranosti protiv off-target mjere.Kontinuitetje drugi razlika.Sirovi rezultati su prisiljeni da se razlikuju s prekidima u cijeli broj koraka koje odgovaraju jo jedan i jedan manje promatranje.Frakcije su neuoljiv.Mjere, s druge strane, kao apstraktne prikaze teorijskih konstrukata su kontinuirano u naoj mati, a tako iu naim matematikim analizama.U praksi, naravno, moemo procijeniti vrijednosti za nae mjere samo su sitnozrno to moemo graditi mjernih ureaja na njima pribline.Ali to ini diskontinuitet u potpunosti pitanje instrumentacije inenjering.Naa ideja naih mjera ostati kontinuirano.Statusse odnosi na neumoljiva stvarnost da se rezultati sirove ogranien da se nita vie od konanih primjera onoga to smo traili i time zauvijek nepotpuna.Mjere, s druge strane, su, u naem stohastiki koncepciju od njih, kompletan ideje.Kontrolase odnosi na mogunost kada se radi s mjerama procijenjeni od sirovog rezultate usporediti promatranja vrijednosti oekuje od mjera naeg mjernog modela s promatranjima zapravo dobiti.Ovo omoguava kontinuiranu, on-line nadzor empirijske valjanosti nae teorijske mjere.Openitostproizlazi iz tumaenja naih sirovih podataka kao primjer trajne stohastiki proces koji je usko upravljaju conjointly procjenljiv mjerenja parametara.To je ono to nam omoguava da se inferencijalne korak od konanog konkretnoj situaciji-vezan iskustvo beskonano reproducirati saetak situacija-oslobodila ideja.Tablica 2Sirovi rezultati su mjere, koje neIGRANIBETON ORDINAL RAW RezultatiSAETAK LINEAR MJERE

Aditivnosti:non-aditivaditiv

ne-linearnilinearan

savijenravno

KONTINUITET:diskretnistalan

pijanglatko

STATUS:nepotpunkompletan

sirovrafiniran

KONTROLA:bez nadzorapod nadzorom

unvalidatedpotvrdio

divljipripitomljen

OPENITOSTI:lokalneopi

betonsaetak

irreproducibleizvodljiv

test-vezanitest-free

Povijest temeljnih mjerenjeIzloene ope ideje saeti su u tablici 2 nas uvui u znanstvenu povijest temeljnih mjerenja.Povezan u lanacGodine 1920 engleski fiziar Norman Campbell zakljuio da je "temeljna" mjerenje (na kojima se uspjeh temelji fizike) potrebna, barem po analogiji, mogunost fizikog spajanja, kao i spajanje kraja dri spojite duljine ili piling cigle spojite teinu.DovoljnostGodine 1920 engleski matematiar Ronald Fisher, dok je njegov razvoj "vjerojatnosti" verziju inverznu vjerojatnost da se konstruirati procjena maksimalne vjerojatnosti, otkrio statistika tako "dovoljno" da je iscrpio sve informacije u vezi s njezinim uzoru parametara iz podataka u ruci.Statistika koja iscrpiti sve po uzoru informacije omoguuju uvjetno formulacija po kojoj vrijednost za svaki parametar moe se procijeniti neovisno od svih ostalih parametara u modelu.Ova potreba za izgradnju osnovnih mjerenja slijedi zbog prisutnosti parametra u modelu moe se zamijeniti po dovoljna statistika.Fisher's dostatnosti omoguava neovisna procjena parametara za modele koji ukljuuju vie razliitih parametara (Andersen 1977).Ovo dovodi do drugog zakona mjerenja:Kada je mjerni model koristi parametre za koje ne postoje dovoljni podaci, da se model ne moe graditi korisno mjerenja jer se ne moe procijeniti svoje parametre neovisno jedan od drugoga.DjeljivostGodine 1924 francuski matematiar Paul Levy (1937) pokazao da je izgradnja stabilne inferentially zakon potrebne beskonano djeljiv parametara.Levy je djeljivost je logarithmically ekvivalent Zdruenog aditivnosti (Luce & Tukey 1964) koji mi sada prepoznati kao matematiki generalizacije ulanavanje Campbell potrebne za mjerenje temeljne.Levy zakljuci su reenforced u 1932 kada je ruski matematiar ANKolmogorov (1950 pp.9 & 57) pokazala je da nezavisnost i procjena parametara potrebnih djeljivost, ovaj put u obliku aditiva razgradnje.ThurstoneIzmeu 1925 i 1932 amerikih inenjer elektrotehnike Louis Thurstone objavio 24 lanaka i knjiga na izgradnji psiholokih mjera i razvio matematike metode koje se pribliila zadovoljavanju svaki mjerenje zahtjev koji Thurstone bio svjestan.Unidimensionality:Mjerenje bilo koji objekt ili entitet opisuje samo jedan atribut objekta mjeri.To je univerzalna karakteristika svih mjerenja.(Thurstone 1931, p.257)Linearnost:Sama ideja mjerenja podrazumijeva linearni kontinuum neke vrste kao to su duljina, cijena, volumen, teinu, dob.Kada se ideja mjerenja se primjenjuju na kolski uspjeh, na primjer, potrebno je prisiliti kvalitativne promjene u skolastikoj linearno mjerilo neke vrste.(Thurstone & chave 1929, str.11)Apstrakcija:Linearni kontinuum koji je impliciran u svim mjerenja je uvijek apstrakcija ... Tu je i popularna zabluda da mjerna jedinica je stvar - kao to je komad arin.To nije tako.Mjerna jedinica je uvijek proces neke vrste koje se mogu ponoviti bez izmjena u razliitim dijelovima mjerenja kontinuum.(Thurstone 1931, p.257)Uzorak bez kalibracije:Mjerilo mora nadii grupu mjeri.Jedan kljuni test se mora primijeniti na na nain mjerenja stavova prije nego to se moe prihvatiti kao valjan.Mjerni instrument ne mora biti ozbiljno utjecati na njegovu funkciju mjerenja objekt mjerenja ... U krugu od objekata ... namijenjen, njegovu funkciju mora biti neovisno o objektu mjerenja.(Thurstone 1928, p.547)Test besplatno mjerenje:To bi trebao biti mogue izostaviti nekoliko ispitnih pitanja na razliitim stupnjevima ljestvicebez utjecaja pojedinanih bodova (mjera)...Tonebi trebao biti duan dostaviti svaki predmet na cijeli niz skale.Polazna toka i krajnja toka ...izravno ne bi trebao utjecati na pojedinca rezultat (mjera).(Thurstone 1926, p.446)Sluaj V Thurstone's Law poredbenog presude (Thurstone 1927) je temeljni mjerenje rjeenje za analizu upareni usporedbe.GuttmanGodine 1944 ameriki sociolog Louis Guttman je istaknuo da znaenje bilo sirovo bodova, ukljuujui i Likertova skala, ostati dvosmisleni ako rezultat navedene svaki odgovor u uzorak na kojem je temeljen.Ako osoba koja zalae za vie ekstremnih izjava, on bi trebao potvrditisvemanje ekstremne izjave ako su izjave smatraju ljestvici ... Mi emo pozvati skup stavkizajednikih sadrajaskali, ako [i samo ako] osoba s veim ranga od druge osobe samo kao visok ili vii nasvakustavku od druge osobe.(Guttman 1950, p.62)Prema Guttman samo podatke koje oituju takve savrene Zdruenog tranzitivnost moe proizvesti nedvosmislene mjere.Obavijest slinost u motivaciji izmeu Guttman je "skalabilnost", i Ronald Fisher's "dostatnost".Oba zahtijevaju nedvosmislen statistika mora iscrpiti informacije na koje je rekao da se odnosi.RaschGodine 1953 Danski matematiar Georg Rasch (1960) je utvrdio da je jedini nain on mogao usporediti posljednjih nastupa na razliitim testovima oralne itanje primijeniti eksponencijalnim aditivnosti za Poissonov 1837 raspodjela (Stigler 1986, pp.182-183) podataka produced by novi uzorak studenata reagirati istovremeno na oba testa.Rasch koriste Poissonova, jer je to bio jedini distribucije mogao sjetiti koji je omoguio da jednadba dva testa e biti potpuno neovisan o oigledno proizvoljne raspodjele sposobnosti itanja novog uzorka.Kao Rasch razraen njegovo matematiko rjeenje za ono to je postalo neoekivano uspjean test izjednaavanja, on je otkrio da je matematiku vjerojatnosti procesa, mjerenje model, mora biti ograniena na formule koje proizvode dovoljno statistike.Tek kad mu je bilo dovoljno parametara statistika mogao koristiti ove statistike za uklanjanje neeljenih osoba parametara iz njegova procjena jednadbi i tako dobiti procjene svojih test parametre koji su neovisno o vrijednosti ili distribucija bilo koje ostali parametri su bili na radu u mjernog modela.Rasch opis Zdruenog tranzitivnost on zahtijeva od vjerojatnosti definirane njegova modela mjerenja pokazuju da je izgradio stohastiki rjeenje za problem inae nemogue ivota do Guttman je deterministikih uvjet za postojanje koristan rejting ljestvici.Osoba koja ima veu mogunost od drugoga trebali imati veu vjerojatnost rjeavanjabilo kojestavke vrste u pitanju, i slino, jedan predmet biti tee nego jedan nain da sezasvaku osobu koja je vjerojatnost rjeavanja drugi predmet ispravno je vea jedan.(Rasch 1960, p.117)Rasch dovrava svoj model mjerenja na stranicama 117-122 njegove knjige 1960.Njegov "mjerenje funkcija" na stranici 118 navodi multiplikativne definicija osnovnih mjerenja za dihotomna opaanja kao:f (P) = b / dGdje je P vjerojatnost ispravno rjeenje.f (P) je funkcija P, jo treba utvrditi.b je omjer mjera sposobnosti osobe.I D je omjer umjeravanje predmet potekoa.Rasch objanjava ovaj model kao inverznu vjerojatnost.Model bavi vjerojatnost pravo rjeenje, koji se moe uzetikao zamiljena ishodbeskonano dugog niza suenja ... formula kae da jeu cilju da se pojmovi B i D moe se uope smatrati smislenim, f ( P), to je izvedena na neki nain od P, trebao bi biti jednak odnos izmeu B i D.(Rasch 1960, p 0,118)I, nakon istiui da normalno probit, ak i sa svojim drugi parametar postavljen na jedan, biti e previe "kompliciran" da slui kao mjerne funkcije f (P), pita:"Da li postoje takve funkcije, f (P) , da je f (P) = b / d je ispunjen?(Rasch 1960, p 0,119)Budui da"aditiva sustav... jednostavnije je od izvornog ...multiplikativne sustav."Rasch traje logaritmi:loge{f (P) = log}eb - d = logeB - Dkoji "za tehniku prednost", on izraava kao logitL = loge{P / (1-P)}Pitanje je dosegla svoj konani oblik:Da li postoji funkcija g (l) varijable L koji ini sustav aditiva u parametrima za osobu B i parametri za stavke-D takav da(Rasch 1960, pp.119-120)g (L) = B - DPitajui: "da li je mjerenje funkciju za test, ako uope i postoji, jedinstveno je odreen" Rasch dokazuje daf (P) = C {f0(P)}"Je mjerenje funkcija za sve pozitivne vrijednosti i C, ako je f0(P) je tako",koje "sadri sve mogue mjerenje funkcija koje mogu biti izgraen od f0(P). Dakle:" Do prikladan izbor dimenzije i jedinice, odnosno C i za f (P), mogue je da B i d'e varirati unutar bilo kojeg intervala pozitivnog to se moe iz nekog razloga smatra zgodan. "(Rasch 1960, p 0,121)Zbog "valjanostiseparabilnost teorem (dostatnost):Mogue je organizirati promatranja situacije na takav nain da se iz odgovora broj osoba u skup stavki u pitanje moemo izvui dva seta koliinama, raspodjele koji ovise samo o stavku parametara, i to samo na osobne parametre, respektivno.Nadalje, uvjetna distribucija cijelog skupa podataka za dane vrijednosti od dva seta veliina ne ovisi o bilo kojem od parametara."(Rasch 1960, p.122)S obzirom na separabilnost, izbor ovog modela je bio sretan.Imali smo primjer preuzela "Normal-kumulativni histogram Model" sa svim si= 1, to brojano moe biti teko razlikovati od logistiko - onda separabilnost teorem bi oborio.A isto bi se, zapravo, dogoditi za bilo koji drugi model sukladnosti koji nije jednak u smislu f (P) = C {f0(P)} da f (P) = b / d...as tie separabilnost.Mogue distribucije su ogranieni na prilino jednostavan vrste, ali dovode do radije dalekosenu generalizacije Poisson proces.(Rasch 1960, p 0,122)Do 1960 Rasch je dokazano da je formulacija u sloena Poissonova obitelji, kao to je Bernoullijeva binomna, bili su dostatni i vie iznenauje, potrebne za izgradnju stabilne mjerenja.Rasch je utvrdio da je "multiplikativne Poisson" je bio jedini matematiki rjeenje za drugi korak u izvodu, formuliranje cilja, uzorka i metode ispitivanja bez mjerenja modela.Implikacije Rasch otkria su se dugi niz godina doi do prakse (Wright 1968, 1977, 1984, Masters i Wright 1984).ak i danas postoje drutveni znanstvenici koji ne razumiju ili imaju koristi od onoga to Campbell, Levy, Kolmogorov, Fisher i Rasch su dokazali (Wright 1992).Zdrueni aditivnostiAmerikanci rade na matematike temelje za mjerenje bili svjesni Rasch postignua.Njihov rad je doao do glave uz dokaz matematikom psihologa Duncan Luce i Ivana Tukey (1964) koji Campbell je ulanavanje bio je fiziki realizacija opi matematiki pravilo koje, u njegovom formuliranju, je "" definicija temeljnih mjerenja.Oni su nazivali svoj formulacija, koja je potrebna i dovoljna za korisne mjerenje, "Zdrueni aditivnosti".Osnovni karakter ... temeljnih mjerenje opsenog koliina je opisana aksiomatizacija za usporedbu uinaka proizvoljne kombinacije "koliina" odjedne odreene vrste... Mjerenje na omjer ljestvici proizlazi iz takvih aksioma.Osnovni karakter istovremeno conjointmeasurement je opisana aksiomatizacija za usporedbu uinakaparovaformirana od dvije navedene vrste "koliina" ...Mjerenje na interval skalama koje imaju zajedniku jedinicu slijedi iz tih aksioma.Blizak odnos postoji izmeu Zdruenog mjerenja i uspostavljanja poduzetih mjera u dvosmjernu stol ... za koju je "uinci kolone" i "efekte redaka" su aditiva.Doista otkrie takvih mjera ... mogu biti pregledani kao otkria, preko zdrueno mjerenje, temeljnih mjera retka i stupca varijable.(Luce & Tukey 1964, str.1)Unato praktine prednosti takvog odgovora mjera, prigovori su podignuta na njihovoj potrazi ... aksiome simultano Zdruenog mjerenja prevladavanju tih primjedbi ... aditivnosti je jednako kao i axiomatizable ulanavanje ... u smislu aksioma koji dovode do ...interval i omjer skale.U ...ponaanja i biolokih znanosti, gdje faktori proizvodnje orderable efekata i odgovora zasluuju vie korisnih i vie osnovnih mjerenja, moralni ini jasno:ako nema prirodnih ulanavanje rad postoji, treba pokuati pronai nain za mjerenje faktora i odgovore kao to je "efekte" razliitih faktora su aditiva.(Luce & Tukey 1964, p.4)Iako, Luce i Tukey ini se da su svjesni Rasch djelo, drugi (Brogden 1977, Perline, Wright, Wainer 1979) navodi daThe Rasch model je poseban sluaj aditiva Zdruenog mjerenja ...napad Rasch model podrazumijeva da je otkaz aksiom (npr. Zdruenog tranzitivnost) e biti zadovoljni ... To onda slijedi da i osobe koje su stavke mjereno podjeljka ljestvice sa zajednikim jedinicu.(Brogden 1977, p.633)Modela mjerenjaNai podaci nam dolazi u obliku nominalne odgovor kategorije kao to su:da / ne, prisutan / odsutan, uvijek / uglavnom / ponekad / ne, pravo / krivo, uope se ne slaem / dogovore / ne slaem / uope se ne slaem.The labels we choose for these categories suggest an ordering from less to more, moreyes, morepresence, moreoccurrence, morerightness, moreagreement. Without thinking much about it, we take for granted that this kind of labeling necessarily establishes a reliable hierarchy of ordinal response categories, an ordered rating scale. Whether empirical responses to such labels are, in fact, actually distinct or even in their expected order, however, remains to be discovered when the data are subsequently studied with an articulate measurement model.It is not only the unavoidable ambiguity of what is counted nor our lack of knowledge as to the functioning distances between the ordered categories that mislead us. The response counts cannot form a linear scale. Not only are they restricted to occur as integers between none and all. Not only are they systematically biased against off target measures. But, because, at best, they are counts, their natural quantitative comparison will be as ratios rather than differences. Means and standard deviations calculated from these ranks are systematically misleading.There are serious problems in our initial raw data: ambiguity of entity, non-linearity and confusion of source (Is it the smart person or the easy item that produces the "right" answer?). In addition it is not these particular data which interest us. Our needs focus on what these data imply about future data which, in the service of inference, are by definition "missing". We take the inverse probability step to inference by addressing each piece of observed data, xni, as a stochastic consequence of its modeled probability of occurring, Pnix.We take the mathematical step to inference by connecting Pnixto a function which specifies how the measurement parameters in which we are interested might govern Pnix. Our parameters could be Bnthe location measure of person n on the continuum of reference, Dithe location calibration of item i on the same continuum and Fxthe threshold of the transition from category (x-1) to category (x).The necessary and sufficient formulations then are:Pnix/Pnix-1bn/difxiloge(Pnix/Pnix-1) == Bn- Di- Fxin which the symbol "==" means "by definition" rather than merely "equals".The first formulation shows how this model meets the Levy/Kolmogorov divisibility requirement. The second formulation shows how, in logeodds form, this model meets the Campbell/Luce/Tukey conjoint additivity requirement. On the left we see the replacement of xniby its Bernoulli/Bayes/Laplace stochastic proxy Pnix.On the right of the second formulation we see the conjoint additivity which produces parameter estimates in the linear form to which our eyes, hands and feet are so naturally accustomed.Do not forget that when we want to see what we mean, we draw a picture because only seeing is believing. But the only pictures we see successfully are graphs of linear measures. Graphs of ratios mislead us. Try as we might, our eyes cannot "see" things that way. Needless to say, what we cannot see we cannot understand, let alone believe.Indeed, Fechner (1860) showed that when we experience any kind of ratio -light, sound or pain, - our nervous system "takes its logarithm" so that we can "seehow it feels" on a linear scale. Nor was Fechner the first to notice this neurological phenomena. When tuned according to the Pythagorean scale musical instruments sounded out of tune at each change of key. Pythagorean tuning was key-dependent. This inconvenience was resolved in the 17th century, 200 years before Fechner's work, by tuning instruments to notes which increased in frequency byequal ratios.Equal ratio tuning produces an "equally tempered" scale of notes which sound equally spaced in any key and so are sufficiently "objective" to be "key-free", as it were. Bach's motive for writing "The Well-Tempered Clavier" was to demonstrate the value of this invention.Inverse Probability in PracticeThe use of inverse probability to implement inferenceredirects our attention away from seeking models to fit data toward finding data that fit the particular models by which we define measurement. Binomial transition odds like Pnijkx/ Pnijkx-1propel inferential meaning. Our raw data are recorded in terms of categories likeright/wrongandagree/disagree. We label these categories: X = 0,1,2,3,4,,, so that each X label counts a step up along the intended order of categories like:WRONGX = 0PRAVOX = 1

STRONGLY DISAGREEX = 0DISAGREEX = 1AGREEX = 2STRONGLY AGREEX = 3

We then connect X to the circumstances in which we are trying to measure by subscribing X, as in Xnijk, so that Xnijkcan stand for a rating earned by performer n on item i from judge j for task k.Then Pnijkxcan be the inverse probability that performer n gets rated X on item i by judge j for task k.The transition odds that the rating is X rather than X - 1 become Pnijkx/Pnijkx-1, as in:Pnijkx-1Pnijkx

X - 2X - 1XX + 1

We then "explain" the logarithm of these transition odds as the consequence of a conjointly additive parameter composition like Bn-Di-Cj-Ak-Fxso that our measurement model becomes:loge(Pnijkx/Pnijkx-1) Bn-Di-Cj-Ak-FxThis conjoint additivity provides inferential stability.Three Essential Statistics and their RepresentationTo use measures wisely we need to know three things about every measure, its:1.Location on the linear measurement scale, AMOUNT.2.Range of reasonable values, PRECISION.3.Empirical coherence, VALIDITY.Finally, to "see" what our statistics mean, we need to:4.Plot them into an informative PICTURE.To estimate the AMOUNT of the measure is, of course, our motivation for constructing it in the first place. But we must realize that no measure, however, carefully constructed, can be exact. There is always some error in the measure. We need to know how big this error is so that we can keep in mind the PRECISION of the measure as we work with it.There are two main sources of measurement error. The first is an intrinsic component of the stochasticity of our measurement model. The binomial basis for the inverse probability dictates an entirely expected level of measure error. The magnitude of this error component is governed, first by the number of replications, as in, the number of observable steps or the number of rating forms completed and second by the targeting of items on persons.But that modeled and hence expected error is not all. We can only obtain the data for our measures in a real situation which is inevitably fraught with potentially interfering circumstances. We cannot know ahead of time how much these circumstances muddy our measures. Things are bound to be slightly different every time. Fortunately, the fit statistics of our measurement model give us an excellent indication of how much unplanned for disturbance we actually encounter at each application.Thus a second error component, determined by the situation in which our raw data are obtained, which always decreases the actual precision of our measures must be factored into the mathematically modeled precision to produce a precision estimate which is realistic.Finally, the same fit statistics which help us bring our measure precision into rapport with the actual situation in which the raw data were obtained also indicate the general validity of the measure. When the pattern of observed data comes close to the expected values predicted by the measurement model, then we can see that our measure and its error are valid. But, when some of the observed values wander far from expectation, then we cannot overlook the fact that something has interfered with the data collection for our measure and so made our measure less valid than we might have wished.A useful feature of the comparisons between observed and expected raw responses is that the specificities of these discrepancies, which person, which item, often show us what caused the interference and so suggest how we might control the intrusion of further interferences of this kind.To illustrate with a homely example, imagine that, in order to evaluate my "miracle" diet, I weigh myself five times each morning and record the following readings from my bathroom scale:On Monday I read, in pounds: 180 - 179 - 178 - 181 - 182.Mean = 180, Error = 1, Range = 178-182The five readings cluster nicely. It is obvious that 180 is a rather precise estimate of my Monday weight - to the nearest pound.On Tuesday, however, the weights I read are different:180 - 175 - 170 - 185 -190.Mean = 180, Error = 5, Range = 170-190Results which are still valid but somewhatimprecise.The way I used my scale on Tuesday was obviously not as accurate as on Monday. Am I jumping on the scale too roughly? My best Tuesday estimate of my weight is still 180, but now only to the nearest 5 pounds, perhaps too crude to detect any success from my diet.So on Wednesday I am careful how I step on the scale, but, alas, something else goes wrong:177 - 174 -200- 176 -173Mean = 180, Error = ???, Range = 173-200These results must beinvalid!One of my five weighings does not make sense. How could I suddenly weigh 20 pounds more? Was that the moment my wife, trying to see how much I weighed, leaned on my shoulder as I stood on the scale? One thing for sure, that reading of 200 is out of line with the other four readings and has to be reconsidered.Now, look at how nice and sensible my results become when I omit that wild 200:177 - 174 - 176 -173Mean = 175, Error = 1, Range = 173-177Once again both valid and precise. And, glory be, I'm 5 pounds lighter! My diet is working!We discovered the measurement hazards in my weighing story by reading my numbers carefully. But reading numbers takes concentration. Few of us do it well. I'll bet that, had I not dramatized that errant "200", you might have missed it and come to the wrong conclusion about the success of my diet. To avoid the easy mistake of misreading tables of numbers, it is always a good idea to make a picture of the numbers in some kind of plot. Plots of numbers are always better than tables.Let's see what my weighings look like when I plot them. Figure 3 tells my weighing story at a glance. No need to read anything carefully. The increase of uncertainty from Monday to Tuesday is obvious and the irregularity of that "200" on Wednesday is glaring. You cannot miss either of them. You can also see, directly, that once the "200" is excluded from Wednesday's readings, Wednesday's precision is as good as Monday's and that I have definitely lost some weight.Slika 3Three Days on Ben's Bathroom Scale

Why did I weigh myself five times each morning? By now you should be able to answer that question quite easily. What if I had weighed myself only once and that one reading turned out to be the 200? How misled I would have been. Ask yourself the same question in a more dramatic context. How many tosses of my Lucky Quarter would you demand to check for fairness before you bet your life on it? Would once or twice be enough?Nije vjerojatno!What does that mean about scores and measures? One observation, one score from one item, is never enough! Neither is one mere second opinion enough! To make a wise or even sensible decision we must obtain several independent replications of the relevant measures before we act!The moral of this story is that:We needMEASURESnot scores, else change is without evidence.We need to know thePRECISIONof our measures, else their implications remain obscure.We need to verify theVALIDITYof our measurement process by obtaining several independent replications, else meaning is uncertain.No matter how smart we are, we need more than one observation, more than one opinion. We needREPLICATIONS.Finally, a plot is worth a thousand numbers. Indeed a goodPICTUREmay be the only way to "see" what a set of numbers mean.A Fundamental Measure for Applied Self-CareTo do this kind of analysis with your data, you record your category ratings on disk and analyze them with a computer program like BIGSTEPS orFACETS. (Wright & Linacre 1997, Linacre & Wright 1997) This kind of analysis will give you tables, maps, keys and files of conjoint linear measures.The following example of Rasch BIGSTEPS analysis comes from 3128 administrations of the 8 item PECSApplied Self-Care LifeScale. This scale evaluates eight aspects of self-care:BOWELProgramURINARYProgramSKIN CAREProgramHealthCOGNIZANCEHealthACTIVITYHealthEDUCATIONSAFETYAwarenessMEDICATIONKnowledgeby asking a nurse to rate the patient's competence on each of the eight items according to a seven category rating scale intended to bring out gradients of competence for each item, like these categories for the Bowel Program item:BOWELProgram effectiveness concerns regulation of bowel elimination. Prevention of complications includes: regulation of food and fluids; high fiber diet; medications for stimulation or prevention of diarrhea; digital stimulation; and colostomy care.RATING CATEGORY DEFINITION1INEFFECTIVE:Less than 25% effective.2DEPENDENT:25% - 49% effective.3DEPENDENT:50% - 74% effective.4DEPENDENT:75% - 100% effective.5INDEPENDENT:50% - 74% effective.6INDEPENDENT:75% - 100% effective.7NORMAL:Self maintenance.The data matrix has 3128 rows, a row for each patient n, and 8 columns, a column for each item i. The cell entry xniis an ordinal rating from 1 to 7 of patient n on item i. BIGSTEPS analyzes this matrix of 25,024 raw data points to produce the best possible:1.8 item calibrations to define the PECSApplied Self-Care construct,2.for each item, 6 rating step calibrations to its step structure and3.3,128 measures of the extent of each patient's self-care.The analysis not only extracts the best possible linear measurement framework, but also reduces the complexity of the data from 25,024 raw ordinal data points to a mere 8 item calibrations + 48 item step calibrations + 3,128 person measures, all 3,184 of which estimates are expressed in linear metrics on a common scale which measures a single dimension of "self-care"!Tablica 3Summary of the BIGSTEPS Analysisof the PECSLifeScales: Applied Self-CareENTERED: 3128 PATIENTS ANALYZED: 2145 PATIENTS 8 ITEMS 56 CATEGORIES SUMMARY OF 2145 MEASURED (NON-EXTREME) PATIENTS

RAW MODEL INFIT OUTFIT SCORE COUNT MEASURE ERROR MNSQ MNSQMEAN 26.2 7.5 41.56 4.94 .88 .89SD 11.0 .9 22.09 1.35 1.06 1.10

MODEL RMSE 5.12 ADJ.SD 21.49 SEPARATION 4.19 RELIABILITY .95REAL RMSE 5.80 ADJ.SD 21.32 SEPARATION 3.68 RELIABILITY .93SE OF PERSON MEAN .48MAXIMUM EXTREME SCORE: 5 PATIENTS MINIMUM EXTREME SCORE: 94 PATIENTSLACKING RESPONSES: 884 PATIENTS VALID RESPONSES: 94.3%

SUMMARY OF 8 MEASURED (NON-EXTREME) ITEMS RAW MODEL INFIT OUTFIT SCORE COUNT MEASURE ERROR MNSQ MNSQMEAN 7034.1 2023.1 50.01 .28 .83 .88SD 829.2 148.5 4.83 .02 .23 .24

MODEL RMSE .29 ADJ.SD 4.82 SEPARATION 16.91 RELIABILITY 1.00REAL RMSE .29 ADJ.SD 4.82 SEPARATION 16.64 RELIABILITY 1.00SE OF ITEM MEAN 1.82

Output from BIGSTEPS (Wright, BD & Linacre, JM 1997)

What follows are excerpts from Richard Smith's BIGSTEPS analysis of these data. It is unreasonable for you to expect yourself to master every detail shown in these excerpts. Instead, I urge you to sit back and notice, to whatever extent is comfortable for you, the various ways this kind analysis can bring your inevitably complicated data into a few well-organized tables and pictures.Table 3 summarizes the results of an 87% reduction of the raw ordinal data and describes the summary characteristics of its reconstruction into a unidimensional measurement framework. Table 3 contains more information than we can discuss here, but there are two points to note:1.2145 patients are measured at non-extreme scores and among these the data completion is 94.3%.2.The reliability of this 8 item scale to separate the self-care measures of these 2145 patients is a high .93.Slika 4A MAP ofPECSLifeScales: Applied Self-CareMEASURE PATIENTS RATING.APPLIED SELF-CARE 90. 6.KNOWS MEDICATIONS . .# Q 6.HEALTH ACTIVITY . 6.SAFETY AWARENESS 80 .## 6.SKIN CARE PROGRAM .### 6.BOWEL PROGRAM .## 6.URINARY PROGRAM .##### 70 .### .##### ######### ######### S 60 .######## 4.KNOWS MEDICATIONS .########## .#### 4.HEALTH ACTIVITY .########## 4.SAFETY AWARENESS 50 .##### 4.SKIN CARE PROGRAM .####### 4.BOWEL PROGRAM ##### 4.URINARY PROGRAM ######## M 40 .############ 3.KNOWS MEDICATIONS .########### .######### 3.HEALTH ACTIVITY .###### 3.SAFETY AWARENESS 30 .###### 3.SKIN CARE PROGRAM .####### 3.BOWEL PROGRAM .####### 3.URINARY PROGRAM .###### 20 .##### S .###### .##### .#### 1.KNOWS MEDICATIONS 10 .#### .# 1.HEALTH ACTIVITY . 1.SAFETY AWARENESS .## 1.SKIN CARE PROGRAM 0 .# 1.BOWEL PROGRAM .### Q 1.URINARY PROGRAM


To see the meaning of the BIGSTEPS definition of the PECSSelf-Care construct, we plot the item calibrations and patient measures together on the MAP in Figure 4. The left column benchmarks the linear units of the measurement framework, scale for this analysis to run from -20 to 120. The MAP is focused on the region from 0 to 90.The second column shows the frequency distribution of the patients who measure between 0 and 90. The symbols M, S and Q mark the mean patient measure at M, plus and minus one standard deviation at each S and plus and minus two standard deviations at each Q. Finally, on the right, six of the eight items defining this self-care construct are shown in their calibration order four times, once at each of the rating levels 1 at "Ineffective", 3 and 4 at "Dependent" and 6 at "Independent".This was done so that you could see how this definition of self-care moves up from ratings ofineffectivenessin the 0 to 10 measure region, through two successive levels ofdependencein the 25 to 60 measure region to ratings ofindependencein the 75 to 90 measure region.Figure 4 shows only six of the eight items because the two other items,MEDICATIONSandEDUCATIONcalibrate on top of each other at the same high level has doSKIN CAREandCOGNIZANCEa bit lower down.The mapped hierarchy of the 6 items begins with theURINARYandBOWELprograms which are the easiest to rate well on and moves up throughSKIN CARE(andCOGNIZANCE),SAFETYandACTIVITYto reachMEDICATIONS(andEDUCATION) which are the hardest self-care programs to rate well on.The practical application of this empirical hierarchy is that self-care education has the best chance of success when it begins at the easy end withURINARYandBOWELprograms and only reaches up to the more challengingACTIVITYandMEDICATIONSprograms after the easier programs are well established.Tablica 4Calibrations for the Eight ItemsDefiningPECSLifeScales: Applied Self-Care ITEMS STATISTICS: MEASURE ORDERENTRY RAW INFIT OUTFIT NUM SCORE COUNT MEASURE ERROR MNSQ MNSQ PTBIS HARDEST ITEM 8 5964 1866 57.4 .3 .96 .97 .82 KNOWS MEDICATIONS 6 5828 1832 57.1 .3 .66 .68 .89 HEALTH EDUCATION 5 6294 1800 52.1 .3 .72 .74 .87 HEALTH ACTIVITY 7 7458 2137 48.9 .3 .71 .78 .86 SAFETY AWARENESS 4 7413 2135 48.2 .3 .47 .49 .91 HEALTH COGNIZANCE 3 7290 2139 48.1 .31.22 1.29 .77 SKIN CARE PROGRAM 1 7858 2141 44.6 .3 .82 .92 .83 BOWEL PROGRAM 2 8168 2135 43.7 .31.09 1.16 .80 URINARY PROGRAM

MEAN 7034. 2023. 50.0 .3 .83 .88 EASIEST ITEMSD 829. 148. 4.8 .0 .23 .24

INPUT: 3128 PATIENTS, 8 ITEMSANALYZED: 2145 PATIENTS, 8 ITEMS 56 CATEGORIES


Table 4 lists the item calibrations from which the right side of Figure 4 was mapped. This table shows all 8 items in their difficulty order and lists their calibrations, calibration standard errors and fit statistics by which the validity of these calibrations can be judged. The fit statistics, which have expected values of 1.00 show that the only item afflicted with calibration uncertainty is theSKIN CAREprogram item with mean square residual ratios of 1.22 and 1.29.These fit statistics suggest that skin care may sometimes interact idiosyncratically with other patient characteristics like age, sex or impairment. Further examination of other BIGSTEPS output can be used to identify the particular patients who manifest the effects of such an interaction and hence bring out the individual diagnostics which are most helpful to these particular patients.The analysis of the extent to which any particular set of data cooperates with a measurement model to define a desired variable and estimate useful measures along that variable is decisive in the construction of good measures. We cannot do justice to this important topic here. But an excellent place to look for extensive and articulate explanations and applications is in the published work of Richard Smith. (Smith 1985, 1986, 1988, 1991, 1994, 1996)Slika 5Rating Category Step Structure: BOWELPECSLifeScales: Applied Self-Care Construct BOWEL PROGRAM RATING CATEGORY PROBABILITIES

PR 1.0 Ineffective NormalO 111 7777B 11 77A 11 77B .8 1 Independent 7I 1 7L 1 7I 1 Dependent 44 7T .6 1 44 4 7Y 1 4 4 7 .5 1 2222 3333 4 4 7O * 233 * 4 7F .4 21 32 43 4 7 2 1 3 2 4 3 4 7R 2 1 3 2 4 3 5*6*666E 22 * * 3 55 6* 6S .2 2 31 42 3 5 6 *55 66P 2 3 1 4 2 5* 6 7 4 5 6O 222 33 1*4 2 5 3* 7 4 5 666N 222 33 44 11 *** 667*3 44555 6666S .0 ********************************************************* E -25 -5 15 35 55 75 95 115 MJERE

3 1 1 1 11112222222234222241333431311 1 2 6 4 4 31671212716708664379546145138996 7 1 2 3 6PATIENT 6 55216473961979258068811527561823331476839 9 9 3 2 QSMSQ

BOWEL PROGRAM RATING STEP CALIBRATIONS

CATEGORY STEP OBSERVED STEP STEP EXPECTED SCORE MEASURES LABEL VALUE COUNT MEASURE ERROR STEP-.5 AT STEP STEP+.5

Ineffective 1 618 NONE ( -6 ) 13 2 2 1049 7 .6 13 13 22Dependent 3 1794 21 .4 22 30 40 4 4 2503 38 .3 40 50 58Independent 5 860 66 .4 58 63 67 6 6 727 66 .4 67 72 78Normal 7 1010 70 .5 78 ( 84 ) mode mean


Figure 5 illustrates the way BIGSTEPS analyzes how this set of categories are used to obtain patient ratings on each item. The item chosen here is the easiestBOWELprogram. The plot in the top half of Figure 5 shows how the probabilities of category use from 1 up to 7 move to the right across the variable from low measures at -25 to high measures at +115. We can see that each rating category in sequence has a modal region of greatest probability except for categories 5 and 6 which can be seen to be underused on this item.This observation about the usage of categories 5 and 6 could lead us to question their distinction and, perhaps, to consider combining them into one category of "independence".Below the plot is the frequency distribution of the 8567 patient ratings used for this analysis. At the bottom are observed counts, step difficulty calibrations and measures expected at each rating level from 1 to 7. In the column labeled "OBSERVED COUNT" we see that the uses of categories 5 and 6 at counts of 860 and 727 fall far below the uses of category 4 at a count of 2503. That is why the curves for those two categories do not surface at the top of Figure 5.Slika 6Rating Category Step Structure: PATIENT EDUCATIONPECSLifeScales: Applied Self-Care Construct PATIENT EDUCATION RATING CATEGORY PROBABILITIES PR 1.0O Ineffective NormalB 11 7A 11 7B .8 1 Independent 77I 1 7L 1 Dependent 6666 7I 1 6 66 7T .6 1 2222 6 6 7Y 1 2 2 333 6 6 7 .5 12 2 3 33 6 *O 21 23 3 55556 7 6F .4 2 1 32 344445 65 7 6 2 1 3 2 443 54 6 5 7 6R 2 1 3 2 4 35 4 6 5 7 6E 2 1 3 24 * * 5 7 6S .2 2 * 42 5 3 6 4 5 7 66P 22 3 1 4 2 5 36 4 5* 6O 22 33 11 4 * 63 4 77 55 6N 333 4** 55 22*6 333 *** 555S .0 ********************************************************* E -13 7 27 47 67 87 107 127 MJERE

3 1 1 1 11112222222234222241333431311 1 2 6 4 4 31671212716708664379546145138996 7 1 2 3 6PATIENT 6 55216473961979258068811527561823331476839 9 9 3 2 QSMSQ

PATIENT EDUCATION RATING STEP CALIBRATIONS

CATEGORY STEP OBSERVED STEP STEP EXPECTED SCORE MEASURES LABEL VALUE COUNT MEASURE ERROR STEP-.5 AT STEP STEP+.5

Ineffective 1 849 NONE ( -1 ) 18 2 2 1727 11 .5 18 22 33Dependent 3 2018 33 .4 33 42 50 4 4 1403 50 .4 50 57 64Independent 5 1233 63 .4 64 70 79 6 6 799 86 .5 79 92 108Normal 7 92 107 1.2 108 ( 118 ) mode mean


Figure 6 gives the same information for the hardestEDUCATIONitem. We can see that the measure scale under the plot has moved up 12 units and that the step difficulties and expected measures have also increased. In the category curves for this item every category is seen to have its day in the sun. We would not be tempted to combine any of these categories.Slika 7KEY Forms forA Typical and An Atypical Patient TYPICAL PATIENT - Raw Score 36 Estimated Measure 60

NO DIAGNOSTIC DEFICIENCIES

PECS LifeScales: Applied Self-Care

MEASURE! | -20 0 20 40 (60) 80 100 120 | KNOWS MEDICATION 1 : 2 : 3 : (4) 5 : 6 : 7 PATIENT EDUCATION 1 : 2 : 3 : (4) : 5 : 6 : 7 SOCIAL ACTIVITY 1 : 2 : 3 : (4) | : 5 : 6 : 7 SAFETY AWARENESS 1 : 2 : 3 : (4) : | 5 : 6 : 7 SELF-CARE PROGRAM 1 : 2 : 3 : 4 : (5) : 6 : 7 SKIN CARE PROGRAM 1 : 2 : 3 : 4 : (5) : 6 : 7 BOWEL PROGRAM 1 : 2 : 3 : 4 : (5) : 6 : 7 URINARY PROGRAM 1 : 2 : 3 : 4 : (5) 6 : 7 | PERSON MEASURE -20 0 20 40 (60) 80 100 120 | | | | | | | | | PERCENTILE 5 10 25 50 (70) 90 95 (BASED ON 8561 PATIENTS) | MEASURE!

ATYPICAL PATIENT - Raw Score 36 Implied Measure 70

TWO DIAGNOSTIC DEFICIENCIES

PECS LifeScales: Applied Self-Care

DEFICIENCY MEASURE? | | -20 (0) 20 40 60 (73) 80 100 120 | KNOWS MEDICATION | [ 1 ] : 2 : 3 : 4 : 5 | : 6 : 7 PATIENT EDUCATION [ 1 ] : 2 : 3 : 4 : 5 | : 6 : 7 SOCIAL ACTIVITY 1 : 2 : 3 : 4 : (5)| : 6 : 7 SAFETY AWARENESS 1 : 2 : 3 : 4 : (5) : | 6 : 7 SELF-CARE PROGRAM 1 : 2 : 3 : 4 : 5 : (6) : 7 SKIN CARE PROGRAM 1 : 2 : 3 : 4 : 5 : (6) : 7 BOWEL PROGRAM 1 : 2 : 3 : 4 : 5 : (6) : 7 URINARY PROGRAM 1 : 2 : 3 : 4 : 5: (6) : 7 | PERSON MEASURE -20 (0) 20 40 60 (73) 80 100 120 | | | | | | | | | | PERCENTILE 5 (8) 10 25 50 75 (85) 95

(BASED ON 8561 PATIENTS) | DEFICIENCY MEASURE?


Maps and summaries of item and category performance are essential to our analysis of the measurement quality of our test. But our ultimate aim, of course, is the measurement and diagnosis of each of our patients. This concern brings us to the diagnostic KEY forms illustrated in Figures 7 and 8.After the construct MAP of Figure 4, KEY forms are the second most useful outcome of a BIGSTEPS analysis. Figure 7 shows two patients who received the same rating scores of 36 raw score points, but who differ decisively in the best estimates of their self-care measures and who also differ in the diagnostic implications of their particular self-care ratings.The typical patient at the top of Figure 7 measures at 60 self-care units which puts them at the 70th percentile among a normative group of 8561 patients. This patient is verging on self-care independence except for some additional help with the hardest programs, medication and education.The atypical patient at the bottom of Figure 7 with the same raw score, however, appears differently when recorded on their own KEY form. When their abysmal ineffectiveness in the medication and education programs is set aside, they measure up at 73 units and the 85th percentile. They are well into independence on everything except education and medication. But, in those two aspects of self-care, they are dramatically deficient. Obviously their self-care education must concentrate on the earliest levels of these two areas.Slika 8Diagnostic KEY Form forAn Alzheimer Patient PATIENT - Raw Score 66 Implied Measure 67

THREE DIAGNOSTIC DEFICIENCIES

PECS LifeScales: Cognition and Communication

DEFICIENCY MEASURE? | | -20 0 (20) 40 60 (67) 80 100 120 140 160 |SOLVE PROBLEMS 1 : 2 : 3 : (4) : 5 : 6 : 7PRODUCE WRITTEN LANG 1: 2: 3:4 (5) 6 : 7PERCEPT. & COG. DEFICIT 1 : 2 : 3 : 4 : (5) : 6 : 7READ WRITTEN LANG. 1 : 2: 3: 4: (5) 6 : 7SHORT TERM MEMORY 1 : [2 ] : 3 : 4: 5 | : 6 : 7PRODUCE SPOKEN LANG. 1 : 2:3:4 :5 : (6) : 7VISUAL SPATIAL PROCESS 1 : 2 : 3 : 4: (5) : 6 : 7ATTENTION/CONCENTRATION 1 : [2 ] : 3 : 4 : 5 | : 6 : 7VERBAL LING.PROCESSING 1 : 2: 3: 4: 5 |(6) : 7BASIC INTELL. SKILLS 1 : 2 :3 : 4 : (5) | 6 : 7COMPREHEND SPOKEN LANG. 1 : 2: 3: 4: 5 : (6) : 7LONG-TERM MEMORY 1 : 2 :3 : 4: 5 : (6) : 7ORIENTATION 1 : [2 ] :3: 4:5 : 6 | 7ALERTNESS COMA STATE 1 : 2 :3:4:5 : 6 : (7) |PERSON MEASURE -20 0 (20) 40 60 (67) 80 100 120 140 160 | | | | | | | | | | |PERCENTILE (0) 5 10 25 (55) 75 90 95(BASED ON 9600 PATIENTS) | | DEFICIENCY MEASURE?


Figure 8 shows a KEY form for another construct, Cognition and Communication. The patient is also atypical. This time in a particular way which we may have learned implies a particular diagnosis. The deficiencies for this patient, short term memory, attention/concentration and orientation suggest the possibility of Alzheimer's.Notice how well, when a firm well-labeled frame of reference is constructed, we can then become aware of and attend to idiosyncratic aspects of patient status. The two patients in Figure 7 are not treated the same because they happen to have the same raw scores of 36. On the contrary, when we have a frame of reference we can ask about more than their score. We can ask how they got the score? And, when their pattern of ratings contains values which, because of the structure of our measurement frame of reference, are unexpected, we can identify and respond to the details of their particular personal needs.Methodological Summary1.Ratings are chancy. But they are all we can observe. They may contain good information. But it is always in a fuzzy state.2.Raw scores are not measures. Raw scores are biased against off target performance, are sample dependent, inferentially unstable and non-linear.3.Inverse probabilityexplained byconjointly additive parametersenables the construction of clear measures from fuzzy raw scores.4.These measures, in turn, enableMAPswhich define the variable completely in a one page, easy to grasp and remember, graphical report on the construct which our analysis has realized as a measurable variable.5.The measures also enable individualKEYswhich apply the variable sensitively and individually to each and every person to bring out their personal particulars in an easy to review, understand and work from one page graphical person report.ZAKLJUAKA weight of seven was a tenant of faith among seventh century Muslims. Muslim leaders were censured for using less "righteous" standards (Sears, 1997). Caliph of the Muslim world, 'Umar b. 'Abd al-'Aziz, instructs his governor in al-Kufa that:The people of al-Kufa have been struck with trial, hardship, oppressive governments and wicked practices set upon them by evil tax collectors. The more righteous law is justice and good conduct...I order you to take in taxes only the weight of seven.(Damascus, 723)The Magna Carta of John, King of England, requires that:There shall be one measure of wine throughout Our kingdom, and one of ale, and one measure of corn, to wit, the London quarter, and one breadth of cloth,..., to wit, two ells within the selvages. As with measures so shall it be with weights.(Runnymede, 1215)Thus we see that commerce and politics were the first source of stable units for length, area, volume and weight. The steam engine added temperature and pressure. The subsequent successes of science stand on these commercial and engineering achievements. When we recall the long standing political and moral history of units of taxation and trade we realize that when units are unequal, when they vary from time to time and place to place, it is not only unfair. It is also immoral. So too with the misuse of necessarily unequal and so unfair raw score units, when they are analyzed as though they were fair measures.The main purpose of measurement is inference. We measure to inform and specify our plans for what to do next. If our measures are unreliable, if our units vary in unknown ways, our plans must go astray. This point might seem small. Indeed, it has been belittled by many, presumably knowledgeable, social scientists as not worth worrying about. But, far from being a small point, it is a decisive one! We will not build a useful, let along moral, social science until we stop deluding ourselves by analyzing raw scores as though they were measures.ZakljuakThe concrete measures which help us make life better are so familiar that we seldom think about "how" or "why" they work. Although the mathematics of measurement did not initiate its practice, it is the mathematics of measurement which provide the ultimate foundation for practice and the final logic by which useful measurement evolves and thrives. A mathematical history of measurement, however, takes us behind concrete practice to the theoretical requirements which make the practical success of measurement possible. There we discover that:1.Mjere suzakljuaka,2.Dobivenestohastikim aproksimacija,3.Odjednodimenzionalnogkoliinama,4.Counted inabstractunits, of sizes which are5.Intended toundisturbedby extraneous factors.To meet these requirements mathematically,Mjerenje mora biti zakljuak o vrijednosti za beskonano djeljiv parametrima koji odreuju koeficijente prijelaza izmeu vidljivih koracima teorijskog varijable.Table 5 summarizes an anatomy of inference according to four obstacles which stand between raw data and the stable inference of measures they might imply.Tablica 5An Anatomy of InferencePREPREKERJEENJAIzumitelji

NESIGURNOSTimaju -> elimdanas -> kasnijeStatistika -> parametarVJEROJATNOSTbinomni koeficijentiredovito nepravilnostinepodoban za otkrivanjeBernoulli 1713Bayes 1764Laplace 1774Poissonov 1837

DISTORZIJEnelinearnostinejednake intervalenesamjerljivostAditivnostilinearnostpovezan u lanacZdrueni aditivnostiFechner 1860Helmholtz 1887N. Campbell 1920Luce / Tukey 1964

ZbunjenostmeuzavisnostinterakcijakonfuznihSeparabilnostdovoljnostnepromjenjivostZdrueni redaRasch 1958RAFisher 1920Thurstone 1925Guttman 1944

Dvosmislenostof entity, interval and aggregationDjeljivostnezavisnoststabilnostreproducibilnostizmjenljivostLevy 1924Kolmogorov 1932Bookstein 1992de Finetti 1931

Za Bernoulli, Bayes, Laplace, Poissonova i Helmholtz vidjeti Stigler (1986).

Uncertaintyis the motivation for inference.Budunost je neizvjesna po definiciji.Imamo samo prolost kojim predvidjeti.Our solution is to capture uncertainty in a skein of imaginary probability distributions which regularize the irregularities that disrupt connections between what seems certain now but is certainly uncertain later. We call this step "inverse probability".Distorzijaometa prijelaz s promatranja na konceptualizaciju.Naa sposobnost da shvatiti stvari dolazi iz nae fakulteta za vizualizaciju.Our power of visualization evolved from the survival value of body navigation through the two dimensional space in which we live. Our antidote to distortion is to represent our observations of experience in the linear form that makes them look like the space in front of us."Vidjeti" to iskustvo "znai", mi "karta" to.Confusionje uzrokovana meuovisnosti.Kao to smo gledati za sutranje vjerojatnosti u jueranjoj lekcije, zbunjujui interakcije provaliti.Nae rjeenje je zbunjenost pojednostaviti sloenost smo iskustvo u nekoliko lukavo crafted "dimenzije".The authority of these dimensions is their utility. Final "Truths" are unknowable. But, when our inventions work, we find them "useful". And when they continue to work, we come to believe in them and to call them "real" and "true".The method we use to control confusion is to enforce unidimensionality. We define and measure one invented dimension at a time. The necessary mathematics is parameter separability. Models which introduce putative "causes" as separately estimable parameters are our laws of quantification. These models define measurement, determine what is measurable, decide which data are useful and expose data which are not.Ambiguity, a fourth obstacle to inference, occurs because we can never determine exactly which particular definitions of existential entities are the "right" ones. As a result the only measurement models that can work are models that are indifferent to level of composition. Bookstein (1992, 1996, 1997) shows that to accomplish this the models must embody parameter divisibility or additivity as in:H(xy) = H(x)H(y) and G(x+y) = G(x)+G(y)Fortunately the mathematical solutions toAmbiguity, ConfusionandDistortionare identical. The parameters that govern the probabilities of the data must appear in either a divisible or additive form.Inverse ProbabilityA critical turning point in the mathematical history of measurement is the application of Jacob Bernoulli's 1713 binomial distribution as an inverse probability for interpreting the implications of observed events (Thomas Bayes, 1764, Pierre Laplace, 1774 in Stigler 1986, pp. 63-67, 99-105). The data in hand are least of what we seek. Our interests go beyond to what these data imply about other data still unmet, but important to foresee. When we read our weight as 180 pounds, we take that number, not as a one-time, local description of a particular stepping on the scale, but as our "weight" for now, just before now, and, inferentially, for awhile to come.The first problem of inference is how to infer values for these other data, which, by the meaning of "inference", are currently "missing". Since the purpose of inference is to estimate what future data might be like before they occur, methods which require complete data cannot be methods of inference. This realization engenders a third law of measurement:Svaka statistika metoda nominiran sluiti zakljuivanja koja zahtijeva potpuni podaci, ovaj uvjet se disqualifies kao metoda inferencijalne.Ali, ako je ono to elim znati je "nedostaje", kako moemo koristiti podatke u ruci da ine korisne zakljuke o "nedostaje", podaci su moda podrazumijeva?Inverzni vjerojatnost reconceives na sirovi opaanja kao vjerojatna posljedica relevantan stohastiki proces s korisnim formulacija.The apparent determinism of formulae like F = MA depends on the prior construction of relatively precise measures of F and A. The first step from raw observation to inference is to identify the stochastic process by which an inverse probability can be defined. Bernoulli's binomial distribution is the simplest and most widely used process. Mathematical analysis proves that the compound Poisson is the parent of all such measuring distributions.Conjoint AdditivityThe second step to inference is to discover what mathematical models can determine the stochastic process in a way that enables a stable, ambiguity resilient estimation of the model's parameters from the data in hand. At first glance, this step looks obscure. Its twentieth century history has followed so many paths, traveled by so many mathematicians and physicists that one might suppose there were no clear second step but only a jumble of unconnected possibilities along with their seemingly separate mathematical resolutions.Sreom, odraz na motivaciju za te staze i ispitivanje njihovih matematike dovodi do pojednostavljenja ohrabrujua.Although each path was motivated by a particular concern as to what inference must overcome to succeed, all solutions end up with the same simple, easy to understand, easy to use formulation.Matematika funkcija koja upravlja inferencijalne stohastiki proces potrebno je odrediti parametre koji su ilibeskonano djeljivailiconjointly aditiva,odnosnopodijeliti.To je sve za njega!Some fundamental laws of measurement emerge as we explore the definition and necessities of inference:Any statistical method nominated to serve inference which turns out to require complete data, by this very requirement, disqualifies itself as an inferential method.When a model employs parameters for which there are no sufficient statistics, that model cannot construct useful measurement because it cannot estimate its parameters independently.Before applying linear statistical methods to raw data, one must first use a measurement model to construct, from the observed raw data, coherent sample and test free linear measures.Practical solutions to Thurstones' five requirements:1.Measures must be linear, so that arithmetic can be done with them.2.Item calibrationsmust not depend on whose responses are used to estimate them -must be sample free.3.Person measuresmust not depend on which items they happened to take -must be test free.4.Missing data must not matter.5.The method must be easy to apply.were latent in Campbell's 1920 analysis of concatenation, Fisher's 1920 invention of sufficiency and the functional divisibility of Levy and Kolmogorov. Stable inference theory was realized practically by Rasch's 1953 application of the additive Poisson model to the equation of alternative tests of oral reading.Rasch's original model has since been extended to address every imaginable kind of raw observation: dichotomies, rating scales, partial credits, binomial and Poisson counts (Masters & Wright 1984) in every reasonable observational situation ie ratings faceted to: persons, items, judges and tasks. Today versatile computer programs are available which make thorough applications of Rasch's "measuring functions" so easy, immediate and accessible to every student of outcome measurement that there is no excuse for stopping analysis at a misconstruction of raw scores.Despite hesitation by some to use a fundamental measurement model to transform raw scores into measures so that subsequent statistical analysis can become fruitful, there have been many successful applications (Fisher & Wright 1994) and convenient software to accomplish these applications is readily available (Wright & Linacre 1997, Linacre & Wright 1997).Today, it is easy for any reasonably knowledgeable scientist to use these programs to traverse the decisive step from their unavoidably ambiguous concrete raw observations to well-defined abstract linear measures with realistic precision and validity estimates. Today, there is no methodological reason why outcome measurement cannot become as stable, as reproducible and hence as useful as physics.The mathematical knowledge needed to construct objective, fundamental measures from raw scores has been with us for more than 40 years. Easy to use computer programs which do the number work have been available for 30 years. What could possibly justify continuing to misuse raw scores as though they were measures when we know that they are not?MESA Memorandum 66, 1997Benjamin D. WrightMESA Psihometrijska LaboratorijPublished asWright BD (1997) Fundamental measurement for outcome evaluation. Physical medicine and rehabilitation : State of the Art Reviews. 11(2) : 261-288.REFERENCEAndersen, EB (1977). Sufficient statistics and latent trait models.Psychometrika, (42), 69-81.Bookstein, A. (1992). Informetric Distributions, Parts I and II.Journal of the American Society for Information Science, 41(5):368-88.Bookstein, A. (1996). Informetric Distributions.III.Ambiguity and Randomness.Journal of the American Society for Information Science, 48(1): 2-10.Brogden, HE (1977). The Rasch model, the law of comparative judgement and additive conjoint measurement.Psychometrika, (42), 631-634.Campbell, NR (1920).Physics: The elements. London: Cambridge University Press.de Finetti, B. (1931). Funzione caratteristica di un fenomeno aleatorio.Atti dell R. Academia Nazionale dei Lincei, Serie 6. Memorie, Classe di Scienze Fisiche, Mathematice e Naturale, 4, 251-99.[added 2005, courtesy of George Karabatsos]Engelhard, G. (1984). Thorndike, Thurstone and Rasch: A comparison of their methods of scaling psychological tests.Applied Psychological Measurement, (8), 21-38.Engelhard, G. (1991). Thorndike, Thurstone and Rasch: A comparison of their approaches to item-invariant measurement.Journal of Research and Development in Education, (24-2), 45-60.Engelhard, G. (1994). Historical views of the concept of invariance in measurement theory. In Wilson, M. (Ed),Objective Measurement: Theory into Practice. Norwood, NJ: Ablex, 73-99.Fechner, GT (1860).Elemente der psychophysik. Leipzig: Breitkopf & Hartel. [Translation: Adler, HE (1966).Elements of Psychophysics. New York: Holt, Rinehart & Winston.].Fisher, RA (1920). A mathematical examination of the methods of determining the accuracy of an observation by the mean error and by the mean square error.Monthly Notices of the Royal Astronomical Society,(53),758-770.Fisher, WP & Wright, BD (1994). Applications of Probabilistic Conjoint Measurement. Special Issue.International Journal Educational Research, (21), 557-664.Guttman, L. (1944). A basis for scaling quantitative data.American Sociological Review,(9),139-150.Guttman, L. (1950). The basis for scalogram analysis. In Stouffer et al.Measurement and Prediction, Volume 4. Princeton NJ: Princeton University Press, 60-90.Kolmogorov, AN (1950).Foundations of the Theory of Probability. New York: Chelsea Publishing.Levy, P. (1937).Theorie de l'addition des variables aleatoires.Pariz.Linacre, JM & Wright, BD (1997).FACETS: Many-Faceted Rasch Analysis. Chicago: MESA Press.Luce, RD & Tukey, JW (1964). Simultaneous conjoint measurement.Journal of Mathematical Psychology,(1),1-27.Masters, GN & Wright, BD (1984). The essential process in a family of measurement models.Psychometrika, (49), 529-544.Perline, R., Wright, BD & Wainer, H. (1979). The Rasch model as additive conjoint measurement.Applied Psychological Measurement, (3), 237-255.Rasch, G. (1960).Probabilistic models for some intelligence and attainment tests. [Danish Institute of Educational Research 1960, University of Chicago Press 1980, MESA Press 1993] Chicago: MESA Press.Sears, SD (1997).A Monetary History of Iraq and Iran.Dr.sc.Disertacija.Chicago: University of Chicago.Smith, RM (1985). Validation of individual test response patterns.International Encyclopedia of Education, Oxford: Pergamon Press, 5410-5413.Smith, RM (1986). Person fit in the Rasch Model.Educational and Psychological Measurement, (46), 359-372.Smith, RM (1988). The distributional properties of Rasch standardized residuals.Educational and Psychological Measurement, (48), 657-667.Smith, RM (1991). The distributional properties of Rasch item fit statistics.Educational and Psychological Measurement, (51), 541-565.Smith, RM (1994). A comparison of the power of Rasch total and between item fit statistics to detect measurement disturbances.Educational and Psychological Measurement, (54), 42-55.Stigler, SM (1986).The History of Statistics.Cambridge: Harvard University Press.Thorndike, EL (1904).An introduction to the theory of mental and social measurements. New York: Teacher's College.Thurstone, LL (1926). The scoring of individual performance.Journal of Educational Psychology, (17), 446-457.Thurstone, LL (1927). A law of comparative judgement.Psychological Review, (34), 273-286.Thurstone, LL (1928). Attitudes can be measured.American Journal of Sociology, (23), 529-554.Thurstone, LL & Chave, EJ (1929).The measurement of attitude.Chicago: University of Chicago Press.Thurstone, LL (1931). Measurement of social attitudes.Journal of Abnormal and Social Psychology, (26), 249-269.Wright, BD (1968). Sample-free test calibration and person measurement.Proceedings 1967 Invitational Conference on TestingPrinceton: Educational Testing Service, 85-101.Wright, BD (1977). Solving measurement problems with the Rasch model.Journal of Educational Measurement, (14), 97-116.Wright, BD (1984). Despair and hope for educational measurement.Contemporary Education Review, (1), 281-288.Wright, BD & Linacre, JM (1989). Observations are always ordinal: measures, however, must be interval.Archives of Physical Medicine and Rehabilitation, (70), 857-860.Wright, BD & Linacre, JM (1997).BIGSTEPS: Rasch Computer Program for All Two Facet Problems. Chicago: MESA Press.Wright, BD & Masters, GN (1982).Rating Scale Analysis: Rasch Measurement. Chicago: MESA Press.Wright, BD & Stone, MH (1979).Best Test Design: Rasch Measurement. Chicago: MESA Press.

Documents

Temeljna Mjerenja Za Procjenu Ishoda