286
Symposium i anvendt statistik 2018 Institut for Fødevare-og Ressourceøkonomi, Københavns Universitet Det Nationale Forskningscenter for Arbejdsmiljø

Symposium i anvendt statistik 2018

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Symposium i anvendt statistik 2018

Symposiumi anvendtstatistik2018

Institut for Fødevare-og Ressourceøkonomi, Københavns UniversitetDet Nationale Forskningscenter for Arbejdsmiljø

Symposiebog 2018_forside.indd 1 03-01-2018 14:45:44

Page 2: Symposium i anvendt statistik 2018
Page 3: Symposium i anvendt statistik 2018

SYMPOSIUM I

ANVENDT STATISTIK

22.-24. januar 2018

Redigeret af Peter Linde på vegne af organisationskomiteen

Støttet af SAS Institute Ine.

Institut for Fødevare-og Ressourceøkonomi, Københavns Universitet og

Det Nationale Forskningscenter for Arbejdsmiljø

Page 4: Symposium i anvendt statistik 2018

Forord

Det er symposiets formål at fremme information om såvel anvendt statistik som statistisk databehandling. Symposiet er tværfagligt med særlig vægt på metodik, formidling og fortolkning af statistiske analyser. I år er Institut for Fødevare-og Ressourceøkonomi (IFRO), Københavns Universitet vært for symposiet, hvilket vi gerne vil takke for. Symposiet arrangeres af Det Nationale Forskningscenter for Arbejdsmiljø, og Institut for Fødevare-og Ressourceøkonomi (IFRO), Kø­benhavns Universitet. Den faglige forening Symposium i Anvendt Statistik er an­svarlig for det faglige program og økonomien.

Denne publikation indeholder foredragene fra det 40. Symposium i Anvendt Sta­tistik. Dette års indlæg kommer fra mange forskellige fagområder og lægger vægt på forskellig metoder og problemstillinger. Som det er normalt ved viden­skabelige indlæg, er bidragsydeme ansvarlige for indholdet af indlæggene, og spørgsmål herom kan rettes direkte til forfatterne.

Med symposiet tilstræbes det at skabe et forum for tværfaglig inspiration og kri­tik blandt andet for at udbygge kommunikationen mellem personer, der arbejder med beslægtede metoder inden for forskellige fagområder.

Peter Linde, Organisationskomiteen

ISBN 978-87-7904-333-6

Trykt hos PRinfoTrekroner i 175 eksemplarer

Page 5: Symposium i anvendt statistik 2018

Organisationskomiteen for Symposium i Anvendt Statistik 2018

Lisbeth la Cour Økonomisk Institut Copenhagen Business School Porcelænshaven 16A 2000 Frederiksberg [email protected]

Anders Milhøj Økonomisk Institut Københavns Universitet Studiestræde 6 1455 København K [email protected]

Gorm Gabrielsen Institut for Finansiering Copenhagen Business School Solbjerg Plads 3 2000 Frederiksberg [email protected]

Helle M. Sommer SEGES Landbrug & Fødevarer Axeltorv 1609 København V [email protected]

Mogens Dilling-Hansen Institut for Økonomi Århus Universitet 8000 Århus C [email protected]

Jørgen Lauridsen Økonomisk Institut Syddansk Universitet Campusvej 55 5230 Odense M [email protected]

Birthe Lykke Thomsen S-cubed Lille Strandstræde 20C 5. 1254 København K [email protected]

Peter Linde Det Nationale Forskningscenter for Arbejdsmiljø Lersø Parkalle 105 2100 København Ø [email protected] ~

Esben Høg Matematiske Fag Aalborg Universitet Fredrik Bajers Vej 7 9220 Aalborg Ø [email protected]

Anders Holm Sociologisk Institut København Universitet Øster Farimagsgade 5 1014KøbenhavnK [email protected]

Niels Kærgaard Fødevare- og Ressourceøkonomi Københavns Universitet Rolighedsvej 25 1958 Frederiksberg [email protected]

Klaus Rostgaard Statens Serum Institut Artillerivej 5 2100 København Ø [email protected]

Kristina Birch SAS Institute Købmagergade 7-9 1050 København K [email protected]

Page 6: Symposium i anvendt statistik 2018

Indholdsfortegnelse

Uddannelse

The Impact of Teacher Effectiveness on Student Learning in Africa

Julie Buhl-Wiggers, KU, Jason T. Kerwin, University of Minnesota, Jeffrey A. Smith,

University of Michigan and Rebecca Thornton, University of Illinois ........................... 1

Frafald fra Økonomistudiet ved Københavns Universitet

Anders Milhøj, Københavns Universitet ......................................................................... 2

Changes in intergenerational educational mobility – does relative or absolute

parental education matter – a Danish register study

Martin D. Munk, Aalborg University, David J. Harding , University of California,

Anders Holm, Western University, and Keunbok Lee, University of California .......... 12

Bias når surveys opregnes med egen oplyst uddannelse mod registre – hvor galt kan

det gå?

Peter Linde, Det Nationale Forskningscenter for Arbejdsmiljø ................................... 13

Analyse af internationale undersøgelser og Big Data Regional Statistik

Clustering af fjernvarmevekslerstationer for Affald Varme Aarhus

Alexander Martin Tureczek, DTU Management Engineering ...................................... 22

Intelligent, flexible production in an energy system dominated by renewables

Erik Lindström, Centre for Mathematical Sciences, Lund University.......................... 23

Predicting AIRBNB sales with Google searches in a customer journey context

Mads Zacho Krarup, Niels Buus Lassen and René Madsen, Dept. of Digitalization,

CBS ............................................................................................................................... 32

Klassekammerateffekten i PISA

Hans Bay, Det Nationale Forskningscenter for Arbejdsmiljø ..................................... 50

Data og analyser

Potentialerne i samlingen af registerdata

Lea Sztuk Haahr, Steen Andersen og Bodil Stenvig, Rigsarkivet ................................. 65

The Danish National Biobank and the Danish Biobank Register

Lasse Boding, Danish National Biobank, Statens Serum Institut ................................. 71

Ny IDA-database baseret på arbejdsmarkedsregnskabet

Pernille Stender og Søren Leth-Sørensen, Danmark Statistik ...................................... 73

Tidsrækker

Intervention Models in Time Series

Anders Milhøj, University of Copenhagen ................................................................... 80

Population Alcohol Consumption as a Predictor of Alcohol-Specific Deaths

in Finland

Timo Alanko, University of Helsinki ............................................................................ 91

Predicting TV Viewing with weather data

Matilde Røndbjerg and Niels Buus Lassen, Dept. of Digitalization, CBS ................ 105

Predicting the daily sales of Mikkeller bars using Facebook data

Lisbeth la Cour, CBS, Anders Milhøj, KU, Ravi Vatrap, CBS, and Niels Buus

Lassen, CBS ................................................................................................................ 125

Page 7: Symposium i anvendt statistik 2018

Demografi og samfund

Can We Learn to Live Longer? – A Spatial Learning Model of Life Expectancy

Axel Börsch-Supan, Munich Center for the Economics of Aging, and Jørgen T.

Lauridsen, Centre of Health Economics Research, SDU .......................................... 142

Kvindekrisecentre

Anne Vibeke Jacobsen, Danmark Statistik ................................................................ 151

Empathy variation in general practice care: A survey among Danish

General Practitioners

Justin A. Charles, Peder Ahnfeldt-Mollerup, Jens Søndergaard and Troels

Kristensen Center for Medical Humanities, Stony Brook University and Depart.

of Public Health, SDU ................................................................................................ 160

Statistiske analyser og SAS

Blev vi klogere af mere datainformation? – og i givet fald hvad?

Helle M. Sommer, Julie Krogsdahl og Mai Britt F. Nielsen, SEGES ....................... 175

How to predict the outcome of a football match

Sara Armandi, SAS .................................................................................................... 185

In-vitro fertilization and in-vitro embryo culture in mouse as a reprotoxicity model

for xenobiotics: some considerations on how to analyse data and present results

Leslie Foldager, Ying Liu, Hanne Skovsgaard Pedersen, Knud Larsen, Kaja Kjær

Kristensen, Henrik Callesen and Martin Tang Sørensen, Department of Animal

Science, Bioinformatics Research Centre and Department of Molecular Biology

and Genetics, Aarhus University ................................................................................ 188

44 års opfølgning fra Copenhagen Male Study (CMS). Hvem overlever alder 85 år?

Hans Bay, Det Nationale Forskningscenter for Arbejdsmiljø .................................. 200

Nyheder i SAS Analytics 14.3

Anders Milhøj, Department of Economics, University of Copenhagen .................... 209

Økonomi

Finansministeriets økonomiske råderum:Luftkastel eller realitet?

Jesper Jespersen, Roskilde Universitet ....................................................................... 219

Tesla, taxes, and government takings

Marcus Asplund, David Jinkins, Chandler Lutz and Gyorgy Paizs, CBS .................. 227

Contracting Out Welfare-to-Work Services

Lars Skipper and Kenneth Lykke Sørensen, Aarhus University, CAFÉ ..................... 228

Page 8: Symposium i anvendt statistik 2018

Regional og spatial statistik

The Geography of Foreign Direct Investment’s spillover effects

- Evidence from Denmark

Ditte Håkonsson Lyngemark, Kraks Fond, Ismir Mulalic, Kraks Fond and DTU,

and Cecilie Dohlmann Weatherall, Kraks Fond ........................................................ 231

Anvendelse af surveys til eksperimentelle designs

Mogens Dilling-Hansen, Department of Economics and Business Economic,

Aarhus University ....................................................................................................... 235

Educational choice and inter-regional migration

The causal effect of secondary education on migration out of less-urban areas

Elise Stenholt Sørensen, Kraks Fond, and Anders Holm, University of Western

Ontario ........................................................................................................................ 248

Do your neighbours matter? Evidence on peer effects from quasi-experimental data

Georges Poquillon, University of Essex, and Bence Boje-Kovacs, Kraks Fond ....... 252

Statistiske metoder

The end of the Rasch model?

Karl Bang Christensen, Department of Biostatistics, University of Copenhagen ...... 254

Assessing observable donor and product characteristics as risk factors for adverse

health outcomes in blood transfusion recipients

Klaus Rostgaard. Department of Epidemiology Research, Statens Serum Institut .... 257

Metodiske overvejelser i forhold til Charlsons komorbiditetsindeks

Sören Möller, OPEN – Odense Patient data Explorative Network, Odense .............. 261

Computational Parametric Mapping

Line Rosendahl Meldgaard Pedersen, Det Nationale Forskningscenter for

Arbejdsmiljø ................................................................................................................ 265

Page 9: Symposium i anvendt statistik 2018
Page 10: Symposium i anvendt statistik 2018

The lmpact ofTeacher Effectiveness on Student Learning in Africa

Julie Buhl-Wiggers, Jason T. Kerwin, Jeffrey A. Smith and Rebecca Thomton 1

Abstract

Teacher effectiveness is known to be critical for students' education and life prospects in

several developed countries. However, little is known about how teacher effectiveness

affects student leaming in Africa. This paper presents the first estimates ofteacher

effectiveness from an African country, using data from a school-based RCT in northem

Uganda. Exploiting the random assignment of students to classrooms within schools, we

estimate a lower bound on the variation in teacher effectiveness. A I-SD increase in

teacher effectiveness leads to at least a 0.14 SD improvement in student performance on a

reading test at the end ofthe year. We find no detectable correlation between teacher

effectiveness and teacher characteristics, but we do find that more effective teachers have

more structured lessons and more active students. In addition, we find that providing

teacher training and support increases the variation in teacher effectiveness, by making

the most-effective teachers relatively better than the least-effective teachers.

1 Buhl-Wiggers: Department ofFood and Resource Economics, University ofCopenhagen [email protected]); Kerwin: Department of Applied Economics, University ofMinnesota [email protected]); Thornton: Department of Economics, Universityoflllinois ([email protected]); Smith: Department ofEconomics, University ofMichigan ([email protected]).

1

Page 11: Symposium i anvendt statistik 2018

Frafald fra Økonomistudiet ved Københavns Universitet

Anders Milhøj1

Københavns Universitet

Opsummering

I notatet vises en række tal for studieaktiviteten for 273 studerende, der påbegyndte bachelordelen på Økonomi i sommeren 2010 og som stadig var indskrevet 1. oktober 2010. Denne dato er essentiel, da studerende, der afbryder studiet før 1. oktober, opfattes som ikke begyndt, og de regnes derfor ikke med i frafaldsprocenten. Alle tal i dette notat er en anelse upræcise, da der er registreringsfejl især for studerende, der ikke fuldfører studierne, og fordi oplysninger om adgangsgivende eksaminer mangler for visse studerende, der sandsynligvis er optaget i kvote II samt for enkelte udlændinge.

Ud af disse 273 studerende havde 188 midt i juni 2016 færdiggjort bachelorgraden. Langt de fleste af disse 188 studerende med en bachelorgrad i Økonomi har påbegyndt kandidatstudiet og 46 i juni også færdiggjort kandidatstudiet. Enkelte af disse 188 dimittender har påbegyndt andre kandidatstudier, og atter andre har påbegyndt, og tre endda afsluttet, endnu et bachelorstudium på KU.

Ud af de 273 studerende er 10 stadig indskrevne som bachelorstuderende. Nogle af disse mangler kun et enkelt fag, mens de øvrige kan skyldes forskellige individuelle forhold, eller at et de facto studieophør endnu ikke er registreret.

Der er 75 ud af de 273 studerende, der er faldet fra bachelorstudiet i Økonomi. Denne frafaldsprocent på 27.5% (75 ud af273) er højere end universitetssektorens måltal, så den udgør et problem for studiet.

Ud af disse 75 har 28 ikke bestået et eneste fag på Økonomistudiet og 22 endda aldrig mødt op til en eneste eksamen. Det må derfor konkluderes, at disse 22 ikke i realiteten har givet studiet en chance. Ser man bort fra disse 22 studerende i opgørelsen af frafaldsprocenten, daler den til 21 %, hvilket stadigt er højt. De resterende 4 7 frafaldne har tilsammen bestået 2255 ECTS (hvilket er 37.6 STÅ) på Økonomistudiet, hvoraf kun 152.5 er meritoverført til andre studier på KU. Der har altså været en betydelig økonomisk gevinst ved disse frafaldne studerendes studieaktiviteter i deres tid som indskrevne på Økonomistudiet; både for faget Økonomi og for KU som helhed.

Ud af de 75 frafaldne har 23 påbegyndt andre studier på KU og 11 har afsluttet en bachelorgrad. Sammenlagt har de optjent 2425 ECTS på andre studier, uden at merit fra Økonomistudiet er medregnet.

1 Tak til Sara Armandi for hjælp til at skaffe data; især var udelukkende gennem hendes forbindelser, at deltagerne i rusturen 2015 kunne identificeres. Tak desuden til Bjørn Bjørnsson Meyer for at lade mig bruge

hans STADS data.

2

Page 12: Symposium i anvendt statistik 2018

Blandt de 273 studerende pr 1. oktober 2010 havde 18 tidligere været optagne på andre steder på KU. To havde fuldført et bachelorstudium før studiestarten 1. september 2010. Sammenlagt havde de bestået 1157.5 ECTS på andre studier, hvoraf kun 190 ECTS er meritoverført til Økonomistudiet. Også for denne gruppe af frafaldne fra andre studier end Økonomi er der tale om en gevinst for de tidligere fag og for KU som helhed.

I anden del af papiret gennemgås det tidlige frafald for optaget efteråret 2015 mere del taljeret.

Frafald for optaget efteråret 2010

Oversigt over data

I dette notat opsummeres en række data for de studerende, der blev optaget på bachelordelen i Økonomi ved Københavns Universitet sommeren 2010, dvs. med startdato 1. september 2010. Alle data stammer fra STADS; det et samlet udtræk for alle studenter foretaget midt i juni i sommeren 2016.

På Økonomistudiet forlanges, at faget Matematik i den adgangsgivende uddannelse er taget på A niveau (det højeste). Det var i 2010 var der desuden et specifikt adgangskrav, at gennemsnittet af matematikkarakteren fra den adgangsgivende uddannelse og det samlede gennemsnit fra den adgangsgivende uddannelse skal være mindst 6. I 2010 var der bortset fra disse specifikke adgangskrav frit optag. I de senere år er adgangskvotienten for Kvote I steget og er nu 8.0 i sommeren 2017. Der er desuden et optag i Kvote II, men der desværre ikke oplysninger, om hvorvidt de studerende er optaget i Kvote I eller II. Det specifikke adgangskrav er ændret til et krav om matematikkarakter på mindst 6 fra gymnasiet.

I alt var 273 studerende stadig indskrevet som aktive studerende 1. oktober 20 I 0.

I sommeren 2016 har 188 af disse afsluttet studiet med en bachelorgrad, mens 75 har forladt studiet før færdiggørelse. Det svarer til en frafaldsprocent på 27%.

Der er desuden 10 studerende, der efter knapt seks års studier endnu er indskrevet som studerende på det treårige bachelorstudium i Økonomi. To af disse har haft en orlovsperiode af et halvt år varighed. Disse 10 studerende har i alt bestået 1225 ECTS på Økonomistudiet:

ECTS Frequency Percent

Mellem 60 og 120

Over 120

4 40.00

6 60.00

De 4 af disse endnu ikke færdige har faktisk inden midt-juni 2016 bestået bachelorprojektet, så de er tæt på en fuldførelse afbachelordelen. To ser ud fra en

3

Page 13: Symposium i anvendt statistik 2018

udskrift af eksamensaktiviteter ud til at være tæt på en tvangsudskrivning. De resterende 4 har bestået eksaminer med visse pauser, der sandsynligvis skyldes forskellige personlige forhold.

Tidspunkt for frafald

I datamaterialerne er der en slutdato, der er den administrative dato for registreringen af studieophøret. Denne dato kan ofte være langt senere end det reelle studieophør. Det kan skyldes, at den studerende ikke melder fra, når beslutningen om studieophør er taget, eller det kan skyldes, at administrationen ikke registrerer frafaldet straks, den studerende giver besked. Enkelte er tvangsudskrevet pga. manglende studieaktivitet, hvilket er en proces, der kan tage meget lang tid.

Den første er udskrevet 11. november 2010; altså kun halvanden måned efter skæringsdatoen 1. oktober 2010. Den sidste frafaldne er udskrevet så sent som 15. februar 2016. Figuren viser en graf over det kumulerede frafald.

80

60

-c

~ ""' I 40

'" ~

20

Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul 201 0 2011 2012 2013 2014 2015 2016

SLUTDATO

Adgangsgivende eksamen

For langt de fleste er der oplysninger om gennemsnit ved den adgangsgivende eksamen og for året hvor denne eksamen er aflagt. I dette afsnit benyttes

4

Page 14: Symposium i anvendt statistik 2018

optagelseskvotienten, der er tillagt 3% for ekstra fag på A-niveau og er tillagt 8% for adgangsgivende eksaminer, der er højst to år gamle. Ud af de 273 studerende er der disse oplysninger tilgængelige for 256.

Gennemsnittet for den adgangsgivende eksamen blandt alle de 273, der begyndte bachelorstudiet i Økonomi var 8.32, blandt de der i juni 2016 havde gennemført bachelorstudiet var det 8.62 og for de frafaldne var den 7.71. Disse forskelle er signifikante.

En grov tabellering, se tabel, viser også billedet af, at de frafaldne har et lidt lavere adgangskvotient end de, der har gennemført hele bacheloruddannelsen. Forskellene er signifikante.

Det er påfaldende, at halvdelen af de optagne med et gennemsnit under 6 fra adgangsgivende uddannelse har gennemført bacheloruddannelsen. Det kan forklares ved det specifikke adgangskrav, der lægger op til, at der kan kompenseres for et lavt totalgennemsnit, hvis matematikkaraktererne er høje nok. For nogle kan det måske også forklares ved de specifikke kvalifikationer, der kan føre til optag via Kvote II. Desuden ses, at frafaldet også foregår for studerende med høje gennemsnit fra adgangsgivende uddannelse; men de har selvfølgelig lettere ved at skifte til andre studier med adgangskrav. Ni ud af 53, dvs. 17%, med gennemsnit over 10 er faldet fra. Alle de, der stadigjuni 2016 er i gang med bachelordelen, har gennemsnit under 8 fra adgangsgivende eksamen.

ADGEX KVOTIENT Status juni2016

Afbrudt Dimittend Total

Under 6.0 10 10 20

Over 6.0 og under 8.0 49 125 174

Over 10.0 9 44 53

Total 68 179 247

Det efterfølgende kandidatstudium i Økonomi

Ud af de 188, der har færdiggjort bachelorstudiet, har 183 påbegyndt kandidatstudiet i Økonomi, heraf har 46 også færdiggjort kandidatstudiet i Økonomi midt i juni 2016, mens 128 har påbegyndt kandidatstudiet i økonomi og er stadig indskrevet. I alt 9 har påbegyndt kandidatstudiet i Økonomi, men siden afbrudt.

5

Page 15: Symposium i anvendt statistik 2018

Nogle få (mindst to) har endnu ikke påbegyndt et nyt studium på KU hverken på bachelor- eller kandidatniveau (men kan have påbegyndt studier på andre universiteter) efter at have færdiggjort bachelordelen af Økonomistudiet.

I alt 6 dimittender fra bachelordelen af Økonomi har påbegyndt andre bachelorstudier på KU; eventuelt efter at være på, men faldet fra, kandidatstudiet i Økonomi. To læser stadig et bachelorstudium hhv. medicin og datalogi på bachelordelen, mens 3 faktisk har færdiggjort bachelordelen på andre studier; to på Filosofi og en på matematik. En har efter en bachelor i Økonomi påbegyndt og siden afbrudt bachelorstudiet i Geografi og Geoinformatik, men er siden begyndt kandidatstudiet i Økonomi.

I alt 4 med gennemført bachelorstudium på Økonomi har påbegyndt andre kandidatstudier på KU (heraf enkelte efter at have begyndt på, men faldet fra, kandidatstudiet i Økonomi). Heraf2 på kandidatstudiet i "Global Development", der delvis varetages af Økonomisk Institut, og 1 på hvert af studierne "Sikkerhed og Risikoledelse" og Statskundskab.

Optjente ECTS/STÅ på Økonomistudiet før frafald

De 75 studerende, der faldt fra bachelorstudiet I Økonomi uden en fuldført bachelorgad, har i alt gennemført 2255 ECTS på Økonomi studiet.

ECTS Frequency Percent

Ingen

Mindre end 30

Mellem 30 og 60

Mellem 60 og 120

28 37.33

13 17.33

15 20.00

19 25.33

I disse tal er ikke medtaget merit, der er taget ind i Økonomstudiet fra andre uddannelser, fX fra Økonomistudierne i Århus og Odense eller visse, der fX har bestået de grundlæggende matematikkurser, der kræves ved Økonomistudiet, ved andre studier.

Kun seks ud af de 28, der forlod Økonomistudiet uden at have bestået nogen eksamen overhovedet, har mødt fysisk op til en eksamen og altså fået enten karakteren 00 eller -3. Karakteren -3 gives som altoverskyggende hovedregel kun for en blank besvarelse uden et egentligt forsøg. Kun fire har fået 00 i et fag, heraf de to i hele to fag, uden at bestå noget som helst, før de forlod Økonomistudiet.

Det kan altså konkluderes, at de resterende 22 studerende ud af de 28, der ikke har bestået fag ved Økonomistudiet, ikke har gjort en eksamensindsats overhovedet. Ud fra STADS data kan det ikke afgøres om disse studerende selv har ønsket at opretholde

6

Page 16: Symposium i anvendt statistik 2018

deres indskrivning ved Økonomistudiet, men de har altså ikke aktivt selv meldt sig fra studiet, fX for at påbegynde et nyt studium. For nogle år siden var der ikke det samme fokus på tvangsudskrivning fra Københavns Universitets side, som der er nu, så studerende kunne være inaktive i årevis. Det er for usikkert at konkludere noget om deres studieaktivitet de første måneder af studiet før januar eksamen, da oplysninger om fX enkelte afleverede opgaver ikke konsekvent fremgår af STADS.

Optjente ECTS/STÅ på andre studier ved Københavns Universitet efter frafaldet

Der er 23 studerende, der efter at have forladt Økonomistudiet uden bachelorgrad har påbegyndt et andet studium på Københavns Universitet. Hele 11 af disse har gennemført et fuldt bachelorstudium, eventuelt med merit overført fra aktiviteter fra Økonomistudiet. Disse 11 dimittender fordeler sig på 9 meget forskellige studier: To Jura og to Jordbrugsøkonomi samt en enkelt fra hvert af studierne Antropologi, Matematik, Filosofi, Naturressourcer, Filosofi, Folkesundhedsvidenskab og Medicin.

De 23 studerende, der har påbegyndt et studium efter at have forladt Økonomistudiet uden bachelorgrad og har påbegyndt et andet studium på Københavns Universitet, har i alt optjent 2425 ECTS på de efterfølgende studier, hvilket svarer til ca. 40 STÅ. I dette tal er ikke medtaget fag, der er overført som merit overført fra Økonomistudiet. De fordeler sig på følgende måde

ECTS Frequency Percent

Mindre end 30

Mellem 60 og 120

Mellem 120 og 180

180 eller flere

5.88

5 29.41

4 23.53

7 41.18

Seks personer har fået merit fra Økonomistudiet ind i andre studier på KU. Det drejer sig om i alt 152.5 ECTS, dvs. under 3 STÅ. I dette tal indgår også merit, som færdige bachelorer i Økonomi har fået på andre studier på Københavns Universitet.

Studieaktiviteter på KU før indskrivning på Økonomistudiet.

Blandt de 273 optagne har 18 været indskrevne ved andre studier på KU, før de påbegyndte økonomistudiets bachelordel.

En havde en bachelorgrad i Idræt og en havde en bachelorgrad i Statskundskab - de er begge i juni 2016 færdige med bachelordelen i Økonomi og den ene har endda også afsluttet kandidatdelen i Økonomi. Hele 8 af de 16 har tidligere læst Statskundskab og

7

Page 17: Symposium i anvendt statistik 2018

5 har læst matematiske/datalogiske fag. De resterende 4 fordeler sig med 2 fra humanistiske fag, en fra Idræt og en fra Jura.

De 18 har i deres tidligere studier på KU i alt optjent 1157.5 ECTS. Tre har ikke optjent ECTS før Økonomistudiet, mens de øvrige 16 fordeler sig som

ECTS Frequency Percent

Mindre end 30

Mellem 30 og 60

Mellem 60 og 120

Mellem 120 og 180

2 12.50

3 18.75

5 31.25

5 31.25

Kun 6 af disse studerende har fået merit for tidligere beståede fag på KU. I alt 190 ECTS er ført ind i Økonomi studiet; 4 har hver ført 40 ECTS med sig, mens 2 hver har ført 15 ECTS ind i Økonomistudiet.

Ud af de 18 har 13 gennemført bachelordelen af Økonomistudiet og 4 endda også kandidatdelen. De fleste frafaldne i denne gruppe af tidligere studerende ved andre fag på KU er faldet fra uden at have bestået eksamener på Økonomistudiet. Kun to har bestået fag på Økonomi, faktisk mange fag, med hhv. 97.5 og 135 ECTS før de (også) forlod Økonomistudiet.

Frafald for optaget efteråret 2015

Det naturligvis er af interesse at undersøge frafaldmønsteret for et nyere optag end 2010, da al offentlig debat og regelsætning om fredrift et samt politikernes ønsker om at minmere frafald givetvis har medført ændringer i de studerendes adfærd. Ulempen ved at undersøge nyere data er, at der så selvfølgelig kun er muligt at se på frafald i begyndelsen af studiet.

Her ses på frafaldet første studieår for optaget efteråret 2015 frem til sommeren 2016. Gennemgangen er bygget kronologisk op, så en beslutning og studieophør sættes i relation frafald sættes i relation til den viden om studiet, den studerende har på det konkrete tidspunkt.

Studiestart i september 2015

Ved studiestart i september 2015 var 330 tilmeldt studiet; herafmeldte 10 fra før d. 1. oktober. Frafald før 1. oktober indgår ikke i den officielle frafaldsprocent. Denne gruppe er karakteriseret ved en lidt lavere adgangskvotient. Gennemsnittet er 8.29 mod 9.44. Denne forskel er signifikant, p = 3.0% ved et test mod et ensidet alternativ. Hvis

8

Page 18: Symposium i anvendt statistik 2018

der skæres ved en adgangskvotient på 8.156 er der 7.0% frafald i september blandt de laveste og 1.3% blandt de højeste gennemsnit.

Ved studiestart arrangerer ældre studerende rusture o.l. Ud af de 10, der faldt fra i september har 2 (dvs. 20%) været på rustur mens den tilsvarende andel for alle øvrige er 83%. Denne forskel er stærkt signifikant, men siger vel blot, at beslutningen om hurtigt frafald er truffet allerede før rusturen ved studiestart. Der var i alt 62, der ikke deltog i rusturen. Heraf faldt de 8, dvs. 12.9% fra allerede i september.

Kombineres de to informationer finder man ved machine leaming, at der i gruppen på 20, der ikke deltog i rusturen og have gennemsnit under 8.0 var 5, dvs. 25%, der faldt fra allerede i september.

Frafald mellem oktober og januar eksamen januar 2016

Pr. 1. oktober var der 220 indskrevne. Heraf faldt 7 fra før januar 2016 - den første allerede 2.oktober, men vedkommende tæller altså med i den officielle frafaldsstatstik. Igen er adgangskvotienten for disse 6 (adgangskvotienten mangler for en af dem) lavere end for de øvrige, 8.2 mod 9.5, men denne forskel er ikke signifikant. Ud af disse 7, har de 5 været på rustur, så frafaldet i dette tidsrum er altså igen lidt større blandt ikke-rusturdeltagere, 3.7% mod 1.9%.

Eksamensresultater ved vintereksamen

Vintereksamen begyndte med faget Økonomiske Principper A midt i december 2015 efterfulgt af eksaminer i Samfundsbeskrivelse A og Matematik A i januar. I alle disse fag var der reeksamen i februar, hvilket i vore dage kan dække over sygdom, ikke bestået ved den første eksamen eller blot en bevidst satsning på en længere periode til eksamensforberedelse. Der er derfor ingen rimelighed i at skelne mellem beståelse ved ordinær eksamen og reeksamen.

I eksamensperioden afbrød 12 studiet. Af disse i alt 12 har 4 bestået Økonomiske principper A og ud af disse 4 har 1 bestået Matematik A og en anden har bestået Samfundsbeskrivelse A. Det er absolut tænkeligt, at eksamenserfaringerne har medvirket til studieophøret.

To studerende har !aet karakteren 00 i alle fag og de resterende er udeblevet fra eksamen, Det gør det nærliggende at tro, at beslutningen om studieophør var truffet før eksamensperioden.

Frafald marts til juni

I perioden fra marts til observationsperioden slutter i juni 2016 faldt yderlige 14 fra. Ud af disse 14 har de 3 ikke deltaget i nogle af vintereksamenerne, så de må opfattes som blot sent udmeldte i frohold til ophørsbeslutningen. Af de resterende 11 har 3 ikke bestået Matematik A men har bestået de to andre fag. Disse eksamenserfaringer har uden tvivl medvirket til beslutningen om studieophør.

9

Page 19: Symposium i anvendt statistik 2018

Hele 7 har bestået alle fire fag, dvs. at de opfylder de på papiret her gennemført første halvår uden problemer. Så de hører nok til den gruppe, hvor frafaldet ikke skyldes manglende evner til at gennemføre studiet, men i stedet et ønske om et studieskift. Gennemsnittet fra den adgangsgivende eksamen for disse 14 var 8.9 mod 9.5 for de resterende, som ikke er faldet fra før tidligst sommeren 2016. Forskellen er ikke signifikant.

Det samlede frafald

Som anført i den kronologiske gennemgang kan der rent hypotetisk forslås tre forskellige typer af fraflad.

1) Reelt manglende studiestart

2) Tidlig opgiven

3) Ønske om studieskift

Type 1) er et til dels et bureaukratisk problem, fordi det tager en vis tid at for universitetet at ekspedere et ønske om studieophør, så en udmeldelse i oktober i stedet for september, hvilket er til skade for universitetet, kan ikke udelukkende tilskrives den studerende. Det kan dog sagtens tænkes, at den studerende bevidst trækker studieophøret af hensyn til fx et kollegieværelse og SU. Sociale myndigheder foretrækker også SU frem for bistandshjælp. Forældre kan også lægge et vist pres på deres børn, for at børnene skal lave noget fornuftigt.

Det ses af tallene at denne gruppe har lidt lavere karakterer fra den adgangsgivende eksamen. Desuden er de i høj grad fraværende ved intro-aktiviteter på studiet, så de vil kunne spottes med henblik på en forebyggende indsats; enten som hjælp til at studere eller tilskyndelse til at melde fra før skæringsdatoen 1. oktober.

Type 2) kan til beskrives som studerende med en et lavt gennemnit fra adgangsgivende eksamen, selvom gennemsnittet ikke er lavt i forhold til den samlede population, der består af en adgangsgivende eksamen; det er snarere gennemsnitligt. Om frafaldet skyldes manglende evner eller manglende studieindsats, er svært at sige. allerede de mange obligatoriske afleveringer er en stopklods for nogle, da disse opgaver er på et højere niveau end i gymnasiet. Det første halve år på de fleste studier er dog i vore dage for at sikre en blid studiestart ret bløde, så reelle faglige problemer opstår først på andet-tredje år af studierne.

Det kan ikke udelukkes, at den formelle matematik i Matematik A kan virke overvældende for nogle. Desuden er projektopgaverne i Samfundsbeskrivelse langt mere omfattende end i gymnasiet og tanken om en 3-timers eksamen uden hjælpemidler i Økonomiske principper virker som en trapez-tur uden sikkerhedsnet. I det omfang, de dukker op til januar eksamen, går det i hvert fald dårligt. Det kunne tænkes, at denne gruppe skal deles op ti to for de, der ikke for alvor kommer i gang, men dog begynder og så de, der giver det hele en chance og først opgiver efter de første eksaminer.

10

Page 20: Symposium i anvendt statistik 2018

Denne gruppe kan naturligvis spottes ved manglende opgaveaflevering og dårlige eksamensresultater. Men det er umuligt for universitetet at identificere dem før 1. oktober. For dem selv og deres fremtid kan det være en fordel at de indstiller sig på et studieskift så tidligt som muligt, hvad mange altså også gør. Men det er ikke alle med beviseligt dårlige resultater ved eksamen efter første halve studieår, der falder fra i løbet af studiets andet halvår.

For Type 3) er det ikke faglige problemer eller studiemoral, der er problemet. Det er et ønske om at bruge livet på noget andet end økonomi. Menage i den gruppe har et højt gennemsnit fra den adgangsgivende eksamen, så de kan komme ind på mange andre studier, jf. gennemgangen af 2010-optaget første halvdel af dette papir. Denne gruppe kan ikke identificeres via objektive observationer; men at de er så mange, kan skyldes dårlig vejledning om studievalg. At de udgør et problem for universitetet, er et politisk valg. Man kan lige så godt hævde, at universitetet tjener penge på dem Desuden har samfundet godt af, at visse læger, jurister, humanister og dataloger m.fl. har et grundlæggende kendskab til økonomi og økonomers tankegang fra et eller to års fuldtidsstudium!

11

Page 21: Symposium i anvendt statistik 2018

Changes in intergenerational educational mobility- does relative or

absolute parental education matter - a Danish register study

Martin D. Munk*, David J. Harding**, Anders Holm*** Keunbok Lee****

*Department of Political Science, Aalborg University, Denmark, email: [email protected]

**Department ofSociology, University ofCalifornia, Berkeley, USA, email:

[email protected]

***Department ofSociology and Department ofEconomics, Western University, Canada,

email: [email protected]

**** Department ofSociology, University ofCalifornia, Berkeley, USA, email:

[email protected]

In this paper, we explore possible changes in intergenerational educational mobility in Denmark. Previous studies have indicated that access to upper secondary youth education and university college has been increasing (Esping-Andersen 2004; Munk 2014; Thomsen 2015), so what trend is appearing if we look at the entire spectrum of emollment and completion of education? We ask whether changes are due to relative parental education or specific parental education? When using rank regressions and linear probability models, we find mixed results, but it seems that decreasing educational mobility is driven by father's with an university degree. We are exploring various trends using both father's and mother's educational rank position and specific categorical education.

12

Page 22: Symposium i anvendt statistik 2018

Bias når surveys opregnes med egen oplyst uddannelse mod registre

- hvor galt går, det når der opregnes efter to forskellige tilgange? Svar: Mega meget.

Peter Linde, Det Nationale Forskningscenter for Arbejdsmiljø

Indledning

For at starte med konklusionen: I denne artikel vises, at opregning i en dataindsam­ling af egen oplyst uddannelse mod registeroplysningen, er en metodisk forkert løs­

ning på det skæve bortfald og giver en ny stor bias. Ingen dataindsamling er bedre

end sit svageste led - og kvaliteten er en samlet vurdering af alle vigtige delkompo­

nenter. Der er de t re vigtigste metodiske og kvantitative fejl, man altid skal kigge

efter. For det første om udtrækket er repræsentativt, for det andet om dataindsam­

lingen er med høj opnåelse og kvalitet og endelig for det tredje, om man dokumenter den forskel, der er mellem stikprøve og population og regner rigtigt op herfor. Dette

indlæg vil sætte fokus på det sidste - opregningen for skævheder i bortfaldet, men

også berøre de to andre kilder til metodiske fejl i en dataindsamling. Udover de tre

mere kvantitative kilder til kvalitet eller mangler herpå, er det også den mere bløde:

hvordan er spørgsmålene stillet og svarene præsenteret, så det er muligt for respon­

denten at svare dækkende herpå. Den fjerde fejlkilde er ofte lige så vigtigt som de tre metodiske om kvalitet af dataindsamlingen.

I ovenstående tilgang vil nogle efterlyse antal svar statistikken bygger på. Det er også

vigtigt, men det siger kun noget om sikkerheden og ikke noget om repræsentativite­

ten og kvaliteten af undersøgelsen. En dataindsamling, hvor stikprøven fx er udvalgt

fra et selv-rekrutteret panel og bygger på e-mails, eller spørgsmålet er forkert stillet,

bliver aldrig repræsentativ eller mere repræsentativt, fordi man spørger mange. Det

eneste man her kan være sikker på, hvis man i en ikke repræsentativ undersøgelse

spørger flere er, at undersøgelsen mere sikkert bliver forkert, fordi det så alene er

biasen og den manglende repræsentativitet, der dominerer, og den går altid en sy­

stematisk retning, fx overvurderer indkomsten (hvor den tilfældige fejl pga. stikprø­

vestørrelsen kan gå i begge retninger). Den tilfældige fejl bliver mindre, jor flere man

spørger - dog kun med kvadratrodens kraft. Så en stikprøve på 100 kan have en til­

fæld ig fejl på op til 10%, 400 en t ilfældig fejl på 5% og 1.600 en tilfældig fejl på 2,5%.

Altså en reduktion af den tilfældige fejl med en faktor 2, når man øger stikprøven

med en faktor 4. Bag disse ofte brugte nøgletal ligger matematiske form ler, der for­

udsætter at stikprøven er valgt simpelt tilfældigt fra hele population. Det er stikprø­

ven fx ikke i et selv-rekrutteret web-panel, så her holder disse formler ikke, selv om

de ofte antages af være universelt gældende. En stikprøve opfylder ikke altid og au­

tomatisk de videnskabelige krav til variansberegninger i en simpel tilfæ ldig stikprøve i

hele populationen.

13

Page 23: Symposium i anvendt statistik 2018

Universal repræsentativitet

Nogle stikprøver, der udtaler sig om hele befolkningen, opfylder opregnet fordelingen

af alder, køn og geografi, som de oplyses af Danmarks Statistik. Det er selvfølgeligt

fint, men langt fra t ilstrækkeligt for at sikre en undersøgelse er repræsentativ. Ind­

komst, uddannelse eller socioøkonomisk status er tre vigt ige variabler, der også skal

fordele sig rigtigt, men også etnisk baggrund, familietype eller boligform skal der væ­

re tjek på. Udover disse variabler kan man fortsætte listen i det uendelige, og så er

der de variabler, der knytter sig til den konkrete undersøgelse. Det er umuligt at måle

kendskab til sundhedskampagner korrekt, hvis alle der spørges kommer fra et web­

panel, der er meget på internettet, hvor disse kampagner i stigende omgang føres.

For slet ikke at tale om kommercielle undersøgelser om omfanget og præferencer for

web-handel. Eller en transportundersøgelse, der ikke har medtaget nok af de, der bruger kollektiv transport mest. Et sidste eksempel er sundhedsundersøgelser om

rygning, fedme eller forbrug af alkohol, hvor en lav opnåelse ofte betyder, at man

underestimerer omfanget, ford i man ikke er kommer langt nok ud t il alle i udtrækket,

hvis man kun kontakter få gange. Faktisk vil en faldende opnåelse her understøtte, at

sundhedstilstanden i Danmark vil se bedre ud end den er. Så kravene t il en faglig god

undersøgelse har betydning for, om kravet om 5% af en ny årgang ryger ser ud til at

være opfyldt.

Selvfølgelig kan man indsnævre kravet til repræsentativitet ti l at køn, alder og geogra­

fi passer med de officielle tal fra Danmarks Statistik. Det er også let at lave, da de ud­

valgte uden problemer kan svare på deres køn, alder og hvor de bor, sådan som

Danmarks Statistik også måler det. Det bliver lidt svære, når det kommer til etnisk

baggrund, fam ilietype eller boligform. Og det bl iver let umuligt, når det kommer til

indkomst, uddannelse eller socioøkonomisk status. Heldigvis er der en anden mulig­

hed end at basere kravet om repræsentativitet til de udvalgtesegne oplysninger. Alle,

både universiteter, forskningsinstitutioner og private firmaer, kan få udvalgt en sim­

pel tilfældig stikprøve fra CPR-registeret til statistiske undersøgelser i henhold til Per­

sondataloven og CPR-loven. Kommercielle undersøgelser er også en statistisk under­

søgelse. Så det er alle typer statistiske undersøgelser, hvor myndighederne kun stiller helt almindelige og rimelige krav til datafortrolighed og sikkerhed, når man modtager

en stikprøve fra CPR. Krav alle private og offentlige virksomheder let kan opfylde, fx

om password til computeren og logning af brugen af data, samt kun afgang for sær­

ligt autoriserede personer.

I de gode gamle dage, hvor besøgsinterview var det gældende, blev stikprøverne ud­

valgt repræsentativt fra vejviseren over husstande. Senere, da over 90% af husstande

have en fasttelefon, blev t ilfældig udvælgelse af telefonnumre rammen om udvælgel­

ser, der delvist opfyldte kravene til repræsentativitet. Med de mange tjeneste- og

14

Page 24: Symposium i anvendt statistik 2018

taletidsmobiltelefoner er det ikke længere muligt. Men CPR-registeret er for alle ind­

samlere af statistiske undersøgelser muligheden for at sikre repræsentativitet, og i

dag reelt den eneste mulighed for en repræsentativ udvælgelse af personer. En sim­

pel tilfældig udvælgelse af personer fra CPR vil samtidig være universal repræsenta­

tiv, dvs. at den for alle variabler vil ligge tæt på populationen med den sædvanlige

stikprøvesikkerhed, der afhænger af stikprøvestørrelsen.

Et eksempel på en universal repræsentativ undersøgelse er Danmarks Statistiks bor­

gerundersøgelser, hvor omkring 1.500 personer i alderen 15-74 år udvælges hver

måned simpelt tilfældigt fra CPR-registeret. 43.500 er udvalgt i perioden 2015-medio

2017 og 25.025 har svaret, svarende til 58% af de udvalgte. Det er en universal re­

præsentative stikprøve på 43.500, hvilket på enkel måde kan dokumenteres. Uanset

hvilken variabel der sættes fokus på, skal stikprøven på 43.500 kun afvige med højst 0,6% fra populationen af hele den voksne befolkning på 15-74 år. Det gælder både

kategoriske variabler som uddannelse og kontinuerte som indkomst.

Tabel 1. Universal repræsentativitet af køn, alder og geografi

Population Udvalgt Forskel

Mænd 50,2% 50,2% 0,0%

Kvinde 49,8% 49,8% 0,0%

15-19 år 6,6% 6,5% 0,1%

20-29 år 17,6% 17,4% 0,2%

30-39 år 15,9% 16,0% 0,1%

40-49 år 18,7% 18,8% 0,1%

50-59 år 18,1% 18,4% 0,3%

60-74 år 23,1% 22,9% 0,2%

Nordjylland 10,3% 10,5% 0,2%

Midtjylland 22,6% 22,7% 0,1%

Syd Danmark 21,0% 21,0% 0,0%

Hovedstaden 31,7% 31,5% 0,2%

Sjælland 14,4% 14,3% 0,1%

I alt 100% 100%

Kilde: Præsenteret i Statistisk Forening, 2017

Som det fremgår af tabel 1 er afvigelsen mellem populationen og stikprøven nogle få

promiller og klart inden for de op til 0,6%, som følge af det klassiske statistiske sik­

kerhedsinterval på 95%. Da det er en simpelt tilfældig stikprøve fra CPR, hvor man

kender køn, alder og geografi for alle udvalgte kunne man med en simpel t ilfældig og

15

Page 25: Symposium i anvendt statistik 2018

proportional stikprøve have sikret, at afvigelsen mellem stikprøve og population var

0,0%. En korrekt proportional fordeling er selvfølgelig en fin kvalitet ved en stikprøve,

men repræsentativitet betyder, at afvigelsen generelt højst må være 0,6% for alle

variabler med en stikprøve på 43.500 så lad os se, om dette universalt er tilfældet.

Tabel 2. Universal repræsentativitet af indkomst, uddannelse og socioøkonomisk

Population Udvalgt Forskel

Op til 50.000 kr. 20,7% 20,7% 0,0%

50-100.00 kr. 18,4% 18,1% 0,3%

100-200.000 kr. 20,1% 20,1% 0,0%

200-300.000 kr. 23,3% 23,4% 0,1%

Over 300.000 kr. 17,5% 17,7% 0,2%

Grundskole 33,4% 33,5% 0,1%

Gym" erhv. og KVU 43,8% 43,7% 0,1%

MVU 15,2% 15,2% 0,0%

LVU 7,6% 7,5% 0,1%

Selvstændig 3,7% 3,7% 0,0%

Lønmodtager 53,8% 53,7% 0,1%

Arbejdsløs 1,6% 1,6% 0,0%

Under uddannelse 12,2% 12,3% 0,1%

Pensionister 17,8% 17,7% 0,1%

Andre udenfor arb. 11,0% 10,9% 0,1%

I alt 100% 100%

Ikke specielt overraskende er alle forskelle under 0,6%, som de også skal være med

en universal repræsentativ stikprøve på 43.500 personer. Som nævnt kan man yderli­

gere forbedre den universal repræsentative stikprøve, ved at trække stikprøven sim­

pelt tilfældigt inden for strata, man søger for er 100% som populationen. Det sikrer

ikke den universale repræsentativitet, for den sikres ved at vælge simpelt tilfældigt

fra hele populationen, men det kan forbedre stikprøven og reducere afvigelsen mel­

lem populationen og stikprøven generelt. Dvs. ikke kun for de faktorer, der stratifice­

res proportionalt efter, men også andre der "trækkes" med. Her er køn, alder og geo­

grafi ikke de bedste variabler som "trækdyr". Det er indkomst, alder, uddannelse, et­

nicitet og familietype, samt evt. køn. I tabel 3 nedenfor fremgår det, at etnicitet og familietype også opfylder kravene til universal repræsentativitet - ligesom alle andre

16

Page 26: Symposium i anvendt statistik 2018

variabler, man kan finde på, vil. Det skyldes de universal repræsentative egenskaber

tilfældige stikprøve fra hele population har modsat fx selv-rekrutterede web-paneler.

Tabel 3. Universal repræsentativitet af familietype og etnicitet

Population Udvalgt Forskel

Enlig uden børn 27,8% 27,4% 0,3%

Enlig med børn 6,6% 6,6% 0,0%

Par med børn 31,4% 31,7% 0,3%

Par uden børn 34,2% 34,2% 0,0%

Danskere 86,9% 86,9% 0,0%

Indvandre og efterkommere 13,1% 13,1% 0,0%

I alt 100% 100%

'Model assisted' opregning - efterstratifikation

Efterstratifikation er en reparation af stikprøven efter den er trukket, hvor man med

vægte lægger stikprøven på plads i forhold t il populationen. Vægtene er af formen N/n, hvor N er antal i populationen i et understratum, og n er stikprøven i det samme

understratum. Det er en standardisering, der sikrer at stikprøven vægtet svarer til

populationen. Det er den samme type vægte, man danner, hvis man vægter for bort­

fald . Her vil afvigelsen dog ikke, som ved efterstratifikation ved universal repræsenta­

tive stikprøver kun være en korrektion for en tilfældig afvigelse, men vil også repare­

re for den bias bortfaldet i dataindsamlingen giver. Man kalder denne type opregning

for 'model assisted' fremfor 'model depended', fordi den kun korrigerer for noget,

der burde være tilnærmelsesvist opfyldt, hvis stikprøven var repræsentativ og kun er

en standardisering, der ikke har nogle yderligere antagelser.

Bortfald og opregning

I tabel 1 til 3 er det dokumenteret de universale repræsentative egenskaber en sim­

pel tilfældig stikprøve fra hele populationen har i CPR registeret. Selv om opnåelsen i

Danmarks Statistiks borgerundersøgelse er så flot som 58%, er der stadig skævheder.

Skævheder der ville have været større, hvis ikke hele 58% havde svaret. Som det

fremgår af tabel 4, er bortfaldet højst blandt de unge, laveste indkomster, grundsko­

le, uden for arbejdsmarkedet, ikke danskere og enlige. Helt generelt betyder det at

bortfaldet er socialt i selv en undersøgelse med 58% i opnåelse, og derfor skal rettes

op for ikke at male et for rosenrødt billede.

17

Page 27: Symposium i anvendt statistik 2018

Tabel 4. Opnåelse af variablerne i tabel 1-3

Population Udvalgt Opnåelse

Mænd 50,2% 50,2% 48,7%

Kvinde 49,8% 49,8% 51,3%

15-19 år 6,6% 6,5% 6,0%

20-29 år 17,6% 17,4% 12,5%

30-39 år 15,9% 16,0% 13,8%

40-49 år 18,7% 18,8% 18,7%

50-59 år 18,1% 18,4% 20,5%

60-74 år 23,1% 22,9% 28,4%

Nordjylland 10,3% 10,5% 10,9%

Midtjylland 22,6% 22,7% 23,8%

Syd Danmark 21,0% 21,0% 21,8%

Hovedstaden 31,7% 31,5% 29,1%

Sjælland 14,4% 14,3% 14,4%

Op til 50.000 kr. 20,7% 20,7% 15,6%

S0-100.00 kr. 18,4% 18,1% 14,9%

100-200.000 kr. 20,1% 20,1% 21,1%

200-300.000 kr. 23,3% 23,4% 27,1%

Over 300.000 kr. 17,5% 17,7% 21,3%

Grundskole 33,4% 33,5% 26,7%

Gym" erhv. og KVU 43,8% 43,7% 45,8%

MVU 15,2% 15,2% 18,4%

LVU 7,6% 7,5% 9,1%

Selvstændig 3,7% 3,7% 4,0%

Lønmodtager S3,8% 53,7% 57,7%

Arbejdsløs 1,6% 1,6% 1,3%

Under uddannelse 12,2% 12,3% 10,9%

Pensionister 17,8% 17,7% 19,4%

Andre udenfor arb. 11,0% 10,9% 6,7%

Enlige uden børn 27,8% 27,4% 21,5%

Enlige med børn 6,6% 6,6% 5,7%

Par med børn 31,4% 31,7% 37,2%

Par uden børn 34,2% 34,2% 35,6%

Danskere 86,9% 86,9% 91,6%

Indvandre og efterk. 13,1% 13,1% 8,4%

18

Page 28: Symposium i anvendt statistik 2018

Følgende eksempel illustrerer effekten af bortfaldet og en metodisk korrekt opreg­

ning af disponibel indkomst, der findes oplyst i registeret for alle i populationen:

Indkomst i populationen i kroner: 210.235

Indkomst i den universal repræsentative stikprøve på 43.500: 210.735

Indkomst blandet de 25.025 svar (efter bortfald): 228.641

(0,2%)

(8,8%)

Den universale repræsentative stikprøve ligger tæt på populationen, som den skal.

Afvigelsen er 0,2% og ligger klart inden for sikkerhedsintervallet med 43.500 udvalgte

enheder. Bortfaldet betyder en overvurdering af indkomsten på 8,8%. Et helt typisk

resultat der viser, at det er den bedre stillede del af befolkningen, der oftere svarer,

og at man vil opnå en bias, der er mange gange større end den statistiske usikkerhed,

hvis man ikke korrigerer herfor. Bortfaldsbiasen har været svagt stigende i dette årtu­

sinde. For 25 år siden var bortfaldsbiasen mht. indkomst ca. 7%.

Hvis man opregner for bortfaldet ved at standardisere stikprøven på plads efter de

registeroplysninger, der kendes for de udvalgte personer, fås følgende resultater:

De 25.025 svar standardiseret efter:

Køn: 229.592 (9,2%)

Køn og alder 220.624 (4,9%)

Køn, alder og region 221.167 (5,2%)

Køn, alder, region og uddannelse 215.150 (2,3%)

Biasen bliver generelt mindre og nærmer sig det sande populationsgennemsnit på 210.235. Standardisering for køn rykker ikke meget, men da mænd svarer lidt mindre end kvinder og generelt tjener mest, stiger den standardiserede indkomst lidt. Når alder også inddrages i standardiseringen kommer indkomsten tættere på det rigtige, og ligger 4,9% over den faktiske indkomst i hele populationen. Det skyldes det højere bortfald blandt yngre personers og deres generelt lidt mindre indkomst. Region flytter ikke noget givet der er korrigeret for køn og alder. Dette er også et generelt resultat -standardisering for køn, alder og geografi er langt fra tilstrækkeligt til at sikre resulta­ter, der ligger tæt på forholdene i hele populationen. Der skal variabler med, der ind­drager sociale forhold. Ved at inddrage fx uddannelse, kommer man betydelige nær­mere det rigtige tal. Biasen bliver mere end halveret. Hvis man inddrager alle de vari­abler, der er medtaget i tabel 1-3, men ikke indkomst (der er målingsvariablen i dette eksempel) bliver biasen på under Yi procent. Altså på niveau med den t ilfældige stik­prøvefejl. Opregning med uddannelse mv. kræver adgang til flere variabler end der findes i CPR registeret. Opregning med uddannelse og andre sociale variabler er muligt på For-

19

Page 29: Symposium i anvendt statistik 2018

skerordningen på Danmarks Statistik, hvor der kan dannes vægte og opregningstabel­ler, der kan bruges i kvalitetssikringen af analyserne og resultaterne, så man ikke ma­ler et for rosenrødt billede af den virkelige verden.

Man kunne overveje at opregne for uddannelse med at opregne for respondenternes egen oplyste uddannelse mod den, der findes i registeret. Det afgørende problem er her, ikke om registeroplysningen er rigtige eller ej, for det er som nævnt ovenfor en 'model assisted' opregning, der ikke giver bias, hvis det gøres korrekt. Det, der er pro­blemet, er at det forudsættes, det er den samme oplysning, der standardiseres efter, når vægtene N/n dannes, hvor N er antal i populationen i det konkrete understratum, og n er antal i stikprøven. Det, der er problemet er, at 'N' og 'n' tælles og er defineret forskelligt. Så lad os se på denne forskel.

Egen oplyst uddannelse og registeruddannelse

De 25.025 respondenter, der har svaret, er blevet bedt om at oplyse deres højeste

uddannelse. 76,2% svarer den uddannelse, de også er registret med i registeret.

17,6% oplyser en højere uddannelse, end de er registeret med og 6,1% en lavere ud­

dannelse end de er registret med. Der er ikke noget specielt overraskende i, der er en

forskel, men måske hvor stor den er. Noget af forskellen kan skyldes respondenter­

nes kendskab til de forskellige formelle definitioner, men det er ikke hele forklarin­

gen. Man kan have en ansættelse på en overenskomst eller i en stilling, hvor der for­

udsættes kompetencer svarende til et bestemt uddannelsesniveau. Det gælder både i

den private og offentlige sektor, fx en IT eller økonomi stilling. Kompetencerne kan

være skaffet gennem videreuddannelse eksternt eller internt, samt selvstudium og

oplæring på job. Disse kan have arbejdet mange år i en sådan stilling som ekspert og vil med god grund opfattes som uddannede - både af dem selv, deres kollegaer og

arbejdsgiver - selv om de i registeret står registeret med en fx grunduddannelse.

Modsat er der også nogle, der står registeret med en uddannelse i registeret de ikke

bruger, fx en faglig uddannelse. Samlet er det ikke noget overraskende i, der er for­

skel mellem egen oplyst uddannelse og den formelle i registeret. Brugt og fortolket

korrekt vil det heller ikke give problemer i analysen. Men brugt forkert i opregningen

vil det kunne give øget bias. Lad os først se på den konkrete forskel mellem egen op­

lyst uddannelse, og den der er registret i registeret.

20

Page 30: Symposium i anvendt statistik 2018

Tabel 5. Egen oplyst uddannelse og den uddannelse, der er i registeret. Procent.

Register: Grundudd Gym. og KVL MVU LVU I alt

Egen oplyst:

Grunduddannelse 16,4 3,9 0,3 0,1 20,8

Gymnasial og KVL 8,3 36,3 1,5 0,1 46,1

MVU 1,2 4,8 14,8 0,2 21,1

LVU 0,8 0,8 1,7 8,7 12,0

I alt 26,7 35,8 18,4 9,1 100

Bemærkning: Med rødt der hvor egen oplysningen er mindre, med blåt hvor den er større og med sort der hvor den er ens.

Opregning med egen oplyst uddannelse mod registre - og konklusion

Tabel 5 viser hvilke forskelle der er. Spørgsmålet er hvilken bias det giver anledning

til, hvis nævneren i vægten bestemmes af egen oplysningen og tælleren bestemmes

af registreringen i populationen.

Indkomst i populationen

Indkomst i den universal repræsentative stikprøve Indkomst blandet de 25.025 svar (efter bortfald):

Køn, alder, region og uddannelse (registerdefineret)

Køn, alder, region og uddannelse (register/egen oplyst)

210.235

210.735 228.641

215.150

200.047

(+0,2%)

(+8,8%)

(+2,3%)

(-4,8%)

Generelt gælder det, at opregningen med alle de sociale variabler (på nær indkomst)

vil bringe biasen ned på under Yz procent. Køn, alder, region og uddannelse opregnet

korrekt giver en bias på 2,3%. Generelt kommer man tættere og tættere på den rigti­

ge register indtægt - oppefra - når flere sociale variabler inddrages, tilsvarende at

graden af forklaret variation bliver større og større, når flere signifikante variabler

inddrages. Når der opregnes forkert med uddannelse (vægten defineres i tælleren,

som antal i populationen med det give uddannelsesniveau og nævneren, som egen

oplysningen blandt de der svarer) rykker man forbi det rigtige og fordobler den sam­

lede bias.

Samlet er konklusionen, at det umiddelbart giver en større bias at opregne forkert

efter uddannelse. Det er muligt ud fra Forskerordningen på Danmarks Statistik at

danne tabeller, der kan bruges eksternt uden for Danmarks Statistik. Man kan estime­

re fordelingen af egenoplysningen af uddannelse i hele befolkningen vha. korrekte

opregningsmetoder, der inddrager registeroplysningen af uddannelse og andre sociale

faktorer. Sådanne tabeller vil ikke have nogen metodisk bias, og analyser vil kunne

optimere deres robusthed mht. kvalitetssikring af dataindsamlingen.

21

Page 31: Symposium i anvendt statistik 2018

22

Clustering af fjernvarmevekslerstationer for Affald Varme Aarhus

Alexander Martin Tureczek

Systems Analysis, DTU Management Engineering

Smarte fjernvarme målere, til forbrugsmåling, er i det sidste årti blevet udrullet i Danmark. I modsætningen til elmåler data er der meget få studier som har analyseret fjernvarmeforbrugsdata. Dette studie kobler læring fra analyse af smarte elmåler data til at analysere fjernvarmedata. Med udgangspunkt i data fra vekslerstationer i Affald Varme Aarhus forsyningsnet, studiet har 2 mål at undersøge om fjernvarme data kan analysere med samme metoder som anvendes til analyse af elmåler data samt undersøge mulighederne for at forbedre den fremherskende clustering metode K-means

Page 32: Symposium i anvendt statistik 2018

Intelligent, flexible production in an energy system dominated by renewables

Erik Lindstri:im Centre for Mathematical Sciences

Lund University, Sweden

Abstract It is well known that the electricity price varies between days but also within

the day, sametimes even resulting in negative prices. Flexible production units could take ad van tage of those variations by adapting the production accordingly. However, few studies exists that compare the cost effectiveness of different types of production units while taking these within day price variations into account.

This paper studies intelligent, flexible production units in comparison to statically scheduled units in the context of the Scandinavian power system. Specifically, we study production of ammonia, a chemical process that only uses air, water and electricity. The ammonia can be used as fuel or to produce nitrogen fertilizers. We derive optimal production strategies for such unit using Model Predictive Control.

The strategies were evaluated on historical data from Nord Pool. We found that the Nord Pool region in general and DKl in particular is well suited for production of ammonia, due to the relatively low cost for electricity. Further­more, the results improve substantially when considering flexible strategies that can benefit from the variability in the electricity price.

1 Introduction

The agricultural sector has undergane an almost unparalleled transformation during the last century. A substantial part of the workforce was working in the sector in the beginning of the 20th century, but the introduction of intensive agriculture with mechanization, pesticides and fertilizers popularized during the green revolution in the middle of the 20th century have completely transformed the sector. Only a small fraction of the workforce is occupied in the sector today, but it produces much more than ever befare - and the increased production is crucial as the worlds population keeps growing. It is estimated that about ane third of the protein in humanitys diet depends on mineral nitrogen fertilizers Smil [2004].

Today, about 1.2 % of all global energydemand is being used to production of nitrogen fertilizers IFA [2014]. The production of nitrogen fertilizers aften starts with production of ammonia, which itself is aften used as fertilizers in the US corn belt. The product ion is so important that ammonia is traded as a commodity in financial markets, see Figure 1.

23

Page 33: Symposium i anvendt statistik 2018

However, ammo­nia is hazardous in its concentrated form, and is subject to strict reporting re­quirements by fa­cilities which pro­duce, store, or use it in significant quan­tities. Ammonia is therefore often used as a precursor to produce nitrogenous compounds that are easier and safer to handle and distribute for t he end user, which is how nit ro-

~

140 00

120 00

100 00

80 00

60 00

40 00

20 00

0 1940 1950 1960 1970 1980 1990 2000

Figure 1: Global production of ammonia.

gen fertilizers typically are used in the EU.

2010

Almost all prod uction of ni tro gen fertilizers are bas ed on fossil feedstock such as coal and natura! gas. China is predominantly using coal for the production, with a low energy efficiency of 59 GJ/ metric tonne of ammonia, while the rest of the world often uses natura! gas with an energy efficiency ranging from 27 to 58 GJ/ metric tonne, with an average of 37 GJ/ metric tonne Smil [2004] . Producing nitrogen based fertilizers from fossil fuel will lead to substant ial greenhouse gas (GHG) emissions and other waste produets, while electrolysis based production is substantially clearer. Tallaksen et al. [2015] studied ammonia production, partly powered by wind power and partly by the surrounding energy system. Their life cycle analysis, comparing production in a fossil fuel heavy energy system in Minnesota, US to an energy system dominated by renewables and nuclear power in Sweden showed that substantial reductions in GHG emissions are possible, about 80 % in the US and up to 93-96 % in Sweden.

Large scale industrial production of ammonia via electrolysis was performed at the Vemork hydroelectric plant in Norway between 1911-1971, but was eventually closed down due to lacking cost effectiveness. The economic conditions for producing ammonia using biogas, biomass and electrolyzers were compared in Tunå et al. [2014] with only the biomass process being cost efficient compared to market prices. That study was based on conditions prevailing in the US, and more importantly, does not take into account the price structure found in the Scandinavian power market, Nord Pool.

Specifically, the price of electricity is, on average, lower in Scandinavia (say 80 USD / MWh in the US vs 35 EUR/ MWh at Nord Pool). That difference would clearly shift the con­clusions in favour of electrolysis based production of ammonia, although it may not be enough to make it cost competitive. Furthermore, the electricity spot price at Nord Pool is highly variable Lindstriim and Regland [2012], Weron [2014], and is expected to be­come even more so when additional renewable energy is integrated into the energy system

24

Page 34: Symposium i anvendt statistik 2018

Lindstriim et al. [2015b]. A related study, Morgan et al. [2014] considered wind power ammonia fuel production for remote islands, and found that it can be profitable under certain conditions. Similarly, Du et al. [2015] showed that solar power may provide an attractive alternative for ammonia production, based on a case study in Indiana, US, by converting a small fraction of the area used for corn production in to ammonia production.

We study, using the technical and economical specification considered in Tunå et al. [2014], how an electrolysis production unit located in Scandinavia with smart control utilizing the variations in the spot price, would compare to the market price of ammonia.

2 Production of ammonia

Early production of ammonia used sodium nitrate as the main natura! resource. This was often referred to as Chile saltpeter as the majority was mined in South America. This changed when Fritz Haber and Carl Bosch invented the Haber-Bosch process in 1909, which they also patented in 1910. That process is still used for industrial scale production of ammonia, with a yearly production of 176 million tonnes in 2014.

2.1 Haber-Bosch process

The Haber-Bosch process, see Dybkjaer [1995], is a process for converting atmospheric nitrogen N2 by a reaction with hydrogen H2 to ammonia N H3 using a metal catalyst under high pressures and temperatures. The chemical process is given by

This is implemented in several steps, the first being to use an electrolyzer to extract H2 from water. This can be done using alkaline electrolysis or proton exchange mem­brane electrolysis, with the former being commercially more mature. The parallel step is to extract N2 from air using a membrane. The hydrogen and nitrogen gases are then compressed and heated as syngas and combined in the ammonia converter. This process also generates excess heat that can be recycled, but we have omitted that possibility in our analysis.

2. 2 Cost analysis

The investment cost for an electrolysis ammonia production unit are, according to the economic analysis in Tunå et al. [2014] dominated by electrolysers and syngas compressors. They consider two cases, a 3 MW electrolysis unit and a 10 MW electrolysis unit, where the investment costs were estimated at 10 182 kUSD and 29 034 kUSD respectively. The expected life span for both units is 15 years, and they use a discounting rate of 8 %.

Turthermore, Tunå et al. [2014] assume that their production units are running 8000 hours per year , leading to a production of 2030 and 6760 Metric tonnes respectably of ammonia each year. They assume that the electricity price is modelied by iid draws from a beta-PERT distribution, with a minimum price of 40.1 USD/ MWh, most likely price of

25

Page 35: Symposium i anvendt statistik 2018

80.1 USD/MWh and maximum price of 120.2 USD/ MWh. Thus, their total electricity cost for the life span 28 836 kUSD and 96 120 kUSD respectively.

There is little empirical support for using an interest rate of 8 % these days, cf. Lind­strom et al. [2015a] as most government bands have negative rates. Infact, recent research papers in portfolio management aften use a rate of 0 %, see e.g. Nystrup et al. [2017] .

We also find their iid assumption regarding the electricity spot price questionable, knowing that the price have seasonal patterns on a yearly, weekly and daily scale, is heteroscedastic and heavy tailed, see Weron [2014] for an overview stylized facts.

3 Ammonia production in a modern Scandinavian context

The Danish power system used to rely on fossil fuel (coal, oil and natura! gas), but is rapidly integrating more renewables such as wind and solar power. 42 % of all electricity was produced using wind power in 2015, and this number is expected to rise substantially in a near future, with a political goal of reaching 100 % by 2050. Furthermore, the pro­duction in Fyn and Jutland (regional area DKl where most of the renewable production is located) was enough to cover the needs for all of Denmark during 1460 of 8760 hours in 2015. Integrating additional renewables will reduce the GHG emissions further , but will also lead to even larger variat ions (including at times negative prices) in the electricity price, Lindstrom et al. [2015b].

The backbone of the Swedish power system is hydrapower (53 %) and nuclear power (40 %) resulting in a predictable power supply. The current trend is a reduction of the share of nuclear power and increased share of renewables such as wind and solar power.

3.1 Nord Pool market structure

The majority of all produced electricity in Scandinavia is traded on Nord Pool. The mar­ket has grown from only including Norway to now encompass all Scandinavian countries, the Baltic countries and also same operations in Germany and the UK. It is current ly the !argest electricity market in the world, with a traded volume of 505 TWh in 2016, NordPool [2016].

The market requires all market participants to submit bids for the upcoming day befare noon, see NordPool [2009]. All bids are compiled by Nord Pool who sets the price by balancing demand and supply. Nord Pool then publish the price for the upcoming day at 13:00. That means that the price is completely known at 13:00 for the rest of that day and the next day.

3.2 Static production scenario

In t his scenario, we mimic the production strategy in Tunå et al. [2014]. They run their plants at ful! capacity for 8000 out of 8760 hours each year. There is no reason to consider time varying operations as the price is assumed to be iid t hroughout the year. We implement their strategy, which we refer to as 'Static' by running the production unit

26

Page 36: Symposium i anvendt statistik 2018

8760 hours per year, but at (8000/8760) of the maximum capacity, making our results comparable to Tunå et al. [2014]. We also run the production at maximum capacity all year round, a strategy we refer to as 'Static max'.

3.3 Flexible production scenario

It was argued in Morgan et al. [2014] that production can be varied with the input power, as lang as the electrolyzers are hot. That means that it is possible to take advantage of the varying electricity price, given same constraints that enforces physical !imitations of the production units.

We are interested in maximizing the profit for every day, indexed by n, by using Model Predictive Control (MPC) Garcia et al. [1989], Nystrup et al. [2017]. MPC approximates an intractable optimal contra!, with a problem than can be solved using convex optimiza­tion. The solution is implemented and run until new information become available, after which a new optimization problem using that additional information is solved.

Let x1 be the input power in MW, s1 the spot price for electricity measured in EUR/MWh, p is the price of ammonia in EUR/ metric tonne, c the conversion from electricity to ammonia and finally f being the fixed investment cost for the production unit translated to a daily scale. Our MPC problem, solved at noon when the spot price is known for the rest of the day and tomorrow is then given by

(n+1)24

xn =argmax 2= x1(c ·p -s1) -t xEX t={n-1)24+13

This optimization problem is solved subject to constraints:

XMin :":: Xt :":: XMax

[xt - Xt- 1[ :":: XRamp

% keep the unit operational

% limit the change per hour

(1)

(2)

(3)

The ramping constraints appy both within day, and between days, meaning that the first values of the new solution to the optimization problem must be reasonably similar to the final values of the previous optimal solution. The optimization problem is a convex linear program, ensuring that it can be solved arbitrarily accurate using standard numerical methods. We refer to this strategy as 'MPC'.

Finally, the realized profit is computed by only using the first 24 hours of that solution, as the rest is revised when new information arrives at noon.

n·24+12

Profitn = L xr(c· p - s1) - f t=(n-1)-24+13

4 Empirical results

(4)

We start by recalling that Tunå et al. [2014] concluded that production of ammonia through electrolysis was economically infeasible. They report that the average price of

27

Page 37: Symposium i anvendt statistik 2018

ammonia in January 2013 was 976 USD per metric tonne, while their reported cost for producing ammonia via electrolysis is around 1700 USD per metric tonne.

The ammonia price declined throughout 2013, suggesting that a conservative price around 900 USD per metric tonne would be a reasonable average price for all of 2013. The exchange rate in 2013 was approximately 0.75 EUR/USD, leading to a price of 675 Euros per metric tonne.

This study consider the 10 MW production unit, leading to XMax = 10. It is also assumed that XMax = 2, which was needed to keep the unit hot enough for production. Finally, we implement a ramping constraint of 2 MW per hour. The results are robust across a wide range of ramping constraints in our simulation study.

All strategies are evaluated on data from DKl, western Denmark where most of the re­newable production is located, DK2, eastern Denmark including the capital, Capenhagen and SE4, southern Sweden with a mix of consumption and production. The results when evaluating the strategies using Nord Pool market data from 2013 is presented in Figure 2.

'• I , •' '~",,,"

\ • I I •

i. r. I

V I • · !• ,.,,,.._ / lf \ I i) vu

\ \ \ \

\ i {vi

•"~ • E;:;;,,J

Figure 2: Cumulative profit during 2013 for the MPC (solid line), Static max (dashed) and Static (dash-dotted) strategies in the DKl, DK2 and SE4 markets.

There are several conclusions that can be drawn from those graphs. It is hardly sur­prising that the MPC strategy is consistently outperforming the Static strategy but we note that it is also performing bet ter than the Static max strategies, in spite of producing less ammonia with identical fixed investment costs. The actual production of the flexible strategy is slightly larger than that of the static strategy, but definitely Jess than the static max strategy.

However, the most important conclusion is that electrolysis may be profitable in Scan­dinavia, even when performing the calculations with a lower price of ammonia than used by Tunå et al. [2014]. The result is particularly strong in the DKl area where the energy system is dominated by wind power and the spot price varies significantly, but even DK2 offers positive returns. The electricity spot price and corresponding production for DKl is presented in Figure 3, where it can be seen that the product ion is essentially capped

28

Page 38: Symposium i anvendt statistik 2018

for high prices, even though the ramping constraint and min or max constraints limit the fiexibility. It is also clear t hat the unit can take ful! advantage of negative prices.

Jan Mar Apr Jun Aug Sep Nov

Jan Mar Apr Jun Aug Sep Nov

Figure 3: Electricity spot price (top) and allocated production (bottom) for the DKl area during 2013.

The robustness of the results can be evaluated by bootstrapping the yearly profit. Here

we restrict ourselves to comparing the MPC and the static max strategies as they per­formed better than the static strategy. The bootstrapped profit is presented in Figure 4.

The MPC strategy performed better for all markets, both in terms of expected profit,

but also in terms of having Jess risk, cf Nystrup et al. [2017].

5 Conclusions

We considered a flexible strategy for production of ammonia in an energy system with

a high penetration of renewable energy. The optimal strategy was derived using Model

Predictive Control, and performed substantially better than static strategies commonly considered in the literature.

It was found that electrolysis based production of ammonia can be cost efficient, in particular in the DKl area where it can profit from rapidly varying prices. This is in­

teresting as the price dynamics in which the Vemork production unit was operating was

29

Page 39: Symposium i anvendt statistik 2018

x105 DK1 x104 DK2 x104 SE4

+ 18 T ~ I 12

T I I T

I I T I I I 16 I I I I I I

I I 10 I 2.5

8 I I I I 14 I I I I I

D I I I

I

8 I

12 I I I

I I

~ I

~ I

1.5 I 10

B I

J..

I I I I I I I I I I

0.5 I I I I I I I I I I I I I I I I I I I I I I I I I I J.. I I I I I J.. I

-0.5 I I I I I I

J.. J.. -2 J.. -2

MPC Static max MPC Static max MPC Static max

Figure 4: Bootstrap cumulative profit for the MPC and Static max strategies for all the areas.

much calmer, a faet that may have contributed to the lacking cost effectiveness for that unit.

We would like to point out that these results were obtained using a conservat ive esti­mate of the ammonia price and without recycling of the excess heat generated during the electrolysis in the Haber-Bosch process, suggesting that the results are exhibit some ro­bustness. However, we are also well aware that a large drop in the ammonia price (which in itself is closely linked to the price of natura! gas) probably would make the electrolysis process unprofitable yet again.

Acknowledgement

The work was supported by the EU Interreg Smart Cities Accelerator project.

References

Z Du, D Denkenberger, and Joshua M Pearce. Solar photovoltaic powered on-site ammonia production for nitrogen fertilization. Solar Energy, 122:562- 568, 2015.

Ib Dybkjaer. Ammonia production processes. Springer, 1995.

Carlos E Garcia, David M Prett, and Manfred Morari. Model predictive control: theory and practicea survey. Automatica, 25(3):335-348, 1989.

30

Page 40: Symposium i anvendt statistik 2018

IFA. Ammonia production: Moving towards maximum efficiency and lower GHG., 2014. URL http: I /WWTii. fertilzer. org.

Erik Lindstrom and Fredrik Regland. ModeJing extreme dependence between european electricity markets. Energy Economics, 34(4):899- 904, 2012.

Erik Lindstrom, Henrik Madsen, and Jan Nygaard Nielsen. Statistics for Finance. Chap­man and Hall/CRC, 2015a.

Erik Lindstrom, Vicke Noren, and Henrik Madsen. Consumption management in the Nord Pool region: A stability analysis. Applied Energy, 146:239-246, 2015b.

Eric Morgan, James Manwell, and Jon McGowan. Wind-powered ammonia fuel produc­t ion for remote islands: A case study. Renewable Energy, 72:51- 61, 2014.

N ordPool. The nordic electricity exchange and the nordic model for a liberalized electricity market. Nord Pool Spot, Norway, 2009.

NordPool. Ab out us, 2016. URL https : //WWVI . nordpoolgroup. com/ About-us/ .

Peter Nystrup, Henrik Madsen, and Erik Lindstrom. Dynamic portfolio optimization across hidden market regimes. Quantitative Finance, pages 1- 13, 2017.

Vaclav Smil. Enriching the earth: Fritz Haber, Carl Bosch, and the transformation af world food production. MIT press, 2004.

Joel Tallaksen, Fredric Bauer, Christian Hulteberg, Michael Reese, and Serina Ahlgren. Nitrogen fertilizers manufactured using wind power: greenhouse gas and energy balance of community-scale ammonia production. Journal af Cleaner Production, 107:626- 635, 2015.

Per Tunå, Christian Hulteberg, and Serina Ahlgren. Techno-economic assessment of nonfossil ammonia production. Environm ental Progress fj Sustainable Energy, 33(4): 1290-1297, 2014.

Rafal Weron. Electricity price forecasting: A review of the state-of-the-art with a look into the future. International journal af forecasting, 30(4):1030- 1081, 2014.

31

Page 41: Symposium i anvendt statistik 2018

Predicting AIRBNB sales with Google searches in a customer journey context

Mads Zacho Krarup, Dept. ofDigitalization, CBS

Niels Buus Lassen, Dept. ofDigitalization, CBS

Rene Madsen, Dept. of Digitalization, CBS

Abstract This paper presents a predictive model of Airbnb sales in Copenhagen, build on a dataset with over a 33-months of Airbnb bookings in Copenhagen and related searches on Google. Moreover, geospatial patterns of the AIRBNB listings are detected by constructing a 100 m x 100 m grid cells in UTM 32 coordinates ofthe city of Copenhagen. The predictive models built are a stepwise regression model, which is compared with a neural network model, both performing well on training as well as on k fold cross validation data with an R2 value on 86%-96%.

1. Introduction AIRBNB is a Peer to Peer online marketplace and a homestay platform where travelers around the world, can hook short term accommodations in residential properties. AIRBNB is currently represented in 34,000 cities with above 2 million listings worldwide [l]. The equity funding had in June 2015 a total value of 2.3 billion USD, however, AIRBNB has so far kept the information about the total revenue and the number of reservations secret [2]. In Denmark, the govemment has had a liberal approach to AIRBNB, by allowing Danish residential to sublet their property without taxation to an amount of24.000 DKK per year [3].

Research Questions: I: To what extend can search engine data describe the customers journey process in

terms of AIRBNB bookings in Copenhagen?

2: What kind of geospatial patterns can be seen computing AIRBNB lis tings in a I 00

m. x 100 m. grid cells o/Copenhagen?

2. Briefly on the existing literature Forecasting in tourism enable business participant or destinations to allocate resources that meets the demand at a period of time [4]. Tourism produets are

32

Page 42: Symposium i anvendt statistik 2018

described as perishable and the demand have traditionally been predicted using time series or econometric analysis. However, research have concluded that no single method is superior to other models. It depends on the evaluation criteria and the data set employed that allow certain models to perform better than others (5). Some researchers have successfully used Holt- Winters method to forecast daily hotel room demands (6), while others have used dynamic linear models (7) or ARMA models with high predictive accuracy (8). With the increase of internet searching, researchers have successfully incorporated Google Trend data, to find a high correlation of queries from influenza patients and certain search words on Google Trend in order to generate an accurate forecasting model (9).

In terms of peer to peer research papers, conceming AIRBNB, relatively few has occurred despite the rise of the sharing economy. Analysis from Texas City concludes that low-end hotels, not marked against business customers, are vulnerable due to the shift in the pattems of accommodation booking (10).

3. The data and methodology

3.1 AIRBNB data The Airbnb dataset provided by AIRDNA (11) contains 20,5 million lines oflistings over a period of time from September 2014 to May 2017. Each line in the dataset represent a listing that has been marked with a status of being; (R) reversed, (B) blocked or (A) available on a given day.

The primary focus in this paper is to look at the reserved listings, which in total aggregate to 2.6 million reservations, with in other words is the amount of reservation days in the dataset. 11.391 unique property ID's are detected among the reserved listings. The dataset contains a verity of descriptive parameters, more specific 51 in total, are assigned to each reservation.

The data set host endless of opportunities for further research, however to specify this research the main focus will be on foliowing parameters; Sum of sales which is 288 million USD for the whole period and geospatial coordinates in terms of longitude/latitude.

33

Page 43: Symposium i anvendt statistik 2018

SM

4M

3M

2M

lM

OM

lWM 1-ll 1.ll l MU lWU 1-U 1.U lMU lWU 1-V 1.V lMV

Day ofWeek of Date

Figure I: Tableau output, AIRBNB revenue Copenhagen 288 mio USD From 26.10.2014 - 25.06.2017 [11].

3.2 Google search engine data Google trend is a public tool that provides an index of search terms relative to the total number of Google searches over time. The index can include multiple queries to obtain a relative comparison among them [7].

The use of google search quires are applied to describe the customer joumey

corresponding to the reservation of AIRBNB bookings. The idea is to use trave! related queries to build a predictive model relevant for the information searching process. More specific in terms of transportation options, accommodation alternatives and for general city information.

Travel related category Search Queries

• Accommodation "Hostel Copenhagen", " Hotel Copenhagen", "AIRBNB Copenhagen"

• Transportation "Flight Copenhagen", "Train Copenhagen", "Metro Copenhagen"

• City information "Travel Copenhagen", "Sightseeing Copenhagen"

Figure 2: Search Queries from Google Trend [12] .

The queries have moreover been translated into 13 languages representing the countries which had the highest number of overnights in Denmark in 2016 [13]. A preliminary analysis indicated that the highest explorative power in terms of building a predictive model is obtained by using English search quires worldwide.

34

Page 44: Symposium i anvendt statistik 2018

As Google trend only allows 5 queries in one Google Trend Search, the query with the highest index value ("Hotel Copenhagen") are used as a baseline, meaning that the query is always included when applying new search quires on google trend [12].

Travel related category Search Queries

• Language English

• Region Worldwide

• Search time 01.09.2014 to 30.06.2017

• Category All

• Dates when data gathered Weekly average

Figure 3: Search settmgs from Google Trend [12].

The Google Trend index time range from 01.09.2014 to 30.06.2017 due to the possibilities of time lag. This is one of the advantages of using search engine data, as they represent real-time consumer behavior [14].

100

90

BO

70

60 g ~ so

40

30

20

10

23 FeblS 24 AuglS 22Feb16

Week of Week of Date

22Aug 16 20Febl7

Figure 4: Tableau output, Preliminary Subplot ofGoogle Trend search quires from 20.10.2014 to 19.06.2017 [12].

Measure Names

T: flight copenhagen:

• T: hoste I copenhagen

• T: hotel copenhagen

• T: metro copenhagen

• T: traincopenhagen • T: traver copenhagen

Report from Emarketer states that 48.4% of the consumers in Canada and USA use search engine when ones begin to research about upcoming trips and 20.4 % uses property websites [15]. According to net market share, Google holds 75.94 % of the current search engine market share [16]. It would be preferred to include all search

engines; however, Google is the only company that allow the public to export data from their database. Furthermore, it would be ideal to have a higher percent of

usage of search engines, when they are planning research on a trip. However, is it believed that Google Trends with its !imitations can provide a significant contribution to the purpose of this paper. Despite the faet that Google Trends may

35

Page 45: Symposium i anvendt statistik 2018

contain approximation methods in the sample data, which leads to inaccuracies [5].

3.2 Data preprocessing and transformation In our data preprocessing we have used Alteryx Designer [17], IPython Jupyter notebook [18] and libraries from Scitools [19], Osgeo [20], Sharpely [21] and Matplotlib [22] to compute and visualize UTM 32 coordinates on a map of Copenhagen. Tableau [23] has been used to visualize and explore the dataset and for statistical computing SAS/JMP [24] has been used.

As seen in the preliminary revenue plot [figure 1], trend and seasonality are seen in the graph. In order to deseasonalise and make the trend stationary, a logarithm transformation has been performed of the AIRBNB revenue. Moreover, the data has been aggregated to weekly level, in order to compare the values with the google search quires. The google search quires has as well been time lagged up to 7 weeks back in time. This creates the possibility to describe at what time travelers respond to certain explorative variables.

4. Models for AirBnB sales

4.1 Stepwise regression model Stepwise regression is a method for developing a regression model, where the selection of input variables is done by an automatic procedure. The method has received a lot of criticism, for overfitting, not adjusting for degrees of freedom and in general generating too many input variables.

We have dealt with overfitting by choosing 20% Hold-out data in Cross 5 fold validation. This automatic function for Cross fold validation in SAS JMP, was actually the main argument for us to choose stepwise regression, as this method in SAS JMP allowed us to use exactly the same 20% Hold-out data in Cross 5 fold validation for both stepwise regression and neural networks. The stepwise regression in SAS JMP is resulting in 21 input variables, which can be considered too many. We defend our 21 input variables as being only 7 Google searches in several timelagged versions. This brings in some intercorrelation in our model, but also allows us to start developing a preliminary Customer Joumey model in this article with Google searches over time, linking Google searches on a timeline to different phases in the Customer Joumey model. We see the Stepwise regression model as a starting point for comparing multiple regression with neural networks. But as future work we would develop a classic multiple regression model with much fewer input variables, and compare that model to neural networks on the same 20% Hold-out data in Cross 5 fold validation. But SAS JMP did not allow us to do Cross fold validation on a classic multiple regression model automatic.

36

Page 46: Symposium i anvendt statistik 2018

4.2 Neural Network

Applying artificial neural network on sales modeiling could be seen as shooting birds with a cannon, but artificial neural networks have performed very good on many datasets within sales modeiling in many cases. So we feel it is a natural and logic process to compare regression models with neural networks within sales modeiling. We also chose the neural network modeiling approach, because we have an Airbnb dataset with 20,5 million lines of data, and see a good potential in general for the neural network model on such a large and complex dataset in our future work on this

datas et.

The neural network used is a standard feed forward network with 1 hidden layer and using back propagation

4.3 Model evaluation, validation and comparison The above listed models are evaluated by RMSE where the smallest values between the two models is preferred as well as the highest R"2. AK fold cross validation is as well performed. We made one model for the weekly data with stepwise multiple regression, and K-5fold cross validation on 20% ofthe dataset. We also applied Neural Network models for comparison, with exactly the same K-5fold cross validation on 20% ofthe dataset.

5. Models for AIRBNB sales

5.1 Model comparison The Multiple Regression model obtained a Rsquare on 0,87 on the validation data identifying 21 of 80 Google searches after validation Rsquare optimum method. The 80 Google searches were 10 searches positively Iinked to Airbnb sales, and then timelagged from 0-7 weeks back in time.

The Neural Network obtained a Rsquare on 0,96 on the validation data - with exactly the same set of21 Google searches as input variables. So the Neural Network performs significantly better than Multiple Regression on both Rsquare and RMSE, refer to below outputs from SAS JMP.

Neural Network Stepwise regression Full model Hold out Full model Holdout

Rsquared 0,9465 0,9568 0,9190 0,8721 N 112 28 112 28 RMSE 15 5221% 151117% 28 8389% NIA

Table 5: Summarized output form SAS/JMP from Stepwise regression and Neural Network.

37

Page 47: Symposium i anvendt statistik 2018

Other combinations of input variables for the Neural Network were tested, but none of them performed better on validation data, than the Neural Network with 21 Google searches as input variables.

5.2 Model results: Stepwise regression

SSE 9,314887

DFE RMSE RSquare RSquare Adj Cp 112 0,2883897 0,9190 0,8995 2,2394695

p AICc BIC 28 91,71697 161,2064

RSquare K-Fold 0,8721

Figure 6: SAS JMP output, Multiple Regression on Log ofSales, with 21 Google searches as input variables.

jActual by Predicted Plot 16

15

... ~ 14

i 13

12

tt

11 12 13 14 15 16

log sales Predicted RMSE=0,2832 RSq=0,92 PValue<.0001

Figure 7: SAS JMP output, Multiple Regression on Log of Sales, with 21 Google searches as input variables.

JResidual by Predicted Plot o.6~---------.-.--.--.-.-.-•. ~.---~

~ ~:~ • • 1 • • •11.L I I : > ·- 0 0 • • • .,..U"9'• .,.., ." •• ~ -0:2 .: "--;. ••• .,,. ~ ~-,." •• ~ -0,4 •• :. " _"

• ~6 ' gi -0,8 _J - 1,0

- 1 ,2-+--~------~-------~----<

11 12 13 14 15 16 Log sales Predicted

Figure: 8: SAS JMP output, Multiple Regression on Log ofSales, with 21 Google searches as input variables.

38

Page 48: Symposium i anvendt statistik 2018

5.3 Model results: Neural Network ---;::=============;-r==============; ~ Trainin .dJValidation '--;====-=====: .d[b2g sales I ~ Lo sales

Measures RSquare RMSE Mean Abs Dev -Loglikelihood SSE Sum Freq

Value 0,9464885 0,2115284 0,1552189 -15,05924 5,0113573

112

Measures RSquare RMSE Mean Abs Dev -Loglikelihood SSE Sum Freq

Value 0,9567919 0,1814072 0,1511751 -8,066025 0,9214404

28

Figure 9: SAS JMP output, Neural Network on Log ofSales, with 21 Google searches as input variables.

Figure 10: SAS JMP output Diagram for the Neural Network ofLog Sales with 21 Google searches as input variables.

39

Page 49: Symposium i anvendt statistik 2018

Training

J :::+---------=~--='---1-~~~--!l'lb.---I m -<>.• ] -10

.§' -1:5 +---r---r-r-'--r----.---.------i

11 12 13 14 15 16 Log sales Predlcted

Valldatlon 0,4

§ 0,3 :0 0,2

æ 0,1

I o.o .§' -0,1

-0,2

-0,3 11

... . . . . ·: .

12 13 14 15 Log sales Predlcted

Figure 11: SAS JMP output, Residual by Predicted Plot ofNeural Network on Log of Sales, with 21 Google searches as input variables.

JActual by Predicted Plot Training Validation

1e~---------------~ 16

15

14 ~ ] .§' 13 . ..

12

11

11 12 13 14 15 16 Log sales Precllcted

15

14 m ] .§' 13

12

11

11

.. ..

12 13 14 Log sales Predlcted

. •• •

15

. .

Figure 12: SAS JMP output, Actual by Predicted Plot ofNeural Network on Log ofSales, with 21 Google searches as input variables.

5.4 Model interpretation: Customer Journey

The 21 Google searches identified by the stepwise regression model are visualized in below preliminary Customer Joumey model.

T-minus 7weeks.

T-minus 6weeks.

A cluster of customers starts their customer journey 7 weeks before arriving in Copenhagen,

looking at both Flights & other ways to trave! to Copenhagen.

A cluster of customers starts their customer journey 6 weeks before arriving in Copenhagen,

Here we are closer to purehase, as the customer already starts looking at what can be explored in Copenhagen.

16

16

---<------------------ -------j

T-minus 4weeks.

A cluster of customers start their customer journey 4 weeks before arriving in Copenhagen,

40

Page 50: Symposium i anvendt statistik 2018

T-minus 3weeks.

T-minus lweeks.

T, trave/ week.

Metro Copenhagen

Flight Copenhagen

After looking at ways to travel to Copenhagen, many have bought train/flight ticket already here, and then customers start exploring Airbnb possibilities.

This exploring of travel possibilities is both before and after customers have bought Airbnb, Hostel or Hotel.

This exploring of trave I possibilities is both before and after customers have bought Airbnb, Hostel or Hotel.

A cluster of customers starts their customer journey 2 weeks before arriving in Copenhagen,

After looking at ways to travel to Copenhagen, many have bought train/flight ticket already here, and then customers start exploring Airbnb possibilities.

After Jooking at ways to trave/ to Copenhagen, many have bought train/flight ticket already here, and then customers start exploring HOSTEL possibi lities.

Here we are doser to purehase, as the customer already starts looking at what can be explored in Copenhagen.

After looking at ways to trave/ to Copenhagen, many have bought train/flight ticket already here, and then customers start exploring Airbnb possibilities.

Potential hotel customers find hotel much later in customer journey, compared to Hoste/ & Airbnb customers who find it in better time.

Customers who explore & maybe buy t rave/ with short timespan

- - -<

Customers who explore and maybe buy Airbnb with

_._s_h_o_rt_t_im~e_sp_a_n~~~~~~~~~~~~~--~ Customers who explore & maybe buy Hoste/ with short timespan

Potential hotel customers find hotel much Jater in customer journey, compared to average Hoste/ & Airbnb customers who find it in better time.

Customer starts looking at what can be explored in Copenhagen, just before or after they arrived in Capenhagen. Customer starts looking at how to use Metro in Copenhagen, just before or after they arrived in Copenhagen.

~ustomers who explore & maybe buy flight with short I t1mespan J

~~~~~~~~~~~~~~~~~~~~~~~~~~~-

41

Page 51: Symposium i anvendt statistik 2018

Figure 13: Customer joumey process. Summarized tindings from significant Google searches in stepwise regression from JMP/SAS.

All Google searches will belong to one ofthe phases in a Customer Journey model,

and this is the explanation why Airbnb relevant Google searches have predictive

power up to Airbnb sales. The pattern in the timeline is:

1. explore and maybe purehase train or flight

2. explore and maybe purehase hostel, Airbnb or hotel

3. explore possibilities with Metro and sightseeing

6. Geospatial analysis

To analyse the dataset for geospatial patterns, the WGS84 latitude, longitude

coordinates for the bookings are converted into a local UTM32 equivalent coordinate

system and then aggregated to lOOx!OO meter grid cells in UTM32 fora normalized

equal area presentation.

The color maps are produced as 5 equal quantiles of 20% to get the best visual

representation ofthe geospatial distribution within the map.

The maps are produced in Jupyer based from 2.6 mill booking data rows in a SQL

database.

Press service: It can be possible upon request, to get access to the geospatial maps in

color and high resolution. Please contact Niels Buus Lassen [email protected]

42

Page 52: Symposium i anvendt statistik 2018

Figure 14: Output from Jupyter notebook. Avg. Revenue for 100 m x 100 m grid cells.

In figure 14, it can be seen that the highest revenues are in the center ofCopenhagen.

43

Page 53: Symposium i anvendt statistik 2018

Figure 15: Output from Jupyter Notebook. Reservations days by 100 m x 100 m grid cells.

In figure 15, it can be seen that the highest level ofreservation days are in center, Vesterbro, Nørrebro, Østerport and the parts of Amager close to the center.

44

Page 54: Symposium i anvendt statistik 2018

Figure 16: Output from Jupyter Notebook. Price per guest on 100 m x 100 m grid.

In figure 16, it can be seen that the highest level of price per guest is in a little broader center area (compared to figure 14), and also in Islands Brygge.

45

Page 55: Symposium i anvendt statistik 2018

Figure 17: Output from Jupyter Notebook. Number ofProperties per 100 m x 100 m grid cell.

In figure 17, it can be seen that the highest level ofproperties per grid cell, are in the center, Vesterbro, Nørrebro, Østerport and the parts of Amager close to the center. The pattern it quite similar to figure 15, showing level of reservation days. A figure showing number of guests per grid, is also very similar to this figure 17.

46

Page 56: Symposium i anvendt statistik 2018

Figure 18: Output from Jupyter Notebook. Sum revenue per 100 m x 100 m grid.

In figure 18, it can be seen that the highest level ofrevenue per grid, are in the center, Vesterbro, Nørrebro, Østerport and the parts of Amager close to the center. The pattern it quite similar to figure 15, showing level of reservation days, and figure 17 showing properties per grid.

47

Page 57: Symposium i anvendt statistik 2018

7. Conclusion We have proven the significant predictive power in relevant Google searches up to

Airbnb sales in a Customer Joumey context. The predictive power of the Google searches were shown in both a stepwise regression and neural network model, and

the two models were compared in relation to modeiling and predicting Airbnb sales in Copenhagen with relevant Google searches. Stepwise regression had Rsquare on 0,87 and Neural Network had 0,96 on 20% hold-out data in cross 5fold validation.

By computing 2,6 million of AIRBNB reservation days in a 100 m. x 100 m. grids

of Copenhagen, we also showed new insights of the Airbnb data pattems in Copenhagen. Like in many other cities a central location of the property is preferred and the price and avg. revenue per property is highest in the central Copenhagen. However, the areas just outside center, Vesterbro, Nørrebro, Østerport and Islands Brygge (bro areas) has more days of booking and higher density of bookable properties per grid cell than the center resulting in higher total capacity than the center. The result is that the revenue per grid cell is as high in the bro areas as in central Copenhagen.

References (1) AIRBNB, about us, Accessed 18-12-2016https://www.airbnb.dk/about/about-us

(2) Statista, AIRBNB, 2016, Accessed 18-12-2016 https://www.statista.com/topics/2273/airbnb/

[3]Skat.dk, Accessed 18-12-2016https://www.skat.dk/SKAT.aspx?old=l615284

(4) Frechtling, D 1996, Practical tourism forecasting, Elsevier. -- 2001, Forecasting tourism demand: methods and strategies, ButterworthHeinemann.

[SJ Song, H, Witt, SF & Li, G2009, Travelers' Use of internet, 2009 Edition, Travel lndustry Association of Amercia, Washington D.C

(6) Mihir Rajopadhye et al. Forecasting uncertain hotel room demand, information sciences volume 132, issues 1-4, pages 1-11 (2001) Accessed 18-12-2016 http://www.sciencedirect.com/science/article/pii/S0020025500000827

(7) Roberto Rivera. A dynamic line model to Forecast Hotel Registations in Puerto Rico UsingGoogle Trend data. (2016): Accessed 18-12-2016 https://arxiv.org/pdli'lS 12.08097.pdf 2016

(8) Pan et al, Bing Pan, Chenguang Doris, Haiyan Song. Forecasting Hotel Room

48

Page 58: Symposium i anvendt statistik 2018

Demand Using Search Engine Data. 2012 Accessed 18-12-2016 https://pdfs.semanticscholar.org/3dce/ecd91 lbb8abb2464a70b8db2f0f609c283e4.pdf

(9] (Ginsberg et al) Ginsberg, J, Mohebbi, MH, Pate!, RS, Brammer, L, Smolinski,

MS & Brilliant, L 2009,'Detecting influenza epidemics using search engine query

data', Nature, vol. 457, no. 7232,pp. 1012-4.

[10] G. Zervas, D. Properpio, J. W. Byers, The Rise ofthe Sharing Economy: Estimating the Impact og Airbnb on the hotel Industry (2016), http://people.bu.edu/zg/publications/airbnb.pdf

(11] AIRDNA: Airbnb dataset from Copenhagen https://www.airdna.com/

[12] GOOGLE TREND SEARCH: https://trends.google.dk/trends/explore?date=2014-

09-01%202017-06-30&q=airbnb%20copenhagen,Hotel%20copenhagen,hostel%20copenhagen

(13] Visit Denmark, Status på turisternes overnatninger I Danmark 2016, 2016,

Accessed 18-12-2016 http://www.visitdenmark.dk/da/analyse/turistemes­

ovematninger-i-danmark-0

(14] Torsten Schmidt & Simeon Vosen, Forecasting Private Consumption: Survey -

based indicatorsvs. Google Trends, Ruhr, Economic Papers #155, 2009, Accessed 18-

12-2016 https://core.ac.uk/download/pdf/6326969 .pdf

(15] E Marketer, Most Travelers Use Search Engines When Planning a Trip, 2016,

Accessed 18-12-2016 https://www.emarketer.com/ Article/Most-Travelers-Use­Search-Engines-Planning-Trip/l 013745

[16] Net market share, Desktop Search Engine Market Share, 2016, Accessed 18-12-

2016 https://www.netrnarketshare.com/search-engine-market­share.aspx?qprid=4&qpcustomd=O

(17] Software; Alteryx Designeri, https://www.alteryx.com/

[18] Software; Jupiter, http://jupyter.org/

[19] Python Library; Scitools, https://scitools.com/

[20] Python Library; Osgeo http://www.osgeo.org/

[21] Python Library; Sharpely, https://pypi.python.org/pypi/Shapely

[22] Python Library; Matplotlib, https://rnatplotlib.org/

[23) Software; Tableau, https://www.tableau.com/

(24] Software; SAS/JMP, https://www.jmp.com/en_dk/home.html

49

Page 59: Symposium i anvendt statistik 2018

Klassekammerateffekten i PISA

Cand. stat. Hans Bay e-mail: [email protected]

Abstrakt: Det er velkendt at elevers faglige præstation afhænger af elevens socioøkonomiske baggrund. Men effekten af klassekammeraternes socioøkonomiske

baggrund, her beregnet som et gennemsnit, er også medvirkende til at forklare den enkelte elevs faglige præsentation. Endvidere vil det blive demonstreret, at hvis man undlader at medtage klassekammerateffekten, ved opstilling af modeller, der involverer skolernes undervisnings effekt, så bliver estimatet for undervisningseffekten

biased.

Effektiviteten af landets grundskoler har været et politisk varmt emne de seneste år. Dette har ledt til diskussioner blandt landet forskningsinstitutioner samt ministerier om

hvordan den såkaldte undervisningseffekt skal måles. Uenigheden går på, hvorvidt de andre elevers socio-økonomiske baggrund, den såkaldte klassekammerateffekt, skal

medtages. I det følgende vil der blive argumenteret for, at klassekammerateffekten skal medtages, fordi den har en stor indflydelse på estimeringen af undervisningseffekten. Dette vil ske med udgangspunkt i de seneste PISA-resultater, de omdiskuterede PISA-surveys belyser den sociale arv i Danmark og alle deltagende lande i et hidtil uset omfang.

I artiklen vil der blive opstillet modeller, der beregner undervisningseffekten baseret på de seneste PISA-resultater. I en model vil klassekammerateffekten blive udeladt. Hermed vil det blive påvist, at ranglister, baseret på en model, som ikke medregner klassekammerat-effekten, bliver biased. Dermed får skoler med en ovevægt af elever med høje socioøkonomiske baggrundsvariable en for "stor" undervisningseffekt og

vice versa. Så skolerne bliver rangeret efter sammensætningen af elever og ikke efter hvad skolen formår at lære eleverne.

Data

I foråret 2015 blev den 6. internationale PISA; undersøgelse gennemført. I Danmark

deltog 7 .161 elever. Alle disse elever fik stillet naturfagsspørgsmål, og derfor er scoren for naturfag (score_n) valgt som den variabel, der skal analyseres. Som forklarende variable er valgt elevens køn, elevens etnicitet, elevens socioøkonomiske indeks samt klassekammerateffekten, som er konstrueret som skolens socioøkonomiske gennemsnit. Endvidere er brugt PISA-konsortiets vægte (herunder replikationsvægte).

so

Page 60: Symposium i anvendt statistik 2018

Tabel I. Oversigt over notation mm.

K Antallet af skoler.

(i 2015 deltog 333 skoler i den danske del afundersøgelsen)

ni Antallet af elever på skole nr. j K Antallet af elever, der deltog i undersøgelsen.

Inj Danmark havde 7.161 elever med i undersøgelsen i 2015 j=l

PISA score for den i'te elev på skole nr j . Tilbage i år 2000 blev scorerne konstrueret således at de for

score1j hele OECD blev normalfordelt med middelværdi 500 og spredning 100. Naturfagsscoren for Danmark i 2015 blev

501 ,9.

ESCS for den i'te elev på skole nr j ESCS = index ofEconomic, Social and Cultural Status.

ESCS1j Betegner elevens socioøkonomiske ballast. For OECD-området følger ESCS den standardiserede normalfordeling.

n1 Den gennemsnitlige PISA score på skole nr. j

S_scorej = 2_ L scoreij nj b l

n1 Den gennemsnitlige sociale kapital på skole nr. j - også

S_ESCSj = 2_ L ESCSij kaldet kammerateffekten n ·

} "='

sexij Dummy variabel.

Pige og dreng Dummy variabel

etnicitetij dansk herkomst og anden etnisk herkomst (der er brugt IMMIG variablen fra PISA)

Tj Tj antages at være N(O, p 2)

Repræsenterer undervisningseffekten for skole nr. j

E1j E 1j antages at være N(O, cr2)

Repræsenterer det sædvanlige restled 02 Variansen mellem skolerne a' Restleddets varians

p 2 Intra-class coefficient. Et mål for hvor stor /CC = - 2--2 undervisningseffekten er. o + a

Residual. r;j Afvigelsen fra det eleven har opnået i testen og den

forventede score baseret på den anvendte model Et andet mål for undervisningseffekten, som her bliver

n1 benævnt som skolemål.

1) =~I r;j Den gennemsnitlige afvigelse for skole nr. j .

} i= l Hvis 1) er positiv, så opnår skole nr. j bedre karakterer end forventet og vice vera.

51

Page 61: Symposium i anvendt statistik 2018

En meget vigtig baggrundsvariabel i PISA sammenhæng er ESCS (= index of

Economic, Social and Cultural Status). Indekset afspejler en række aspekter ved den

enkelte elevs familie- og hjemmebaggrund, som kombinerer information om

forældrenes uddannelsesniveau, erhvervsmæssige stilling og forskellige typer af

besiddelser i hjemmet (dette er beskrevet i samtlige PISA rapporter siden år 2000).

Man har valgt at konstruere dette indeks således, at gennemsnittet for alle OECD­

lande bliver lig nul med en tilhørende spredning på 1. Endvidere kan dette indeks

beskrives med en normalfordeling. (Dette skyldes at man har brugt den principiale

komponent metode til at konstruere indekset).

Den gennemsnitlige naturfagsscore for Danmark blev 501,9, hvilket er en anelse over

OECD's gennemsnit, som oprindeligt var sat til 500 (i år 2000). Naturfagsscoren (og

også de andre domæners score) er udviklet således, at gennemsnittet for alle OECD­

lande er af størrelsesorden 500, og den tilhørende spredning er på 100. I nedenstående

figur er vist fordelingen for de 7. 161 danske elever.

Fig. 1. Fordelingen for naturfagsscore for 7 .161 elever i Danmark år 2015.

Distribution ofscore_N

190 230 270 310 350 390 430 470 510 550 590 630 670 710 750 790

naturfagsscore

I Curve --- Norrnal(Mu"'486.44 Sigma=9L096) I

Note: gennemsnittet er på 486, hvilket skyldes at der ikke er brugt vægte i forbindelse

med tegning af denne figur.

52

Page 62: Symposium i anvendt statistik 2018

Nedenstående er vist ESCS fordelingen for Danmark.

Pig 2. ESCS fordeling i Danmark, baseret på besvarelser fra 6.985 elevers besvarelser.

Distribution of FSCS

-645 -5.85 -5.25 -4.65 -4.05 -3.45 -2.85 -2.25 -1.65 -1.05 -0.45 0.15 0.75 1.35 1.95 2.55 3.15

hldex of economic, social and cultural status (WLE)

I Curve -- Nomial(Mu=0.4469 Sigrra=0.9495) I

Note: Gennemsnittet for hele OECD er på 0. Danmark ligger over gennemsnittet. Igen

er der ikke brugt vægte, hvorfor ovenstående gennemsnit er for lille (se senere). Det er lidt overraskende hvor skæv fordelingen er.

Tabel 2. Naturfagsscoren og ESCS fordelt efter etnicitet og køn.

antal score n ESCS

dansk herkomst 5.411 509,5 0,654 anden etnisk herkomst 1.750 439,6 0,048 piger 3.602 498,9 0,577 drenge 3.559 504,6 0,598 I alt 7.161 501,9 0,587 Note: Der er brugt imputerede data, da ikke alle elever har besvaret baggrundsskemaet. I appendikset er der nærmere redegjort for dette.

Lige fra den første PISA rapport tilbage fra undersøgelsen i år 2000, har PISA fået sat et markant fokus på, hvor influerende den socioøkonomiske baggrund (ESCSii) er på

elevernes karakter i grundskolen. Den socioøkonomiske baggrund er en uomgængelig

53

Page 63: Symposium i anvendt statistik 2018

faktor i analyser af elevers karakterer. I nedenstående figur er vist sammenhængen mellem de danske elevers naturfagsscore og ESCS.

Fig 3. Sammenhæng mellem naturfagsscore og ESCS for 6.985 elever.

Fit Plot for score_N 800

600

e Observations 0 Parameters

6985 2

~ CJ> ErrorDF 6983 OJ)

..s MSE 7113.5

400 R-Squarc 0.1378 ~ ~ Adj R-Square 0.1377

200

-6 -4 -2 0 2

Index of economic, social and cultural status (WLE)

Note: Af grafiske årsager er der brugt klassisk regressionsanalyse uden vægte. Med vægte ville forklaringsgraden blive 11,32%.iii.

Klassekammerateffekten (S _ ESCS)

Klassekammerateffekten er her beregnet som gennemsnittet af skolens elevers ESC Siv. Der har været en del debat om hvorvidt man bør medtage klassekammerateffekten, når man skal måle/vurdere skolernes undervisningseffekt. Tilbage i 2011 var der en diskussion mellem undervisningsministeriet og det daværende KREVI ( jvr. KREV12011). De to institutioner fik produceret forskellige ranglister for skolerne

undervisningseffekter. Men de to institutioner brugte forskellige modeller og en væsentlig forskel var, at undervisningsministeriet ikke havde medtaget klassekammerateffekten i modsætning til KREVI. Hos Allerup2016v fremgår det at

"Klassens gennemsnitlige socioøkonomi har en signifikant sammenhæng med klassens

54

Page 64: Symposium i anvendt statistik 2018

resultatafvigelse. Jo højere klassens socioøkonomi er, jo større er klassens resultatafvigelse i positiv retning (side 7)".

Fig. 4 fordelingen for 333 skolers klassekammerateffekt (=gennemsnit af elevernes

ESCS).

Distribution ofS ESCS

-0.75 -0.5 -0.25 0 0.25 0.5 0.75 1.25

klassekammerateffel..'t

I Curve --- Normal(Mu=0.4266 Sigma=0.4645) I

U ndervisningseffekt

1.5

I en række artikler er undervisningseffekten for skole nr. j beregnet som r; (se Tabel

1). Her udtrykker r; gennemsnittet af elevernes afvigelse i forhold til den anvendte

model. Så hvis r; er positiv så scorer eleverne i gennemsnit bedre end forventet og vice

versa. Dette mål for undervisnings effekten har været anvendt hos KREVI2011 vi, Bay2013vii, CEPOS2015viii_ Også Undervisningsministeriet via Styrelsen for

undervisning og kvalitet udarbejder socioøkonomiske referencer for grundskolekarakterer;". I VIVE2017x har man videreudviklet den model, der blev brugt hos KREVI og bruger nu en model, der antager, at skoleeffekten er normalfordelt

omkring 0. Dermed er VIVE's model analog til nedenstående model.

55

Page 65: Symposium i anvendt statistik 2018

Model der anvendes i dette notat:

i=l,2,3

scoreij = a + {31sexij + {32etnicitetij + {33ESCSij + {34S_ESCSj

+ {35 (etnicitet * ESCS)ij + 1j + Eij

nj (antallet af elever pr. skole)

j= l,2,3,"" K, her er K=333, som er antallet af skoler.

Tj ~ N(O, p2) repræsenterer de 333 skolers undervisningseffekt.

Eij ~ N(O,cr2) repræsenterer restleddet

Etnicitet og sex er dummy variable. Se også tabel 1.

Ovenstående model er også anvendt uden klassekammerateffekten (S_ESCS).

Tabel 3. Estimater

Fulde model Udeladelse af estimater klassekammerat-effekten

428,9 443,8 intercept (3,6) (3,3)

-5,5 -5,3 pige (1,8) (1,8) dreng 0 0

44,6 47,58 dansk etnisk herkomst (3,1) (3,1) anden etnisk herkomst 0 0

11 ,6 14,4 ESCS (2,7) (2,7)

35.3 -Klassekammerateffekt (S ESCS) (4,2)

14,2 13,3 ESCS*dansk (2,9) (2,9) ESCS*anden 0 0 p2 544,7 688,4 a' 5664,9 5679,4 ICC= p2/( p1+ a1 ) 0,088 0,108 Korrelation mellem klasse-

0,36 kammerateffekten ( S_ESCS) og 0 undervisningseffekten

(P<0,0001)

Korrelation mellem 0,05 0,48

klassekammerateffekten ( S ESCS ) og Skolemål.

(P=0,33) (P<0,0001)

56

Page 66: Symposium i anvendt statistik 2018

Undervisningseffekten udregnet som ICC- som her er af størrelsesorden 0,09- er internationalt set lillexi. En beskeden undervisningseffekt blandt landets skoler bør give anledning til at overveje, hvor meningsfuldt det er at udarbejde ranglister over skolerne. For begge mål for undervisningseffekten gælder, at de er uafhængige (og dermed unbiased) af klassekammerateffekten. I modellen, hvor klassekammerateffekten ikke er medtaget i modellen, gælder at begge mål for undervisningseffekten er korreleret med klassekammerateffekten, og dermed er begge mål biased.

Skolernes estimerede undervisningseffekt (der bruges Tj) er optegnet i FIG. 5

Fig. 5. Estimater for de 333 skolers undervisningseffekt.

Distribution of undervisnings effekt

-100 -80 -60 -40 -20 0 20 40 60 80

lllldervisningseffel..1

I Curve --- Nonnal(Mu=O Sigma=20.02) I

De estimerede undervisningseffekter viser en relativ pæn normalfordelt figur, hvilket

også var en forudsætning for den anvendte model.

I det følgende er de estimerede undervisningseffekter optegnet mod klassekammerateeffekten.

57

Page 67: Symposium i anvendt statistik 2018

Fig. 6. Undervisningseffekt mod klassekammerateffekten.

50

0

-50

-100

Fit Plot for undervisningsefTekt

-l

0

---------------··--·--·--·--····-··-··-··--··-···'{) ·---· ··---0-

0

Oo @o <Ill o 0

8 0 0

0o o

0

0 0 ·····························O· ·a

-0.5 0

0

0

0.5

klassekammerateffeh.i

0

0 0

0 -~ ·o . 6 -

0

1.5

I --- Fil D 95% Confidence Limits mnnmmmm 95% Prediction Limits I

Observations 333 Parameters

ErrorDF 331 MSE 318.45 R-Square 397E-7 Adj R-Square -0.003

Så her estimeres undervisningseffekten ukorreleret af klassekammerateffekten. Og dermed er målet for undervisningseffekten unbiased afklassekammerateffekte.

58

Page 68: Symposium i anvendt statistik 2018

Fig. 7. Undervisningseffekt mod klassekammerateffekten for model, hvor klassekammerateffekten ikke er medtaget.

100

50

.!<: ~ "-' Q)

~ a 0 .ra ;. .... Q)

§ -50

-100

-1

Fit Plot for undervisningseffekt

....... ---··················· ·············

0

0 0

0 Q .... ······· .......... -----

-0.5

0

... ---······ 0

0

0

00 0 ······

12>······ 0

0

0

0.5

klassekammerateffekt

0

o····· 9 ..... ·······

1.5

I --- Fit D 95% Confidence Limits ----- 95% Prediction Limits I

Hermed er det påvist at målet for undervisningseffekten er korreleret med klassekammerateffekten og derfor et biased estimat.

Konklusion.

Observations 333 Parameters 2

ErrorDF 331 MSE 375.14 R-Square 0.1356 Adj R-Square 0.133

Udelades kammerateffekten så vil undervisningseffekten bliver korreleret med klassekammerateffekten. Dermed bliver skoler med en overvægt af elever med høje socioøkonomiske baggrundsværdier karakteriseret som skoler der har en høj

undervisningseffekt.

59

Page 69: Symposium i anvendt statistik 2018

Appendix

lmputation:

Alle 7.161 elever har besvaret spørgsmål vedr. naturfag og derfor har alle elever en naturfagsscore. Ikke alle elever har udfyldt baggrundsskemaet, der udleveres til prøven. Derfor er der 60 elever, der alene mangler ESCS oplysning, 78 der alene mangler oplysning om etnicitet, mens 116 elever både mangler ESCS og etnicitet. I alt 254 elever har manglende (missings) værdier. Det danske PISA konsortium har anbefalet af man bruger imputationxii, så' i nedenstående tabel vises de oprindelige værdier og de imputerede værdier.

Tabel 2. Oversigt over imputation for ESCS og etnicitet.

irnputation irnputation irnputation

antal score n antal ESCS antal score n ESCS

dansk herkomst 5.281 510,1 5.264 0,657 5.411 509,5 anden etnisk herkomst 1.686 441,6 1.643 0,049 1.750 439,6 samlet 6.967 502,8 6.907 0,593 7.161 501,9 piger 3.602 498,9 3.530 0,581 3.602 498,9 drenge 3.559 504,6 3.455 0,604 3.559 504,6 I alt 7.161 501,9 6.985 0,593 7.161 501,9 Note: egne beregnmger på det danske PISA datasæt. Der er brugt proceduren MI (fra SAS) med kun en imputeret værdi, der dermed erstatter den manglende værdi.

Man ser at der er stor forskel på ESCS mht. etnicitet. Elever med dansk herkomst har en gennemsnitlig ESCS værdi på over 0,60 mens elever af anden etnisk herkomst har en ESCS af størrelsesorden 0,05. Og selv om der-er tale om forholdsvis beskedne ændringer, så bør man være opmærksom på at den gennemsnitlige ESCS værdi falder, når man bruger imputation. Dette skyldes, at det er elever med beskedne ESCS værdier, der ikke har udfyldt baggrundsskemaet, der bruges til at fastsætte den socioøkonomiske baggrund. Det er lidt overraskende, at der er flere piger end drenge i stikprøven. Dette kunne være udtryk for at drenge har lettere ved at undslå sig deltagelse i testen end piger.

Vurdering af imputationens betydning

Modellen er anvendt på det imputerede datasæt (model A), på det oprindelige datasæt (N=6.907) (Model B), og endelig er modellen anvendt på det imputerede datasæt men med udelukkelse af klassekammerateffekten (S_ESCS) (Model C). Estimaterne er angivet i nedenstående tabel.

60

0,654

0,048 0,587 0,577 0,598 0,587

Page 70: Symposium i anvendt statistik 2018

Tabel 3. Estimater

ModelA ModelB Model C N=7.161 N=6.907 N=7.161 Imputeret Oprindelige data Imputeret

estimater datasæt datasæt 428,9 433,7 443,8

intercept (3,6) (3,7) (3,3) -5,5 -6,9 -5,3

pige (1,8) (1,8) (1,8) dreng 0 0 0

44,6 43,7 47,58 dansk etnisk herkomst (3,1) (3,3) (3, 1) anden etnisk herkomst 0 0 0

11,6 11,4 14,4 ESCS (2,7) (2,7) (2,7) Klassekammerateffekt 35.3 33,0 -(S ESCS) (4,2) (4,2)

14,2 14,4 13,3 ESCS*dansk (2,9) (3,0) (2,9) ESCS*anden 0 0 0 PL 544,7 562,8 688,4 (Jl 5664,9 5678,2 5679,4 ICC= pL/( pL+ crL) 0,088 0,090 0,108 Korrelation mellem klassekammerateffekten

0 0,36

( S_ESCS) og -

(P<0,0001) undervisningseffekten Korrelation mellem klassekammerateffekten 0,05 0,48

-( S_ESCS) og (P=0,33) (P<0,0001) Skolemål.

Der er ikke den store forskel på om modellen bruges på det oprindelige datasæt (og dermed få nogle manglende værdier) eller modellen bruges på det imputerede datasæt.

61

Page 71: Symposium i anvendt statistik 2018

Modellen specificeret efter køn og etnicitet:

Dansk, dreng:

Score=428,9 +44,6 + (J l,6+14,2)*ESCS+35,3*S_ESCS. =473,5+25,8*ESCS+35,3*S_ESCS

Dansk, pige:

Score=428,9 +44,6-5,5 + (J l,6+14,2)*ESCS+35,3*S_ESCS. =468,0+25,8*ESCS+35,3*S_ESCS

Anden etnisk herkomst, dreng:

Score=428,9 + ll,6*ESCS+35,3*S_ESCS.

=428,9 +I l ,6*ESCS+35,3*S_ESCS

Anden etnisk herkomst, pige:

Score=428,9-5,5+ l l ,6*ESCS+35,3*S_ESCS

=423,4 + ll,6*ESCS+35,3*S_ESCS

En konsekvens af modellen er, at ESCS har stor betydning for danske elever og knap

så stor for elever med anden etnisk herkomst. Dette er også diskuteret hos Henningensen20 l 7xiii. " ... hvilket kan oversættes til, at den negative sociale arv er

svagere for elever med ikke-etnisk dansk baggrund sammenlignet med etnisk danske elever. (side 74) ".

Analyser baseret på skolerne

Model (1)

scoreij = a + PESCSij + yS_ESCSj

i= l,2, ... nj her er njantallet af elever på skole nr. j

j=l,2,3," .. K

Der summeres over Model (1)

S_scorej = a + pS_ESCSj + yS_ESCSj

62

Page 72: Symposium i anvendt statistik 2018

Model (2) S_scorej = a + (/3 + y)S_ESCSj

j=l,2,3, .... K

Tabel I. Estimater fra model (1)

Estimated Regression Coefficients

N=7.161 R-square=O, 1380 (jvr. Fig. 3)

Standard Parameter Estimate Error tValue Pr > ltl

Intercept 464.17 3.12 148.56 <.0001

ESCS 26.39 1.25 21.12 < .0001

S ESCS 37.91 4.95 7.65 < .0001

ESCS+S ESCS 64,30

Note: Der er brugt proceduren SURVEYREG, der er brugt det 11llputerede datasæt

Tabel II. Estimater fra model (2)

Parameter Estimates

N=333 (antallet af skoler) R-square=0,5141

Parameter Standard Variable Label DF Estimate Error t Value Pr > ltl

Intercept Intercept 1 455.53 2.51 181.27 <.0001

S ESCS klassekammerateffekt I 73.53 3.93 18.71 <.0001

Note: der er brugt klassisk regression ingen vægte.

63

Page 73: Symposium i anvendt statistik 2018

' PISA står Programme International Student Assessment. Data kan hentes fra www.pisa.oecd.org ;; Konstruktionen af ESCS må siges at være en stor succes. Med et begrænset antal spørgsmål t il eleven, har man konstrueret et indeks, der er signifikant korreleret med samtlige scores (læsning, matematik og naturfag) for samtlige lande i samtlige PISA runder siden år 2000. ;;; I rapporten. "Danske unge i en international sammenligning. Vibeke T. Christensen (red.) KORA 2016 " er noteret at: "I Danmark forklares 10,4% af variationen af ESCS-indekset side 117" .. " I PISA undersøgelserne er der ikke en variabel, der fortæller hvilken klasse eleven går i. Derfor er klassekammerateffekten beregnet for hele skolen. Der er normalt maksimalt 28 elever pr. skole, der deltager i undersøgelsen. 'Allerup, Torre & Hetmar 2016: Hvad betyder klassen?. Danmarks institut for Pædagogik og uddannelse. ,; KREVl2011: Om faglig skolekvalitet ifølge KREVI og Undervisningsministeriet. August 2011 "' Bay 2013. Udvikling i PISA for de tre skandinaviske lande. Nationaløkonomisk tidsskrift 151, 2013, 247-258 ~;, CEPOS2015: Klassekammerateffekten og inklusion i folkeskolen. Arbejdspapir marts 2015.

;, De socioøkonomiske referencer for grundskolekarakterer 2015. (oprindeligt udarbejdet af Styrelsen

for IT og læring). Af det statiske bilag fremgår det at: Den anvendte statistiske model er en såkaldt

multilevel model: Y;;=X;; •p + u1 +e;; hvor Y;; er karakteren for elev i på institution j, er X;; e levens

baggrundsvariable og~ er de tilhørende parameterestimater, u; varianskomponenten svarende til

variationen mellem skolerne, og er e;; residualet svarende ti l variationen mellem eleverne på skolen.

I et andet notat (171207-Soc-ref-laesevejledning) skriver undervisningsministeriet: " Modellen er

udviklet af Undervisningsministeriet og er baseret på forskningsbaserede og anerkendte metoder til

statistisk korrektion. Hertil er der kun at sige at orskningsbaserede og anerkendte metoder omfatter

også om estimaterne er biased (min kommentar).

' Karakterer- og prøveeffekter i de danske folkeskoler 2014-2016. VIVE 2017 ,; PISA 2015 results Volume I side 227. Bogen kan downloades fra OECD's hjemmeside. , ;; Fakta om metoden multipel imputation. DPU, KORA (nu VIVE) og Danmarks Statistik se også https://www.kora.dk/udgivelser/udgivelse/i14766/FAKTA-om-PISA ,;;; Henningsen & Allerup: PISA- matematik. Forlaget Matematik 2017.

64

Page 74: Symposium i anvendt statistik 2018

Potentialerne i samlingen af registerdata

Lea Sztuk Haahr, Steen Andersen og Bodil Stenvig, Rigsarkivet

Artiklen her vil kort gennemgå de forskellige former for data, som Rigsarkivet bevarer og

arbejder for at bringe i spil i nutidig og fremtidens forskning.

Rigsarkivet vil være vores fælles samfundsmæssige hukommelse og har i dag en stor

tilknytning til især de historiske fagområder. I Rigsarkivet bevares alt skriftligt materiale, der

vurderes at have værdi ifølge arkivlovens § 4 som foreskriver Rigsarkivets formål:

I) "At sikre bevaringen af arkivalier, der har historisk værdi eller tjener til dokumentation af

forho Id af væsentlig administrativ eller retlig betydning for borgere og

myndigheder.(Arkivloven § 4 stk. I).

2) At stille arkivalier til rådighed for borgere og myndigheder, herunder til

forskningsformål." (Arkivloven§ 4 stk.3)."

Holdningen til hvad der bidrager til den historiske værdi kan være forskellig afhængig af

faglighed. En samfunds- og sundhedsvidenskabelig forsker er ikke nødvendigvis enige om,

hvordan nutidens samfund bedst beskrives om 200 år.

For en samfundsvidenskabelig forsker vil politiske analyser, vælgerundersøgelser og brede

samfundsmæssige beskrivelser være afgørende for at kunne beskrive samfundets politiske

udvikling. Den socialdemokratiske regering fra 1993 holdt magten i 8 år, og blev afløst af blå

blok i 2001 som efterfølgende holdt magten 10 år, Thorning-regeringen derimod havde kun

magten i 4 år. Hvad initierede vælgernes skift i præferencer? Er økonomi en variabel der kan

virke afgørende for vælgerne, når de afgiver deres stemme til folketingsvalg? Eller er de mere

værdiorienteret? Det vil valgundersøgelserne kunne give et billede af og give en forklaring på

de politiske skift gennem årenes løb.

Sundhedsvidenskaben kan beskrive samfundets tilstand ud fra :tX hvilke sygdomsmæssige

problemer der påvirkede samfundet i en given periode.

Den øgede forskning i fertilitet i 1980'erne og frem skyldtes danskernes faldende fertilitet, og

et samfundsproblem som blev så stort, at forskerne fik øgede bevillingerne til at fokusere på

årsagerne hertil. Fertilitetsproblemer ville ikke nødvendigvis være blevet beskrevet i det

samfundsvidenskabelige perspektiv, men det var og er stadig et samfundsproblem i dag.

Nogle af de første store forskningsundersøgelser Rigsarkivet modtog fra sundhedsforskningen

var arbejdsmiljørelateret. Anledningen var en øget fokusering på danskernes sikkerhed på

arbejdspladsen, grundet bl.a. asbestsager og kemikaliesager. Det fokus der anlægges i

65

Page 75: Symposium i anvendt statistik 2018

sundhedsvidenskabelig forskning, kan være med til at beskrive problematikker

befolkningen, og er dermed lige så vigtige i beskrivelsen af samfundet.

Det fulde billede af Danmark anno 2017 set om 200 år vil historisk set skulle beskrives ikke

kun ved klassiske historiske arkivalier som fx klassiske befolkningsregistre (cpr" kirkebøger

mv.), men i lige så høj grad videnskabshistorisk fra mange forskellige videnskabelige

retninger.

Surveydata i Rigsarkivet

Rigsarkivet har en unik samling af surveydata som går tilbage til 1953 og indeholder cirka

3000 datasæt, og hovedsageligt samfunds- og sundhedsvidenskabelige. Her kan blandt andet

nævnes valgundersøgelser som går helt tilbage til 1973. Seneste udgave er fra 2015 og har

4147 respondenter og 400 variable, hvor disse fx dækker over graden af politisk interesse,

holdninger til skattepolitik, indvandring mm. (Studiebeskrivelse for Valgundersøgelsen

2015). Her kan tilføjes at der er identiske spørgeskemaer for valgundersøgelser for både

folketingsvalg og kommunalvalg.

Valgdata er generelt de mest efterspurgte data i Rigsarkivets samling af surveydata, og er

eksempelvist for nylig anvendt i publikationen af Russell J. Dalton: "Political Equality as the

Foundation of Democracy"(Dalton, 2017: 1)

Værdiundersøgelserne starter op i 1981 og gentages til 2008 i europæiske lande, og man kan

via disse bl.a. lave komparative undersøgelser over tid. "Værdiundersøgelserne omfatter et

bred spektrum af værdier men samler sig navnlig om værdier i forhold til: religion, arbejde,

politik, familie, overordnet moral." (Studiebeskrivelse den danske værdiundersøgelse, 2008)

Al data er dokumenteret via omfattende metadata, som inkluderer studiebeskrivelser,

metodebeskrivelse, periode de dækker og oprindelige spørgsmål. Derved sikres validiteten i

brugen af sekundær data idet en af ulemperne normalvis er mangelende eller dårlig metadata,

som det fremgår af følgende citat: "Faren er, at indsamlingsmetoder, spørgsmålsformulering

og dårlig dokumentation af de sekundære data forhindrer en indsamling af data på tværs og

over tid". (Bøgh Andersen, Møller Hansen, et al. 2012: 295).

Søgning foregår via Rigsarkivets hjemmeside: https://www.sa.dk/da/brug-arkivet/dda/dda­

soegeservice/, og data kan ligeledes bestilles derfra.

Som følge af samarbejde mellem Danmarks Statistik og Rigsarkivet, arbejdes på fremover at

gøre det muligt for Rigsarkivet at overføre data via en sikker forbindelse til Danmarks

Statistiks Forskerservice. Forskere fra institutioner, der indgår i forskerordningen, kan hermed

66

Page 76: Symposium i anvendt statistik 2018

tilgå data i Forskerservices sikre miljø. Adgang til data forudsætter naturligvis, at forskeren

har Datatilsynets tilladelse til at anvende data fra Rigsarkivet i et forskningsprojekt.

Rigsarkivet samarbejder også internationalt i CESSDA ERIC, en sammenslutning af

forskningsdataarkiver i Europa, som giver brugere adgang til internationalt forskningsdata.

Målet er at kunne supportere den næste generation af forskere, uanset hvor i Europa de

befinder sig. European Language Social Science Thesaurus (ELLST) er et af tiltagene, en

online thesaurus for samfundsvidenskaben tilgængelig på 12 sprog. ELSST kan bruges som

opslagsværk, hvis man fX vil søge udenlandske studier gennem CESSDA ERICS' søgeportal.

Her kan der søges på dansk og alle internationale studier med det pågældende søgeord vil så

dukke op. Tesaurussen i sig selv fungerer ligesom fX Google Translate, blot mere kvalificeret.

Den findes p.t. på UK dataservices hjemmeside: https://elsst.ukdataservice.ac.uk/.

Sundbedsvidenskabelige data1

Data fra kohorter og kliniske case-kohorte studier er unikke og værdifulde forskningsdata for

sundhedsforskningen. Flere danske forskningsprojekter har etableret og fulgt kohorter

gennem mange år, hvor nogle stadig følges og bruges i specifikke projekter. Blandt disse er

data fra projektet: "Screening for kræft i tyktarm og endetarm, 1987-2002". Projektet har

bidraget til at sætte screening af personer mellem 50-70 år for tarmkræft på det nationale

screeningsprogram i Danmark. Projektet har international interesse og p.t. undersøges

effekterne efter 30 år for de personer, som indgik i de forskellige nationale kohorter (USA,

England og Danmark). Data fra projektet er blevet afleveret i Rigsarkivet og oparbejdet i

henhold til Rigsarkivets standard for data- og metadatakvalitet i 2010. Det er siden blevet

udleveret flere gange til ny forskning.

Derudover kan nævnes at blandt andet forskningsdata vedrørende effekter af screening af

brystkræft i København og på Fyn er udleveret til ph.d. projekter ved Århus universitet, og

datamateriale om "CNS-tumorer hos børn i Danmark, 1980-1996" er udleveret til forskning

ved Rigshospitalet.

Mange andre værdifulde datamaterialer fra sundhedsforskning er indsamlet og udleveret med

base i Rigsarkivet.

Projekt helbredskort fra skolelæger ca. 1950-1980

På baggrund af en forskerhenvendelse har Rigsarkivet i gangsat scanninger af helbred skort fra

danske skolelæger i perioden 1950-1980. Allerede inden scanningerne påbegyndte var

1 Projektleder Bodil Stenvig

67

Page 77: Symposium i anvendt statistik 2018

interesserede forskere inviteret ind til at give deres bud på eventuel kommende forskning

baseret på materialet. Nyeste tiltag er et samarbejde med interesserede forskere om

struktureringen af metadata for helbredskortene, hvilket har givet forskerne en unik mulighed

for at forme materialet under processen. Målet er, at informationer fra helbredskortene

sammenholdt med fx registerdata kan bidrage til ny viden indenfor børns udvikling.

Digitaliseret og Digitalt skabt materiale2

Rigsarkivets har en meget stor samling af digitalt skabte arkivalier, som er yderst relevant for

forskningen. Digitalt skabte arkivalier indeholder digitalt skabte data fra IT-systemer fra

myndigheder og private. Når data fra digitalt skabte arkivalier stilles til rådighed for

arkivbrugerne er det som databaser. Rigsarkivet har udviklet programmet Sofia, der kan

anvendes til søgning i disse. Hovedparten af data stammer fra IT-systemer, som

myndighederne har brugt i deres sagsbehandling. Digitalt skabte arkivalier afleveres som

arkiveringsversioner, der indeholder en kopi af data, men ikke selve systemet de er skabt i. De

første blev skabt i begyndelse af 1970'erne i takt med, at myndighederne begyndte at anvende

IT-systemer til fx registrering af CPR-numre og opkrævning af kildeskat. Efterfølgende har

anvendelsen af IT-systemer bredt sig til alle områder af den offentlige forvaltning.

Rigsarkivets samling af digitalt skabte arkivalier afspejler denne udvikling, og er derfor også

stødt stigende.

Rigsarkivets opgave er at bevare de digitalt skabte arkivalier og sikre, at de både kan læses og

bruges af eftertiden. Af den grund har Rigsarkivet valgt at standardisere afleveringsformen.

Det betyder, at myndigheden skal lave et udtræk af data fra deres IT-system og aflevere data

som en arkiveringsversion, dvs. i en form som overholder standarderne beskrevet i

bekendtgørelse nr. 1007 (retsinformation.dk).

Eksempler på digitalt skabte arkivalier er først og fremmest ESDH-systemer (Elektroniske

Sags- og Dokumenthåndteringssystemer). Disse systemer har været anvendt til registrering af

oplysninger om ind- og udgående post samt lagring af de elektroniske dokumenter, der har

betydning for myndighedernes sagsbehandling.

En meget væsentlig kilde til den nyere Danmarkshistorie er Statens Centrale

Regnskabssystem fra Økonomistyrelsen, som indeholder data om statens pengeforbrug fordelt

på konti for perioden 1976-1991 og 1992-1997.

Andre eksempler på vigtige databaser som er digitale, og hvor tidligere materiale er

digitaliseret, er eksempelvis Klimadatabasen, der indeholder Danmarks Meteorologiske

2 Seniorforsker ph.d. Steen Andersen

68

Page 78: Symposium i anvendt statistik 2018

Instituts data om vejr og vind indsamlet i perioden 1872-2004. Et andet centralt register er

Dødsårssagsregistrene fra Statens Institut for Folkesundhed, som også er blevet digitaliseret.

Det indeholder oplysninger om dødsårsager i perioden 1943-1969.

For forskere er der mulighed for at søge om adgang til data, som ikke er umiddelbart

tilgængeligt. Forskere skal beskrive deres projekter - og hvordan de agter at anvende data i

den forbindelse - og indsende ansøgning til Rigsarkivet, som foretager høring af Datatilsynet.

Registerdata

Rigsarkivet gemmer også registerdata, og når det vurderes om et register indenfor fx

sundhedsvidenskab skal bevares, jævnfør arkivloven, er de udvalgt og gemt efter følgende

kriterier, som skal være opfyldt for at komme i betragtning. Registret skal indeholde

oplysninger om sygdomsårsager, sygdomsforekomst og sygdomskonsekvenser. Derudover er

det også væsentligt om et register indeholder oplysninger om en totalpopulation samt

personoplysninger, herunder CPR-numre, så registrene kan kobles med andre registre,

surveydatasæt, patientjournaler eller andet.

Rigsarkivet har et formelt samarbejde med KOR, som er ansvarlige for et registeroverblik til

forskningsbrug. Registeroverblikket giver forskere muligheder for at søge informationer om

statslige registre bl.a. ved:

Navn

Organisatorisk tilknytning

Kronologi

Indhold

Registerdata er struktureret og gjort søgbare med finansiering af KOR og kan findes her:

http://www.registerforskning.dk/projekter/registeroverblik/

Der er flere fordele ved brug af registerdata heriblandt kombineringen af data:

"Ved at benytte registerdata, skjult observation eller dokumentanalyse kan man analysere

adfærd uden at påvirke adfærden hos de personer, hvis adfærd undersøges." (ibid 2014:113)

og "Endelig er det ofte via registerdata muligt at følge en udvikling over tid, hvilket er

umuligt med tværsnitsanalyser, som gennemføres på et givent tidspunkt." (ibid 2012:294).

Der er store potentialer i at knytte surveydata til registerdata fra fx Rigsarkivet. Registerdata

kan også medvirke til at reducere problematikker ved surveydata:

"Foruden den store tilgængelighed og de mange kombinationsmuligheder, som ligger i

databaserne, er fordelen ved at arbejde med registerdata, at der er tale om populationsdata -

og ikke kun en mindre stilprøve. Dette bevirker, at deri statistiske usikkerhed knyttet til

69

Page 79: Symposium i anvendt statistik 2018

stikprøveudvælgelsen er væk. Derudover vil data som oftest være systematisk indsamlet,

hvorved fejlkilderne i hvert fald er forsøgt minimeret".(ibid 2012: 294).

Afslutning

Artiklen har haft til formål at give en kort introduktion til potentialerne i Rigsarkivets samling

især indenfor den samfunds- og sundhedsvidenskabelige forskning. Jævnfør arkivloven skal

data indsamlet i dag, kunne beskrive hele samfundet om 200 år, og derfor opbevarer

Rigsarkivet relevant forskning.

Artiklen har kort gennemgået potentialerne i surveydata, hvor især valgundersøgelser og

værdiundersøgelser er interessante i samfundsvidenskaben.

Digitalt skabte data og digitaliserede databaser giver nye muligheder for forskning, via

Rigsarkivets program Sofia og sundhedsområdet tilbyder store datasæt, herunder registre, som

bl.a. kan kombineres med surveydata.

Indenfor den sundhedsvidenskabelige forskning, er registerforskningen især efterspurgt.

Vi håber med denne opfordring, at forskere vil bruge endnu mere data fra Rigsarkivet, idet

formålet med forskning, er at generere ny viden til glæde for eftertiden.

Referencer:

Arkivloven §4: https://www.retsinformation.dk/Forrns/r07lO.aspx?id=183862

Bekendtgørelse nr. 1007. https://www.retsinformation.dk/forms/r071O.aspx?id= l32898

Frederiksen, Morten. Gundelach, Peter. Skovgaard Nielsen, Rikke. Mixed Methods-forskning

Principper og praksis. 2014, Hans Reitzels forlag

Bøgh Andersen, Lotte. Møller Hansen, Kasper. Klemmensen, Robert. Metoder i

Statskundskab. 2012, Hans Reitzels forlag.

Russel Dalton. Political Equality as the Foundation ofDemocray. In "The Participation Gap:

Social Status and Political inequality", (Oxford: Oxford University Press, Forthcoming 2017)

Værdiundersøgelsen (studiebeskrivelse):

http://dda.dk/catalogue/31083 ?Iang=da#documentation

Valgundersøgelsen 2015: http://dda.dk/catalogue/31083 ?lang=da

Ole Kronborg: Screening for kræft i tyktarm og endetarm, 1987-2002:

http://dda.dk/catalogue/ l 5096?lang=da

KOR: http://www.registerforskning.dk/projekter/registeroverblik/

ELLST: https://elsst. ukdataservice.ac. uk/.

70

Page 80: Symposium i anvendt statistik 2018

Abstract

The Danish National Biobank and the Danish Biobank Register By Lasse Boding

Danish National Biobank, Statens Serum Institut, Copenhagen [email protected]

The Danish National Biobank and the Danish Biobank Register together make up a novel research infrastructure which among other features include very fast and useful web-based tools for study feasibility assessment (qualified power calculations) ahead of eventually retrieving data for studies on health outcomes.

Background The Danish National Biobank was established in 2012 at Statens Serum Institut in Copenhagen. It is among the most advanced in the world and currently houses 8.5 million biological samples. But the biobank is much more thanjust a storage space: it is a research infrastructure. This infrastructure includes the Danish Biobank Register that gives researchers an overview ofbiological samples stored in Danish biobanks. By linking with the Danish registers, information can be obtained about e.g. the donors ' diseases. The linkage is unique because it gives researchers an opportunity to study the very first indicators of a developing disease. Many diseases develop slowly over time - some take 20 years before giving symptoms. This is why researchers are trying to discover new biomarkers to make preventive or early treatment possible in the future.

Access to samples and data To gain access, all research projects must be approved by a research ethics committee and the Data Inspection Agency. All incoming applications are assessed by the DNB Scientific Board. An agreement ofthe terms for the project is made, and the samples are retrieved and handed out.

The Danish Biobank Register The Danish Biobank Register gives researchers online access (www.biobanks.dk) to combined detailed meta-data from all the biobanks participating in the initiative. A tutorial about the register is also available online (www.nationalbiobank.dk/register.html). Through national collaboration large biobanks, based at hospitals, universities and other research institutions in Denmark, regularly submit data to the Danish Biobank Register. Data from the biobanks are linked to disease codes and demographic information from national administrative registries on an individual level.

71

Page 81: Symposium i anvendt statistik 2018

Aggregated results about the available biological material is displayed to researchers around the world through a web-based search system, that so far contains information about 24.5 million biological samples from 5.7 million Danes. The foliowing biobanks are available through the Danish Biobank Register:

• The Danish National Biobank - samples from the Danish National Birth Cohort (>600,000 samples), PKU cards blood spots from all newbom Danes since 1982 (>2 million samples) and SSI historical collections (>4 million samples)

• The DNA biobank at Rigshospitalet (50,000 samples collected annually) • The Danish Cancer Society project biobank "Diet, Cancer and Health (samples

from 57,000 cohort participants) • The Danish Patobank (17 million tissue samples from national hospitals) • COPSAC - studies involving children with asthma • DD2 - type 2 diabetes research • The Danish Blood Donor Study - research on why blood donors are more

healthy than the average population • Danish Cancer Biobank - blood and tissue samples from Danish cancer patients

And can be linked to information from: • The Danish Civil Registration System - individual information such as gender,

date ofbirth, place ofbirth, citizenship, identity ofparents, place ofresidence. • The Danish National Patient Register - information on all hospital contacts,

department, date and time of arrival and departure, treatment, and operation. • The Danish Pathology Register - information on all pathological examinations

carried out in Denmark, investigating pathology department or practicing pathologist, type of investigation, gross description, microscopy description, conclusion and/or diagnoses, and coded diagnoses based on the Danish SNO MED

The future The DNB has many interesting new projects in the pipeline. The biobank will grow and include more sample types, such as tissues and microorganisms. The Danish Biobank Register will continue to add new biobanks and registers, and are working on integrating three large laboratory results databases. The lab results addition will enable researchers to search for samples from patients with specific biomarker characteristics e.g. specific circulating antibodies, vitamin deficiency or chronic inflammation. Further, we are piloting integration of analysis results from samples we have handed out, and, initially, where genetic analysis has been performed. Bringing back the results and making them available to other researchers will enable markedly faster and cheaper studies, while preserving precious biological material.

72

Page 82: Symposium i anvendt statistik 2018

Ny IDA database på baggrund af arbejdsmarkedsregnskabet

Pernille Stender og Søren Leth-Sørensen, Danmarks Statistik

1. Status for omlægningen af IDA

IDA-databasen indeholder oplysninger, der hidtil har dækket perioden 1980-2013. I

2017 har der været arbejdet med at omlægge databasen til at anvende et nyt datagrund­

lag (arbejdsmarkedsregnskabet) fra 2008 og frem.

Det var planen, af IDA skulle have været omlagt medio december, og at de reviderede

data for perioden 2008-2015 derfor skulle have været tilgængelige igennem forsker­

placeringsordningen på nuværende tidspunkt. Arbejdet er dog blevet forsinket, hvilket

betyder at det p.t. kun er persondelen af IDA (IDAP) som er frigivet. Der arbejdes na­

turligvis på at få de andre dele af IDA (IDAN: ansættelser og IDAS: arbejdssteder)

færdiggjort snarest.

2. IDA i historisk perspektiv

IDA blev udviklet sidst i firserne og udviklingsarbejdet tog tre år. Siden databasens

færdiggørelse har den været et af de mest benyttede forskningsregistre, som Danmarks

Statistik stiller til rådighed. IDA-databasen var en vigtig nyskabelse af flere forskellige

årsager, hvor de væsentligst var:

}> Den belyser jobmobilitet mellem to statustidspunkter ultimo november på to på

hinanden følgende år.

}> Den fastlægger identiteter på arbejdssteder og firmaer mellem to statustids­

punkter ultimo november på to på hinanden følgende år. Dette giver bl.a. mu­

lighed for at se om arbejdssteder er oprettede, bevarede eller nedlagte.

}> Databasen indeholder variable som opsummerer befolkningens historik på ar­

bejdsmarkedet, ex. erhvervserfaring og ledighed.

}> IDA var oprindeligt opbygget som et stort integrationsregister og databasen in­

deholdte i mange år ca. 300 variable fra forskellige statistikområder. Det dreje­

de sig bl.a. om variable fra befolkningsstatistikken, uddannelsesstatistikken og

73

Page 83: Symposium i anvendt statistik 2018

indkomststatistikken. De mange variable sikrede en let adgang til data på et

tidspunkt, hvor kobling af data fra mange registre var tidskrævende.

Det primære datainput til den oprindelige IDA-database er beskæftigelsesstatistikregi­

stret. Beskæftigelsesstatistikregistret var fra 1980 datagrundlag for den registerbasere­

de arbejdsstyrkestatistik (RAS). Fra 1990 var beskæftigelsesstatistikregistret også da­

tagrundlag for erhvervsbeskæftigelsen (EBS). RAS og EBS er to vigtige strukturstati­

stikker til belysning af hhv. befolkningens tilknytning til arbejdsmarkedet (RAS) og

erhvervsstrukturen (EBS).

Beskæftigelsesstatistikregistret var (og er stadig) ikke et forløbsregister, men derimod

et årskomprimeret register med oplysninger om den danske befolknings arbejdsmar­

kedsstatus ultimo november samt samtlige lønmodtagerjob i året. Lønmodtagerjobbe­

ne er årskomprimerede, hvilket vil sige, at en lønmodtager kun kan have et job på et

arbejdssted i løbet af året. I en situation hvor en lønmodtager arbejder på et arbejdssted

i tre måneder i starten afåret og tre måneder i slutningen af året, vil der derfor kun bli­

ve dannet et job.

Fra 2008 er beskæftigelsesstatistikregistret dannet på baggrund af et bagvedliggende

forløbsregister kaldet arbejdsmarkedsregnskabet (AMR). Før 2008 fandtes der ikke et

bagvedliggende forløbsbaseret datagrundlag. Den primære årsag hertil var, at data for

lønmodtagerjob før 2008 ikke havde særligt præcise periodeangivelser. Før 2008 skul­

le arbejdsgiverne kun indberette oplysninger om deres ansatte en gang årligt via oplys­

ningssedlen i forbindelse med beregning af skat. Danmarks Statistik havde fået indført

nogle enkelte felter på oplysningssedlen til brug for statistik. Her skulle arbejdsgiveren

angive, om den ansatte havde været beskæftiget hele året. Hvis det ikke var tilfældet,

skulle arbejdsgiveren angive perioden, hvor medarbejderen havde været ansat. Såfremt

medarbejderen havde været ansat i flere perioder i årets løb, skulle arbejdsgiveren kun

angive om medarbejderen havde været ansat ultimo november. Dette havde to konse­

kvenser. For det første kunne arbejdsgiveren fristes til at angive at medarbejderen hav­

de været ansat hele året, uden at det var tilfældet, idet det var lettest. Dette bidrog til, at

opgørelsen ultimo november blev mere usikker. For det andet kendte man ikke ansæt­

telsesperioderne i de tilfælde, hvor medarbejderen havde været ansat i flere perioder i

årets løb. Det sidste betød, at beskæftigelsesstatistikregisteret - og dermed RAS, EBS

og IDA - ikke kunne opbygges som forløbsregistre.

74

Page 84: Symposium i anvendt statistik 2018

3. Forløbsbaserede arbejdsmarkedsstrukturstatistikker

I løbet af de sidste 10 år er arbejdsmarkedsstrukturstatistikkerne efterhånden blevet

forløbsbaserede. Som den første blev statistikken for personer som modtager offentlig

forsørgelse færdigudviklet i 2007. Her er de statistikker som tidligere eksisterede på

området, bl.a. ledighedsstatistikken, barsels- og sygedagpengestatistikken, kontant­

hjælpsstatikken og AMFORA-statikken integreret i et system. Ved produktionen af

statistikken over offentligt forsørgede foretages der en tværgående behandling (en så­

kaldt overlapsbehandling) af data. Ved denne overlapsbehandling nedjusteres volumen

(dvs. tilstandsgraden) typisk, når en person har overlappende aktiviteter på samme

tidspunkt, hvor volumen er større end tilladt. I praksis svarer det til, at der nedskrives,

når man er forsørget i mere end 37 timer om ugen.

Som det andet forløbsregister blev Danmark Statistiks eget elndkomstregister færdig­

udviklet i 2011. Eindkomstregistret indeholder oplysninger fra 2008 og frem. Dan­

marks Statistiks elndkomstregistret er dannet på baggrund af SKAT's elndkomstregi­

ster, men registret er bearbejdet således, at det opfylder behovene i relation til at være

datainput til beskæftigelsesstatistikkerne. Eindkomstregistret indeholder i modsætning

til den årlige oplysningsseddel månedlige oplysninger om lønmodtagerjob med langt

mere præcise dateringer end hidtil. I modsætning til oplysningssedlen har elndkomst­

registret også oplysninger om antallet af løntimer, som kan bruges til at beregne et ar­

bejdsomfang. Før elndkomst var man derimod nødsaget til at anvende det indbetalte

ATP-beløb som en - ret upræcis - indikator for arbejdsomfanget.

Med færdiggørelsen af disse to forløbsregistre var der udviklet to vigtige forløbsbase­

rede datainput til at opbygge et forløbsregister, der kunne belyse hele befolkningens

tilknytning til arbejdsmarkedet - et såkaldt arbejdsmarkedsregnskab (AMR)1. AMR

blev færdigudviklet i 2015 og dækker p.t. perioden 2008-2015. Datakilderne til AMR

er foruden elndkomstregistret og statistikken over offentligt forsørgede, uddannelses­

statistikken, erhvervsregistret, befolkningsstatistikken, indkomststatistikken og bar­

sels- og sygedagpengestatistikken. AMR indeholder derfor oplysninger om lønmodta­

gerjob, job for selvstændige og medarbejdende ægtefæller, oplysninger om forskellige

typer af offentlig forsørgelse, uddannelsesaktiviteter, alderspensioner, og fraværsfor­

løb. I forbindelse med produktionen af AMR fortages en tværgående overlapsbehand­

ling, hvor der ved åbenlyse fejl justeres på dateringer af arbejdsmarkedsforløb og op­

lysninger om antallet aftimer i forløbene.

i Ud over personer der har bopæl i Danmark, indeholder AMR også oplysninger om personer som ar­bejder i Danmark, men har bopæl i udlandet.

75

Page 85: Symposium i anvendt statistik 2018

I AMR kan en person have forskellige aktiviteter på samme tidspunkt. Endvidere er

der til samtlige aktiviteter knyttet timeoplysninger i form af en tilstandsgrad. AMR

findes i to varianter, hhv. en timenormeret variant, hvor hver person altid på et givet

tidspunkt har en summeret tilstandsgrad på en (hvilket svarer til 37 timer om ugen) og

en ikke timenormeret variant, hvor hver person kan have en tilstandsgrad som er mere

eller mindre end en. Begge varianter af AMR indeholder en variabel, som kan anven­

des til at bestemme den primære tilknytning til arbejdsmarkedet på et givent tidspunkt

efter de internationalt gældende regler.

AMR er et særdeles fleksibelt statistiksystem som bl.a. gør det muligt at:

~ Opgøre befolkningens tilknytning til arbejdsmarkedet i fuldtidspersoner i en

valgfri periode, evt. med normering i forhold til en 37 timers arbejdsuge.

~ Foretage beregninger af gennemsnit (ex. gennemsnitlig beskæftigelse) i en

valgfri periode

~ Foretage statusopgørelser af befolkningens tilknytning til arbejdsmarkedet på

en valgfri dag i året - dvs. ikke nødvendigvis ultimo november.

Endvidere giver AMR mange muligheder for at foretage forskellige analyser, hvor det

forløbsmæssige aspekt udnyttes. Det kan være analyser af overgange fra uddannelses­

systemet til arbejdsmarkedet eller opgørelser af tilknytning til arbejdsmarkedet for be­

folkningen over en længere periode.

I dag anvendes oplysningerne fra AMR i personstatistikken, men også i høj grad i er­

hvervsstatistikken og i økonomisk statistik.

4. AMR som datagrundlag for RAS og EBS

Efter udviklingen af AMR er RAS og EBS overgået til at anvende AMR som data­

grundlag fra 2008 og frem. RAS og EBS opgøres stadig som statusopgørelser ultimo

november, idet der er et klart brugerbehov for at videreføre den eksisterende lange

tidsserie som går helt tilbage til 1980. I løbet af 2018 vil der dog også blive offentlig­

gjort opgørelser, der relaterer sig til andre tidspunkter af året.

På baggrund af AMR dannes der også fortsat et årskomprimeret beskæftigelsesstati­

stikregister, som minder meget om det hidtidige beskæftigelsesstatistikregister. Dette

register anvendes typisk til serviceopgaver, hvor RAS eller EBS er datagrundlaget el-

Page 86: Symposium i anvendt statistik 2018

ler af forskere, som foretrækker et mere velkendt datagrundlag eller forskere, som ikke

har behov for de muligheder som forløbsdata giver.

5. Konsekvenser af det nye datagrundlag for IDA

I 2015 blev den hidtidige IDA opdateret med data for 2013, men siden da har databa­

sen ikke været opdateret. I 2017 overgik ansvaret for databasen fra kontoret for forsk­

ningsservice til arbejdsmarkedskontoret. Siden da har der været arbejdet på at omlæg­

ge IDA til at anvende data fra AMR.

Omlægningen betyder, at dele af IDA vil forblive uændret, mens andre dele vil æn­

dres. Af de uændrede forhold vil det fortsat være således, at IDA er baseret på det års­

komprimerede beskæftigelsesstatistikregister, og at der derfor tages udgangspunkt i

bevægelser mellem statustidspunkterne ultimo november på to på hinanden følgende

år. Endvidere er reglerne der anvendes ved opgørelserne af jobmobilitet og arbejdsste­

ders/virksomheders identiteter over tid uændrede.

På en række områder er der dog ændringer. Her kan særligt fremhæves følgende:

~ Opgørelsen af personernes erhvervserfaring er forbedret. Den oprindelige vari­

abel var baseret på ATP-indbetalingerne i året. Det betød for det første, at vari­

ablen ikke var særlig præcis, idet ATP-beløbet kun afspejler arbejdsomfanget i

begrænset grad. Den gamle variabel havde yderligere den svaghed, at den kun

blev beregnet for personer, der var i arbejdsstyrken ultimo november. Det be­

tød, at personer der var uden for arbejdsstyrken ultimo november, men havde

været i arbejdsstyrken i løbet af året, ikke fik tildelt nogen erhvervserfaring. En

konsekvens af dette var, at erhvervserfaringen for personer med en ustabil til­

knytning til arbejdsmarkedet blev undervurderet. Endelig blev variablen nulstil­

let, såfremt man ikke havde bopæl i Danmark.

I den nye IDA beregnes derfor en ny variabel for erhvervserfaring (fra 2008) på

baggrund af løntimerne, og den er dermed væsentligt mere præcis. Endvidere

nulstilles variablen ikke, når man ikke har bopæl i Danmark ultimo november,

og variablen opgøres for hele befolkningen (og ikke kun arbejdsstyrken).

Erhvervserfaringen vil dog fortsat også blive beregnet efter den oprindelige me­

tode af hensyn til at have en tidsserie med ensartet definition.

77

Page 87: Symposium i anvendt statistik 2018

~ Der er dannet en række nye variable, som viser forskellige aspekter ved befolk­

ningens erhvervserfaring. Dette indbefatter for det første en optælling af antallet

af år, hvor man har været beskæftiget siden 1980 baseret på status ultimo no­

vember. Hermed kommer erhvervserfaring opnået i job som selvstændig eller

medarbejdende ægtefælle (som jo ikke er omfattet af ATP-ordningen) også til

at tælle.

Desuden er der på baggrund af AMR's forløbsoplysninger foretaget en opgørel­

se af andelen af dage siden 2008, hvor man har været beskæftiget (uanset om

man har bopæl i landet igennem hele perioden). Som supplement er der foreta­

get en opgørelse af, hvor stor en andel af perioden siden 2008 man har haft en­

ten bopæl- eller arbejde i Danmark. Disse opgørelser kan sammenholdes, så­

fremt man ønsker at vurdere den opnåede erhvervserfaring for en person i for­

hold til, hvad det rent faktisk har været muligt at opnå i perioden for personen.

~ Timelønsvaribleme er nu beregnet ud fra løntimeme og henholdsvis det smalle

og bredde lønbeløb i elndkomst/AMR. Det smalle lønbeløb svarer til det hidti­

dige lønbeløb i IDA. Men for begge timelønsvariabler gælder det, at de er base­

ret på langt bedre oplysninger om arbejdsomfang (løntimer), end den "gamle

IDA-timeløn", hvor arbejdsomfanget blev estimeret på baggrund af ATP­

beløbet og ' andre oplysninger'. Disse 'andre oplysninger' var bl.a. joblængden,

som var behæftet med stor usikkerhed på den gamle oplysningsseddel, på grund

af de usikre periodedateringer jf. ovenfor ..

I den forbindelse er en vigtig pointe, at det ikke er muligt at danne en timeløns­

variabel efter den gamle metode længere. Årsagen er, at arbejdsgiveren ikke

længere indberetter perioden for ansættelsen på samme måde som tidligere (og

med de fejl og mangler der var på den gamle oplysningsseddel).

~ Variabelnavngivningen er generelt standardiseret i forhold til navngivningen i

AMR/RAS. Det betyder, bl.a. at IDA-variablen PSTILL nu hedder

PSOC STATUS KODE. - -

6. Samspil mellem IDA og AMR

I praksis er AMR, RAS, EBS og IDA fire statistikker som hænger tæt sammen. Over­

ordnet forholder det sig således, at RAS og EBS er direkte afledte statistikker ud fra

AMR, hvor der tages udgangspunkt i situationen ultimo november. I modsætning her-

Page 88: Symposium i anvendt statistik 2018

til foregår der ved dannelsen afIDA en yderligere berigelse af AMR's data m.h.t. fast­

læggelse af jobmobilitet og arbejdssteders/virksomheders identiteter over tid (med ud­

gangspunkt i situationen ultimo november).

AMR muliggør naturligvis, at man vil kunne opgøre jobmobilitet mv. i løbet af året ef­

ter de samme regler, som anvendes i IDA. Det vil dog kræve betydelige ressourcer til

et sådant udviklingsarbejde. I stedet kan AMR anvendes af brugerne som et supple­

ment til IDA på anden vis. Det kunne ex. være i form af følgende:

~ At opnå information om omfanget af tilknytning til arbejdsmarkedet fordelt på

forskellige socioøkonomiske grupper over en periode. Ex. opgørelse af antallet

af dage hvor man har været henholdsvis beskæftiget, ledig, eller i forskellige

socioøkonomiske grupper placeret uden for arbejdsstyrken.

~ At opgøre flows mellem socioøkonomiske grupper i valgfrie perioder over tid.

~ At ra viden om sideløbende aktiviteter for en population over tid. Ex. hvor

mange har kun et job (og evt. om jobbet er støttet beskæftigelse eller om man er

midlertidigt fraværende fra jobbet), hvor mange har flere samtidige job, eller

andre aktiviteter ved siden af jobbet (ex. uddannelse, efterløn eller førtidspensi­

on).

Ansvaret for AMR, RAS, EBS og IDA er som nævnt blevet samlet i arbejdsmarkeds­

kontoret. Skulle der være bruger-ønsker om nye summariske variable til IDA på bag­

grund af AMR kan man rette henvendelse til arbejdsmarkedskontoret, Pernille Stender,

[email protected].

79

Page 89: Symposium i anvendt statistik 2018

Intervention Models in Time Series

Anders Milhøj

University of Copenhagen

Abstract

All time series models have a stochastic component that intends to describe the effect of influence of stochastic errors happening to the series. Often a time series is however also influenced by well-defined specific rare events with large impacts. Such exogenous, deterministic effects from outside should be mode led separate ly by de terminis tic components and not as a part of the stochastic model. Of special interest in the present paper are models for exponentially decreasing impacts of an intervention. This model is in this paper time reverted such that it also fits to situations were impacts are increasing up to the actual date of the intervention.

Two examples are takenfrom series for the number ofinternet searches as published by Google® Trends. Searches related to earthquakes and to the release date of a video game are used to illustrate various types of interventions. A third example is the number ofnew cars in Denmark.

Introduction

All time series models have a stochastic component to fit the influence from various sources outside the model. These impacts are in a statistical model seen as outcomes of stochastic variables. They are often denoted "noise" which is an intuitive explanation for the faet that they are seen as noise that makes the true signal, the correct data series, more dim. Often the series is however also influenced by events with large impacts on the time series. Such events are explicitly known as extraordinary.

We reserve the concept "interventions" to reactions to events that are ofthis non-stochastic nature. The existence of an exogenous intervention is known to the analyst and its impact should be modeled separately from the stochastic model ofthe rest data series. Otherwise the estimated residual variance become unrealistic large. The reaction to the time series ofthe intervention has some kind of deterministic explanation so the behavior ofthe series close to the time ofthe intervention has to be modeled apart from the stochastic mechanism that otherwise generate the observations.

80

Page 90: Symposium i anvendt statistik 2018

An example of such an event is the intake of medicine where biological measurements indicate a reaction to the intake. It is known that the medicine is given so any change in the physiological behavior ofthe patient is assumed to be due to the medicine. Another example is the reaction ofthe consumers to an increase in taxes, where the consumption is believed to decrease.

Intervention models are often formulated by dummy variables taken values like zero and one and hence they are by no means observed values. An intervention could model the level shift of a time series from summer to winter, but actual measurements of temperatures, humidity, hours ofsunshine are by no way included in the model. For the two intuitive examples mentioned above this means that the actual dose ofthe intake and the actual level ofthe tax increase are not included in the model. More refined models for interventions are however very close to models relying on actual observed values so the distinction is by no way strict.

A general framework for analyzing time series were presented in the hook by Box and Jenkins(1976) and after that seasonal ARIMA models are standard when modeiling univariate time series. Box and Jenkins(1976) in their hook also considered intervention models ofthe kind considered in this paper.

The first example on Google searches for the word "earthquake" is a straightforward application foliowing the outline ofBox and Jenkins(l976). In the second example of searches for the phrase "FIFA re lease date" a steady growth in the series is seen up to actual release date. This behavior is impossible to model by traditional by Box and Jenkins(l976) models. In this paper this problem is solve by simply reverting the time order ofthe series. The third example combines the approaches.

Types of Interventions

Interventions may take various forms, as for instance one particular outlier, a level shift or perhaps a shift in the observed trend in the time series. More complicated interventions exist for instance a shift in the level which is not abrupt but instead occurs gradually.

An intervention for an intervention that only affect the series for just one observation, at time to, has the form

X1 = COit + yl

where the intervention components 11 is given by It = 0 for all t but for just the value lw = 1 at the event ofthe event. More precisely It = 0 fort f:. t0 and11 = 1 fort = to. The remainder term Yt could be modeled in many ways according to the situation, e.g. as a regression model or as a time series model such as a multiplicative, seasonal ARIMA model, see Box and Jenkins(1976).

81

Page 91: Symposium i anvendt statistik 2018

For interventions for temporary level shifts we assumed above that the effect ofthe intervention was immediately, but often the effect lasts for some time. This could be modeled by introducing lags in the intervention model

Here the effect of the intervention is given as steps in the foliowing way

Time t0 : The effect is ro0

Time t0+ 1 : The effect is - ro 1

Time t0+r : The effect is - ro,

The choice of sign for the parameters roi in this parameterization is taken as the notation by Box & Jenkins(l976) and the notation in the SAS® PROC ARlMA.

One problem by using r > 0 in this parameterization is that the number of ro-parameters is large and as they tell the same story they are subject to multicollinearity. This means that the estimates ofthe roi parameters have large standard deviations. If r > 1 the more parsimonious parameterization stated below could be often more efficient.

In order to reduce the number of parameters it is often better to adopt another form of parameterization using the notation of lag polynomials. The idea is that the effect is slowly decaying by a factor 81 as

(1) X1 = roI1 + ro8I1- 1 + .. + ro8'I1-r + .. + Yt =~It+ Yt 1-8L

Here the effect ofthe intervention is given as steps in the foliowing way

Time t0 : The effect is ro

Time to+ I : The effect is ro8

Time t0+r : The effect is ro8'

... etc.

The value 8 = 0 gives a simple intervention. But the situation of a positive parameter 8 < 1 gives a gradual decrease towards the previous level ofthe series which is never reached but in practice the first steps are the !argest. An example is presented in Figure I for 8 = 0.8. For 1812'. 1 the series diverges and a new level is undefined. The parameter 8 is therefore restricted to the interval ] - 1, I [. Even if the value 8 = 1 is not included in the parameter space values of 8 cl ose to + 1 takes the form of a step function. These kinds ofmodels are discussed in detail by Box and Jenkins(l976).

82

Page 92: Symposium i anvendt statistik 2018

1.0 + +

+ 0.5 +

+ + + + + + + + + + + + + 0.0 + + +

-0.5

-1 .0

Figure 1. A Gradual Intervention Component for o = 0.8

The Daily series of internet searches for earthquakes

In this section the activity ofthe search word "earthquake" in the internet search machine Google is considered. Such time series are available from the website

https://www.google.com/trends/explore

The actual numbers of searchers are indexed by Google and these index values are then log- transformed, so the numbers ofthe vertical axis have no direct meaning. For daily observations the interest for internet searches usually last for some days. An example for 90 days July to September 2016 is presented in the Figure 5. In this period two earthquakes which lead to more internet searches were observed

August 24'th:

An earthquake in Northern Italy August 24'th measured as 6.2 on the Richter scale. A total of298 people were killed. The dummy variable I_Aug takes the value 1 for this particular day and zero for all other days.

September 3'rd:

An earthquake September 3'rd in Oklahoma, the strongest ever within the state of Oklahoma. The dummy variable I_Sep takes the value 1 for this particular day and zero for all other days.

Here it is clear from the graph that the interest in Google searching the word "earthquake" is large for some days after each earthquake but it gradually dies out; perhaps exponentially.

The estimated parameters and their standard errors are presented by the Table. The 8- parameter, 8 = 0.75, in the denominator is large for the Italian earthquake saying the

83

Page 93: Symposium i anvendt statistik 2018

interest decreased rather slowly, but after a week the effect is just 0.767 = 0.13. This mean that such an effect is usually not important when say monthly data is used. The o-parameter is significant but however close to zero for the Oklahoma earthquake, probably because this earthquake did not lead to any killed nor to severe damage.

16Jun 201 6

Parameter

MU

ARl,1

NU Ml

DENl,1

NUM2

DENl,1

01Jul 16Jul 01Aug 16Aug 01Sep 16Sep 010ct

date

Daily searches for the word "earthquake"

Maximum Likelihood Estimation

Estimate Standard t Value Approx Lag Variable Shift

Error Pr> ltl

1.41655 0.08281 17.10 <.0001 0 log_index 0

0.65018 0.08437 7.71 <.0001 1 log_index 0

3.04786 0.26317 11.58 <.0001 0 l_Aug 0

0.76110 0.04858 15.67 <.0001 1 l_Aug 0

1.92144 0.26669 7.20 <.0001 0 l_Se p 0

0.39705 0.14220 2.79 0.0052 1 l_Sep 0

84

Page 94: Symposium i anvendt statistik 2018

FIFA - a video game

The video game FIFA is veiy popular among eveiyone with an interest in soccer (European Football). In the game you are able to choose among all major clubs in all major toumaments in the world. Also you could choose among the famous players or you could create yourself as a player of some team; like playing side by side with Ronaldo in the team "Real Madrid".

This game is updated eveiy year as the season begins - usually in September. The game is available on many platforms like PC, Nintendo, Xbox and the game could be bought in a physical form like a compact disc and in recent years the game has also become available as download. The exact date ofthe release ofthe yearly update of course varies a bit. The Google search phrase "FIFA release date" becomes more relevant as the expected date ofrelease approaches. But after the release, eveiyone all at a sudden stop searching as they become aware of the release almost instantaneously.

The overall structure of weekly data for five years of the series of Google searches for the phrase "FIFA release date" is regular. Every year the interest in the release date rises exponentially until some date in September, but after the release date the interest suddenly drops down to a veiy low level. The Figure presents this weekly series; however, plotted to the reverted date variable. The tick marks at the horizontal axis are completely meaningless as they represent the dates before the date Januaiy l'st 1960. This plot looks exactly as the plot of the daily number of searches for earthquakes, Figure 5. This plots clearly shows that on the reverted time scale the number of internet searches decays exponentially.

- - - - - - -- - - - -1902 1903 1904 1905 1906 1907 1908

revert_date

Searches for the phrase "FIFA release date" plotted to a reverted time axis

85

Page 95: Symposium i anvendt statistik 2018

The estimated ro- and o-parameters for the yearly releases 2012- 2016 are given by the Table. Here a second order autoregression is used, even ifthe second order parameter is close to insignificance. The estimated o's in the Table are calculated as coefficients for exponential decay in the time reverted series; but they represent an exponentially build up in the weeks before the event when considered in real time.

For the years 2012 to 2015 the spikes at the week ofthe release is increasing (the numerator parameters NUMI - NUM5 in the output, the ro-parameters) which is easily interpreted as the interest in the video game is increasing worldwide. The peak at the 2016 release is in faet the last observation in the dataset and then the first observation in the reverted dataset. The ro-parameter, denoted NUM5 in the output ro = 0.45, is because ofthis a very imprecise estimate with a standard deviation as large as 0.22 while the standard deviations for the other ro- parameters are around 0.07. The denominator variables, the ()- parameters denoted DEN!,! in the output are all very close to the value o = 0.9.

Parameter Estimate Standard t Value Approx Lag Variable

Error Pr> !ti

MU 0.94328 0.29675 3.18 0.0015 0 LOG_INDEX

ARl,1 0.73246 0.06729 10.88 <.0001 1 LOG_INDEX

ARl,2 0.15646 0.07075 2.21 0.0270 2 LOG_INDEX

NU Ml 1.70462 0.34423 4.95 <.0001 0 RELEASE_12

DENl,1 0.92249 0.08065 11.44 <.0001 1 RELEASE_12

NUM2 1.98176 0.33554 5.91 <.0001 0 RELEASE_13

DENl,1 0.89422 0.08321 10.75 <.0001 1 RELEASE_13

NUM3 2.12850 0.33578 6.34 <.0001 0 RELEASE_14

DENl,1 0.90654 0.07051 12.86 <.0001 1 RELEASE_14

NUM4 2.76813 0.33119 8.36 <.0001 0 RELEASE_15

DENl,1 0.88774 0.05786 15.34 <.0001 1 RELEASE_15

NUMS 0.45828 0.22012 2.08 0.0373 0 RELEASE_16

DENl,1 0.87454 0.05221 16.75 <.0001 1 RELEASE_16

86

Page 96: Symposium i anvendt statistik 2018

Monthly sales of cars

In this section a time series defined by the number of new cars ( ordinary cars for private use) is analyzed. This series is considered in a period long time ago where many exogenous interventions occurred. The example consists of288 observations from January 1955 to December 1978. The numbers are log-transformed.

As a benchmark the so called airline model is used. This series is aften applied as a default model for seasonal time series; most famous in the example ofthe number of airline passengers by Box and Jenkins (1976). Themodel is also applied as default in the XI 1-ARIMA algorithm for seasonal adjustments. For this dataset the model with parameters estimated by maximum likelihood is

(1 - B)(l - B12)Xt = (1- 0.48B)(l - 0.74B 12)et , var(et) = 0.0680

The variance 0.0680 is heavily influenced by extreme observations as it is estimated mainly using the sum of squared residuals. The series of residuals is plotted by the Figure where the horizontal lines are two respective four times the standard deviation 0.0680" = 0.2608. It is clear that many residuals are numerically very large; two residuals exceed the limit offour times the standard deviation, January 1975 and October 1977. The critical value fora 5% test for outliers among 288 observations is close to four times the standard deviation.

~ " ~ 0 u, (ij

" ~ 0 -+-__,,l"'rn'""""",,.......'°"'"""'"""_..,.....,.....,,.,.,,,..,.,.""""''m'm.'"'"'"'Ym"......,..l"wnrmrnY"m"""".....,.._...l'-m"""""""'Y'""'""'""'"""",rywi------<

(ij

" 'O

~

1955 1960 1965

87

1970 1975 1980

Date

Page 97: Symposium i anvendt statistik 2018

Ifthese two residuals are removed using dummy variables, the residual variance is reduced to 0.0578. Of course more outliers could be compensated by dummy variables in order to reduce the variance even further, but the number ofparameters then of course increases as each outlier Ieads to an extra parameter.

This model is fitted in order to take care ofthe autocorrelation structure in the time series. The estimated ARMA parameters and the residual variance are the same when they are estimated based on the reverted time series. However, the idea of an innovation process, Et. and hence also the numerical residuals are different as displayed by the next Figure. The residuals are now prediction errors when backcasting one time period using only future values of the time series. The residuals Iarger than four times the standard deviation are now July 1962 and December 1974.

1.0 -

0.5

111, 1 1i1J 1 1I, "11 ,," I.I I I I 11l ll1 li li 11 , ,, III li, l,1111

'I' I' 'I' 1 11111 i1 l'I' n ·1 ·1 l

ll 'll '11 ' '

III' 11111 11 11

11111 li 'li 11

(i)

"' "' ~ 0 u,

00 ro ::i t) <(

ro ::i

"'C 'iii Q)

Cl:'. -0.5 -

-1.0

-1.5 I I I

1955 1960 1965 1970 1975

Date

A numerically large residual could be considered as the simplest form of an intervention which is easy to see as a part of an ordinary time series analysis. Interventions followed by an exponential decay (1) are more advanced and it not so simple to identify. But by macro programming it is possible to fit an intervention of the form (1) for each month in the dataset. However, the denominator variable 8 is not well defined ifthe intervention is applied for the very last part ofthe series.

The next Table gives the estimated values of ffi and 8 for the months were ffi numerically exceeds four times its standard deviation. Ifthe estimated 8 is close to

88

1980

Page 98: Symposium i anvendt statistik 2018

zero, it means that the intervention this particular month is probably a simple outlier. In some situations, the estimated o is close to one, which means the intervention is probably a step function. Several ofthe entries in table show values of o not significantly different from 0.7, which are interventions ofthe form(!).

date estimate_w estimate_o stderr_o tvalue_o probt_o

01/08/1962 -0.9223112 0.99999983 0.02917628 34.27 <.0001

01/04/1968 -0.8849199 0.5763446 0.19937316 2.89 0.0038

01/07/1970 -0.8729466 0.73588597 0.15881163 4.63 <.0001

01/06/1974 -0.8901127 0.93368101 0.0595717 15.67 <.0001

01/07/1974 -0.8307896 0.90274912 0.07510949 12.02 <.0001

01/12/1974 -0.9538513 -0.3761707 0.15292763 -2.46 0.0139

01/01/1975 1.25359976 0.61672696 0.13826998 4.46 <.0001

01/10/1977 -1.2118672 0.60343619 0.14154894 4.26 <.0001

A similar exercise can be applied to the time series with reverted time. In this way we find interventions which is exponentially build up in the months before rather than decaying after the interventions. The foliowing table gives the estimated values of co and o for the months were co numerically exceeds four times its standard deviation. For two ofthese interventions the estimated o is close to zero. This means that the intervention this particular month is probably an outlier. For one month, the estimated o is close to one. This means the intervention is probably a step function. Several of the remaining entries in Table show values of o not significantly different from say 0.7; that is ofthe form (1).

date estimate_w estimate_o stderr_o tvalue_o probt_o

01/08/1962 1.035228 0.75595124 0.12020634

01/07/1970 0.89187073 0.59941956 0.19336559

01/01/1975 -1.5019414 0.82758359 0.06029859

01/02/1975 0.95027123 -0.3361518 0.1595968

01/10/1977 0.95339143 0.99999948 0.04096177

01/11/1977 -0.9719802 -0.0605509 0.21504244

89

6.29

3.10

13.72

-2.11

24.41

-0.28

<.0001

0.0019

<.0001

0.0352

<.0001

0.7783

Page 99: Symposium i anvendt statistik 2018

These results from the preceding two Tables are based on an estimation ofthe denominator parameter 8 for each month in the dataset. I order to ease the estimation burden a fixed value of8, say 8 = 0.7, could be applied so that only the ro parameter has to be estimated.

In the analyses presented for this series up to this point interventions are generated automatically by a screening process which of course could lead to spurious results. A more honest analysis takes the approach first to identify a reason for an intervention with an ideas ofthe form ofthe intervention and after that estimate the parameters of the intervention component.

For this series it is of course possible to look in the historical annals for events that could affect the car sales in Denmark. Many such events exist for instance due to changes in the Danish taxation on new imported cars. Ifthe purpose ofthe analysis is mainly focused on the most recent part ofthe time series a screening, like the one presented above is, however a practical approach. In some sense the dates for interventions found by screening help the analysists to remember long forgotten events.

For this series the interventions are found in the period foliowing the first oil crisis in the autumn-winter 1973/74. In the foliowing years the trade balance was a huge problem for Denmark so as a quick fix an extraordinary increase on the taxation was announced from May 1974 to December 31 . 1974. A natura! model for such an intervention is a temporary step function having the value one only for the months of the intervention and zero for all other months. However, it was announced that the extraordinary taxation should end by New Year Eve 1974/75 and hence the sales of new years in Demark were close to zero the months befare. This situation is exactly an event that is could be modelied by the intervention function formed as an exponential increase up to the last month ofthe extraordinary tax level. The outlier October 1977 is due to a sudden, permanent increase in the taxation on new cars.

References

Box, G.E.P. and Jenkins, G.M. (1976), Time Series Analysis: Forecasting and Control, Revised Edition, San Francisco: Holden- Day.

90

Page 100: Symposium i anvendt statistik 2018

Population Alcohol Consumption as a Predictor of Alcohol­Specific Deaths in Finland

To be presented at the 40th Symposium i Anvendt Statistik, Frederiksberg, Denmark, January 22. - 24. 2018

Timo Alanko Helsinki, Finland, [email protected]

Acknowledgement: This paper is based on recentjoint work with Kari Poikolainen, (Poikolainen and Alanko, 2017)

ABSTRACT

Aims: The study examines whether the number of alcohol-specific deaths can be pre­dicted by population total and/or beverage-specific alcohol consumption and if, how precisely. The data are annua! series of spirits, wine, beer and total consumption and alcohol-specific deaths in Finland in the years 1969-2015.

Methods: We specify a ARDL (AutoRegressive Distributed Lags) model with cointe­grated variables, to be used in prediction. In our model the number of alcohol specific deaths is the response variable, and log of spirits consumption and log of non-spirits consumption, are the explanatory variables. The response variable has one added an­nua! lag and the explanatory variables have both four annua! added Jags in the model.

Results: In our data alcohol-specific deaths, log of spirits and log of non-spirits con­sumption are significantly cointegrated. The precision ofthe estimated model is good. The prediction results include prediction ofthe 2008 downtum in alcohol deaths, and forecasts offuture (2017-2020) alcohol deaths from 2016 on. Forecasted effects of proposed Finnish alcohol policy changes, potentially leading to six percent or 0.5 per­cent total consumption increases, are estimated.

Conclusions: The number of alcohol-specific deaths can be predicted with an appro­priate time-series regression model on the basis ofpopulation consumption. It is im­portant to consider also beverage type because ofthe improved predictive power. The model is useful in an evaluation of proposed alcohol policy changes

Keywords: Alcohol-specific deaths, Alcohol consumption, Cointegration, ARDL (Au­to Regressive Distributed Lags), Beverage type

91

Page 101: Symposium i anvendt statistik 2018

INTRODUCTION

It is commonly thought that total consumption of alcohol in a country determines how much harm is caused by alcoholic beverages. For example, per capita alcohol con­sumption has been shown to be significantly related to male all-cause mortality in eight out of 14 European countries in aggregate time-series between the years 1950-95 or shorter (Norstrom, 2001).

In addition to total alcohol consumption, type ofbeverage may also play a role. For example, in a pooled cross-sectional time-series analysis, spirits consumption was found to associate as strongly as total alcohol consumption with cirrhosis mortality in five countries, Australia, Canada, New Zealand, the United Kingdom and the United States in 1953-1993, while wine and beer were not significant (Kerr et al., 2000). In another cross-sectional time-series analysis, higher spirits (hard liquor) consumption was found to associate more strongly than other beverages with higher cirrhosis, head and neck cancer and ischemic heart disease (IHD) mortality in 48 states ofUSA in 1957-2002, while higher beer and wine consumption were found to associate with lower ischemic heart disease mortality (Kerr and Ye, 2011). Alcohol-related disease mortality declined by 7.0% after a 1990 tax increase for spirits and beer. Thus, bever­age type may influence the number of alcohol-related deaths.

Tue aim of the present paper is to examine the prediction of alcohol-specific mortality on the basis oftotal and beverage-specific aggregate alcohol consumption. The aim may at first sight seem trivial: 'Alcohol is the cause of alcohol deaths, so what is the problem?' . This is certainly true at the individual level. However, the aim is highly pertinent to alcohol policy. The theoretical basis of alcohol policy in many countries, especially in Scandinavia, is a quasi-statistical model of a 'collective drinking culture' (Skog, 1985). This model, often called 'the total consumption model', is based on a close association between total (or mean) population consumption and the proportion ofheavy drinkers in the population. Themodel assumes indeed that the correct way to reduce alcohol harm is to control the total consumption. An underlying causa! or qua­si-causal effect oftotal consumption on harm is obviously presumed. A strong associa­tion between total consumption and harm is thus a necessary condition for the model to be effective.

We show that aggregate consumption is a good predictor of alcohol-specific mortality in Finland, thus showing that the necessary condition sometimes prevails. Yet, we do not take a stand on the question of a causa! relationship between total consumption and alcohol mortality. The causa! direction ofthe total consumption model has been criticized and alternative ways of explaining the association have been suggested, see e.g. Duffy (1986).

92

Page 102: Symposium i anvendt statistik 2018

Data The alcohol-specific deaths, defined as cause-of-death group 41, were extracted from the registers of Statistics Finland (https://www.stat.fi/til/ksyyt/index en.html). The series consisted of 47 consecutive years, 1969-2015; the IDC-codes are given in Poi­kolainen and Alanko (2017). These are underlying causes of deaths, that is disease or in jury that initiated the train of morbid events. Contributory causes of death are not included because their causa! role is unclear. Consumption in absolute (100%) alcohol, consumed as various beverages, were obtained from the register in the National Insti­tutefor Health and Welfare (https://www.thl.fi/en/web/thlfi-en) in Finland. The fol­iowing variables were formed (variable names in italics): number of alcohol-specific deaths (alcdeath) per 100 000 person-years in population aged 15 years or older, total annua! alcohol consumption per capita in litres of absolute alcohol (totalcons), distilled spirits, beer and wine consumption per capita in li tres of absolute alcohol in population aged 15 years or older (spirits, beers, wines) and beers and wines together (nonspirits) . All these variables were also examined in natura) logs. Figure 1 gives an overview of the annua! development ofthe main variables over the period 1969-2015.

0 0 .r. 0

Ol 0 o; .; <Xl .0 (Il

I'-(J)

~ CO

.!: 2

l()

·a. ..,. (Il 0

~ <'l a. 0 N .r. 0 0 <(

0

... , /,;,; '

/'-~-----./ I

/ ./

... /"· ..•... " ..•..•.•.....•.••....•... "

•.•.... ··· , . ....,.-....... ,,,,·-·-. .,_ /°,,. '·, __________ _,,·-·-·,·-.........

1970 1980 1990 2000 2010 year

----· total per capita -·-·-·· spirits per capita

2020

•··••··••··•···· non-spirits per capita --- alcohol deaths per 100.00

oO ..,.o 0

0

0 0

Figure 1. Time series of the main variables 1969 -2015. All figures are annua/ and re­late to the population over 15 years of age in Finland. The alcohol consumption fig­ures (the fejt axis) are in litres absolute alcohol per capita (15 years +). The a/cohol deaths (the right axis) are per 100.000 af population (15 years +)

93

Page 103: Symposium i anvendt statistik 2018

Methods

A visual examination shows that none ofthe series are stationary. We assumed sto­chastic trends as an explanation ofnon-stationarity. Fixed time trends were deemed neither satisfactory nor justifiable on substantial grounds.

All the series (also in logs) were tested for unit roots, i.e. the hypothesis that the trends be due to random walks. We first applied augmented Dickey-Fuller unit root tests to the series in levels (also in logs). The tests showed that unit root hypotheses could not be rejected for any series, either with or without the assumption of a linear trend com­ponent in the data. Further unit root tests showed that the unit root hypothesis could be rejected for the once differenced versions ofthe series. With further examination ofthe autocorrelation structures, we inferred that the series could be assumed to be station­ary, I(O), in first-differenced form and thus I(!) in levels.

We tested also for possible cointegration relationships among alcdeath and the con­sumption variables using the Bounds test (Pesaran et al., 2001) and the Johansen trace test (Johansen, 1995).

The three-variable vector ofvariables (alcdeath, log(spirits), log(non-spirits)) was cointegrated. Indeed, the null hypothesis of no long-run relationship was clearly re­jected, with a high significance level, see Table 1.

Several other combinations ofvariables were tested. The vector (alcdeath, log(spirits), log(totalcons)) was cointegrated. Neither the pair (alcdeath, log(spirits)) nor (alcdeath, log(totalcons)) nor any other pair formed with alcdeath or log(alcdeath) was cointe­grated.

It is obvious, by substantial reasoning, that the predictive direction is from consump­tion variables to alcohol deaths, i.e. that there is no feed-back from alcohol deaths to consumption. We furthermore tested the direction by applying the Wald tests of Granger-causality to the key variables and came to the same conclusion.

The tests and all estimations in the paper were performed either with EViews (9.5) or STATA (10.0) software.

The ARDL model

It is well known that there is a considerable time lag between heavy consumption and death from alcohol at the individual level. We thus wanted to consider lagged varia-

94

Page 104: Symposium i anvendt statistik 2018

bles. We selected the ARDL (standing for AutoRegressive Distributed Lags) model for time series modeJing. There are several advantages to the ARDL-approach. Unlike the VAR (Vector AutoRegression) models where all the variables are endogenous, ARDL treats one ofthe variables, (the dependent variable alcdeath in our application) condi­tional on the other variables. An immediate advantage is the need to estimate a smaller number ofparameters for our short time-series than in VAR. And, also in ARDL with cointegrated variables, we can estimate an error correcting form for the short run (or year-to year) changes and a long-run or equilibrium form ofthe relationship between the variables.

The basic underlying form of an ARDL regression model with a response variable y and just one explanatory variable x, is, in levels

with t = 1 ... T and <p0, 1/Ji, fh Et the regression coefficients and the error term, respec­tively. Themodel above is denoted by ARDL(p,q) with p the number of lags in the response variable and q the number of lags in the explanatory variable. An ARDL model with two or more explanatory variables is analogous.

Specification ofan ARDL -model requires many diagnostics, decisions, andjudgments explained below. Nevertheless we arrived at an ARDL(l,4,4) -model specified in the basic form as:

Yt = (/Jo + 1/J1Yt-1 + f30X1t + f31Xit-1 + /32Xit-2 + f33Xit-3 + f34Xlt-4 + YoX2t

+ Y1X2t- l + Y2X2t-2 + Y 3X2t-3 + Y4X2t- 4 + Et (1)

where t = 1 ... T, y = alcdeath with an added lag, x1 = log(spirits) with lags 0 - 4, and x2 = log(nonspirits), with lags 0 - 4 and <p0, 1/Ji, /Jj, Yi, Et designating the regres­sion coefficients and the error term, respectively.

The specification decisions were: First, in choosing explanatory variables, total con­sumption lead to interpretational difficulties once any of its subcomponents ( e.g. spir­its) were also in the model. Total consumption alone was an inefficient predictor. We decided to exclude total consumption, given that spirits and non-spirits together cover the information in total consumption. Similarly, separate beer and wine consumption variables were excluded from the model.

95

Page 105: Symposium i anvendt statistik 2018

Second, we decided to use a model where the response variable was expressed in lev­els and the explanatory variables in logs (the level-log specification). Taking loga­rithms ofthe response variable caused heteroskedasticity in the residuals and the levels form performed anyway equally well. For the explanatory variables logging was pref­erable for at least two reasons: easy interpretation ofthe coefficients (percent change) and cointegration of spirits and non-spirits with alcohol deaths when in log form.

Third, the basic difficulty is in the specification oflag lengths. We used exhaustive search over estimated models (1), with the Schwartz (Bayes) information criterion, available in Eviews 9.5 software. We ended up with the ARDL(l ,4,4) model (1) above.

We review below basic steps we took in estimation and prediction from the ARDL model, using, not the general notation but the ARDL(l,4,4) model as an example. The presentation is non-rigorous and especially coefficient denotations across the equations are ad hoc. Fora recent general and rigorous, yet concise, review see Eviews (2017). See also the seminal paper by Pesaran et al. (2001).

Two additional forms admitted by the form (1) are presented below. Using the identity for any series Zt = Zt-l + tizt where the first difference operator tizt = Zt - Zt-l and the Nelson-Beveridge decomposition one arrives at the second form, the 'Conditional error correcting form' or CEC of Pesaran et al. (2001):

tiyt = f3o + OoLlxu + 81.ilxu- 1 + 82.ilxu-2 + 83/:ixlt-3 + TJol:ix2t + TJ1l:ix2t-1

+ TJ2l:iX2t-2 + T]3Llx2t-3 + BoYt-1 + 81X1t-1 + 82X2t-1 +Et (2)

with obvious notation. Denoting the Iong run equilibrium relationship by

(3)

we note that the parameters ofthe long run relationship can be extracted from (2) (by assuming all the .il -terms~ 0) as a0 = -/10 /80 , a1 = -8if 80 , a2 = -82 /80 where 80 , 81, 82 are estimated coefficients from (2). Standard errors for the estimators of a 0 , av a 2 can be obtained with usual numerical methods.

We can do cointegration testing using (2) by the Bounds test. It is important that Et is well-behaved in (2). The null hypothesis ofno long run relationship is:

96

Page 106: Symposium i anvendt statistik 2018

against the alternative that H0 is not true. The test statistic is that ofthe simple F-test but its distribution is not F. The Bounds test critical values, derived from a large simu­lation study, are tabulated in Pesaran et al. (2001). The test has significance bounds for all-1(1) and all-1(0) variables. We assume further the Case 2 in Pesaran et al. (2001) where the intercept is restricted to the error correction term. A Bounds test result is given in Table 1.

F-Bounds Test Null Hypothesis: No levels relationship

Test Statistic Value Signif. 1(0) 1(1)

F-statistic 9.196925 10% 2.63 3.35 k 2 5% 3.1 3.87

2.5% 3.55 4.38 1% 4.13 5

Tabte 1. The Bounds test of cointegration for the ARDL(l,4,4) model (4). The F-statistic ( 9.197) is well above the 1(1) 1% bound, indicating a rejection of the nul/ hypothesis of no ca-integration at the 0.01 significance level.

Similarly, with further manipulation, one arrives at the third form, the error correcting form ofthe model. The ARDL(l,4,4) model is in the standard ECM form

flyt = f30llxlt + /31llxlt-l + /32 llxit-z + f33llxlt_3 + y0 llxu + rillxu_1 + y2llxu-z + y3 llxu_3 + aECt-l + Et· (4)

with y,x1 and x2 as defined in (1), /3i, Yj being regression coefficients, ECt the error correction term, a the 'speed' coefficient for the error correction term and Et the usual error term. Please note the missing constant, restricted to the error correction term.

The error correction term ECt-l is the residual ofthe estimated long run equation (3)

ECt = alcdeatht - [a1 log(spirits) t + a2 log(nonspirits)t - a0 ].

Estimation offorms (1), (2) and (4) is straightforward by OLS. The form (4) is used for final estimation and all the predictions given in the sequel.

97

Page 107: Symposium i anvendt statistik 2018

RESULTS

Variable Coefficient Std. Error t-Statistic Prob.

Lilog(spirits )t-l 15.63 4.80 3.26 0.003

Lilog(spirits)t-2 -5.10 5.24 -0.97 0.337

Lilog(spirits )t-3 10.29 4.54 2.26 0.030

Lilog(nonspirits )t 0.79 7.76 0.10 0.919

Lilog(nonpirits )t-1 -22.72 9.06 -2.51 0.017

Lilog(nonspirits )t- 2 -12.14 9.42 -1.29 0.206

Lilog(nonspirits )t-3 -25.86 8.95 -2.89 0.007

ECt-1 -0.63 0.10 -6.23 0.000

ECt = alcdeatht -

[17.44 * log(spirits)t + 67.74 * log(nonspirits)t -104.35)

Tab le 2. The estimated error correcting form and the lang-run form of the ARDL{l,4,4} model given in equation (4). Dependent variable is Lialcdeatht. Shawn are the esti­mated coefficients of the lagged Lllog(spirits) and Lllog(nonspirits) variables, of the lagged error correction variable ECt_1, and their standard errors and t-tests for significance. Note that the constant is restricted to the ECt.

Variable

log(spirits)

log(nonspirits)

constant

Coefficient Std. Error t-Statistic Prob.

17.44 3 .65 4. 77 0.000

67.74 3.15 21.49 0.000

-104.35 8.70 -11.99 0.000

Table 3. The coefficients of the lang run or equilibrium equation, their standard errors and t-tests for significance.

98

Page 108: Symposium i anvendt statistik 2018

The basic estimates for the model are given in Tables 2 and 3. It is noteworthy that the speed parameter of error correction term ECc-l is negative, fairly large in absolute value (-0.63) and statistically highly significant. ECc_1 works as a negative feedback, adjusting shocks in alcohol deaths rapidly (-0.63 meaning in about a year and a halt) back towards equilibrium, given by the estimated equation

alcdeath = -104.35 + 17.44 * log(spirits) + 67.74 * log(nonspirits). (5)

The equation (5) describing the long run relationship between the response and the explanatory variables, shows that, ceteris paribus, a one percent increase in per capita spirits consumption tends in the long run equilibrium towards an approximate increase of alcohol deaths by 0.17 4 deaths per 100 000. Similarly, a one percent increase in non-spirits percentage tends to increase alcohol deaths by 0.67 deaths per 100 000.

Figures 2 and 3 show the fit (observed and predicted values) ofthe ARDL(l,4,4) mod­el for the levels (Figure 2) and annua! changes (Figure 3). Tue fit is good: approximate R2 = 0.75, RootMSE = 1.35 for model (4).

1970 1980 1990 2000 2010 2020 year

--- Alcohol deaths - - - - - Predicted alcohol deaths - - - - - upper .95 prediction limit - - - - - lower .95 prediction limit

Figure 2. Observed and predicted values in levels with .95 prediction limits from the estimated ARDL{l,4,4) model. Themodel coefficients are given in Table 2

99

Page 109: Symposium i anvendt statistik 2018

~(") Q._

UlN E ~~ u (50 .r. a~ 0 I

roN C I

Cl) (") Ol I c ~"f OIO

I

ID I

1970 1980 2000 2010 year

--- Change in alcohol deaths - - - - - Predicted change

Figure 3. Observed and predicted values in first differences from the estimated ARDL{l,4,4) model. Themodel coefficients are given in Tab/e 2

Forecasts from the ARDL model

An out-af-sample forecast from 2005 on

2020

Apart from the ex post predictions (fits) in Figures 2 and 3, we demonstrate out-of­sample alcohol death forecasts . First, we remind the reader that alcohol deaths in­creased until about 2007 and have decreased ever since (Figure 1). To test the forecast­ing capability ofthe model in out-af-sample forecasting we re-estimated the ARDL(l,4,4) model using only the years 1969 - 2004, and forecast the alcohol deaths for the years 2005 - 2015, using only the 1969-2004 model and observed alcohol con­sumption for 2005-2015. The result is presented graphically in Figure 4. It can be seen that the model is able to forecast the peak and the decrease foliowing 2007 well.

100

Page 110: Symposium i anvendt statistik 2018

0 IJ)

0 0 q IJ)

0'<3" 0

0 <"l

1995 2000 2005

--- Alcohol deaths - - - - - upper .95 p~ediction limit

1-" I '

year

' ' ' ,_ __ "'\

2010

\ \

\ \

\

2015

Forecast alcohol deaths lower .95 prediction limit

2020

Figure 4. Observed alcohol deaths and forecast values from a ARDL(l,4,4) model es­timated on the 1969- 2004 data. The out of sample forecasts with prediction limits are for the years 2005 - 2015.

A policy change forecast for 2016-2020

At the time ofthe writing the 2016 consumption figures are already available but the number of deaths is yet to be released. During a recent policy discussion it was esti­mated that a future change in alcohol policy in Finland would increase the total con­sumption by up to 6% (7.56% in non-spirits consumption, 0% in spirits) through in­creased non-spirits consumption (Miikelii and Osterberg (THL), 2016). Another calcu­lation (Taloustutkimus (TI), 2017) suggested that the total consumption would in­crease by 0.5% (with 2.9% increase in non-spirits and 5.5% decrease in spirits).

To show the predicted effects ofthese suggested changes we assume three counterfac­tual scenarios. 1) Consumption remains frozen at the 2016 level (the NORMAL sce­nario) 2) Consumption increases 6% in 2016 and remains at that level (the THL -scenario) 3) Consumption increases 0.5% and remains at that level (the TT-scenario)

101

Page 111: Symposium i anvendt statistik 2018

We then use our ARDL -model to predict the number ofalcohol deaths in 2016 -2020 under the three scenarios. The results are shown graphically in Figure 5. Prediction limits (.95) for all predictions are predicted value +/- ~ 3. 6-3. 7

We add that the predicted year 2020 values in Figure 5 (NORMAL=34.31, THL=38.37, TT=34.72) are already very close to the long-run values obtained from equation (3): NORMAL=34.14, THL=39.13, TT=35.10. Fora similar counterfactual forecast, where consumption variables are not frozen but first forecasted, see Poiko­lainen and Alanko (2017).

2012

• . .. · .. · .... " .. " ..

. " ..• """.". ".

2013

"" .. .. ·" ........ .

2014 2015

Frozen consumption 2016 - 2020

". ... ,,. . . ".,,,,. . ...... , ., ·'· ·" ...... '

;·;·'

··:"-:"." .. "" .. "."."":;;;-......... """. / ""'....:·""""."."_ ...

2016 Year

__ ..,,, , __ - "".

2017 2018 2019 2020

-·-·-·- Predicted deaths {THL) --- Predicted deaths (TI)

""" ... " .. ". Predicted deaths (NORMAL) --- Alcohol deaths (observed

Figure 5. Predicted a/cohol deaths under three scenarios THL (6% increase in total consumption}, TT (0.5% increase in total consumption) and NORMAL (no change)

from 2016. All predictions under the assumption of frozen (relative to 2016 level} con­sumption in the years 2016 - 2020. 95% prediction limits (not shown) for all predicted

values are about +/- 3.6-3. 7.

102

Page 112: Symposium i anvendt statistik 2018

DISCUSSION

Earlier studies of aggregate alcohol consumption have applied mere differencing and/or Box-Jenkins modeJing. We showed that error correcting ARDL models, stem­ming from econometrics, are also useful in alcohol studies and Iead to improved preci­sion. In particular, a good fit to the Finnish data was obtained by the ARDL(l,4,4) model. The existence of clear and significant cointegrating relationship is in itself im­portant, and enhances the credibility of a stable association between consumption, beverage type and alcohol deaths.

In forecasting, we were able to show that the model is able to predict out-af-sample the downward turn in alcohol deaths in 2007. A counterfactual experiment of a permanent 6% increase in total consumption (7.5% in non-spirits consumption) predicted a clear, slow increase in alcohol deaths whereas the 0.5% increase did not predict a significant increase.

One should note that our data base for time-series analysis is annua! aggregate data and that therefore the number ofobservations available (N=47) is limited. Much ofthe inference would improve on the one hand on having longer series than ours - although ours is as long as that in the earlier studies mentioned in the introduction - on the other hand 4 7 years can be a (too) long period to be stable from an epidemiological or socie­tal point ofview. We did not include unrecorded consumption in our data for several reasons, see Poikolainen and Alanko (2017).

Alcohol-specific deaths are a category containing etiologic diagnoses, that is, alcohol is mentioned in the disease name. Therefore, alcohol is a necessary cause. It is not a sufficient cause, since death is caused by many factors. The accuracy in ascertaining these deaths depends on the judgement ofthe cause-of-death determination which re­mains unknown. Attributions to alcohol may be under- or overestimated. Each revision ofthe ICD has provided more diagnoses with alcohol etiology which may have in­creased the number of alcohol-specific deaths.

It is unknown how much these results can be generalized. The possible effects are like­ly to be time and country specific, rather than universal, but we do not know that.

REFERENCES

Delcher C, Maldonado-Molina MM, Wagenaar AC (2012) Effects ofalcohol taxes on alcohol-related disease mortality in New York State from 1969 to 2006. Addict Behav 37:783-789.

103

Page 113: Symposium i anvendt statistik 2018

Duffy, J. (1986), Tue Distribution of Alcohol Consumption - 30 years on, British Journal of Addiction 81(6):735-48.

Eviews (2017), Eviews 10 User's Guide: Advanced Single Equation Analysis: Autoregressive Distributed Lag (ARDL) Models: Background, http://www.eviews.com/help/helpintro.html#page/content%2Fardl­Background.html%23 , accessed December 12, 2017

Johansen S. (1995) Likelihood-Based Inference in Cointegrated Vector Auto­Regressive Models. Oxford: Oxford University Press.

Kerr WC, Fillrnore KM, Marvy P . (2000) Beverage-specific alcohol consumption and cirrhosis mortality in a group ofEnglish-speaking beer-drinking countries. Addiction 95:339-346.

Kerr WC, Ye Y. (2011) Beverage-specific mortality relationships in US population data. Contemp Drug Probl 38:561-578.

Miikelii P, Osterberg E. (2016) Milen hallituspuolueiden tekemtit a!koholilain uudistuksen lirljaukset vaikuttavat alkoholin kulutukseenja kansanterveyteen? THL. Available at: https://www.thl.fi/fi/web/alkoholi-tupakka-ja­riippuvuudet/alkoholi/usein-kysytyt-kysymykset/politiikka/miten-hallituspuolueiden­tekemat-alkoholilain-uudistuksen-linjaukset-vaikuttavat-alkoholin-kulutukseen-ja­kansanterveyteen. Accessed October 27, 2016.

Norstrom T. (2001) Per capita alcohol consumption and all-cause mortality in 14 European countries. Addiction 96: Suppl l:S 113-128.

Poikolainen K, Alanko T . (2017) Population Alcohol Consumption as a Predictor of Alcohol-Specific Deaths: A Time-Series Analysis of Aggregate Data. Alcohol and Alcoholism 52:685-91.

Pesaran MH, Shin Y, Smith RJ. (2001) Bounds testing approaches to the analysis of level relationships. J Appl Econom 16:289- 326.

Skog 0-J (1985), Tue collectivity of drinking cultures: a theory ofthe distribution of alcohol consumption, British Journal of Addiction, 80, 83-99

Taloustutkimus (2017), Alkoholilaki; selvitys vaikutuksista. http://www.ptv.fi/fileadmin/user upload/tiedostot/Tiedotteet/Tiedotteiden liitteet 201 7/Alkoholilaki selvitvs vaikutuksista Taloustutkimus 2017.pdf. accessed December 1.2017.

104

Page 114: Symposium i anvendt statistik 2018

Predicting TV viewing with weather data

Matilde Biil Røndbjerg, Dept. of Digitalization, CBS

Niels Buus Lassen, Dept. ofDigitalization, CBS

1. Introduction

Television is a big media platform in Denmark. This paper analyses the weekly weather's effect on weekly television use. In bad weather, many outdoor activities become impossible or uncomfortable to practice which might affect indoor activities such as television use. The important question for the research was whether the Danish weather is affecting the amount of time Danes are watching television. This is answered by analysing the TV data itself, looking at relations and correlations between the weather conditions and viewing time and building a model to predict how low Danes are watching television. This is analysed by combining visual analytics and predictive analytics. The data contains 7 years of weekly TV and weather data (N=354). Even without variables for sunshine and cloud cover the multiple regression model found in this paper had an Rsquare of 0,88. Temperature, wind and the number of days with rain, thunder and snow affects TV viewing the most. Surprisingly the amount of rain and fog does not affect the weekly television viewing time.

In 2015, 92% of the Danish households owned at least one television and an average person over the age of 3 in Denmark watched TV approximately 172 minutes a day 1•

In 2016 the TV viewing for same group is down by 8% to 158 minutes a day. https://www.statista.com/statistics/438269/average-daily-tv-viewing-time-in-denmark/ It is expected TV viewing in 2017 will drop again, based on declining trend shown in this paper.

The classical TV platform is continuously loosing market shares to streaming services like Netflix, HBO, Hulu, Amazon etc. The critical factor is flexible access and selection, and classical TV do not have the winner's formula here. Classical TV is still one ofthe most preferable media platforms in Denmark. By using this platform in the right matter and directions there is a lot of viewers to affect and a lot of money to collect2• Because television is such a big platform it is a good

1 (Kulturstyrelsen, 2016) 2 (Stein, 2006)

105

Page 115: Symposium i anvendt statistik 2018

opportunity for different firms to reach out to a big and broad audience. The platform can be used by different firms to promote their produet and the TV channels can eam profit from both the viewers and the promoters3• But to reach the right or biggest audience profiling the viewers can give a sense of what and when to broadcast a specific commercial or program. Since 20 I 0 the weekly viewing of traditional television has declined. As this article, will show later, there is not only a declining trend, but also a very strong seasonality in the viewing data. The three !argest TV channels in Denmark includes DR!, TV2 and TV34• Profiling the viewing behaviour can be a value for channels to get an advantage over the others and for advertisers toget the highest possible outcome. Since 2010 where this paper will start its research a lot has happened to the media on every platform. The number of TV-channels is increasing5 and a lot of the contents is highly distributed on other platforms than the classical TV6. It is much easier to get access to programs days after the TV broadcast and even before the broadcast on the TV or on the phone, tablet etc.7•

According to Gallup streaming is increasing8. All ofthis makes it easier for viewers to enjoy a hot day out without giving up their favorite TV-show, but also to enjoy a rainy day inside with a lot oftheir favorite shows.

This leads to the problem formulation for this paper: How and to what degree does the weather affect the average amount of time Danes are watching television? To answer this problem the foliowing research questions will be examined: Does the television data itself show sign of weather dependency? How do different weather phenomena correlate with the television viewing time? Is it possible to predict the television viewing time with weather data as the explanatory variables?

3 (Stein, 2006) 4 (Kulturstyrelsen, 2016) 5 (Kulturstyrelsen, 2016, s. 14) 6 (Kulturstyrelsen, 2016, s. 25) 7 lbid 8 lbid

106

Page 116: Symposium i anvendt statistik 2018

2. Briefly on the existing literature

Many Academic Articles these days are describing the big change in weather as a result of global warming. The latest Article in the DM! database describes how the precipitation in Greenland is increasing due to global warming9. But like this paper there is also articles about how the weather changes other variables. Already in the 191h

century media have speculated about the weathers influence on elections. There is still no clear answer. Knack (1994), Gomez et al. (2007) and Keele and Morgan (2013) and many others show mixed results. Knack (1994) examines three elections and finds that only people who feel they have a low civic duty seems to be affected by rain. Gomez, Hansford, & Krause (2007) examined 14 presidential elections with data from 22.000 weather stations and election data for over 3.000 counties. One of the results they found is that 1 inch rain decreases the participation with approxematly 1 % and 1 inch of snow decreases turnout by close to 5%. According to the article the weather could be a part of the reason that the republicans won in 1960 and 2000. They discovered that the Demokratic voters tends to be more affected by the weather. The two elections were very close and therefor the weather might had an impact on the outcome. It is very difficult to find any Academic articles, that examines exactly the weather conditions and television viewing from the 21 th century as this paper does. Eisinga, Franses, & Vergeer (2010) examines the daily weather conditions on daily television use in the Netherlands from 1996 to 2005. Their focus opposite this paper differentiate between different types of TV programs. One of the their conclusions is that in uncomfortable weather people spend more time watching entertainment programs but the same weather conditions together with information programs has the opposite effect. Eisinga et al. (2010) assumptions for the report results is that wether people choose to watch sedentary television is a function of both alternative activities (dominated by weather) and TV content. This is contradictory to models dominated by activity considerations like this paper and Bernett et al. (1991), where we assume that people first choose wether or not to watch television according to other activities and then choose which program to watch. In comparing the article with this paper also the researched periods have to be taken under consideration. As said media is changing very fast and it it therefore difficult to compare ten year old results with results from today. Barnett et al. (1991) concluded, that there is a clear seasonality in television viewing and found a model with a Goodness of fit R2 = 0,9988. To explain this seasonality they used weather data containing measures of daylight, temperature and preciption and found all three signifikant.

9 (Mernild, 2014)

107

Page 117: Symposium i anvendt statistik 2018

3. The data and methodology

For this project two kinds of data have been used, weather and viewing time. For the data conceming television this paper works with data from TNS Gallup's TV­meter. The frequency of the data is reflecting the limited access to TV-data. The dataset therefore contains weekly data. The numbers show how much an average Dane over the age of 3 is watching television that given week. The media data used for this project is the only public available data which is reported and shared by Gallup on a weekly basis. These reports are made by register 1.018 households with approximately 2.129 people over the age of 3 years. The sample is made exactly to represent the Danish population. The weather data is copied from Wunderground.com. The data is based upon not only the data their meteorologists provide but also by real-time reports from different users of the site. By combining the data Wunderground.com can make their predictions and collections ofhistorical data. The data is collected for Copenhagen covering the period from 111-2010 until 30/7-2017 and 10 weather phenomena (Temperatures, rain, wind, dew, pressure, humidity, visibility, snow, fog & thunder) with a daily frequency with average, highest and minimum values. The amount of rain is not recorded from 2010 to March 2012 which means that all the relations regarding the amount ofrain is based on the period from April 2012. However, the occurrence ofrain is presented for every week in every year.

3.1 Pre-processing methodology After the media data were copied it had to be converted from text to numerical data. Because ofthe inconsistency ofthe frequency ofthe weather and TV-data. the weather data needed to be transformed into a weekly frequency. Both datasets were searched for missing values and values that does not seem to fit the nearby values. Additive decomposition was used to find the season and trend for the Y-variable. This method should be used because as said there is a clear negative trend in the TV­viewing data. The method follows the form Y = T + S + C + I. The T is the trend component, S the season, C the cyclic component and I is an irregular component. Each component is found one by one starting with a centered moving average, here a CMA 52 because the data is reported weekly. Because ofthe additive form the trend components was subtracted from the values when estimating different models, and added again to the predicted values to compare them to the real values. Furthermore, the rain variable was converted into a dummy making it possible to show how many days rain had occurred a given week. The number of days with rain, thunder, snow and fog was like the number of rainy days calculated into one variable by using dummies (0,1). When one ofthe phenomena occurred a I was registered. The sum of dummies then showed how many days one ofthe phenomena appeared.

108

Page 118: Symposium i anvendt statistik 2018

3.2 Methodology

Business understanding --+ Prob~founul1t1:111 and --+ search fil< databases --+ Collett data; and researdl researcil question;

Transfurm data in Em!: -Aggr<gille daify IUB3lher data to

week!y data -<:cnvertmedia data fromtextto llllllhers

fgure 1:

--+ Deseasonalizeand d~trend data

Copy and paste lnto Excel

V11111li11tlon

--+ ~ø•ra•t•Wt1ttitr /I ) and media data lmo a

11n1le table "Il r-:::=--i ...._______, ~

(~) First a visualization of the data was conducted. The relations found was taken further on into building a prediction model and then testing it and using it. All the way through the process there has been an interdependence between the three steps. The results from one of the steps was also included in the other steps with the purpose of tinding the weather phenomena that have the biggest explanatory effect on the television use. To investigate the relations further on both time and the weather variables were cleared for trend and season. Both an additive decomposition and a multiplicative method were used. The relations between time and the different phenomena was pictured indifferent visual diagrams and plots. To investigate the relations further both time and the weather variables were cleared for trend and season. Both an additive decomposition and a multiplicative method were used. The relations and correlations were registered and the variables with no visible relation and correlations < 0,2 were excluded from the data set. Left were 30 weather variables. The remaining data were converted in to the SAS JMP program for further modeJing. From the diagrams in Excel, the relations found all seemed linear and a linear equation give the highest R2

so it was decided to fit a multiple linear regression in JMP. Many different models were conducted. Some with both the y and x-variables detrended and deseasonalized, some where only some of the data were detrended and deseasonalized and som without any transformed data. All the 30 variables and the timelags were included in JPMs fit model function. One by one the variable with the most insignificant coefficient (p-value > 0,05) was excluded. The exclusion was compared with their correlations and visual diagrams to explain why they should be excluded. Models with only significant variables was also checked for multicollinearity (VIF> 10). The models that passed through was compared on R2 , R2

109

Page 119: Symposium i anvendt statistik 2018

and their predictive performance for 2017.

3.3 The regression model

The models we focus on are of the class of multiple linear regression models. The general model takes the form: Yt = {30 + L /3; W;, +LY; lagsY;, + trend + errort. Where y is the TV-viewing time. For the prediction model the lagged values ofy are used, here represented by the notation E Y; lagsY;,. Because the dataset contains data for seven years it was possible to extend the model with more than one lag of each y value. The W relates to the different weather measures. Furthermore, a trend component and an error component is included in the model.

3.4 Delimitations The assumptions of this big data project are that people choose between whether to watch television or to engage in another activity. This means that this paper does not work with TV content as an explanatory variable. This will be explained and discussed later in this paper. The results from this project is based on an average weekly television use. This means that this report does not examine the number of viewers or what time of the day they watch television. This is important to hold in mind. The tindings might not be optimal if you want to know when to reach as many viewers as possible but more when you have the biggest chance to be seen on screen. Furthermore, this research delimits itself from separate the viewers in to different clusters. The peoples geographical and demographical differences do not play a substantial role in this project. It is recognized that the TV behavior is different according to a lot of different factors. Specially age has a big impact on how much people are watching television 10•

3.5 Conceptual Framework It is a complicated process to forecast the weather. To get the most accurate prediction you should use data from all over the world 11 • This is because wind, temperature ect. after some time, can have influence on the weather other places around the world. The most accurate forecasts can be made one week ahead12• Specially rain and cloudburst are difficult for DMI to predict. In Denmark Danmarks Meteorologiske Institut (DMI) is the institution assigned by the ministry to forecast weather. DM! gets their data from different instruments some of them are satellites, radars, webcams and therrnometers around the world13• Humans cannot control the weather we can only measure it and try to make models to describe it and forecast it. Some experts and journalists see this as a

10 (Kulturstyrelsen, 2016) 11 (Houghton, 2004, s. 79) 12 lbid 13 (Hansen & Bech, 2016)

110

Page 120: Symposium i anvendt statistik 2018

reason to why people is so interested in the weather. DMI is one of the most visited websites in DK and the weather forecast on TV is getting longer and broadcast more frequently 14• A lot of different experts, commentators and journalist have made articles about how and why the interest in weather is big. The basic assumption for this paper is that when the weather is bad people tends to be more inside. Television viewing is an activity that you can find inside, and its possibilities does not depend on the weather outside, therefore bad weather is expected to increase the weekly average viewing time. This is already well known. People tends to watch less television when good weather conditions permits people to be outside because they instead will do outdoor activities15. Many researchers have also tried to examine this explained by mood behavior theory. The weather has an impact on people's mood, which creates the need for different activities 16. For example, mood affects individual decision-making activity to turn out in an election day and in bad weather people tends to vote for (in their opinion) the Jess risky candidate17. The mood is also affecting the types of television program we choose. According to Zillmanns mood Theory and Jantzen & Vetner (2008) the use of media and entertainment is driven by the individual's desire to be in a suitable mood. This means that in bad weather people tends to watch entertainmant programs to decrease a bad mood. So both alternative activities and mood related to the weather seems to have an impact on how much we are watching television.

3.6 Concepts that inform the data analytics methods and techniques To examine this topic this paper will combine visual analytics and predictive analytics. The two types of analytics can work together to answer the problem formulation and research questions. To make the data ready for analysis and to build the best model different types oftechniques and methods is used. The research question contains both supervised and unsupervised learning. Tue profiling of how people watch television contains unsupervised elements because there is no specific target. But the main target for this report is also to predict the exact TV viewing time (target). Since profiling is generally unsupervised and regression are solved with supervised methods 18 both are included in this project. For the visual analytics Excel and SAS JMP was used in this project. The right choice of visual channel depends on the type of data. This analysis only contains quantitative data. The best visual charme! to that type of data is position, Length, Angle, Slope and Area19• Therefore, different types of graphs and Time-Series Data was used20.

The model made is a multiple regression prediction model. At multiple regression model provides an estimated linear equation to predict the dependent variable Y (TV

14 (Remar, 2013) 15 (Roe & Vandebosch, 1996) 16 (Denissen, Penke, Butalid, & Aken, 2008) og (Howarth & Hoffman, 1984) 17 (Bassi, 2013) 18 (Provost & Fawcett, 2013) 19 (Munzner, 2009) 20 (Heer & Bostock, 2010)

111

Page 121: Symposium i anvendt statistik 2018

viewing time) as a function of multiple independent variables X;. The general model takes the form Y = {30 + {31x1 + {32X2 + ··· + fJn + Xw The predicted value depends on the effect from the variable individually and in combination. In this project the model was also used to see the marginal change in the dependent variable related to changes in the independent variables. The estimated coefficients ({J) shows this change in Y ifthe associated variable (X) changes by one, unit all else equa!21•

3.7 Decomposition Additive decomposition was used to find the season and trend for the Y-variable. This method should be used when the series has constant fluctuations over time and a linear trend, which we will see tater on in the results (Time Series). The method follows the form Y = T + S + C +I. The T is the trend component, S the season, C the cyclic component and I is an irregular component. Each component is found one by one starting with a centered moving average, here a CMA 52 because the data is reported weekly. Because of the additive form the trend components was subtracted from the values, and added again to the predicted values. The opposite of additive decomposition is multiplicative decomposition. This should be used ifthe fluctuations get bigger over time. The structure of this method is Y = T · S · C · I . The process is similar to the one for additive decomposition but instead of addition and subtraction the component is find with multiplication and division. This method was also used in the analysis but did not result in model better than with additive decomposition.

4. Models for the TV viewing

djSummary af Fit RSquare RSquareAdj Root Mean Square Error Mean of Response Observations (or Sum Wgts)

djAnalysis af Variance

0,887217 0,883867 0,768473 0,089177

209

Sumaf Source DF SquiR!S Mean Square F Råio Model 6 938,4164 Error 202 119,2912 C. Total 208 1057,7076

~I Parameter Estimaæs Term Estinlilte Std&ror lntercept -4,609258 0,800367 Tempavg -0, 124516 0,014524 Windavg 0,0714982 0,014047 Incident 1 0, 1333367 0,039114 Rainydays 0,0818174 0,033038 Lag1 0,171393 0,034891 Lag 52 0,3464839 0,039191

156,403 264,8423 0,591 Prob > F

<,0001*

tffåio Prob>ltl VIF -5,76 <,0001* -8,57 <,0001* 2,8840225 5,09 <,0001* 1,4555664 3,41 0,0008* 1,5592769 2,48 0,0141* 1,6650026 4,91 <,0001* 2,9292682 8,84 <,0001* 2,996997

Efrlf€ r?1~1r@lffl!~i8°1i'Wi~.~3) I

112

As mentioned SAS JMP was used to analyze over 1000 households' television viewing time per week. I all ended with a multiple regression prediction model. 209 data points for every variable was analyzed. The statistical output from SAS JMP is shown in figure 2. The dependent variable used is the detrended version of the television viewing time. This means that to get the actual predicted value the trend must be added. Because of the hold out data and the method additive decomposition to find the trend the detrended y-variable only

Page 122: Symposium i anvendt statistik 2018

has 209 data points to analyze. To get a simpler model the number of incidents (events) and number of rainy days was categorized as a numerical continuous value. This gave a simpler model because of fewer parameters and only decreased the R2 and adjusted R2 with 0,002. Statistically it would be more correct to categorize the values as ordinal, but since the model did not perform much better simplicity was preferred. There is a strong and documented correlation between the amount of time Danish people watch TV, weather and earlier viewing time with the Rsquare of 0,8872 and all VIF values under 10. Picture Il shows a part of the actual time an average Dane watch television in one week and the models predicted values. All the parameter estimates fit the relation found in the 5 meaningful facts. With positive estimates indicating a positive correlation and negative estimates indicating negative correlation. The number of training data fell to 40 because of the trend method, 16% of the overall number of data.

35,00

30,00

10,00

5,00

0,00

2010-2016

igure 3: Actual viewing time and predicted time 4.1 Actionable lnsights The results from this report is as mentioned

earlier not a recipe to find the optimal time to reach out to viewers on TV. The results showed than when it is cold, windy or it rains, snow or there is thunder, people watch television for a longer time. The chance of being watched is bigger because people watch more television, but it does not have to be the times where most people are watching TV. If the company that want to reach out through television can choose different days to broadcast, they should choose days in the winter time where it is expected to be very cold and windy. They should not base their chose on the amount ofrain, but whether it will rain and for how long. By supporting this report with an analysis on how many people are watching TV a company could get a good tool for strategically choosing when to use TV as a way to reach out.

113

Page 123: Symposium i anvendt statistik 2018

5. Visual analytics In this section the visual analysis that Jays behind the modeiling will be provided. The model visualizations were used to explain and evaluate the tindings of the later generated models.

5.1 Temperature In Figure 4 we see the relation between the average weekly viewing time and the average temperature. The output shows a clear negative correlation between the average temperature and the amount of time we watch classic TV. With a calculated correlation of -0,76, it is clear that a higher temperature fits with lower viewing time. Because R is close to -1 the relation is strong. This is not a big surprise and has been shown in earlier research from other countries. In the Time Series (Figure 5) of the two variables the negative correlation is also visible. There is a clear pattem showing that low/high television values are appearing with high/low temperatures. The faet that temperature seems to have a great impact is included in the further modeJing of the predictive model.

24

. ! 22

!r i 5 20

18

16

14 -10

35,00

30,00

25,00

20,00

15,00

10,00

5,00

(5,00)

(10,00)

igure 4: Average temperature and TV-viewing time

..

'.

-,--,-~--··~

-5 0 5 10 15 20 Tempavg

--5eertid omregnet --Temp avg

igure 5: Average temperature and TV time 2010-2017

114

Page 124: Symposium i anvendt statistik 2018

5.2 Wind Wind is another weather phenomona that has been object to earlier research. Here the values are given in km/hour. The correlation between the average wind and television viewing time for 2010-2017 is 0.450. This is lower than correlation for rain, but also here there is a relation. In last five years the correlation has been close to or above 0.600. As shown in figure 6 the data points are more scattered. The round area captures 90% of the points and shows a positive relation between the variables, which the correlation also showed. It shows that higher wind speed is related to a higher TV­time. This relation, as shown, have points which could be categorized as outliers because they are located far away from the main portion ofthe points.

30 1

____ TV_ viewing tim~ vs. Wind avg

.E

28i 26 j 24

gi 22

:~ ~ 20

18

16

\ . · . . . ~.

: .. · ." ". .· ~ . . ·• "." ••... ! ": ....

·. ·.:-r,:.: .· ·'· . . ·:". ~ ••• _ •• ,. ·:;~. ~ \1 : " • •• ,t• • ......-.~ •• " • •

. .. . .,, ,tt ".r .... • • • ia:. ~ t·~~· a" •. • ·"· . .-:" .. ": ... · .. . . . , . " ... ,, .. ~ .. ·" ~· .-: ~ ··' . . .". . . .

-

14

12

" . "

c---,--- ---r--.----~ J 10 15 20 25 30 35

Wind avg

jRgure 6: Average wind and TV time I

It is not surprising, that there is at relation between wind and television viewing time, since earlier research has already stated that. But it is surprising to see that the two curves (figure 7) seem to be more alike in the present than it was five years ago. The wind has an almost stationary trend while the time average is decreasing. The wind generally has bigger fluctuations, but the two curves follow almost the same pattern with their highest and lowest values at the same weeks. Without the trend in time and the big fluctuations the curve's course seem more alike. The faet that the curves seems to fit better when the time is detrended are with the relationship taken in the considerations when building the predictive model.

115

Page 125: Symposium i anvendt statistik 2018

40,00

35,00

30,00

25,00

20,00

15,00

10,00

5,00

1W~H~%3Ullm~a5~ll~G~ 7 ~~~~~9Ull~~2llW~~~ 3unm~

--TV-view ing time (hr) --Wind avg (km/hr)

igure 7: Average wind and TV time 2010-2017

116

Page 126: Symposium i anvendt statistik 2018

5.3 Precipitation

The next weather phenomena that ended up being significant in the predictive model is the number of rainy days and not the amount of rain. This is surprising because ealier research came to another conclusion. Both Bamett et al. (1991) and Eisinga et al. (2010) uses the amount ofprecipitation in their articles. From an activity point ofview the exisents of rain make several outdoor activities impossible or unconfortable and

TVviewlngt1mevs.Prec1p1tation have negative impact on our 30 mood and therefore makes us

: . 2S • • •.::,

"$.-~ ... ""~--~· .. :r :~.~.:;. ~ ~.

~20 ~ ····:·, "r.:.... . .• ."" .. . .. • ·<· . . ·- ..... z . "·· . l S •.

..

jFigure 9: Numberofrainy days and TV time 2010-2017 I 6

Precipitation

flgure 8: Precipitation and TV time

10

watch more television. But this research found no statistical evidence to support that hypothesis. In figure 8 the TV time is plotted against the amount of precipitation. No clear pattem seems to exist. There is no sign of correlation or common variations. The time and geographical !imitation might be

the reason for this. But also, the

amount of time it is raining could be an explanation. A lot of rain can fall fast while a small amount can fall over a longer period. But for the number of rainy days the hypothesis is supported. Since the independent variable only can take on eight values the average viewing time for each value was calculated. At figure 9 the relation between the number of rainy days and average TV time is conducted.

117

Page 127: Symposium i anvendt statistik 2018

5.4 Events The number of days with rain, thunder, snow and fog was like the number of rainy days calculated into one variable by using dummies (0, 1 ). The sum of dummies then showed how many days one of the phenomena appeared. In general weeks with many days where one the phenomena appears also have a high average watching time. A

_..__ Events without fog Events without fog and thunder

24.~ I

~~:_:J 27

. _A ··-·· •

,,..· .>···-- .. ..-

-Rain

- soow

- Thunder .

rigure I I: Weather events combined

JFigure I 0: Weather events Number of days with phenomena

surprising discovery was a decreasing value at 2 days with a phenomenon. This was the result from the impact of the fog. If fog has appeared two days in one week people watch less television than on weeks with only one foggy day and fog is the most representative with two days' appearance. Since there seem to be no pattem in the fog data, it would only work out as noise and was removed. Days with thunder was also removed. Weeks with several days of thunder tends to have a lower average TV­viewing time. By including this weather phenomenon, the effect of the remaining events would be decreased. Left is the two kinds of precipitation, rain and snow. The more days in a week with these weather events the more people tends to watch TV.

5.5 Time lags The Time Series of the television data showed clear autocorrelation in the ACF plot. By looking at the lag plot for the linear lag relations columns with lags of 1, 2, 4 and 52 weeks of the time variable were made for both the original TV-viewing time and the de-trended version. The analysis showed that the viewing time lagged 1 and the de-trended time lagged 52 weeks is significant. By including the lagged values, the R2

rose by with 0,08 from 0,808 to 0,887.

118

Page 128: Symposium i anvendt statistik 2018

30

14 16 18 20 22 24 26 28 30 l.ag(TVvi..,;ngtime. [JY'iJ

(LagPlot

30

28

• 26 E

·.p 24

·f 22

~ 20

18

~~~~~--~~~~~

16

14+..,~~~~~~~~~.-'

14 16 18 20 22 24 26 28 30 l ag{lVvicwing time , LJJ•

igure 12: Picture 9: Lag plot for lag 52 and lag

119

Page 129: Symposium i anvendt statistik 2018

6. Neural Networks We made two Neural Networks in SAS JMP, one with the original 6 input variables from the multiple regression, and one with additional 5 input variables, totaling 11 input variables. Tue 5 additional input variables were Rain per day, Dewpoint, Visibility, Windchill & Humidity.

EJNeural •IEJNeural falidation: Random KFold Validation: Random KFold

l>jModel l.aunch I ~!Model l.aunch I

El Model NTanH(3) ~Model NTanH(3)

A Trainin ~,Trainin ..o1!validation I ~Validation 4@:iiiewing time I ..o1; rvviewing~ TVviewin time 41TVviewing~

Meuure1 Value Meuur• Value Meuu,.. Volue Mouures V•lue

RSquare 0,8843896 RSquare 0,9005959 RSquare 0,8958839 RSquare 0,9252757 RMSE 0,9530925 RMSE o,1n3239

RM5E 1,0024015 RMSE 0,9039935 MeanAbs Dev 0,7354588 MeanAbsOev 0,5943017 Mean Abs Dev 0.7640602 MeanAbsDev 0,7404695 · Loglikelihood 302,96784 -LogUke1ihood 64,187221 -Loglikelihood 314,11552 -Loglikelihood 72,490297 SSE 200.75314 S5E 33,232784 5SE 222,06275 SSE 44,946231 Sum Freq 221 Sum Freq 55 Sum Freq 221 Sum Freq 55

igure 13: SAS JMP output, 6 input variables an the lefl, and 11 input variables an the right.

Interpreting on both Rsquare and RMSE, it can be seen 11 input variables gives a slightly better model in the Neural Networks. We have seen this pattem in Neural Networks for several other large datasets. Neural Networks often performs better with additional input variables, that were insignificant in the multiple regression model. The Neural Network is a!so a performing better than the Multiple Regression model on the same 6 input variables. Rsquare is 0,88 on the training data with 6 input variables for both Multiple Regression and Neural Network, and Multiple Regression has a little better RMSE, 0,77. But on the same 20% 5Fold-Cross Validation data, Neural Network performs significant better. Rsquare for the Neural Networks are 0,9 & 0,93 with 6 & 11 input variables. The Multiple Regression model obtains Rsquare on 0,84 on the same 20% 5Fold-Cross Validation data with both 6 & 11 input variables.

l[!>i!gram lj_fil_ ram

igure 14: SAS JMP output for the 2 Neural Networks, 6 input variables an the lefl, and 11 input ariables an the right.

120

Page 130: Symposium i anvendt statistik 2018

7. Summary and conclusion To summarize over seven years of weekly television use was analyzed together with daily weather data for several different weather phenomena. Each variable contained 354 values from 2010-2016 week 41. The weather data was summed in to weekly data and the different phenomena was graphically visualized up against the TV data and used in SAS JMP to build a multiple regression model to predict the average weekly viewing time based on these weather events. It was possible to answer the research question based on a combination ofvisual analytics and predictive analytics: The TV Time series showed a clear declining trend that none of the independent variables had. This was also shown by the faet that the hest regression model was built upon the detrended dependent variable. By using an additive decomposition, the trend found had an average decline of 0,0163 hours per week. The television data suffered like all data from seasonal variations. The variations were placed in a pattern very similar to the Danish weather events' variations, which can be a sign of weather dependency. The weather phenomena with the highest correlations to the TV data was the average temperature and wind. These two events therefore seem to affect TV use in Denmark. Rain and dew point had the lowest correlations with the independent variable indicating that these events don' t affect the average time Danes are watching television. Surprisingly even though the amount of rain doesn't seem to be of importance the number of rainy days does. There is a clear positive trend indicating that more rainy days in one week results in a higher amount oftime watching TV. It was possible to predict the average amount of time an average Dane is watch television in a specific week, when the weather conditions I known. The final model was:

y = -4,6093 - 0,1245T + 0,0715W + 0,13331+0,0818R+0,1714Yw- i + 0,3465Yw-sz

Where: T: tempeturfhin Celcius W:Windin-1: Number af days with incidents R: Number af days with rain Yw-1 : Value 1 week earlier Yw- sz : Value 1 year earlier

To that the trend component must be added to get the actual prediction. Even though the data didn't contain cloud cover or sun shine hours that earlier research has found important the model have an Rsquare coefficient and adjusted Rsquare of 0,88. So for the third research question the answer is considered yes. Weather data can be used to predict television viewing ofthis kind. For all kinds of companies, appearance on TV can be a good way to reach out to many people and many different people at once. 91,9% of the Danish households owned at least one television in 2015. But it can be expensive and toget the right outcome it is important for the company to plan their TV project. The results from this research can

121

Page 131: Symposium i anvendt statistik 2018

help the company further on in their decisions on when to appear on TV. The results are not a complete recipe but can bring knowledge that together with further research can help firms making media appearance decisions. It is clear from this analysis that the hest chance to be seen is in the winter time where people watch TV the most. With weather forecastings the company can use the prediction model to predict the weeks we expect people to watch TV the most and then broadcast programs or commercials these weeks. But it is important to hold I mind that media platforms are in a hig change and this could change the television behavior quickly.

7.1 Future work The results in this paper could open up for more research on this area. Media is changing fast and a lot ofthe research on this area is several decades old. To see ifthe changes have a significant impact on the results in this paper, a comparison of decades or even years could be a theme for further research. The results from this paper can be seen as an indicator of a relation between weather and television use. Future work would be desirable to research this on a more detailed and differentiated level. The project is based on weekly data because of the limited amount of data available and the limited time there was for this research. But for future work and understanding the relationship between daily or hourly data would be much more detailed and clear. Approximately 70% of the weekly television use happens between five pm and midnight22. Fora company wanting to reach the !argest number ofviewers a research of exactly this period would be desirable. This research does not differentiate between different geographical areas or demographical features. Earlier studies have pointed out, that specially age, gender and education have impact on the use of television23. This could indicate, that the results from this paper would be different for several clusters ofpeople. Fora firm specialized in produets or programs that caters to a specific audience the results from this project might not be a good fit. When it comes to the geographical differentiation the weather data would also be more precise. As discussed in the methodology this could prevent that some ofthe rain data that did not fit with the amount ofrain reported. To support the findings in this paper two areas with different weather profiles should have different television use. This also lead to a new theory that would be interesting to research further. As the results showed there seems to be some degree of autocorrelation in the television use. It might be, that people who generally watch a lot oftelevision isn' t as much affected by the weather as people who don't. This project only used some of the best-known weather phenomena. It might be, that other phenomena correlate with the television use. For further research, several other variables could be tested. This paper did not try to find a correlation with hours of sunshine, cloud cover or the amount of snow, which is assumed to have a large explanatory effect. This paper found a correlation over 0, 77 between wind and television use. As seen in the results there is a clear relation and the two values has

22 Calculated with data from Gallup, (2010-2016) "(Kulturstyrelsen, 2016)

122

Page 132: Symposium i anvendt statistik 2018

almost the same seasonal variation. The impact of wind is seen before in significant coefficient values in earlier research24 but a further study ofthis phenomena, based on the high correlation is valuable.

7.2 Conclusion We live in a society where the media is everywhere. 91,9% ofthe Danish households owned at least one television and an average person over the age of 3 in Denmark watched TV approximately 172 minutes a day in 201525 . In 2016 the TV viewing for same group is down by 8% to 158 minutes a day. It is expected TV viewing in 2017 will drop again, based on declining trend shown in this paper. Television is one ofthe most preferable media platforms in Denmark. Television is a way for companies to reach out to a lot of different people. But the media platforms are in a constant change so firms that want to appear in the media must be conscious about how people uses them. From the analysis of seven years' weekly weather conditions and TV use, it was concluded, that weather in a high degree affects how much Danes are watching TV. The weather data from this analysis could explain approximately 88% ofthe weekly TV use. Specially four weather conditions effect the TV use; temperature, wind, rainy days, snow and thunder. Lower temperatures, more wind, rainy days, snow and thunder makes people watch more television.

References Barnett, G., Fink, E. L., Chang, H.-j., & Richards, W. D. {1991, December). Seasonality in Television

ViewingA Mathematica! Model of Cultural Processes. Communication Research, pp. 755-772.

Bassi, A. (2013, Maj 20.). Weather, mood, and voting. Retrieved from

http://www. u nc. ed u/~a ba ssi/Resea rch/weathe r -mood-voti ng. pdf

Bowerman, B. L., Connell, R. T., & Koehler, A. B. (2003). Forecasting, Time Series, and Regression.

Cengage Learning. Ine.

Denissen, J. J., Penke, L., Butalid, L., & Aken, M. A. (2008). The Effects ofWeather on Daily Mood: A

Multilevel Approach. pp. 662-667. Retrieved from https://www.psychologie.hu­

berlin.de/de/ prof I perdev /pdf /2008/Denissen_ Weather _Mood _ 2008. pdf

Eisinga, R., Franses, P. H., & Vergeer, M . (2010, Oktober 27). Weather conditions and da i ly television

use in the Netherlands, 1996-2005. Journal of Biometeorology, pp. 555-564.

Gallup. (2010-2016). Kantor Gallup TV-Meter. Retrieved from http://tvm.gallup.dk/tvm/ pm/

Gomez" B. T., Hansford, T. G., & Krause, G. A. (2007, Juni 28). The Republicans Should Pray for Rain:

Weather, Turnout, and Voting in U.S. Presidential Elections. The Journal of Po/itics.

24 (Eisinga, Franses, & Vergeer, 2010) 25 (Kulturstyrelsen, 2016)

123

Page 133: Symposium i anvendt statistik 2018

Hansen, N., & Bech, L. (2016, november 3). DM/ opgraderer med tre nye radarer fra Finland.

Retrieved november 10, 2016, from OMi.dk: http://www.dmi.dk/nyheder/arkiv/nyheder-

2016/november/dmi-opgraderer-med-tre-nye-radarer-fra-finland/

Heer, J" & Bostock, M. (2010). A tour through the visualization zoo. Commun ACM.

Helles, R., & Hjarvard, S. (2014). Seertal og webtrofik. Samfundslitteratur.

Houghton, J. (2004). Global Warming - The complete Briefing. Cambrindge Univ.press.

Howarth, E., & Hoffman, M. S. (1984). A multidimensional approach to the relationship between

mood and weather. British Journal of Psychology, pp. 15-23.

Jantzen, C., & Vetner, M . (2008, December 2.). Underholdning, emotioner og personlighed: Et

mediepsykologisk perspektiv på underholdningspræferencer. Mediekultur, pp. 3-22.

Knack, S. (1994). Does Rain Help the Republicans? Theory and Evidence on Turnout. Public Choice,

pp. 187-209.

Kulturstyrelsen. (2016). Rapportering om mediernes udvikling. Slots- og kulturstyrrelsen. Retrieved

November 7, 2016, from http://slks.dk/mediernes-udvikling-2016/

Mernild, S. H. (2014). Green/and precipitation trends in a lang-term instrumental climate context

(1890-2012}: evaluation af coastal and ice core records. lnt. Journal of Climatology.

Retrieved from http://onlinelibrary.wiley.com/doi/10.1002/joc.3986/abstract

Munzner, T. (2009). Visualization. AK Peters.

Provost, F" & Fawcett, T. (2013). Data Science for Business. O'Reilly Media, Ine.

Remar, D. (2013, marts 15). Derfor taler danskerne så meget om vejret. Retrieved from

http://www.kristeligt-dagblad.dk/liv-sj%C3%A61/derfor-taler-danskerne-s%C3%A5-meget­

om-vejret

Roe, K" & Vandebosch, H. (1996). Weather to view or not: That is the question. European Journal of

Communication, pp. 201-216.

Stein, M. (2006, September). Hvornår og hvordan kon mon bruge tv til markedsføring? Retrieved

from ivaerksaetteren.dk: http://www.ivaerksaetteren.dk/flx/artikler/8/hvornaar-og­

hvordan-kan-man-bruge-tv-til-markedsfoering-453/

The multiple regression model. (2013). In P. Newbold, W. L. Carlson, & B. M. Thorne, Statisticsfor

Business and Economics (pp. 274-487). Pearson.

Tremblay, M. (2003, Maj 31). Wind Chill and Humidex. Retrieved from Natura I Science and

Mathematics: https://ptaff.ca/humidex/?lang=en_CA

Wunderground. (n.d.). Retrieved from Danmark:

https://www.wunderground.com/ q/zmw:OOOOO. l.06186?sp= I FREDER I 143

Zillmann, D. (1988, Januar). Mood Management Through Communication Choices. pp. 327-340.

124

Page 134: Symposium i anvendt statistik 2018

Predicting the daily sales of Mikkeller bars using Facebook data

Lisbeth la Cour, Dep. ofEconomics, CBS Anders Milhøj, Dep. ofEconomics, KU

Ravi Vatrapu, Dep. ofIT Management, CBS Niels Buus Lassen, Dep. ofIT Management, CBS

1. Introduction.

The present study is a continuation ofthe analysis presented in Buus Lassen et. al. (2017) in that it still focuses on how to model and predict series of interest to the management of a private firm using social media data. In the present study we focus on only one such data source: Facebook (FB). As mentioned in the paper above: "The main advantage ofusing social media data as predictors lies in the speed with which such data can be extracted and employed in the forecasting process. Once a firm has leamed how to collect and pre-process their social media data, the information is available almost in real time and this implies that such data in combination with a good predictive model will provide a very useful tool for the management ofthe firm."

The advantage ofthis year's study is that we now have access to daily observations of the sales in a range ofMikkeller bars ofwhich we have chosen to focus on the bar in Viktoriagade. Renee Mikkeller microbrewery is still our case company. Compared to the monthly data ofthe paper mentioned above, we have an increased number of observations and we also have the possibility to work in more detail on the lags structure of our models. We still have a high focus on the data preparatory work and we also keep in mind that simple benchmark models that use cheap information are very relevant as competing model specifications.

2. Briefly on the existing literature.

The idea ofusing social media data as predictors for e.g. company sales is not new. When it comes to model building, various experiments have been conducted and a summary of around 40 articles covering the time period 2005 - 2015 can be found in Buus Lassen et al (2017). For the present purpose the most interesting observations from these studies are that 1) almost 50% ofthe studiesuse some kind ofregression model as their predictive model, 2) the range of social data types studied seem to cover Facebook, Twitter, Google Trends, Instagram, Tumblr, blogs and Youtube.

Theoretically, the argument for considering social data activity as predictors for sales obtains support from e.g. the AIDA model mentioned in Buus Lassen et al (2014). AIDA means Awareness, Interest, Desire and Action and refers to stages in a sales

125

Page 135: Symposium i anvendt statistik 2018

process. If social media data help increase the attention or can be considered a proxy for attention towards a produet then it may also affect the final decision about buying. It is the general perception that more attention will increase sales even ifthe attention is negative.

When it comes to the specification of a set of predictive models we follow the literature and limit ourselves to the class of dynamic regression models. In these models we will have sales as our dependent variable and the FB data as suggested regressors. Facebook data are polished, because people tend to display success and not failures on this social data. This may imply that FB data has a disadvantage as regressors compared to other social data. Still, FB Likes and FB Posts may provide information that links to consumers awareness and in the end their buying of the produet and therefore deserves to be considered as predictors in models of company sales.

3. The data and methodology.

In arder to build a predictive model for Mikkeller's sales we use data from Mikkellers accounting system combined with Facebook data. In this analysis we have obtained daily sales data from a number af Mikkeller bars in the Copenhagen area: Viktoriagade, Stefansgade and Torvehallerne (the latter is also a Bottle Shop). The data from the bars are quite ideal for our purpose as they will relate directly to consumption ofthe produet and therefore simplifies the way that we think about the lag patterns in the data. The time span ofthe study has been limited by our access to historical sales data and covers 2 January 2015 - 30 September 2017. In total we have 1003 observations. In arder to perform an out-of-sample forecasting exercise we have held back 3 months of sales data as a tests sample while we se leet and estimate our model based on the remaining around 900 observations.

Prior to analysis we index the sales data such that the mean is restricted to 1234 and the standard deviation to 12. Such transformations do not affect the significance our results later in the modeJing process. The Facebook data comes from the overall HQ Mikkeller FB page

https://www.facebook.com/mikkeller/ https://www .facebook.com/ eventsO

and from the FB pages ofthe chosen bars

https://www.facebook.com/mikkellerbarvik/, https://www.facebook.com/MikkellerandFriendsBottleShop/, https://www .facebook.com/mikkellerandfriendsQ.

126

Page 136: Symposium i anvendt statistik 2018

Using the Sodato software developed by Ravi Vatrapu and his group, see Hussain & Vatrapu (2014), we collect information from the selected FB pages and we create variables for e.g. total likes ofthe posts on a specific date. As the data is very rich, for a shorter sample period we are also able to construct explanatory factors based on selected FB reactions which are constructed to match major human emotions and in this way seems ideal when sales are in focus.

3.1 Pre-processing methodology

Our first considerations when it comes to data preparatory work concems whether to use simple transformations ofthe series or just the raw series themselves. As the values of sales are quite low on certain dates it does seem like a disadvantage rather than an advantage to use a log-transformation. Also no clear pattem of an increase in volatility over time is revealed from e.g. Figure JA and we decided to model the un­transformed series directly.

With respect to the sales data we are checking the stationarity properties ofthe time series by means of several graphs: sales against time and ACF. We also perform ADF tests ofthe null ofnon-stationarity. Stationarity is preferable fora regression model although stationarity may be ofminor importance when the purpose ofthe model is forecasting.

The social data may consist of different components that we would expect to have different predictive value. Prior to including our social data time series as explanatory factors in our regression models we have the possibility to split them into a trend component, a seasonal component and an irregular component using classical times series techniques for unobserved components models (ucm). We also estimate models that use the social data in their 'raw' form without the ucm pre-processing for comparison reasons.

3.2 Unobserved Component Models

We use the same modeJing strategy as in Buus Lassen et al (2017) and therefore start out by employing an unobserved component (UCM) model. An UCM decomposes the observed series Yt into a sum ofmany components, as for instance

Here the series µtis understood as the level ofthe series; but this level is unobserved. Only the series Yt which is affected by some noise or irregularities is observed. This noise series, Et. could in technical applications be measuring errors.

This basic formulation could be extended by trends and seasonality, and various forms for introducing autocorrelation in the model formulation also exist. A trend component

127

Page 137: Symposium i anvendt statistik 2018

is insignifikant for the sales series. A seasonal component for the day ofthe week effect is defined in a way so it does not affect the level component:

st = - cs1-1 + ". + st-6) +si

In total these ideas lead to the model:

where we also include first lag and lag 7 autoregressive terms are included for this series of daily observations with a significant weekly pattem.

All remainder terms, ab Y)b and Sb are assumed to be mutually independent white noise series. Their variances could be estimated; the larger this component variance the more volatile the component. But it is also possible to fix this variance to the value zero which gives a constant component, e.g. a model with fixed seasonal dummies is found ifvar(s1) = o.

Tue parameters ofthese models, the variances and the autoregressive parameters, could be estimated by the Kalman filter together with all and the component values. This gives an algorithm for successive calculation ofthe unobserved components at timet conditioned on previous observations Yt-i i = 0,", t-1. The Kalman filter is useful ifprediction is the purpose ofthe analysis as the algorithm does not include future observations Yt+i· A further smoothing estimation, where all available information is used when estimating the unobserved components at any time t, also exist. In this paper this method will be used.

3.3 The regression models

In this study we use dynamic regression models. With daily data we have a rich seasonsal structure and even though we only have a sample period covering less than three years we have enough observations to model the seasonality either by ucm (mentioned earlier) or by inclusion of deterministic dummies in the regression equations. Using lags ofboth the dependent variable and the independent variables is also possible and we will do both.

The primary model equations we use are ofthe type:

where y is sales, the x' s are FB measures and the sub scripts, t - i, indicate that only lagged values of sales and FB data are used as predictors. This makes the model suitable for at least 1 step ahead predictions out-of-sample. In practice we use both short lags and lags up to 8 to cover a same-day-of-the-week effect and also an interaction of short run and day-of-week effects. The error term, ab is assumed to fulfill the standard assumptions for OLS estimation.

128

Page 138: Symposium i anvendt statistik 2018

It is difficult to judge the predictive performance of a specific forecasting model unless we have some benchmark to compare to. For sales of individual companies there is no general guideline in the literature on how to choose such a model, so we will argue for our choice in the foliowing way: we want a benchmark model that is simple, that seem to capture some ofthe apparent time series properties in our data and that do not contain FB explanatory factors. We choose two benchmark models. The first includes only deterministic terms and a trend:

(2) Yt =Po + P1DOlOlt + P2D2412t + p3D2512t + p4D26121 + day-of-week dummies + monthly dummies + CBC dummies + trend+s1 t = l, .. ,T

The second includes in addition to all the deterministic dummy and the trend and up to 8 lagged values of sales:

(3) Yt =Po+ Y1Yt-I + ... + YYt-8 + P1DOlOl1 + P2D24121 + p3D25121+ p4D2612t + day-of-week dummies + monthly dummies + CBC dummies +trend+ s1 t = 1, .. , T

Finally, as our model is a forecasting model, we need to split the sample into an estimation part and a part used to evaluate the out-of-sample forecasting properties of the model. For further discussion, see e.g. Hyndman & Athanasopoulos (2014). We retain the last 3 month ofthe sample for the test part, i.e. July - September 2017 (92 observations) and we provide 1-step-ahead prediction for this period. Hence we estimate the models using data from 1 January 2015 until 30 June 2017 (around 904 observations). When we use the FB reaction we stick to the same evaluation sample but we have a shorter estimation sample as the FB reactions were introduced in the beginning of2016. Evaluations will be based on graphs comparing actual sales to predicted sales as well as by numerical measures like RMSE and MAE.

4. Descriptive statistics.

We start by showing some graphs and descriptive statistics for the sales data. In Figure lA we show the development over time in the standardized sales at Viktoriagade Bar over the sample period 1. The immediate impression is a series that do not show a trending behavior. There are three cases ofvery large sales in certain spring weekends coinciding with the Copenhagen Beer Celebration . Also some seasonal variation can be seen. To illustrate the over-the-week pattern in the series we have constructed the special graph shown in Figure lB. We selected (randomly) 6 consecutive weeks during the summer of2015 (15 June - 26 July). Each curve in Figure 1B shows the

1 On the l" January each year the bar is closed and num bers for sales are missing. lnstead of filling in zeros at this stage we simply do not show these dales in the graphs. When modeling we add dummies to capture these dales without any sales.

129

Page 139: Symposium i anvendt statistik 2018

sales for a week during this period. The figure displays a pattem of larger sales on Fridays and Saturdays and also to some extent that the level ofthe sales may depend on the week (maybe the weather - maybe vacation weeks). Taking this intra-week pattem into account will also be important for our modeJing.

°""''' _ "_"_"_"

When taking a doser look at the time series properties ofthe series it seems that a decision oftreating this series as stationary would be a good starting point. The ACF graph ofthe sales corrected for the missing sales of 1 January seems to support this conclusion as the I st order autocorrelation coefficient is 0.552. See also Figure 2A.

Figure 2A: ACF for Sales 1janD

1.0

0.81 06'

~.~J [, ___ it" 11+"i!J" ---ll1"---+l1----1l1----1l+----1l+"il1----+I -~~ . ··'+'- 'w -1w-+1ii-··w-1t11--- 11ti·11t---1w-+w -0.4' -0.6•

-081 -1 .0'~~-

o 10 20 30 40 50 60 70

LAG

Figure 28: ACF for Sales more D's

1.0

0.8 ' 0.6

0.4•

~~1 L;1J l pl~ iii:' ;tn :: : :ci:·"- J: ' ":': , -0.21 -0.4

-061 -0.8. -1.oL ____ -··-- ------~~--~ _

0 10 20 30 40 50 60 70

LAG

In this graph we also see clear indications of an over-the-week pattern. In Figure 2B we have in addition to correcting for the 1 January also regressed sales on dummies for each day-of-the-week, each month and for dates around Xmas and the Copenhagen Beer Celebration event (the three large spikes in Figure lA). This implies that the memory ofthe seasonal pattem becomes Jess pronounced. Inspired by the PACF ofthe extended model (available from the authors upon request) the model can be extended by 8 Iagged values of sales and after such an extension almost no autocorrelation is left.

2 Also an ADF test ofnon-stationarity ofthe series supports a conclusion ofstationarity. With an intercept, but without a trend in the equation ofthis test we reject at the 1 % level the null of a unit root with p-values smaller than 0.0001 and 0.0001 for zero and 7 lagged differences in the equation, respectively.

130

Page 140: Symposium i anvendt statistik 2018

Toget a first impression ofthe some ofthe data from Facebook, in Figures 3A - 3D shows the number of likes and the number of posts by the administrator for the HQ page and for the Viktoriagade page. While none ofthe series seem to follow the pattem ofthe sales series very closely they seem to correlate pairwise (Viktoriagade -Viktoriagade and HQ - HQ). Also the number ofLikes for HQ are - not surprisingly -in general larger than for Viktoriagade. Notice that the activity for Viktoriagade show a decline in 2017 compared to the other years (Mikkeller has no specific explanation to that).

·---~_!_~!e 3A: ~-~ Likes, Viktoriagade ___________ Figure 38: FB Likes, H~--·n

j 15001

~

Figure 4C: Posts by Admin (VIK) Figure 30: Posts by Admin, HQ

, J --- ---- ---·---------!

! . •

I

i : . . - : - . - = - . -In Figure 4 we look for correlations between sales and the selected FB variables. Correlations amongst the FB variables are also shown. It is not easy to get a clear idea about the relationships. There may be indications ofweak positive correlations in most ofthe cases but as we may want to use the FB variables from a range ofprevious days as regressors the scatterplot matrix is not the best graph for that purpose. In Figure 5 we show a selected cross-correlation graphs to get a better idea of a potential lag­pattem.

Table 1 shows simple descriptive statistics for the variables we have been investigating so far. The numbers ofthe mean and standard devistion for sales reflect our standardization. The three missing values are 1 January each ofthe three years. Not surprisingly we see both more posts and also more reactions to the HQ FB activity.

131

Page 141: Symposium i anvendt statistik 2018

Table 1: Descriptive summary statistics.

Variable

Sales Posts by Admin (VlK)

Posts by Admin (HQ) Likes by Viktoria2ade Likes byHQ

Viktoriagade

125 100

75 50 25

0 -

- 9 d' 0

- i: 0 -

0

-- 0 (1) OCD 0

0

0 /j o

0 0 0

Qllo O 0 0 O

lll,, a0o°"'cP

0

0 " .., _, o

-0 Cl) 0

- -=o 0

- 0

- _, CD OCDO

- 0 0 0

- . 0

- cm>O 0 - 0 CX> O - - "' - 00 - 0 -- 0

I I

1200 1350

N Mean Std Minimum Maximum Dev

1001 1234.02 17.01 1209. l 1359.42 1004 0.57 0.95 0 7

1004 1.74 1.48 0 10

1004 6.71 15.70 0 137

1004 168.55 230.94 0 2626

Figure 4: Scatterplot Matrix for Viktoriagade Sales

50 100

0 0 cP 0 0 0 0 r

B 8 B B 0 0 0 0 0 0

~ ;o ,

0

:00 i i ii is 090

0 r

illllh: q,""" 0 B a r

ij o~ o 0 0 0

r

0 0 0 r

i. i i : : : ego r

Totallikes_vik o~a o r-

0

11u;1:. ll L8 0 r

0

CEl :O r

0 I Do

0 0 0

0 0 0 0 0

0 0 0 o o g

B Totallikes_HQ B 0 00

~ lilh ø8° OiøO 0

0 111111!@ oO ~ 0 0 0

00 0 00 00

0 0 0 " 0 00

'""'°o 0 0 mm PostsbyA_vik

00 0 0 0 _.....,. "' 0 - 0000 0 00

_....,o 0 - aaaa ao _,, o _, 0 00 aaoaaa a 0 - - CDCIDO 0 D D DDDD DD c a r-

0 0 0 r-

" 0 0 0 0 r

0 0 "" 0 0

"" 0 <mCXlO 0 0 0 PostsbyA_HQ

r - 0

- 0 0 0 0

.ae 00 0 _ " 00 0 D D 0 0 r - 0 0 0 0 -GD O O O 0 0 a a a o o 0 0 _,,,

0 " - 00 00000 0 0 r-

- om o - 0 a a a a a -= 0

- 0 00000 0 r-

I I I I I I

1000 2000 0 2 4 6 8 10

year o 201 5 o 201 6 o 201 7

132

N Miss

1350

1300

1250

1200

2500 2000 1500 1000 500 0

10

3 0

0

0

0

Page 142: Symposium i anvendt statistik 2018

Figure 5 presents the cross correlations between the indexed sales variable and each of the four social activity variables. The cross correlations are constructed in such a way that a positive number, s, on the horizontal axis implies a correlation between sales at time t and the social variable s periods prior to time t. Even though we did not pre­whiten the series before the cross correlations were calculated we get some initial impression that lagged values ofboth likes and posts may contain explanatory power for the sales.

Figure 5:

1.0

0.5 -

LL 0 0.0

Cross Correlation Analysis for residual_sales_vik with Two Standard Error Limits

Cross Variable: Totallikes_vik Cross Variable: Totallikes_HQ

0 - ... - ~ --0.5

-1 .0

1.0

0.5

LL 0 0.0 0

-0.5 -

-1.0 -

-20

Cross Variable· PostsbyA_vik

-

-10 0

Lag 10

I

20

Cross Variable: PostsbyA_HQ

-20 -10 10

Lag

5. Unobserved components models for the sales series

For the sales series the resulting model is:

20

The variance in the seasonal component is fixed to zero, meaning that the dummy variables are constant. It turns out to be inconvenient to model annua! variation in the sales by cyclic components. Instead the level component is modeled with a positive

133

Page 143: Symposium i anvendt statistik 2018

component variance, var(TJ1 ) > 0. Figure 6 shows the resulting estimated level component.

The final version ofthis model is estimated without dates for Copenhagen Beer Celebration, some days around Christmas and January l 'st where sales for well-known reasons are extraordinary. In the set up for UCM models this is done by simply setting the observations to "missing" instead ofusing dummy variables. But even some more dates give clear outliers in the fitted models. For this reason, it is chosen to also set five more observations as missing because they give clear outliers. It was checked that the sales these dates could not be explained by our Facebook data so the outliers must be due to something else - perhaps some extraordinary event in the bar.

The precise choice ofthe number of outliers to leave out is of course subjective, but it has to be stressed that the validity of an out-of-sample forecasting exercise is independent ofthe number of observations left out from the estimation period.

Figure 6

1245

1240

c Q) c 0 c. E 1235 0 (.)

~ Q) _J

" Q) 1230 .c 0 0 E (/)

1225

Jan Apr 2015

Jul

Smoothed Level Component for sales_vik_f_m

Oct Jan Apr 2016

Date

Jul Oct Jan l'l:lr 201 7

I 0 95% Confidence Limits ---- ----- Startofmulti-<>tep forecasts I

Jul Oct

The level varies around the average value 1234 - remember that data is standardized to this mean. The estimated component variance is var(TJt) = 0.28. This level variance gives clearly visible changes in the level, but only in an interval ± 10 - remember that the series is standardized to variance standard deviation 12.

134

Page 144: Symposium i anvendt statistik 2018

The level ofthe series is highest in the summer period but the annua! variation is far from regular, so this more flexible model for the annua! variation could be superior to monthly dummy variables or cyclic components.

The final model also includes dummy variables for the weekly effect. It was tried model the seasonal component with a time varying weekly pattern but the hypothesis that the component variance was zero, var(s1) = 0, was accepted - however borderline p = 6.9%. The main feature fora potential time varying weekly pattern is that the Friday effect of a is reduced from 24 to 19 in the scale used.

5.1.2 Out-of-sample predictive power?

This model without any exogenous variables could by applied for forecasting. Figure 7 shows the results. For the last quarter of the estimation period - that is April 1 'st 2017 to Jun 30'th 2017 the plot gives one step ahead predictions with forecast limits as opposed to the actual observations. For the last quarter - July 1 'st 2017 to September 30'th 20 I 7 the predictions are made using only data and estimation results before the last date ofthe estimation period - that is June 30'th as indicated by the vertical reference line. For the 92 observations in this ex-ante forecasting period, July 1'st2017 to September 30'th 2017, we find RMSE = 6.36 and MAE=4.83.

Figure 7

1260

1240

1220

0

01 /ltlr 16/ltlr 01 May 16May 01 Jun 16Jun 01 Jul 16Jul 01 Aug 16Aug 01 Sep 16Sep 01 Oct 2017

Date

-- Series Forecast -- Forecastlower Confidence Limit -- ForecastUpper Confidenæ Limit o

D Band

135

Page 145: Symposium i anvendt statistik 2018

6. Results of predictive modeling at the daily frequency

We now consider various specifications for models that contain FB data and/or their lags as explanatory factors as suggested by the main equation (1). As our main purpose is to determine a model that can produce out-of- sample 1 step ahead forecasts, we do not use contemporaneous regressors in the models.

6.1.1 Estimation results, regression models

Estimation results fora selected range ofregression models are shown in Table 2. To save space we have just commented on the results of most of the dummies without actually showing the coefficients. For all Xmas dummies the coefficients are negative. For the day-of-week dummies the coefficients are always positive indicating that Mondays in general have the lowest sales (the base category) while sales are highest on Fridays and Saturdays (not surprisingly). For the monthly dummies the base category is January and the sales in all other months are significantly higher than for January and most so for May until September. For Copenhagen Beer Celebration the sales are in general higher and very much so when getting closer to the weekend.

The basic massage from Table 2 is that it is very hard in-sample to beat a model with just deterministic terms as explanatory factors as our Benchmark 1. Only in one version do we find significance of any of the FB variables and those results are the ones shown in the last column ofTable 2. Here the likes ofHQ at lag 1 and at lag 7 are significant although with coefficients of a sign opposite to the expected one. To move from the full model to the model withjust 2 Likes-variables included we did a range of F tests for exclusion oflikes and posts variables.

6.1.2 Out-of-sample predictive power?

We predict the standardized sales for the time period July 2017 to September 2017. First we show graphs, Figure 8, that compares such predictions to the actual values. We show graphs for the benchmark model 1 and for the model in the last column of Table 2.

From these graphs it is evident that most ofthe movements in sales are captured by the benchmark model. The confidence bands for the prediction are quite wide, however, indicating a fairly high uncertainty for the forecasts. Most ofthe actual values are inside the bounds except for 2 incidents in mid-July and mid-September. The picture shown for the model with lag 1 and lag 7 ofLikes ofHQ is very similar.

136

Page 146: Symposium i anvendt statistik 2018

Table 2: Regression results for Log Sales - no nem.

Variables Benchmarkl Benchmark2 Only lagged Full Model Model with det. terms AR(8) and Sales AR(8) Equation (1) all det. and

det. sign. likes

Intercept Significant S ignificant Significant S ignificant S ignificant

XmasD's Significant S ignificant - Significant Significant

Week dayD Significant Significant - S ignificant Significant Monthly D's Significant Significant - Significant S ignificant

CBCD's Significant Significant - Significant Significant except for except for except for except for

Mondayand Mondayand Mondayand Mondayand Tuesday in 16 Tuesday in 16 Tuesday in Tuesday in

and 17 and 17 16 and 17 16 and 17 Trend -0.004*** -0.004*** - -0.005*** -0.003***

(0.001) (0.001) (0.001) (0.001)

Sales, lag! - -0.001 0.041 *** -0.000 -(0.004) (0.009) (0.004)

Sales, lag6 - 0.006 0.027*** 0.007* -(0.004) (0.009) (0.004)

Sales, lag7 - 0.005 0.047*** 0.007* -(0.004) (0.009) (0.004)

Sales, lag8 - 0.004 0.014* 0.005 -

(0.003) (0.008) (0.003)

Sales lags 2-5 - Insignificant Insignificant Insignificant -Lags 1-8 VIK - - - Insignificant -Likes

Lags 1-8 VIK - - - Insignificant -Posts

Lag I, HQ - - - -0.003** -0.003***

Likes (0.001) (0.001)

Lag 7, HQ - - - -0.003** -0.003***

Likes (0.001) (0.001)

Lags 2-6, 8 - - - Insignificant -HQ Likes

Lags 1-8 HQ - - - Insignificant -

Posts

Adj. R sauare 0.987 0.922 0.987 0.988 0.988 #observations 904 904 904 904 904

Note: the estimation sample has been restricted such that it is the same for all specifications even though models with fewer lags could have used more observations. Note2: Standard errors in parentheses. Significance at 10%: *, 5%: **, I%: ***.

Note3: Dummy for I January included and significant in all models.

137

Page 147: Symposium i anvendt statistik 2018

Figure 8

"'" i

"'" j 12So~'f 01Jun

P...Olrted --···· i.o-1eound oU5'11.CJ~nolloiOua1 Pr.d) ------ u~rBound oU5'11.C.l~ndMdual Pred)

16J~ 01Jul Hl.Jul O!Aug 16Aug 01 Sap 16So

-~·· ···-······ Predlcled · • ••• - Lower8oundof95'11. C.tOndiriclual Prea) ------ UpperBoundorQ5" C.l(lndi'tlOOal PrlK()

In Table 3 we show some numerical measures for the forecasting performance ofthe models from Table 2. We have chosenjust to focus on a few measures and some ofthe more commonly used ones: MAE (mean absolute error) and RMSE (root mean squared error)3.

Table 3: Summary measures on predictive power.

Summary Benchmarkl Benchmark2 Only Full Model with all measure (det. terms) AR(8) and lagged Model det. and sign.

det. Sales (1) likes AR(8)

MAE 5.06 5.08 8.57 5.22 5.13 RMSE 6.73 6.74 10.34 6.85 6.79 Note: In all cases the numbers have been calculated based on the 3 months of July, August and

September 2017.

The numbers in table 3 also indicate that benchmark model 1 performs the best both when evaluated based on MAE and on RMSE. However only the model with just the lagged sales variables (the middle column) shows somewhat higher statistics. The other numbers are actually very close. We have not performed a formal test of equality.

Notice that our forecasting period does not contain the week of CBC. As it stands our models are not well suited to forecast for a time period that contains this week as we would then have to come up with predictions for the excess sales ofthat week (in e.g. 2018). In future work we could have restricted the coefficients ofthe CBC dummies to always be the same and in this way have handled that problem.

3 For formulas on how to calculate these measures, please consult e.g. Hyndman & Athanasopoulos (2014)

138

Page 148: Symposium i anvendt statistik 2018

6.2 Facebook data used in the Unobserved Components Model

6.2.1 Estimation results using regression methods

This approach this year is different from the approached by Buus Lassen et al (2017) were the input variable were applied to the irregular series as extracted by the UCM.

First only lagged values ofthe FB data was used to predict sales. All four series ofFB observations, Posts by Admin (Viktoriagade) Posts by Admin (HQ), Likes by Viktoriagade and Likes by HQ, were used with lags from 1 to 8. This total of 32 exogenous variables were used as ordinary input variable to the UCM found i Section 5. All regression coefficients and all parameters and component values in the full model were estimated simultaneously.

As in the OLS models in section 6.1 most ofthe input variables are insignificant. Table 4 gives two significant regression coefficients along with the parameters ofthe UCM. The significant coefficients are lag 1 ofthe number oftotal likes for HQ; the coefficient however has a negative sign, which is in contrary t our intuition, Tue second coefficient is for lag 1 ofthe number ofposts from the specific bar in question, Viktoriagade. The coefficient tells that for each post from the Viktoriagade bar the sales next day at the Viktoriagade bar increases by 0.47 in our scaled sales. This is a result that has a potential for active marketing. The number 0.47 is however small when compared to the average daily sales, which was set to the number 1234.

Table 4

Final Estimates ofthe Free Parameters

Approx Approx Component Parameter Estimate Std Error t Value Pr > /ti

Irregular Error V ariance 31.19381 1.57766 19.77 <.0001

Irregular AR 1 0.28156 0.03712 7.59 <.0001

Irregular SAR_! 0.13395 0.03679 3.64 0.0003

Level Error V ariance 0.26927 0.10766 2.50 0.0124

ltotallikes _ hq Coefficient -0.00189 0.0008336 -2.27 0.0235

lpostsbya_ vik Coefficient 0.47150 0.20144 2.34 0.0192

This model could be applied i the out-of-sample forecasting exercise. This gives MAE = 4.84 and RMSE = 6.34. These values are very close to the values obtained without using the FB data.

When also unlagged observations ofthe four input series are used one more input variable shows significance; see Table 5. The unlagged total likes for Viktoriagade bar

139

Page 149: Symposium i anvendt statistik 2018

has the coefficient 0.041 - meaning that each like corresponds to an increasing sale of 0.041 beers in our scale. However, this effect is probably a reverse causa! effect as many Iikes one evening probably Ieads to more instantaneous likes.

Table 5

Final Estimates ofthe Free Parameters

Approx Approx Component Parameter Estimate Std Error t Value Pr> !ti

Irregular Error V ariance 30.85263 1.56438 19.72 <.0001

Irregular AR 1 0.29117 0.03736 7.79 <.0001

Irregular SAR 1 0.13139 0.03693 3.56 0.0004

Level Error Variance 0.28051 0.1 1220 2.50 0.0124

Tota!Likes vik Coefficient 0.04135 0.01337 3.09 0.0020

ltotallikes _ hq Coefficient -0.00177 0.0008287 -2.13 0.0328

lpostsbya _ vik Coefficient 0.60043 0.20406 2.94 0.0033

Figure 9

1260

0

1240

1220

1200 <...--- _ _ _ _ _ ___ __ _.,._ _ ______ ~-~---,-'

01 Apr 16Apr 01 May 16May 01 Jun 16Jun 01 Jul 16Jul 01 Aug 16Aug 01 Sep 16Sep 01 Oct 2017

Date

-- Series Forecast -- Forecast Lower Confidence Limit -- Forecast Upper Confidence Limit o

D Band

140

Page 150: Symposium i anvendt statistik 2018

When this model is used in the out-of-sample exercise we find MAE= 4.83 and MSE = 6.32. Again these values are very close to the values obtained without using the FB data and the model only using unlagged input variables.

7. Summary and conclusion

In this paper we have pursued our idea of applying a preparatory ucm model to both regressors and regressand to determine a forecasting model for the monthly sales of the Danish microbrewery Mikkeller. Also we tried a more traditional strategy with lagged sales to model the autocorrelation ofthe in the sales series and a suite of dummy variables for deterministic outside factors; take the effect of Xmas as an example in order to build a predictive model.

Our modeling attempts were mainly unsuccessful as neither ofthe two approaches lead to any significant regression model when Facebook activity was included as input variables.

8. References

Buus Lassen, N., la Cour, L., Milhøj, A., Vatrupu, R. (2017), ' Social media data as predictors ofMikkeller sales?' in P. Linde (Ed.) Symposium i Anvendt Statistik, Page 71-86

Buus Lassen, N., la Cour, L., Vatrapu, R. (2017), 'Predictive Analytics with Social Media data' in Sloan & Quan-Haase ed. The SAGE Handbook of Social Media Research Methods, Chapter 20, pp 328-341

Buus Lassen, N., Madsen, R. and Vatrapu, R. (2014). 'Predicting iPhone Sales from iPhone Tweets', Conference Paper, 2014 IEEE International Enterprise Distributed Object Camputing Conference.

Buus Lassen, N., Vatrapu, R., la Cour, L., Madsen, R. and Hussain, A.(2016), 'Towards a Theory of Social Data: Predictive Analytics in the Era ofBig Social Data', in P. Linde (Ed.) Symposium i Anvendt Statistik, Page 241-256

Doomik & Hendry (2014). 'Statistical Model Selection with 'Big Data', Department of Economics Discussion Paper Series, University of Oxford, #735 .

Hussain A., Vatrapu R. (2014) Social Data Analytics Tool (SODATO). In: Tremblay M.C., VanderMeer D., Rothenberger M., Gupta A., Yoon V. (eds) Advancing the Impact ofDesign Science: Moving from Theory to Practice. DESRIST 2014. Lecture Notes in Computer Science, vol 8463. Springer, Cham

Hyndman, R. J., & Athanasopoulos, G. (2014). Forecasting: princip/es and practice: OTexts: https://www.otexts.org/tpp/

141

Page 151: Symposium i anvendt statistik 2018

Can We Learn to Live Longer? - A Spatial Learning Model of Life Expectancy.

Axel Borsch-Supan1 & Jørgen T. Lauridsen2

1 MEA - Munich Center for the Economics of Aging

2 COHERE - Centre of Health Economics Research, Department of Business and Economics, University of Southern Den mark, E-mail [email protected]

Abstract

Leaming in health is of interest for medical disciplines, healthcare providers, and policy makers. The purpose ofthe present study is to examine whether longer lifetime is something that can be leamed. Operationally, we specify a spatial survival model with life expectancy (observed for 94 Danish municipalities) as dependent variable and the spatial lag of life expectancy as an explanatory variable representing the source of leaming. Selected determinants known to affect life expectancy, including socioeconomic status, health behavior, health conditions and healthcare utilization, are applied. We hypothesize that there is a positive endogenous spatial spillover among the municipalities, in the sense that life expectancy in a municipality is partly determined by life expectancy in the neighboring municipalities. It is found that some of the determinants affect li fe expectancy, and that a positive endogenous spatial spillover is present, thus representing a potential spatial Ieaming effect.

JEL Classifications: 112, 114, 118, Jl 1, Cl3, C23.

Key Words: Spatial leaming; endogenous spatial spillover; life expectancy; duration model.

1. lntroduction

Given that European countries are ageing societies, driving forces behind ageing and extended life expectancy have been extensively investigated. Tue primary focus of the present study is as to whether longer lifetime is something that can be leamed. Leaming in health has been considered in empirical and economic research including economies of scale, human capital depreciation, reverse causality, level of specialization, and social Ieaming or "learning by watching" (Gaynor et al. 2005; Ho 2002; Hockenberry and Helmchen 2014; Huesch 2009; Lee et al. 2015; Mesman et al. 2015). A recent study distinguished between three types of leaming: economies of scale, leaming from cumulative experience, and human capital depreciation (Van Gestel et al. 2017).

Several studies have investigated the effect of socioeconomic status like income, education and labor market status on health outcome (Cantarero and Pascual 2005; Bayati, Akbarian, and Kavosi 2013; Varvarigos 2013; Sede and Ohemeng 2015), and a recent study focused specifically on the effect ofincome on life expectancy (Blazquez­Fernandez et al. 2017).

142

Page 152: Symposium i anvendt statistik 2018

Also, life style health behavior matter for health outcome (Li et al. 2014; Anstey et al. 2014). Such lifestyle factors include food habits (Manuel et al. 2016), smoking (Preston et al 2012), alcohol consumption (Westman et al. 2015), physical exercise (O'Keefe et al. 2012), and obesity (Preston and Stokes 2011 ).

As an in-between socioeconomic status and life style, loneliness is a factor known to affect health and life expectancy. In particular, losing partner due to widowhood or divorce is known to be serious (Reynolds et al. 2008).

Further, health conditions are important (Lubitz et al. 2003; McGinnis 2016).

Finally, utilization of healthcare impacts on life expectancy (Blakemore 2015; Peters et al. 2015; Blazquez-Femandez et al. 2017).

2. Methods

The model to be used in the present study is a parametric duration model reading as

log(y) = X/3 + A.Wy + a&

where y is an n by 1 vector, containing life expectancy for n = 94 Danish municipalities, X an n by K + 1 matrix containing K covariates and a constant term for the n municipalities, W an n by n contiguity matrix, where the element in row i, column j is set to 1 if the two municipalities are neighbors (i.e. having a physical borderline in common) and 0 otherwise and subsequently divided by the row sum. Thus, Wy is an n by 1 vector holding the spatial lag of y, i.e. the average life expectancy in the municipalities. This implies that the parameter A. measures the impact of life expectancy in the neighbor municipalities on the life expectancy in the municipality considered. Finally, f3 is a vector of K + 1 regression coefficients, including the constant term, while CJ is a scale parameter and E an n by 1 vector of stochastic residuals.

All calculations are performed in PROC LIFEREG in SAS version 9.4 and relying on the standard definitions, i.e. the scale parameter a is set to 1, and the stochastic residual E is assumed to follow a Weibul distribution. For details and alternative choices of scale and distribution, refer to any documentation of SAS version 9.4 or to introductory texts on duration analysis like Kalbfleisch and Prentice (2002) or Lancaster (1990). See also the discussion section below.

Intuitively, one would just estimate the duration model using X and Wy as explanatory variables. However, given that Wy is endogenous, this strategy is infeasible, as it leads to an endogeneity bias. Rather, a two-stage instrumenting procedure will be applied: In step 1, the duration model is estimated by regressing the spatial lag Wy on X and its spatial lags WX, and the predicted spatial lag Wy is obtained. In step 2, the duration model is estimated by regressing y on Wy and X. To our knowledge, this quite simple approach to the modeJing of spatial leaming in a duration model has not been considered in literature.

143

Page 153: Symposium i anvendt statistik 2018

3. Data

Data were collected from different sources aiming at covering the determinants as specified above. These sources included The Statistical Bank (Statistics Denmark, 2017), The Key Figure Database (Ministry of the Interior, 2017), and The National Health Profile (2017). Li fe expectancy was available as an average for 2010-14, and all other data were for 2010, which represents the last wave of The National Health Profile. All data were collected for 94 of the 98 Danish municipalities, as life expectancy was not provided for four small island municipalities (Ærø, Fanø, Samsø and Læsø). Table 1 provides an overview ofthe data.

Table 1: Variables included in the study

Variable Definition Source Mean Std. Life expectancy Life expectancy (2010-14) STB 79,93 0,99 SES.factors:

Income Tax deductible income per KFD inhabitant (100.000 DKK) 1,57 0,28

Education % with higher education KFD 23,17 5,20

No education % without vocational education KFD 23,45 8,40 Unemployed % full-time unemployed KFD 4,64 0,94 Social Benefit % receiving social benefit KFD 3,50 1,05 Health Retired % health retired KFD 7,17 2,17

Health behavior:

Unhealthy Food % with self-assessed unhealthy cost NHP habits 4,85 1,09

Smoking % heavy smokers (> 15 cigarettes NHP per day) 11 ,24 2,23

Alcohol % drinking too much (21 glasses NHP for males, 14 for females) 10,19 2,02

Physical Inactive % reporting being physically NHP inactive in leisure time 27,01 3,00

Obese % obese (BMI :::: 30) NHP 14,51 2,81

Lack of relations:

Unwanted Alone % frequently being unwanted alone NHP 5,35 1,00

Health conditions:

SAH % reporting good or very good NHP health 84,59 2,54

Longstanding Illness % having longstanding illness NHP 34,25 2,40 Healthcare utilization:

Visits to GP % having seen GP within last three NHP months 78,05 2,37

Note. NHS: The National Health Profile; STB: The Statistical Bank; KFD: The Key Figure

Database

144

Page 154: Symposium i anvendt statistik 2018

Figure 1 (see Appendix) shows life expectancy by municipalities. Tendencies to spatial clustering of similar levels are present thus supporting the hypothesis of a spatial leaming effect.

4. Results

Table 2 shows the non-spatial and spatial models for life expectancy. Given the logarithmic form ofthe duration model, the coefficients should be interpreted as level­log effects. Thus, the statistically significant spatial spillover of 0.0031 implies that a life expectancy increase of 1 year in the neighbor municipalities is connected to a life expectancy increase of 0.31 percent in the municipality considered. Thus, a spatial leaming seems to be present in the life expectancy.

Tuming to the coefficients of the explanatory variables, not much support is given to the hypothesis of a connection between life expectancy and socioeconomic status as measured by income, education and unemployment. However, the proportion of social benefit receivers matters significantly; a one unit increase in percentage of receivers is connected to a reduction of close to 0.5 percent in life expectancy. Also, the percentage of health retired matters, as a one unit increase in this is connected to a 0.1 percent reduction in li fe expectancy.

Table 2. Duration models oflife expectancy

Non-spatial model Spatial model Variable Coefficient Std. Error Signif. Coefficient Std. Error Signif. Intercept 4,3921 0,0667 *** 4,1613 0,0971 *** W*life exp. (pred) - - 0,0031 0,0009 *** Income -0,0055 0,0047 -0,0027 0,0048 Education -0,0001 0,0005 -0,0004 0,0005 No education 0,0005 0,0003 * 0,0003 0,0003 Unemployed -0,0006 0,0009 0,0000 0,0009 Social Benefit -0,0046 0,0010 *** -0,0043 0,0009 *** Health Retired -0,0014 0,0007 ** -0,0012 0,0006 * Unhealthy Food -0,0014 0,0006 ** -0,0017 0,0006 *** Smoking -0,0001 0,0005 -0,0003 0,0005 Alcohol -0,0011 0,0004 *** -0,0008 0,0004 * Physical Inactive -0,0010 0,0004 *** -0,0010 0,0004 *** Obese -0,0009 0,0004 ** -0,0006 0,0004 Unwanted Alone 0,0001 0,0007 0,0006 0,0007 SAH 0,0008 0,0005 * 0,0006 0,0005 Longst. Illness 0,0008 0,0003 ** 0,0005 0,0003 Visits to GP -0,0002 0,0003 -0,0001 0,0003 Note. Significance indicated by*** (1%), ** (5%) and* (10%)

Considering lifestyle, unhealthy food habits, alcohol consumption and physical inactivity are negatively related to life expectancy. Obesity seems to be negatively

145

Page 155: Symposium i anvendt statistik 2018

related also, although only significantly so in the non-spatial model. Surprisingly, however, the percentage of heavy smokers is not significantly related to life expectancy.

Next, the effect of loneliness turns out to be insignificant for the spatial as well as the non-spatial specification.

Regarding health, there seem to be a positive connection between proportion in good health and life expectancy, while a somewhat unexpected positive relation is found between proportion with longstanding illness and life expectancy.

Finally, healthcare utilization as measured by GP visits seems unrelated to life expectancy.

5. Discussion

Regarding methodology, the study applied a parametric duration model. While this choice facilitates the inclusion of covariates, including the spatial leaming effect, the parametric specification relies heavily on assumptions regarding the statistical distribution of durations. Altematively, one may turn to semi-parametric or nonparametric specifications, which do not rely on such distributional assumptions (Kalbfleisch and Prentice 2002; Lancaster 1990).

Also, within the chosen framework, several choices can be discussed, including the choice of a unit scale parameter and Weibul distribution of the durations. Different scales and distributions may be considered (Kalbfleisch and Prentice 2002; Lancaster 1990). Finally, one may also consider specifications of hazard fimetions, where the risk of dying after receiving a certain age is modeled rather than the duration time itself (Kalbfleisch and Prentice 2002; Lancaster 1990).

Tue way of modeJing spatial effects can also be discussed. For the present study, leaming effects were considered, thus motivating the inclusion of an endogenous spatial spillover. It may also be possible to include a spatial spillover process in the residuals. For an overview, see Cizek et al. (2017), who also discuss how to catch up censoring (which is not relevant, however, for the present work) and ways of formulating spatial effects in a hazard function. Finally, one may also consider inclusion of spatially lagged explanatory variables. However, none of these are readily intuitive for the present study.

Tuming to data and results, restrictions should be pointed out. The data used are of an aggregate nature, thus replacing individuals with municipality averages. This may potentially open for causality problems, as individual effects are not necessarily the same as aggregate effects. Thus, the reported positive effect of proportion with longstanding illness on life expectancy may merely be the reverse, namely that a high life expectancy increases the risk ofbeing old enough to fall longstanding ill.

146

Page 156: Symposium i anvendt statistik 2018

Substituting individuals with aggregates may also cause loss of statistical variation. This may potentially explain why no relationship was found between percentage of heavy smokers and life expectancy.

Also, restrictions on the selected data may be important. Thus, utilization ofhealthcare was only measured by GP visits, which turned out to be insignificant. A more detailed specification ofhealthcare utilization may potentially Jead to significant results.

6. Conclusion

The present study adds to existing knowledge by estimating a positive spatial learning effect in life expectancy. Thus, if li fe expectancy is high in the neighbor municipalities, it will also be high in the municipality considered.

Furthermore, selected explanatory variables were found to affect life expectancy. These included some socioeconomics measures (percentages of social benefit receivers and sickness retired) while others were not significant (income, education and unemployment). Regarding lifestyle, no relationship was found between life expectancy and smoking behavior, while significant negative relationships were found with unhealthy food habits, physical inactivity, alcohol consumption and obesity. Being unwanted alone did not connect to life expectancy. Self-assessed good health and longstanding illness were positively related to life expectancy, however, only significantly in a non-spatial specification. Finally, healthcare utilization as measured by GP visits was not significantly related to life expectancy.

Future work may elaborate on the present study in different directions. One may be inclusion of more detail ed data, in particular on healthcare utilization. Another may be of a methodological nature by considering spatial hazard fimetions, residual spillover processes, semi- and nonparametric approaches, and different parametric specifications.

References

Anstey KJ, Kingston A, Kiely KM, Luszcz MA, Mitchell P, Jagger C. 2014. The influence of smoking, sedentary lifestyle and obesity on cognitive impairment-free life expectancy. International Journal of Epidemiology 43: 1874- 1883.

Bayati M, Akbarian R, Kavosi Z. 2013 . Determinants of Life Expectancy in Eastern Mediterranean Region: A Health Production Function. International Journal of Health Policy and Management l: 57- 61.

Blakemore S. 2015. Healthcare strategy tackles clients' lower life expectancy. Learning Disability Practice 16: 8-9.

Blazquez-Fernåndez C, Cantarero-Prieto D, Pascual-Saez M. 2017. Health expenditure and socio-economic determinants of life expectancy in the OECD Asia/Pacific area countries. Applied Economics Letters 24: 167-169.

147

Page 157: Symposium i anvendt statistik 2018

Cantarero, D., and M. Pascual. 2005. Socio-Economic Status and Health: Evidence from the ECHP. Economics Bulletin 9: 1- 17.

Cizek P, Lei J, Ligthart JE. 2017. Do Neighbours Influence Value-Added-Tax Introduction? A Spatial Duration Analysis. Oxford Bulletin of Economics and Statistics 79: 25-54.

Gaynor M, Seider H, Vogt WB. 2005. The volume-outcome effect, scale economies, and learning-by-doing. American Economic Review 95: 243-247.

Ho V. 2002. Learning and the evolution of medical technologies: The diffusion of coronary angioplasty. Journal of Health Economics 21: 873-885.

Hockenberry JM, Helmchen LA. 2014. The nature of surgeon human capital depreciation. Journal of Health Economics 37: 70-80.

Huesch MD. 2009. Learning by doing, scale effects, or neither? Cardiac surgeons after residency. Health Services Research 44: 1960-1982.

Kalbfleisch JD, Prentice RL. 2002. The statistical analysis of failure time data (Vol. 2.). Hoboken, NJ: Wiley-Interscience.

Lancaster T. 1990. The Econometric Analysis of Transition Data. New York, NY: Cambridge University Press.

Lee KCL, Sethuraman K, Yong J. 2015. On the hospital volume and outcome relationship: Does specialization matter more than volume?. Health Services Research 50: 2019-2036.

Li K, Hilsing A Kaaks R. 2014. Lifestyle risk factors and residual life expectancy at age 40: a German cohort study. BMC Medicine 12: 59 (10 pages).

Lubitz J, Cai L, Kramarow E, Lentzner H. 2003. Health, Life Expectancy, and Health Care Spending among the Elderly. The New England Journal of Medicine 349: 1048-1055.

Manuel DG, Perez R, Sanmartin C, Taljaard M, Hennessy D, Wilson K, Tanuseputro P, Manson H, Bennett C, Tuna M, Fisher S, Rosella LC. 2016. Measuring Burden of Unhealthy Behaviours Using a Multivariable Predictive Approach: Life Expectancy Lost in Canada Attributable to Smoking, Alcohol, Physical Inactivity, and Diet. PLoS Medicine 13 (27 pages).

McGinnis JM. 2016. Income, Life Expectancy, and Community Health. Underscoring the Opportunity. Journal of the American Medical Association 315: 1709-1710.

Mesman R, Westert GP, Berden BJMM, Faber MJ. 2015. Why do high-volume hospitals achieve better outcomes? A systematic review about intermediate factors in volume-outcome relationships. Health Policy 119: 1055- 1067.

Ministry ofthe Interior. 2017. The Key Figure Database, http://www.noegletal.dk.

148

Page 158: Symposium i anvendt statistik 2018

National Health Profile. 2017. Danskernes Sundhed 2013 (Health of the Danes 2013), http://danskernessundhed.dk.

O'Keefe JH, Patil HR, Lavie CJ, 2012. Exercise and life expectancy. The Lancet 379: 799.

Peters F, Nusselder WJ, Reibling N, Wegner-Siegmundt C, Mackenbach JP. 2015. Quantifying the contribution of changes in healthcare expenditures and smoking to the reversal of the trend in life expectancy in the Netherlands. BMC Public Health 15: 1024 (9 pages).

Preston SH, Stokes A. 2011. Contribution of Obesity to International Differences in Life Expectancy. American Journal of Public Health 101 : 2137-2143.

Preston SH, Stokes A, Mehta NK, Cao B. 2012. Projecting the Effect of Changes in Smoking and Obesity on Future Life Expectancy in the United States. NBER working paper series no. wl8407, Cambridge, Mass. National Bureau ofEconomic Research.

Reynolds SL, Haley W, Kozlenko N . 2008. Tue Impact ofDepressive Symptoms and Chronic Diseases on Active Life Expectancy in Older Americans. The American Journal of Geriatric Psychiatry 16: 425-432.

Sede PI, Ohemeng W. 2015. Socio-Economic Determinants of Life Expectancy in Nigeria (1980-2011). Health Economics Review 5: 1- 11.

Statistics Denmark. 2017. The Statistical Bank, www.statistikbanken.dk.

Van Gestel R, MUiler T, Bosmans J. 2017. Does my high blood pressure improve your survival? Overall and subgroup learning curves in health. Health Economics 26:1094-1109.

Varvarigos D. 2013. Environmental Dynamics and the Links between Growth, Volatility and Mortality. Bulletin of Economic Research 65: 314-331.

Westman J, Wahlbeck K, Laursen TM, Gissler M, Nordentoft M, Hallgren J, Arffman M, Os by U. 2015. Mortality and life expectancy of people with alcohol use disorder in Denmark, Finland and Sweden. Acta Psychiatrica Scandinavica 131: 297-306.

149

Page 159: Symposium i anvendt statistik 2018

Appendix.

M issif!lg 82

Figure 1. Life expectancy 2010-14

150

Page 160: Symposium i anvendt statistik 2018

Kvindekrisecentre

Anne Vibeke Jacobsen, Danmark Statistik

Indledning

Der er ca. 50 kvindekrisecentre i Danmark"

Kvinder, der er indskrevet på kvindekrisecentrene, er omfattet af Serviceloven § 109. Loven fastslår, at kommunalbestyrelsen skal tilbyde midlertidigt ophold i boformer til kvinder, som har været udsat for vold eller trusler om vold. Kvinderne kan medbringe børn, og de modtager omsorg og støtte under opholdet.

Tidligere opgørelser fra Socialstyrelsen viser, at der er i størrelsesordenen ca. 1.800 kvinder og ca. 1.800 børn om året, der bor på kvindekrisecenter. Antallet af ophold i dag er på ca. 2.000 om året.

I 2016 fik Danmarks Statistik til opgave af Børne- og Socialministeriet at indsamle oplysninger fra alle landets kvindekrisecentre til brug for en ny statistik. Formålet er at få udvidet viden om omfanget af vold i nære relationer, så man nem­mere kan tilrettelægge indsatser til de voldsudsatte kvinder.

Danmarks Statistik er derfor fra 1.1.2017 begyndt at indsamle data fra kvindekrise­centrene. Første gang data skal offentliggøres er i foråret 2018 for tællingsåret 2017. Vi kan derfor først vise resultater fra undersøgelsen til næste års symposium.

De data, som indsamles, er cpr.nr. på kvinden og evt. medbragte børn samt indskriv­nings- og udskrivningsdato på det enkelte ophold.

I dette papir beskrives metode til at indsamle data, samt nogle af de muligheder vi har for at berige de indsamlede oplysninger med registeroplysninger fra Danmarks Stati­stik.

Data indhentes med hjemmel i Lov om Danmarks Statistik § 6.

Dataindsamling

Det enkelte kvindekrisecenter skal for hver kvinde og eventuelle medbragte børn ind­berette oplysninger om cpr. nr. samt ind- og udskrivningsdato for opholdet til Dan­marks Statistik.

Hvis kvinden og de evt. medbragte børn har et anonymt ophold på kvindekrisecenteret, er det muligt kun at indberette anonymiserede oplysninger til Danmarks Statistik - dog

151

Page 161: Symposium i anvendt statistik 2018

med en opfordring til, at centeret om muligt angiver fødselsdatoer. Erfaringen fra So­cialstyrelsen viser, at ca. 6 pct. af kvinderne har anonymt ophold.

Da Danmarks Statistik ønsker at minimere centrenes svarbyrde, bliver de indsamlede cpr.-data beriges via vores forskellige registre. Pga. muligheden for anonymt ophold er det ikke muligt at berige alle de indsamlede oplysninger med registerdata.

Til at indsamle data har alle kvindekrisecentre modtaget et excel-ark.

Figur 1: Excel-ark til indberetning

L A I B c D E f

1 Upload det udfyldte regneark på: www.dst.dk/kkc -i

INDBERETNINGSSKEMA til Kvindekrisecentre 2..

4 Journalnummer 210109 -,, - 6

~ BrUøef: CPR-nr. lndsllriminpdllto .-nmlnpdllto Antalll!lrn(0-10) Bam1: CPR-nr. Bam2:CPR-nr.

<xxxxxx-xxxx> <dd-mm-Måå> <dd-mm-Wå> Udfyld <xxxxxx-xxxx> <xxxxxx-xu:x> efterf.tøende

8 CPR-nr. til h•jre

9 123456-9998 01--01-2017 23-05-2017 1 U3456-9999

10 U3456-9998 02-02-2017 2 123455-9998 U3456-9999

11 U3456-9998 03-02-2017 12-02-2017 0

12 123456-9998 05-07-2017 0

1 123456-9998 13 !U3456-9998

~U3456-9998 01-10-2017

15-07-2017 04-08-2017 2 123456-9998 123456-9999

G H

I

Hvert kvindekrisecenter har raet tilsendt et regneark med et fortrykt journalnummer. Nummeret er udviklet af Danmarks Statistik. Dette nummer er nøglen til at koble op­lysninger om krisecentret, såsom kontaktperson, adresse osv.

Det enkelte center gemmer regnearket på PC og udfylder det løbende. Hvert kvartal uploades regneark på virk.dk, som man enten kan tilgå direkte eller via Danmarks Sta­tistiks hjemmeside.

For at indsende arket skal man have en digital medarbejdersignatur. Arket bliver sendt til en sikker postkasse i Danmarks Statistik. Herefter arkiverer fagkontoret data på et sikkert drev, hvorefter data anvendes til statistikformål.

Processen er, at Kvindekrisecentret varsles i starten af perioden om, at de gerne må inddatere oplysninger løbende i regnearket. Før indberetningsfrist anmodes de om, at det nu er tid til at indberette. Efter indberetningsfrist sættes rykkerprocedure i gang. De f'ar 3 skriftlige rykkere. Herefter rykker vi telefonisk.

Til at styre varsling, anmodning og rykkere til kvindekrisecentrene, anvender vi IBS (IndBeretningsSystem).

152

Page 162: Symposium i anvendt statistik 2018

Figur 2: Oversigtsbillede i IBS.

lil

lrd>el~ 2MG-201'

~~ ~~pc;.tll f .......... pi~ V~~

--~

- Th. . - j!!'>~ -....... .... .... - ........ ---t - ~~--s ........ - _ Ntii_,MPR ..,. km u....-:J

For den enkelte indberetter anvendes systemet også til at lægge oplysninger ind, som er relevant for indberetningen.

Figur 3: Skærmbillede for den enkelte indberetter. " ..... -·~CVll•Ul'"-1-ln.M d ---ll*MUll. ~

--·-.... -~ ----

'"--....... c i:i.st# -· "'""1 -~---

-·--... " o~-eo.i;-·-p~.

1=-- Il '-;~::;-=.=~:;;,..:o:.=~::;_,::;===' ~ .... " -

- ·---]

':-r·-- -., - .,..

-- -·

Centrene er fra start blevet kontaktet af Børne- og Socialministeriet om den nye ind­samling. Samtidig er der blevet lavet en oplysningside i Danmarks Statistik, hvor cen­tret kan se vejledning om, hvordan man udfylder Excel-arket. Derudover deltog Dan­marks Statistik i det årlige LOKK (Landsorganisation af Kvindekrisecentre) møde, hvor vi fortalte om den nye dataindsamling.

153

Page 163: Symposium i anvendt statistik 2018

Figur 4: Indberetningsside

• DANMARKS STATISTIK

Forindbet'ctt~ I Stertindbet'etning I K-.ind<ekriHC1ntreo0 1rnbll4.antetitbud

DEL SIDENS I NO HOLD

'il in B

ANSVARLIG FOR SIDEN

Ethwrwl~" --­T391732800d.10-15)

Support til indberetning

KVINDEKRISECENTRE OG AMBULANTE TILBUD INDBERET PÅ VIRK.DK·

Fn:m«lreuet sbl 1 ir.dberette"N ~nMrt med dioit1l mederbejder3ic;Jnanr. Re;l'll!al'bt :skal lobende udfyldes.» dete.-0Qdaffreti.stor1enof hvertkvwt1l 09herefterind~tilo,-lsk.ts!lt'dnbnlgotdet.HIM'lltregne«tr.hele ..... tUplGlldelreoneark-vejledni"I!

tNDBERETNINGSFRISTER

1.kvt. 2017 28. iprit

2.kvt.2017 23.juli

27. okt

26.jan. 2018

VEJLEDNING

! Faglig vejledning Kvindekri.secenlre

t Indberetning af anonyme ophold

t Faglig vejledning Ambulante tilbud

KONTAKT

Support til indberetnintr

Kontakt 3lalistikkontoret

39173470

FORMÅL OG ANVENDELSE

Fonniilet med stati.!ltikken er al bel)'5e pel"50ne..-. der har været udsat for vold i nære relationer. Re.5ultateme af opgøreben bliver efter.5porgt a f bta. ministerier. politikere, brancheorganisationer, for-i kere og presse.

læ!I mere om indberetning til Danmark!I Stati!ltik

Vejledningerne ligger kun på hjemmesiden, er typisk på en side og fortæller kort, hvorledes data skal indberettes.

Det er bevidst valg, kun at have vejledning på hjemmesiden. Således vi l der altid ligge den gældende version.

154

Page 164: Symposium i anvendt statistik 2018

Figur 5: Vejledning til indberetning

Vejledning til indberetning Kvindekrisecentre November 2017

Denne vejledning beskriver, hvordan du indberet­

ter data om kvinder og børn på Kvindekrisecentre

§ 109 (midlertidigt ophold i boformer til kvinder

udsat forvold ellerti1svarende) til Danmarks sta­tistik ititlge Lov om Danmarks Statistik § 6.

Excel regneork

Du skal foretage dln indberetning via det excel regneark, som du har modtaget i d in åigilale

postkasse.

Flytning af data fra mail til drev

m DANMARKS STATISTIK

Anonymt ophold efte< § 109, stk. 2

såfremt kvinden har anonymt ophold, anføres

dette i kolonnen &uger: CPRnr ved at areive fildselsdato <ddmmåååå> efterfl.lgt af 9998, fx 010101-9998.

såfremt du ikke kender hldselsdato skriver du

123456-9998.

Hvis der er medbragte bt>m på anonymt ophold,

angives ftldselsdam efterfulgt af 9998, hvis det er

en pige, fx 010101-9998 eller 010101-9999, hvis

det er en dre~. Såfremt du ikke kender J..M..ls-

Regnearkene modtages som vedhæftet fil i postkassen. Der er udviklet en makro i Vi­sual Basic i Excel, som henter de vedhæftede filer og lægger dem på en sikker sti. Herefter kontrolleres at de korrekte vedhæftninger er overført, og de indsendte mails slettes.

Fejlsøgning

Det er første gang, at kvindekrisecentrene skal indberette, og det er derfor valgt at gøre indberetningsformen så smidig som muligt. Der er minimalt af begrænsninger på mu­ligheder for indtastning i regnearket.

Dette stiller større krav til den efterfølgende fejlsøgning.

Der bliver fejlsøgt for en række forhold. Kvinden skal være 18 år og derover, kvinden må ikke bo på flere centre i samme periode, der må ikke være dubletter, udskrivnings­datoen skal ligge efter indskrivning, cpr.nr. skal være korrekt. Der kan også være man­gler, så som manglende datoangivelse eller antal børn stemmer ikke med de oplyste cpr.nr. Der må ikke skrives andet i felterne, end hvad der skal indberettes.

Udover disse fejl har der har været lidt startvanskeligheder.

Nogle centre har kun indberettet afsluttede ophold. Imidlertid skal alle indskrivninger indberettes - også dem, hvor kvinden ikke er udskrevet endnu.

Nogle centre har indberettet alle kvinder anonymt. Det er dog kun kvinder, som deci­deret har anonymt ophold(§ 109, stk. 2), hvor centret kan undlade at indberette cpr.nr.

155

Page 165: Symposium i anvendt statistik 2018

Da sikkerheden for disse kvinder er højt prioriteret, var centrene stærkt bekymret for, hvem der fik adgang til data.

De største udfordringer i fejlsøgningen har været at finde dubletter i ophold samt at finde ukurante cpr.nr. Nedenfor vises den tekniske løsning på dette.

Dubletter i ophold betyder, at den samme kvinde/børn har ophold to steder samtidigt.

Dubletter i ophold blev løst ved dette datastep i SAS:

Figur 6: Datastep til dublethåndtering data tael3 ; obsl=l; do while (obs! <= nobs); set tael2 (drop=indskriv2) nobs=nobs; obs2=obsl+l; t:;et tael2(drop=indskriv2 rename=(person_id = pid2 ind = ind2 ud = ud2 indskriv = jndskriv2 udskriv= udskriv2)) point=obs2;

fpid2=person id AND udskriv>indskriv2 then fejl_dato=l; else fejl_dato=O;

output; obsl+l; end;

*drop fejl_ dato obs! pid2 ind ud ind2 ud2 indskriv2 udskriv2 ind2 ud2; "un;

Input er fx således (data er fiktive):

Figur 7: Input af forløbsdata

PERSON ID I indskriv I udskriv I 2222750 17/01/2017 30/04/2017 2222750 22/04/2017 2429352 22/05/2017 24/ll!il.2017

Og i outputtet kan man se forløbet fra indeværende og næste periode.

Figur 8: Output af forløbsdata PERSON_ID I ndskriv I udskriv I

2222750 17/01/2017 30/04/2017 2222750 22/04/2017 2429352 22/05/2017 24/ll!il.2017

pid2 I ndskriv2 I udskriv2 I 2222750 22/04/2017 2429352 22/05/2017 24/05/2017 2429352 24/ll!il.2017 30/05/2017

Page 166: Symposium i anvendt statistik 2018

Ved at sammenholde 'udskriv' med 'indskriv2' kan man hurtigt se i SAS, hvor der er overlap mellem ophold.

Såfremt der kun er en enkelt dag i overlap korrigeres udskrivningsdatoen i forhold til den næste indskrivning i det offentliggjorte datasæt.

De indberettede cpr.nr. parres med Befolkningsregistret. For de numre, der ikke kan genfindes, og som tydeligvis ikke har anonymt ophold, men hvor der har været en ind­tastningsfejl, forsøges det at finde numre via allerede indberettede cpr.nr.

Her anvendes funktionen SPEDIS. Det kan være meget forskelligt hvor i cpr.nr., der er inddateret forkerte cifre.

Figur 9: Spedis

data opr_pnr_&i; set pnr; opr _pnr=&opr _pnr;

Distance = spedis(pnr,opr_pnr);

if distance=<40 then output;

run;

Figur 10: Eksempel på output med SPEDIS (data er fiktive) Pnr opr _pnr distance 1008099250 2008099250 20 1801016802 0203165005 71 1508045799 1508048799 10 2202097211 2102097211 7

En makro kører det enkelte forkerte cpr.nr op mod alle cpr.nr i registret.

SPEDIS beregner, hvor tæt distancen er mellem det oprindelige cpr.nr og det cpr.nr, som er fundet. Man kan her sætte en overligger for, hvor lang distancen skal være. Jo længere distancen er, dvs. hvor stor forskel, der er mellem de to numre, så højere vil distancen være. Ved at prøve sig frem, kan man med SPEDIS sætte en distance, hvor man helt sikkert rammer det korrekte cpr.nr.

I første omgang er denne metode valgt pga. tidsnød. Det er hensigten, at kontakte det enkelte kvindekrisecenter for de cpr.nr, vi ikke kan genfinde i Befolkningsregistret.

157

Page 167: Symposium i anvendt statistik 2018

Det er planen, at i det regneark som kvindekrisecentrene indsender hvert kvartal, skal indeholde data for indeværende år og de to foregående år. Generelt er revisionspoli­tikken i Danmarks Statistik, at det er muligt at korrigere data 3 år tilbage. Efter 3 år vil Danmarks Statistik fjerne de afsluttede forløb for det første år og returnere regnearket til centret.

Publicering af data

Alle data vil være tilgængelige under Forsker- og Ministerieordningen. Der vil være to datasæt. Et datasæt med oplysninger om kvinderne på centrene og et datasæt med op­lysninger om de medbragte børn.

Figur 11: Moduldata med kvinder (data er fiktive).

"'"'"""""" - .... - """""' 30JUN2017 STATUS_MR 1 17NOV2018 01.JAN2017 30JUN2017 STATUS_MR , 17NOY2918 01JAN2017

8917931 J210103 30JUN2017 STATUS_MR 1 17NOV2018 01JAN2D17 28076062 .1279101 JOJUN2017 STATUS_MR 1 17NOV2018 01JAN2017 53494524 J221701 JOJUN.2017 STATUS_MR 1 17NOV2018 01JAN2017 62217420 .1279101 JOJUN2017 STATUS_MR 1 17NOV2018 D1JAN2017

6221736U J254001 30JUN2017 STATUS_MR t 17NOV2018 01JAN2017

Figur 12: Moduldata med medbragte børn (data er fiktive).

" bam" """""""" - .... - """"" """"" 8919990 30JUN20l7 STATUS 17NOV2018 01JAN2017 12258946 JQJUf\0017 STATUS 17NOV2018 01JAN2017 181306G4 30JUN2017 STATUS 17NOV2018 D1JAN2017 1.98272'9 JOJUN2017 STATUS 17NOV2018 01JAN2017 31713082 JOJUN2017 STATUS 17NOV2018 D1JAN2017

Data vil være tilgængelige i vores Statistikbank. Det er planen at udgive data på alder, herkomst og kommuneniveau. Pga diskretionering vil det højst sandsynligt blive op­delt i flere tabeller.

Figur 13: Statistikbank KRJSE2: OPHOLD OG BEBOERE PÅ KVINOEKRISECENTRE EFTER VARIGHED, HERKOMST, BEBOERSTATOS OG ALOER

Vælg Udv;ialg v1~ ~øgning

VARIGHED(?) ~:.~·-·:7:--~'.'l~ .. -. -----------~~ ~=:::;:_ ," U...W2d9gn "_ 7·30<19gn ) l -9Qdool;l 91-180dll<jn 1151 ·365~ 0..r loir ._

ALOEFl: (1)

Aldtri•!!. 18-24'1 25·29 • 30-39• 4-0-490r 506ragCM>I' ..,.,... ....

". Pw30llefmed~optinde!M.~~ 1nmni1,,.,,. ..,.,.. ........

ÅR -r-~-------~-------~

lndskrøm<1kWickr l~eb«n 0pl'll;lldmedb9111 llldskrivnoll!l'lf AlskJttOO.i b:WI

Page 168: Symposium i anvendt statistik 2018

Berigelse af data

Ud fra de få oplysninger, der indsamles fra kvindekrisecentrene er det muligt for Danmarks Statistik at analysere, hvordan kvinderne generelt klarer sig efter et ophold med hensyn til beskæftigelse, bolig og uddannelse. Fx kan man ud fra Danmarks Stati­stiks flytteregister undersøge, om kvinden flytter adresse efter ophold på et kvinde­krisecenter. Man kan også se ud fra Befolkningsregisteret, hvilken herkomst kvinden har, og hvornår hun er flyttet til Danmark.

Omkring sundhed er der registre for lægebesøg og hospitalsindlæggelser, som det kan være interessant at sammenholde med befolkningen generelt.

For de medbragte børn er det muligt at koble data sammen med registre om udsatte børn og unge og underretninger.

Det videre arbejde

Der skal udvikles en tilbagemeldingsrapport til hvert center, hvor de vil modtage summerede data. Fx antal indskrivninger, antal forløb, antal børn fordelt på kvartaler. Det er også overvejet om hvert center skal have en summeret tabel, hvor de kan se data for hele landet.

159

Page 169: Symposium i anvendt statistik 2018

Empathy variation in 11:eneral practice: A survey among Danish General Practitioners

Justin A. Charles 1'2, Peder Ahnfeldt-Mollerup2, Jens Søndergaard2, Troels Kristensen2

1 Center for Medical Humanities, Compassionate Care, and Bioethics, Stony Brook University, Stony Brook, New York USA.

2 Research Unit ofGeneral Practice, Department of Public Health, University of Southem Denmark, J.B. Winsløws Vej 9, DK-5000 Odense C, Denmark.

Work in progress, do not refer to or cite without permission from the authors. This version: December 2017

160

Page 170: Symposium i anvendt statistik 2018

Abstract

Background: High levels of physician empathy have been correlated with improved patient health outcomes and physician job satisfaction. Knowledge about variation in empathy and related general practitioner (GP) characteristics may allow for amore informed approach to improve empathy among physicians as a modality for healthcare quality improvement.

Objectives: Our objectives are to measure and analyze vanat1on in physician empathy and its association with GP demographics, professional characteristics, and job satisfaction.

Methods: 1,196 Danish GPs responded to a survey containing the Danish version of the Jefferson Scale of Empathy for Health Professionals (JSE-HP) and questions related to their demographics, professional characteristics and job satisfaction. Random effect logistic regression analysis was performed to explore the association between empathy levels and the included GP characteristics.

Results: Empathy scores were negatively skewed with a mean score of 117.9 (SD±l0.1). Increased odds for high empathy scores were associated with GP employment outside of their practice and those who state that the physician-patient relationship, intellectual stimulation, and interaction with colleagues have a strong contribution to their job satisfaction. GPs aged 45-54 were Jess likely to have high empathy scores. Neither gender, nor Iength oftime since specialization, length oftime in current practice, practice type, practice location, or job satisfaction were associated with physician empathy.

Conclusion: There is variation in physician empathy levels among this population ofDanish GPs. This variation is positively associated with outside employment and strong values of interpersonal relationships, and negatively associated with middle age. There is room to increase physician empathy with previously tested interventions, which can possibly lead to improved healthcare outcomes.

161

Page 171: Symposium i anvendt statistik 2018

Introduction

Physician empathy is defined as "a cogmt:J.ve attribute that involves an ability to understand the patient's inner experiences and perspective and a capability to communicate this understanding" .1 Some experts argue that empathy is primarily determined by heritability and early life experiences and therefore cannot be changed.2

Others believe that a person's environment and relationships throughout life can altera person's empathic capabilities under the right circumstances.3•4

Higher physician empathy levels are associated with improved health outcomes, increased patient and physician satisfaction, and decreased physician bumout.5-7 Empathy plays an important role in the relationship between patient and GP, as it facilitates the trust and understanding that allows for effective communication of medical information and reduced emotional burden in both parties.8•9 There is a gap between the desired and actual levels of empathy in this relationship due in part to organizational factors, time pressure, and ind i vi dual variation in the empathic capabilities of GPs.10

The degree and causes of physician empathy variation in the GP population are not well understood, but believed to partially depend on an individual's characteristics, including age, gender, professional characteristics, such as practice type and clinical experience, and job satisfaction. 11 ' 12 Analyzing the extent of this variation and its association with GP characteristics may inform more targeted educational and organizational interventions to improve empathy in primary care.

Such interventions are needed now more than ever due to the progressive decline in empathy that has been observed among younger people in recent years. 13 Proposed reasons for the decline include an increased pervasiveness in technology, which has reduced face-to-face communication, and an increase in narcissistic personality traits.13,14

Coupled with the documented decrease in student empathy that often occurs throughout medical school, improving empathy has become an important issue in the field of medical education. 15 To address this issue, medical schools, such as University of Southem Denmark, have started to introduce narrative courses in attempts to improve empathy skills among future physicians. Courses in communication skills, literature, and art have been incorporated to improve empathy in both practicing physicians and medical students with promising initial results.16' 17 There is no systematic, formal empathy training among practicing physicians to the hest of our knowledge.

In this study, we aim to measure and analyze variation in physician empathy among Danish GPs, and to explore associations between selected GP characteristics and physician empathy. We hypothesize that Danish GPs will have similar empathy levels to other studied GP populations and higher empathy levels than more "technology-oriented"

162

Page 172: Symposium i anvendt statistik 2018

specialists. 1•18 We also hypothesize that there will be variation in empathy levels among our study population due to differences in individual characteristics. Finally, we postulate that part of the variation in physician empathy can be explained by differences in demo graphics, professional characteristics, and job satisfaction.

Methods:

Survey A web-based survey was sent to GPs currently practicing in Denmark. The survey included the Jefferson Scale of Empathy for Health Professionals (JSE-HP) to quantify physician empathy.

The JSE-HP is a 20-question survey that measures self-reported physician empathy using questions on a 7-point Likert scale (1 = strongly disagree, 7 = strongly agree). Scores range from 20-140, where a higher score indicates a more empathic behavioral orientation. Originally created in English, it has been translated into 55 languages, including Danish. Evidence of its validity and reliability is well established in its use among health professionals in international settings, including Denmark.1•19•21 Minor changes were made to the Danish version ofthe scale to better reflect the meaning ofthe English version. These were tested in a pilot study to a small sample of Danish GPs before being implemented in the final version ofthe survey.

In addition, an addendum to the JSE-HP was created to capture information about GP characteristics and determine their potential association with physician empathy. The foliowing characteristics were included: a) demographics (gender, age), b) professional experience (time since specialization, time spent practicing in current clinic, practice type, practice location, employment outside ofGP practice) and c) various dimensions of job satisfaction.

Gender was included as female physicians often have higher self-reported empathy scores than do males.22•24 Age was included to explore a potential link with empathy among GPs because evidence indicates that empathy may change as a function of age.25•26 Empathy levels have been shown to increase with clinical experience (independent of age) and with longitudinal patient relationships.27•28 Therefore, time since GP specialization and time spent practicing in current practice were included as measures of clinical experience and continuity of care respectively.

Practice type was functionally split in our study between "partnership practices" who share patients, and "non-partnership practices" who do not. The increased autonomy of the latter may affect empathy through its relation to better maintenance of long-term

163

Page 173: Symposium i anvendt statistik 2018

relationships.29 There may be empathy differences among GPs who practice in these different locations because healthcare needs may vary between urban and rural regions.30

A GP 's employment outside of their clinic has the potential to increase physician empathy and may reflect a capacity to engage in activities related to their outside interests or self-reflection.31•32 Job satisfaction was measured by a statement of five categories from very unsatisfied to very satisfied. Previous studies demonstrated a bidirectional relationship between job satisfaction and empathy. 11'33 GPs were also asked how much the following factors contributed to their job satisfaction: physician-patient relationship, prestige, intellectual stimulation, interaction with colleagues, and economic profit. These factors were rated on a 1-7 Likert scale, with 7 representing strongest contribution to job satisfaction, and were chosen because they are commonly implicated as important in choosing a career as a physician.34

Sample se/ection The survey contained 20 covariates. Based on this number, the preferred sample size was about 400 GPs. The survey was forwarded to around 1200 GPs with the expectation of at most, a 50% response rate.35

The sample was selected from a list of 2,926 email addresses of GPs from the General Practitioners Organization (PLO) from 2013. GPs who have stopped working since then were excluded. Empiric evidence indicates that GPs may have different characteristics across practice types and locations.6•36 To address this, a "stratified proportion allocation" was applied to a random sample of GPs from different subpopulations. Six strata were created based upon combinations of practice type (partnership, non-partnership) and practice location (urban, rural, mixed urban-rural) and are shown below. Categorization ofthe municipality type and practice type were based on a 2011 report and PLO registry data respectively.37 71 .05% ofthe 2,926 GPs were from partnership practices and 28.95% were from non-partnership practices. The distribution of age intervals was as follows: 10.8% (35-44); 33.3% (45-54); 42.7% (55-64) and 13.3% (65+). 48.4% were females.

To create a random stratum distribution, we first determined the proportion ofthe 2,926 GPs that fit each of the 6 strata. Then, we multiplied this proportion to our desired number of survey recipients (n= l200) rounded down to the nearest person. Random numbers in the interval [O;l] were assigned to the GPs in each stratum. These were then ordered and GPs with a number below the calculated relative proportion from the strata were included in the sample. This produced a random strata distribution of 1,195 GPs as follows: a) Non-partnership practice AND rural municipality: 5.91%, 70 GPs; b) Non­partnership practice AND heterogeneous municipality: 7.83%, 93 GPs; c) Non­partnership practice AND urban municipality: 15.21 %, 182 GPs; d) Partnership practice

164

Page 174: Symposium i anvendt statistik 2018

AND rural municipality: 15.65%, 187 GPs; e) Partnership practice AND heterogeneous municipality: 32.91 %, 394 GPs; f) Partnership practice AND urban municipality: 22.49%, 269 GPs.

Our survey was emailed to these 1,195 Danish GPs using SurveyXact software, accompanied by a cover letter describing the study. Participants were offered the equivalent of $20 for their time as per protocol of the Danish Multipractice Committee who approved our study. Two reminder emails were sent to non-respondents 2 and 4 weeks after the initial email.

Statistical Analysis:

A random-effect logistic regression analyses was applied to examine the contribution of the included GP characteristics to being a "high-scorer" on the JSE-HP (score of2'.120). This is due to our decision to distinguish empathy score (ES) between "high-scorers" and "low-scorers" (score < 119) similarly to in a previous study1, as well as due to clustering of respondents within GP clinics, which allows for correlated observations among GPs within the same practice. Themodel takes the foliowing form:

ES;j = /300 + f3'xij + uj + E;j

Where the dependent dummy variable ESii was defined by

ES·. = { 1 if ES ;::: 120 11 0 otherwise

(1)

(2)

The parameter Xij in (1) is a row vector of explanatory variables containing characteristics ofrespondent i= l..n in clinic j . The term ui is the random effect ofbeing in group j where ui -N(O,cr}). ~ represents within group change. This model allows the probability to vary from clinic to clinic and Eij is the residual at respondent level.

Responses to questions regarding gender, practice location, practice type, job satisfaction and employment outside of clinic were coded with dummy variables. A Wald test was used to test the overall significance of the model. The intra-class correlation coefficient was used to estimation the proportion of overall residual variability linked to the clinic level. The variance inflation factor (VIF) was used to measure the extent of multi­collinearity among covariates, signified by a VIF> 10. Since there is potential for correlated observations among providers within the same practice, random-effect was used. All analyses were performed using Stata Version 14 (Stat/IC, College Station, TX, USA).

165

Page 175: Symposium i anvendt statistik 2018

To analyze over- and under- representativeness of the respondents with respect to the stratification criteria, the ratio between the proportions of respondents who belong to each stratum in the sample and the population was calculated. A value over 1.00 reflects overrepresentations and vice versa. The representativeness of gender and age groups were also assessed via these ratios. Self-reported physician age and gender were compared to registry data to determine the validity ofthe survey responses and correct discrepancies.

Results:

Of the 1,195 GPs who were sent a copy of the survey, a total of 464 GPs from 406 practices completed the entire questionnaire (response rate of 38.8%). There was a minimum of I and maximum of 3 GPs from any one practice, with an average of 1.1 GPs per practice.

The descriptive statistics for the 464 GP respondents' demographic and clinic are shown in Table 1. The average GP respondent is 54.9 years old and 53.4% were males. Most of the respondents work in partnership practice (72 .0%) and 49.3% worked in urban Jocations. The average GP has been specialized for 19.08 years and been in their current practice for I 7 .19 years.

Tabte 1: Empathy scores, demographic and professional characteristics among Danish GPs Characteristic Mean(SD)/ cv p5 median p95

Percenta e Empathy Score 117.85 (10.09) 0.09 99 118 135

Demographic Characteristics Physician Age 54.91 (7.86) 0.14 42 55 66 Gender

Male 53.4% Female 46.6%

Professional Characteristics Practice Location

Urban practice 49.3% Rural practice 17.2% Mixed practice 33.4%

Practice Type Partnership practice 72.0% N on-partnership 28.0%

Employment Outside of Clinic Yes 54.5% No 45.5%

Y ears since GP specialization 19.08 (8.27) 0.43 7 19 32

Y ears in present practice 17.19 (16.74) 0.97 4 15 31

166

Page 176: Symposium i anvendt statistik 2018

Table I also displays descriptive statistics of the GPs' scores from the Jefferson Scale. Tue scores varied from 80-140. No GPs scored in the range from 20-80. The mean and median scores were 117 .8 and 118 respectively. The extent of variability of the score is shown via standard deviation (SD), coefficient of variation (CV) and the 5th and 95th percentiles (p5 and p95r). Over half (54.5%) of GPs had additional employment outside oftheir main clinic, such as research, teaching, and political activity.

71.5% ofthe respondents where from partnerships practices (13.8% rural, 25.4% mixed, 32.3% urban municipalities) and 28.4% where from non-partnership practices (3.4% rural, 8% mixed, 17% urban). The representativeness-ratio for each ofthe strata were single-handed/ non-partnership (a) rural 0.575; b) mixed 1.02; c) urban 1.12), partnership ( d) rural 0.88; e) mixed; 0. 77, t) urban 1.44 ). This shows an overrepresentation of respondents in partnership practices in urban locations and an underrepresentation of respondents from mixed municipality locations. The distribution of age intervals are as follows: 11.4% (35-44); 36.0% (45-54); 41.8% (55-64); 10.78% (65+). Tue representativeness ratios for age ratios are as follows : 1.06 (35-44); 1.09 (45-54); 0.98 (55-64); 1.23 (65+). Therefore, there was no significant difference between the ages of those who responded to the survey and those who did not.

Figure 1 shows the variations in empathy score across the ordered values ofGP respondents. For instance, a score of 108 or below is achieved by the initial 0.20 fraction ofrespondents. In contrast the fraction ofrespondents above .80 have a score of 127 or higher. A histogram ofempathy scores also demonstrates a negative skew ofthis data (not included). Thus, the results demonstrate variation in empathy scores among different subsets ofrespondents.

.2 .4 .6 .8 Fraction of respondents

Figure I: Ordered values of the empathy score versus fractions of respondents

167

Page 177: Symposium i anvendt statistik 2018

The results regarding job satisfaction are shown in Table 2. Most respondents (79.7%) were at least somewhat satisfied with their jobs as GP, while only 9.9% were somewhat or very unsatisfied. GPs also reported that the physician-patient relationship contributed the most to their job satisfaction (6.71/7) and that prestige contributed the least (3.71/7).

Table 2: Job satisfaction among sample ofDanish GPs Characteristic Mean (SD)/Percentage CV Job Satisfaction

Somewhat or very satisfied 79. 7%

Neutral

Somewhat or very unsatisfied

Contribution of Medical Practice Factors to Job Satisfaction•

10.3%

9.9%

p5 median p95

Physician-patientrelationship 6.17 (0.81) 0.13 5 6 7 Jntellectual stimulation 5.65 (1.06) 0.19 4 6 7 Interaction with col/eagues 5.41 (1.40) 0.26 3 6 7 Economic profit 4.92 {1.24) 0.25 2 5 7 Prestige 3.71 {1.5) 0.41 4 6

"These items were rated on a 1-7 Likert scale, with 7 representing strongest contribution to job satisfaction

Table 3 summarizes results of the logistic regression in terms of odds ratios (OR). Overall, the model shows that the empathy score was associated with a part of the GP characteristics among respondents in the sample. Males GPs did not have higher odds for high empathy score than female GPs. GPs in the age range of 45-54 had a 56% decrease in the odds for a high empathy scores compared to the youngest age group, indicated by an OR of 0.44 (p=0.036). A greater number years since GP specialization or years in present practice did not influence odds for higher empathy. This Jack of an association also existed for practice locations and practice types, which do not irnpact odds for a high empathy score. In contrast, GPs who were not employed outside of clinic had a 41 % decrease in the odds for high empathy compared to the reference group with employment outside of clinic (OR = 0.59; p=0.016). Physicians who believed that the physician­patient relationship (OR = 4.30; p<0.0001) and interaction with colleagues (OR =1.90; p =0.006) were ofhigh importance to their job satisfaction had significantly higher odds for a high empathy scores. Those who viewed intellectual stimulation as having high importance to job satisfaction had a slightly below significant increased OR (1.53; p=0.053). The estirnated intra-class correlation coefficient revealed that the overall residual variability was not significant association with the clinic level. The variance inflation factor (VIF) did not reveal multi-collinearity.

168

Page 178: Symposium i anvendt statistik 2018

Table 3: Logistic regression: Odds ratios for High/low empathy scores for Danish GPs Cllaracteristic Odds Ratio [95% Cl{ p-value Re(erence Group

Gender

Male Age

45-54 55-64 65+

Practice location Urban Rural

Practice type Non-partnership

Employment Outside Clinic No

Y ears since GP specialization 14-22 23+

Y ears in present practice I 1-19 20+

Job Satisfaction Somewhat or V ery Satisfied

Contribution of Medical Practice Factors to Job Satisfaction

Physician-patient relationship High (6-7)

Prestige High (6-7)

Jntellectual stimulation High(6-7)

lnteraction with col/eagues High (6-7)

Economic profit High (6-7)

Number ofrespondents (N) Number ofgroups Wald Chi2

Variance inflation factor (mean) Intraclass correlation coefficient

NS= Not Significant

Discussion:

Demographic Characteristics

0.91[0.59, 1.41) NS

0.44 [021 , 0.95] 0.036 0.65 [026, 1.64] NS 0.74 [023, 2.35] NS

Professional Characteristics

I.Ol [0.65, 1.57] NS 0.93 [0.52, 1.68] NS

1.51 [0.93, 2.46] NS

0.59 [0.38, 0.91] 0.016

1.19 [0.60, 2.38] NS 1.15 [0.47, 2.84] NS

0.92 [0.49, 1.74] NS 0.68 [0.31 , 1.48] NS

Job Satisfaction Characteristics

0.95 [0.57, 1.69] NS

4.30 [2.14, 8.64] <0.0001

0.91 [0.48, 1.74] NS

1.53 [0.99, 2.37] 0.053

1.90 [1.20, 3 .0 l ] 0.006

1.09 [0.72, 1.67] NS

464 406 42.03 0.0002 2.24 <0.0001 0.497

Female

3544 3544 3544

Heterogeneous Heterogeneous

Partnership

Yes

0-13 0-13

0-10 0-10

Neutral satisfaction or Jess

Low and Medium (1 -5)

Low and Medium (1-5)

Low and Medium (I -5)

Low and Medium (l-5)

Low and Medium (1-5)

As predicted, empathy scores in our study population (mean=l 17.8) were like those of Danish GPs (mean: 11 7.2)21, American primary care doctors (mean: 116.6)38, and French GPs (mean: 111.&)31, and greater than those of American diagnostic radiologists (mean:

169

Page 179: Symposium i anvendt statistik 2018

110.7)24, American urologists (mean: 113.8)24 and Korean technology-oriented specialists (mean: 106.9).23 A possible explanation for this phenomena are that medical students with higher levels of empathy choose specialties that involve a greater deal of interpersonal interaction.39

There was variation among physician empathy in our study population, indicated by a wide range (80-140) of scores that were negatively skewed, with a near even split between high and low-scorers. This signifies that, as hypothesized, there is a subset of Danish GPs in which empathy levels can ideally be increased. Several studies demonstrate that the use of mindfulness interventions, communication skills training and standardized patient encounters can increase empathy in medical students and physicians.38•40•41 These tindings can aid in the development of cultivating formal, organized, empathy training to be used both in undergraduate and continuing medical education.

One of the goals of this study was to determine what factors are associated with GP empathy variation, which could inform future attempts to develop empathy interventions targeted for GPs. Our results indicated a negative association between physician age of 45-54 and being a high empathy scorer, yet no association between empathy and length of time since specialization, as a measure of clinical experience. This may be related to increasing levels of bumout in the medical field.42 Another explanation is that this age group has increased stressors outside of work, which decrease their capacity for empathy. 1° Future studies involving a longitudinal analyses of empathy levels in physicians over their career may help better explain these results.

There was no association between gender and physician empathy in our study, which was consistent with several other studies.43 '44 However, much of the literature describes that female physicians have higher empathy scores than do male physicians.22-24 This relationship has been attributed to biological, as well as leamed social and cultural factors.45 The Jack of female-bias in our and other analyses may indicate that the influence of these factors vary in different populations.46 Further empirical data may be needed to determine the involvement of various facets of gender in GP physician empathy, ideally among societies with varying degrees of gender equality and gender roles among GPs.

There was no relationship between level of job satisfaction and empathy score in this survey, despite previously documented associations in the literature.11•33 This could be accounted for, in part, by the clear majority of GPs in this population that were at least somewhat satisfied with their jobs. Without a more uniform distribution of job satisfaction in survey participants, it is difficult to effectively evaluate this relationship

170

Page 180: Symposium i anvendt statistik 2018

GPs who stated a strong importance of the physician-patient relationship to job satisfaction had a greater than four-fold likelihood of being a high empathy scorer. GPs often report their most gratifying moments involve relationships with their patients, especially if they demonstrate empathy.8•47 As expected, importance of interaction with colleagues was also associated with high empathy scores, Jikely because interpersonal interaction and communication with others is closely tied to empathy. Importance of intellectual stimulation to job satisfaction predicts higher empathy scores in GPs as well. The JSE-HP measures physician empathy as a cognitive attribute that requires understanding patient perspective, a task that may improve clinical competence through its intellectual stimulation.20 Therefore, being a strong empath may provide an additional layer of complexity that is rewarding to those who gain pleasure in intellectual pursuits.

Strengths and /imitations: A strength of this paper is that it explores empathy specifically in a population of GPs, who in their role as gatekeepers in the healthcare system, have a significant impact on patient care.48

Much of the current work studying physician empathy is done in populations of medical students, who have Jess involvement and responsibility in the longitudinal care of patients. As empathy is related to career experience, bumout, and job satisfaction, it is difficult to extrapolate those results to apply to well-established physicians.11•27

Another strength of this study is that the sample comprised as much as 15.8% of the population of GPs from the entirety of Denmark, while previous studies analyzing empathy in Danish physicians focused on one region.21 '49 It is also a strength that this study used stratified, rather than simple random sampling with respect to urban/rural status and practice type. Still, it is a limitation that non-respondents (61.2%) reduced the balanced proportional representativeness of the subsamples. This type of selective non­response may impede valid inference. In addition, sampling stratification was limited only to practice type and Jocation, while other GP characteristics that were not included may have created bias.

Using the JSE-HP to measure physician empathy offers both strengths and limitations to this study. Preliminary validation of the Danish-translated version of the JSE-HP in the Danish context allows for its use in Denmark. However, it has not been as extensively evaluated as the original English version.21 Further work should be done in examining the psychometrics of our and other samples to further validate the JSE-HP for use in Denmark.

171

Page 181: Symposium i anvendt statistik 2018

The JSE-HP fimetions as a self-report measure, which creates some !imitations to its use. It does not directly measure physician behaviors or patient-perception of clinician empathy, which may not correlate with self-report metrics.50 However, a correlation between GP self-reported empathy and patient-perceived GP empathy has been documented.51 In addition, individuals taking a self-report survey may also be dishonest to give a good impression, but this has not been shown to occur to a large extent with the JSE-HP.52 Future studies that examine the link between physician self-reported empathy, observed physician behaviors and patient perception of empathy can help elucidate their association.

The JSE-HP asks only about cognitive and not affective components of empathy, which may limit measurement of all aspects of physician empathy.53 However, it effectively differentiates empathy from sympathy, which involves directly feeling and experiencing a patient's suffering, and therefore may result in physician bumout and compassion fatigue if used excesssively. 54 Analysis of physician empathy with different psychometric measures that involve different definitions of empathy, and with both qualitative and quantitative components can add strength to the empathy Iiterature.

One !imitation of this study is that the cross-sectional, multivariate analysis cannot ascribe causation between empathy and the examined characteristics. Further studies should include a longitudinal study of physicians and their empathy levels as well as interventions that aim to improve physician empathy among GPs. This model divided respondents by empathy score into "high-scorers" and "low-scorers". Because there were no pre-established cut-offs for the JSE-HP, this distinction was determined by the mean score of a pre-existing study. 1 However, results using this model were consistent with a univariate, continuous analysis of empathy scores that was not included in this study, indicating that this selected cutoff may have been appropriate. Further studies should be done that establish specific cutoffs among different populations of physicians to better allow determination ofhigh and low empathy scores.

Concluding Remarks:

When analysis is adjusted for gender and other GP characteristics, GPs who have outside employment and value interpersonal relationships tend to have higher empathy scores, while GPs in the middle age category tend to have lower empathy scores. The groundwork Iaid by quantifying and qualifying variation in physician empathy can help in the development of targeted interventions that may improve empathy in subsections ofthe GP population.

172

Page 182: Symposium i anvendt statistik 2018

References I. Hojat M, Gonnella JS, Nasca TJ, Mangione S, Vergare M, Magee M. Physician empathy: Definition, measurement, and relationship to gen der and specialty. The American journal of psychiatry 2002; 159. 2. Caspi A, Silva PA Temperamental Qualities at Age Three Predict Personality Traits in Young Adulthood: Longitudinal Evidence from a Birth Cohort. Child Development l 995;66:486-98. 3 . McDonald NM, Messinger DS. The development of empathy: how, when, and why. In: Acerbi A Ll, Sanguinetti JJ, ed. Free will, Emotions, and Moral Actions: Philosophy and Neuroscience in Dialogue. Vatican City: !F-Press; 2011. 4. Roberts BW, DelVecchio WF. The rank-order consistency ofpersonal itytraits from childhood to old age: a quantitative review of longitudinal studies. Psychol Bull 2000;126:3-25 . 5. Derksen F, BensingJ, Lagro-Jansscn A. Effectiveness of empathy in general practice: a systematic review. The British journal of general practice : the journal ofthe Royal College of General Practitioners 2013;63 :e76-84. 6. Mercer SW, Higgins M, Bikker AM, et al. General Practitioners' Empathy and Health Outcomes: A Prospective Observational Study of Consultations in Areas of High and Low Deprivation. Annals of farnily medicine 2016;14:117-24. 7. Thirioux 8 , Birault F, Jaafari N. Empathy Js a Protective Factor of Burnout in Physicians: New Neuro-Phenomenological Hypotheses Regarding Empathy and Sympathy in Care Relationship. Frontiers in Psychology 2016;7:763. 8. Derksen F, Bensing J, Kuiper S, van Meerendonk M, Lagro-Janssen A. Empathy: what does it mean for GPs? A qualitative study. Farnily practice 2015;32:94-100. 9. Derksen F, Olde Hartrnan TC, van Dijk A, Plouvier A, Bensing J, Lagro-Jans.<en A. Conscquences ofthe presence and absence of empathy during consultations in primary care: A focus group study with patients. Patient education and counseling 2017;100:987-93. IO. Derksen F, Olde Hartrnan T, Bensing J, Lagro-Janssen A. Empathy in general practice-the gap between wishes and reality: comparing the views of patients and physicians. Farnily practice 2017. 11. Gleichgerrcht E, Decety J. Empathy in Clinical Practice: How Individual Dispositions, Gender, and Experience Moderate Empathic Concem, Bumout, and Emotional D istress in Physicians. PLoS ONE 2013;8 :e6 l 526. 12. Heje HN, Olesen F, Vedsted P. [Patients' asscssment of general practitioners. Association with type of practice]. Ugeskrift for laeger 2010;172:1 119-26. 13. Sara HK, Edward HOB, Courtney H. Changes in Dispositional Empathy in American College Students Over Time: A Meta-Analysis. Personality and Social Psychology Review 20I0;15:180-98. 14. Twenge JM, Carnpbell SM. Generational differences in psychological traits and their irnpact on the workplace. Journal of Managerial Psychology 2008;23 :862-77. 15. Hojat M, Vergare MJ, Maxwell K, et al. The devil is in the third year: a longitudinal study of erosion of empathy in medical school Academic Medicine 2009;84. 16. Kehn Z, Warner J, Walter JK, Feudtner C. Interventions to cultivate physician empathy: a systematic review. BMC medical education 2014;14:219. 17. Hojat M, Axelrod D, Spandorfer J, Mangione S. Enhancing and sustaining empathy in medical students. Medical teacher 2013;35 :996-1001. 18. Kataoka HU, Koide N, Hojat M, Gonnella JS. Measurement and correlates of empathy among female Japanese physicians. BMC medical education 2012;12:48. 19. Hojat M, Mangione S, Nasca TJ, et al. The Jefferson scale of physician cmpathy: Development and Prelirninary psychometric data. Educ Psychol Meas 200 I ;61 . 20. Hojat M, Gonnella JS, Mangione S, et al. Empathy in medical students as related to academic performance, clinical competence and gender. Medical education 2002;36. 21. Andersen CM. The association between attachment and delay in the diagnosis of cancer in prirnary care. 2015. 22. Di Lillo M, Cicchetti A, Lo Scalzo A, Taroni F, Hojat M. The Jefferson Scale of Physician Empathy: prelirninary psychometrics and group comparisons in ltalian physicians. Academic medicine : journal ofthe Association of American Medical Colleges 2009;84:1198-202. 23. Suh DH, Hong JS, Lee DH, Gonnella JS, Hojat M. The Jefferson Scale of Physician Empathy: A prelirninary psychometric study and group comparisons in Korean physicians. Medical teacher 20 I 2;34:e464-e8. 24. Chaitoff A, Sun B, Windover A, et al. Associations Between Physician Empathy, Physician Characteristics, and Standardized Measures of Patient Experience. Academic medicine : journal of the Association of American Medical Colleges 2017;92:1464-71. 25. Borracci RA, Doval HC, Nunez C, Sarnarelli M, Tamini S, Tanus E. Measurement of empathy arnong Argentine cardiologists: Psychometrics and differences by age, gen der, and subspecialty. Cardiol J 2015 ;22 :52-Q. 26. Hall JA, Dornan MC. Meta-analysis of satisfaction with medical care: description of research domain and analysis of overall satisfaction levels. Social science & medicine (1982) 1988;27:637-44. 27. Shariat SV, Eshtad E, Ansari S. Empathy and its correlates in lranian physicians: A prelirninary psychometric study of the Jefferson Scale of Physician Empathy. Medical teachcr 20 I 0;32:e4 I 7-e2 l . 28. Noyes R, Kukoyi OA, Longley SL, Langbehn DR, Stuart SP. Effects of continuity of care and patient dispositional factors on the physician-patient relationship. Annals of clinical psychiatry : official journal of the American Academy ofClinical Psychiatrists 2011;23 :180-5.

173

Page 183: Symposium i anvendt statistik 2018

29. Gray DP, Evans P, Sweeney K, et al. Towards a theory of continuity of care. Journal ofthe Royal Society of Medicine 2003;96:160-6. 30. Yuguero 0, Ramon Marsal J, Esquerda M, Vivanco L, Soler-Gonzalez J. Association between low cmpathy and high burnout among primary care physicians and nurses in Lleida, Spain. The European journal of general practice 2017;23 :4-10. 31. Lclorain S, Sultan S, Zenasni F, et al. Empathic concern and professional characteristics associated with clinical empathy in French general practitioners. European Journal of General Practice 20!3 ;19:23-8. 32. Soncini F, Silvestrini G, Poscia A, et al. Public Health Physicians and Empathy. Are we really empathic? The Jefferson Scale applied to Italian resident doctors in Public HealthFrancesco Soncin i. European Journal of Public Health 2013;23 :cktl 24.068-cktl 24.068. 33. Fortney L, Luchterhand C, Zakletskaia L, Zgierska A, Rakel D. Abbreviated mindfulncss intervention for job satisfaction, quality of life, and compassion in primary care cl in icians: a pilot study. Annals of family medicine 2013; 11 :412-20. 34. Walker R, Norbeck T, Price G, Libby R, Jones P. 2016 Survey of Amcrica's Physicians: Practice Patterns & Perspectives: The Physicians Foundation; 2016. 35. Cummings SM, Savitz LA, Konrad TR. Reported response rates to mailed physician questionnaires. Health services research 2001;35:1347-55. 36. Lin H-C, Xirasagar S, LaditkaJN. Patient perceptions of service quality in group versus solo practice clinics. International Journal for Quality in Health Care 2004;16:43745 . 37. Operationalisering af landdistriktsbegrebet. 2011. at http://static.sdu.dk/mediafilesÆ/B/Fl"/o7BEBF22FI 3-I 7ED41D9-AF4 D-1 OOBB39 A9DCE% 7DC LFReoort9landdistriktsbegrebet.odf.) 38. Krasner MS, Epstein RM, Beckman H, et al. Association ofan educational program in mindful communication with bumout, empathy, and attitudes arnong primary care physicians. Jama 2009;302:1284-93. 39. Ster MP, Selic P. Intended Career Choice in Family Medicine in Slovenia: An Jssue ofGender, Family Background or Empathic Attitudes in Final Year Medical Students? Materia socio-medica 2017;29:143-8. 40. Wilndrich M, Schwartz C, Feige B, Lemper D, Nissen C, Voderholzer U. Empathytrain ing in medical students - a randomized controlled trial. Medical teacher 2017;39:1096-8. 41. Asuero AM, Queralto JM, Pujol-Ribera E, Berenguera A, Rodriguez-Blanco T, Epstein RM. Effectiveness ofa mindfulness education program in primary health care professionals: a pragmatic controlled trial. The Journal of continuing education in the health professions 2014;34:4-12. 42. Shanafelt TD, Hasan 0, Dyrbye LN, et al. Changes in Burnout and Satisfaction With Work-Life Balance in Physicians and the General US Working Population Betwccn 2011 and 2014. Mayo Clinic proceedings 2015;90:1600-13. 43. Mercer SW, Fung CSC, Chan FWK, Wong FYY, Wong SYS, Mmphy D. The Chinese-version ofthe CARE Measure reliably differentiales between doctors in primary care: a cross-sectional study in Hong Kong. BMC Family Practice 20 11 ;12:43-. 44. Cataldo KP, Peeden K, Geesey ME, Dickerson L. Association between Balint training and physician empathy and work satisfaction. Fam Med 2005;37:328-31. 45. Wood W, Eagly AH. A cross-cultural analysis ofthe behavior of women and men: implications for the origins of sex differences. Psychol Bull 2002;128:699-727. 46. Zeldow PB, Daugherty SR. The stability and attitudinal correlates ofwarmth and caring in medical students. Medical education 1987;21:353-7. 47. Kje ldmand D, Holmstram I, Rosenqvist U. Balint training makes GPs thrive betler in their job. Patient education and counseling 2004;55 :230-5. 48. Pedersen KM, Andersen JS, Søndergaard J. General Practice and Primary Health Care in Denmark The Journal ofthe American Board of Farnily Medicine 2012;25 :S34-S8. 49. Pedersen AF, Carlsen AH, Vedsted P. Association of GPs' risk attitudes, level of empathy, and bumout status with PSA te sting in primary care. The British journal of general practice : the journal of the Royal College of General Practitioners 2015;65:e845-51. 50. Chen DCR, Pahilan ME, Orlander JD. Comparing a Self-Administcred Measure ofEmpathy with Observed Behavior Among Medical Students. Journal af General lnternal Medicine 2010;25 :200-2. 51. Glaser KM, Markham FW, Adler HM, McManus PR, Hojat M. Relationships between scores on the Jefferson Scale of physician empathy, patient perceptions ofphysician empathy, and humanistic approaches to patient care: a validity study. Medical science monitor: international medical journal of experirnental and clinical research 2007;13 :Cr2914. 52. Hojat M, Zuckerman M, Magee M, et al. Empathy in medical students as related to specialty interes~ personality, and perceptions ofmother and father. Personality and lndividual Differences 2005;39:1205-15. 53. Pedersen R. Empirical research on empathy in medicine-A critical review. Patient education and counseling 2009;76:307-22. 54. Hojat M, Spandorfer J, Louis DZ, Gonnella JS. Empathic and Syrnpathetic Orientations Toward Patient Care: Conceptualization, Measuremen~ and Psychometrics. Academic Medicine 2011;86:989-95.

174

Page 184: Symposium i anvendt statistik 2018

Blev vi klogere af mere datainformation? - og i givet fald hvad?

Helle M Sommer, Julie Krogsdahl og Mai Britt F Nielsen

1. Introduktion

I ca. 40 år har man i regi af SEGES Svineproduktion gennemført en lang række forsøg vedrørende den levende gris og formidlet viden herom for at hjælpe landmændene til at anvende de bedste og mest fremtidssikrede metoder. SEGES Svineproduktion arbejder med alt, der omhandler den levende gris, fra sædceller, stiindretning, sygdom, over foder til emission af miljøbelastende stoffer.

Et klassisk foderforsøg med slagtesvin udføres ved at sætte grisene ind tilfældigt i stierne. For hver behandlingsgruppe indgår der flere stier og behandlingsgrupperne startes op samtidigt/ 1 uge forskudt. Der registreres dato for indsættelse og afgang, desuden registreres kilo foder tildelt pr sti og grisenes tilvækst i stierne. Tilvæksten fremkommer ved at veje alle dyr i en sti samlet på en vægt både ved start og slut. Derdover registreres der, hvis der er grise, som er syge og tages ud af stien.

Data der blev brugt til nærværende statistiske analyser var ikke fra et klassisk foderforsøg i den forstand, at registreringerne foregik igennem en foderstation, der registrededen enkelte gris' vægt og foderforbrug i modsætning til på stiniveau.

Formålet med analyserne var, at finde frem til om vi bliver klogere på produktions­parameteren 'foderudnyttelsen', ved at have registreringer på enkeltdyrsniveau fremfor på stiniveau, og i givet fald hvad vi kan bruge den ekstra viden til. Foderudnyttelse beregnes som den konsumerede mængde foder i forhold til tilvæksten af de samme grise.

2. Data og materiale

Data, der blev benyttet i disse analyser, blev indsamlet på Grønhøj forsøgsstation, som er ejet af SEGES Svineproduktion. Data var en del af et større forsøg, der omhandlede alternative fasefodringsmetoder til slagtesvin. Ved fasefodring benytter man flere foderblandinger for at tilgodese slagtesvinenes forskellige behov for f.eks. protein og aminosyrer i forskellige vægtintervaller. Det store forsøg var dimensioneret til at have 85 gentagelser pr. behandling. En mindre del af forsøget var på enkeltdyrsniveau, og det er dette datasæt, som benyttes i nærværende analyser.

Data bestod af 168 grise, fordelt på 12 stier og 2 runder. Hvor hver sti bestod af 14 grise. Forsøget startede når grisene individuelt blev vejet ind i den enkelte sti, med en vægt i intervallet 28 - 39 kg. Grisene, der indgik i første runde, blev alle vejet ind i

175

Page 185: Symposium i anvendt statistik 2018

stierne den 5. december 2016, og grisene i anden runde blev vejet ind i stierne den 20. marts 2017.

Hver runde bestod af 6 stier, hvor to var kontrolstier uden fasefodring (Behandling 1 ), to stier fik en fodring med tre perioder affasefodring (Behandling 2), og de sidste to stier fik en fodring med fem perioder af fasefodring (Behandling 3 ).

Tabel 1 viser fordelingen af antal grise for hver sti. Hver runde er opdelt i to hold i forhold til køn, og grise inden for en runde har samme forhistorie. Som det fremgår af tabel 1 var der tre stier med sogrise og tre stier med galtgrise1 i første runde, og det samme var gældende i anden runde. Desværre var der forhold, som gjorde, at sti 6 kun havde forsøg fra Behandling 1.

Tabel 1: Antal grise fordelt på behandling, runde (R), køn og sti

Køn

Antal grise indsat So Galt

i stierne Sti Sti

2 3 4 5 6

I Behandling I4 (R2) I4 (RI) I4 (RI),14 (R2)

2 I4 (R2) I4 (RI) I4 (RI) I4 (R2)

3 14 (RI) I4 (R3) 14 (R2) I4 (RI)

Figur 1. Elektronisk foderstation, der både måler vægt af grisen samt det spiste foder.

1 Galtgrise er hangrise som er blevet kastreret.

176

Page 186: Symposium i anvendt statistik 2018

Der var for hver sti opstillet en elektronisk foderstation (NEDAP, Pig Performance testing) uden baglåge, se figur 1. Alle grise havde fået påsat øremærke med individuelt nummer, på den måde kunne foderstationen registrere den enkelte gris, der gik ind i foderstationen. Hver gang en gris gik ind i foderstationen blev der registreret data om dyrets individuelle øremærkenummer, tidspunkt for indgang, længden af opholdet i stationen, grisens vægt samt vægten af spist foder.

Udover grisenes vægt og foderforbrug, blev det også registreret hvis en gris blev taget ud af stien. Her blev det registreret hvilken gris (øremærkenummer), der blev taget ud, hvad grisen vejede, samt hvilken dag den blev taget ud og årsagen for udtagningen.

2.1. Beskrivelse af datasæt brugt til analyser

Da enkelte dyr var taget ud af forsøget undervejs pga. sygdom, var der ikke en slutvægt på disse dyr, men en vægt ved udtagning. Derved indeholdt det endelige datasæt en startvægt og slutvægt/udvejningsvægt samt foderforbrug for hver enkelt gris, også for de syge grise. Dette datasæt var derved på enkeltdyrsniveau, og omtales som individ-datasættet med alle dyr.

Da vi fra individ-datasættet ved hvor meget den enkelte gris har spist, var det muligt at frasortere de syge grise ved analyserne. I traditionelle forsøg indgår foderforbrug på sti-niveau, og det er derfor ikke muligt at konstruere et datasæt, hvor de syge dyr ikke indgår. Dette er muligt, når data er på enkeltdyrsniveau. Der blev derfor konstrueret to yderligere individ-datasæt, et som udelukkende bestod af de raske dyr, som alle nåede til slagtning og et som udelukkende bestod af de syge dyr.

Herefter blev alle tre individ-datasæt med alle dyr, raske dyr og syge dyr opsummeret på stiniveau. Hvilket i praksis betød, at der blev summeret en samlet startvægt, en samlet slutvægt og et samlet foderforbrug for den enkelte sti. Disse datasæt var derved på stiniveau, og omtales som sti-datasættet med alle dyr.

Normalt kan man ikke adskille foderforbruget for de raske grise fra foderforbruget for de syge grise, da man kun kender det samlede foderforbrug for hele stien. Man vil da kun kunne lave analysen for alle grisene i stien dvs. med det totale foderforbrug og den samlede tilvækst inklusiv tilvæksten for de syge grise indtil de bliver vejet ud af stien. I dette tilfælde, hvor sti-data er genereret udfra individ-data, har vi for undersøgelsens skyld lavet sti-datasæt hhv. kun for de syge og kun for de raske grise.

177

Page 187: Symposium i anvendt statistik 2018

3. Modeller

Ved modellering af data på sti-niveau og på individ-niveau blev der anvendt mixed lineære modeller, hvor Sti var underordnet Køn og indgik som en tilfældig variabel, og Runde indgik som en systematisk effekt, da der kun var 2 niveauer.

Derudover indgik Behandling, Køn, grisenes Startvægt, samt diverse tovejs­vekselvirkninger i det omfang, det var muligt at inkludere i modellerne.

For sti-datasætter blev følgende model analyseret:

Ysti = asti +Behandling+ Startvægtsti +Køn+ Runde+ Esti

hvor a er interceptet og Y er foderudnyttelsesværdien, og residualleddet E regnes for at være normalfordelt.

For individ-datasætter blev følgende model analyseret:

Ygris = agris +Behandling+ Startvægtgris+ Køn+ Runde+ Sti(Køn)

+Beh.* Køn+ Køn* Startvægtgris+ Beh.* Startvægtgris+ Egris

Analyserne blev foretaget i SAS EG vha proceduren Proc Mixed og med justering af antallet af frihedsgrader vha. ddfrn=Satterthwaite, idet modellen indeholder forskellige varians er.

For både sti- og individ-datasættene blev der foretaget 3 analyser, 1) hvor kun de raske grise indgik i datasættet, 2) hvor kun de syge grise indgik og 3) hvor alle grisene indgik.

4. Resultater

Resultaterne er opdelt på sti-datasættene og på individ-datasættene i næste to afsnit.

4.1. Sti-datasættet

Da der kun er 12 observationer på sti-niveau giver det ikke mening at estimere tovejs vekselvirkninger, idet modellen da er overparametriseret. Runde og Behandling var ikke signifikant (hhv. p=0,50 og 0,77) og blev taget ud af modellen. Startvægt_sti og Køn blev bibehold i modellen (tabel 2), da disse var signifikante i nogle af de efterfølgende modeller. Ved en analyse af outliers var det tydeligt, at specielt en sti havde en forholdsvis stor værdi af 'Restricted Likelihood Distance' samt af 'Cook's Distance'. Denne sti, som i øvrigt også var den sti med det højeste antal syge grise, blev fjernet fra datasættet.

178

Page 188: Symposium i anvendt statistik 2018

Tabel 2: Test af systematiske effekter. En 'outlier sti' er taget ud af datasættet.

Effect NumDF DenDF F Value Pr> F ! I

Startvægt_ sti 4 O.Ql 0.9704 I

Køn 4 8.06 0.0469 1

Tabel 3: Estimater af systematiske effekter

Effect Køn Estimate Standard Error DF t Value Pr> ltl

Intercept 2.6493 1.0413 4 2.54 0.0637

Startvægt 0.000087 0.002195 4 0.04 0.9704

Køn So -0.1148 0.04043 4 -2.84 0.0469

Køn Galt 0

I tabel 3 ses det, at galtgrisene havde en højere foderudnyttelsesværdi end sogrisene. En høj foderudnyttelsesværdi betyder i praksis dårlig foderudnyttelse, idet der går mere foder til pr. kg tilvækst gris.

Da vi senere ønsker at sammenligne foderudnyttelsesværdieme baseret på de forskel­lige datasæt (raske, syge, alle, med eller uden outliers), og da startvægtene i de forskellige datasæt ikke er ens, er der lavet estimater af foderudnyttelsesværdien for hhv. en so og en galt med en startvægt på 30 kg, se tabel 4. Tilsvarende estimater se i tabel 5 for sti-datasættet, hvor outlier-stien ikke er ekskluderet fra datasættet.

Tabel 4: Estimater af foderudnyttelsesværdien. Sti-datasæt eksklusiv en 'outlier-sti' .

Label Estimate Standard Error DF t Value Pr > ltl

Foderudnyttelse 30kg 2.6283 0.1100 4 23 .89 <.0001

Foderudnyttelse 30kg, so 2.5709 0.1017 4 25.29 <.0001

Foderudnyttelse 30kg, galt 2.6857 0.1212 4 22.16 <.0001

Tabel 5: Estimater affoderudnyttelsesværdien. Sti-datasæt inklusiv en ' outlier-sti' .

Label Estimate Standard Error DF t Value Pr> ltl

Foderudnyttelse 30kg 2.4618 0.1193 5 20.63 <.0001

Foderudnyttelse 30kg, so 2.4326 0.1149 4 21.17 <.0001

Foderudnyttelse 30kg, galt 2.4910 0.1277 4 19.51 <.0001

179

Page 189: Symposium i anvendt statistik 2018

4.2. Individ-datasættet

Raske grise

Den tilfældige effekt Sti(Køn) var ikke signifikant (p=0,48), men da fjernelsen af denne kun ændrer p-værdierne på de resterende variable i modellen marginalt, lod vi den blive i modellen. I tabel 6 ses resultaterne for første kørsel af modellen med alle de deterministiske variable.

Tabel 6: Test af alle systematiske effekter samt vekselvirkninger

Effect NumDF DenDF FValue Pr :il

Behandling 2 138 1.31 0.2723 1 Startvægt 144 16.23 <.0001

I Runde 142 0.00 0.9672 i

Køn 144 0.36 0.5482 ' i

Startvægt*Køn 144 0.20 0.6526 1

Startvægt*Behandling 2 140 1.45 0.2385 !

Køn*Behandling 2 12.7 0.21 0.8127 1 ---~

Herefter blev de ikke-signifikante variable successivt fjernet fra modellen, hvor variablen med størst p-værdi blev fjernet først medmindre den indgik i en veksel­virkning. I slutmodellen var kun Startvægt og Køn signifikante, se tabel 7, og deres estimerede værdier er givet i tabel 8. Estimater affoderudnyttelsesværdien er givet i tabel 9. I tabel 8 ses det, at jo højere Startvægt jo højere foderudnyttelsesværdi. Galtgrisene havde en højere foderudnyttelsesværdi end sogrisene.

Tabel 7: Signifikante systematiske effekter i slutmodel

Effect Num DF Den DF F Value Pr> F

Startvægt

Køn

152

152

18.19 <.0001

5.33 0.0223

Tabel 8: Estimater af s stematiske effekter

Effect Køn Estimate Standard Error DF t Value Pr > itl

Intercept 1.7116 0.2303 152 7.43 <.0001

Startvægt 0.02887 0.006768 152 4.26 <.0001

Køn So -0.07371 0.03193 152 -2.31 ~0223 i

180

Page 190: Symposium i anvendt statistik 2018

Effect Køn Estimate Standard Error DF t Value Pr> ltl ·

_J Køn Galt 0

Tabel 9 Estimater af foderudnyttelse for en gris med startvægt på 30 kg, samt for so­og galtgrise (datasæt: raske grise)

Label Estimate Standard Error DF t Value Pr> ltl

Foderudnyttelse 30kg 2.5407 0.02952 152 86.06 <.0001

Foderudnyttelse 30kg, so 2.5038 0.03282 152 76.28 <.0001

Foderudnyttelse 30kg, galt 2.5775 0.03429 152 75.18 <.0001 ·---- - --

Syge grise

En observation skilte sig ud og havde en meget stor indflydelse på den overordnede analyse ('Restricted Likelihood Distance' var meget stor, 270.000). Denne gris havde kun en tilvækst på 0,1 kg og havde spist ca. 30 kg foder, hvilket bevirkede at foder­udnyttelsesværdien blev på 300. Grisen var syg og blev taget ud af forsøget.

Da der kun var 12 grise i datasættet for de syge grise, var det kun Behandling, Startvægt og Køn, der indgik i modellen. Ingen af de 3 variable var signifikante (p­værdierne for hhv. Behandling, Startvægt og Køn var hhv. 0,96, 0,54 og 0,42), men af hensyn til sammenligningen affoderudnyttelsesværdier er Startvægt og Køn bibeholdt i modellen.

Tabel 10: Estimater af foderudnyttelse for en gris med startvægt på 30 kg, samt for so­og galtgrise (datasæt: syge grise eksklusiv en outlier)

Label Estimate Standard Error DF t Value Pr > ltl

Foderudnyttelse 30kg 2.7722 1.0386 9 2.67 0.0257

Foderudnyttelse 30kg, so 2.2373 1.0546 9 2.12 0.0629

Foderudnyttelse 30kg, galt 3.3072 1.3648 9 2.42 0.0384

Det ses i tabel 10, at der er stor variation på disse estimater, men at det at sogrise har en lavere foderudnyttelsesværdi (baseret på data fra de raske grise), også ser ud til at kunne gælde her for de syge grise. Hvis outlieren bibeholdes ville variablene stadig ikke være signifikante, men estimaterne ville blive betydeligt større, både parameter­og spredningsestimateme, tabel 9. Foderudnyttelsesværdien for sogrise ville blive større end for galtgrise. Forskellen er imidlertid ikke signifikant (p=0,19) og skyldes udelukkende outlieren, som var en sogris, der havde en meget høj forderudnyttelses­værdi (ca. 300).

181

Page 191: Symposium i anvendt statistik 2018

Tabel 11: Estimater af foderudnyttelse for en gris med startvægt på 30 kg, samt for so­og galtgrise (datasæt: syge grise inklusiv en outlier)

Label Estimate Standard Error DF t Value Pr> iti

Foderudnyttelse 30kg 22.7077 43.5199 10 0.52 0.6132

Foderudnyttelse 30kg, so 38.2697 43.1717 10 0.89 0.3962

Foderudnyttelse 30kg, galt 7.1457 57.7705 10 0.12 0.9040

Alle grisene.

Outlieren med en foderudnyttelsesværdi på 300 (identificeret fra tidligere) tages ud af datasættet. Sti(Køn) er ligesom for datasættet kun med raske grise ikke signifikant, men tages ikke ud af modellen. Ved successiv reduktion af modellen findes slut­modellen, se tabel 12, hvor Køn ikke er signifikant men bibeholdes af hensyn til sammenligning med de andre analyser. Tages Køn imidlertid ud af modellen ender p­værdien for Startvægt på 0,026.

Tabel 12: Systematiske effekter i slutmodel

Effect Num DF Den DF F Value Pr > F

Startvægt

Køn

164

164

4.56 0.0343

1.74 0.1888

Tabel 13: Estimater af foderudnyttelse for en gris med startvægt på 30 kg, samt for so­og galtgrise (datasæt: alle grise eksklusiv en outlier).

Label

Foderudnyttelse 30kg

Foderudnyttelse 30kg, so

Foderudnyttelse 30kg, galt

Estimate Standard Error

2.5374 0.07962

2.4802 0.08803

2.5947 0.09324 --- - ------- - - -- -----

DF

164

164

164

t Value

31.87

28.17

27.83

Pr > iti

<.0001

<.0001

<.0001

Fra tabel 13 ses det, at i datasættet for alle grise er estimatet for foderudnyttelses­værdien ligeledes lavere for sogrise end for galtgrise, som også var tilfældet ved analyse af de raske grise og de syge grise.

Hvis outlieren bibeholdes ville ingen af variablene være signifikante (Startvægt 0,88 og Køn 0,35), og estimaterne, ville blive en del større, både parameter- og sprednings­estimaterne samt estimaterne på foderudnyttelsesværdien (tabel 12).

182

Page 192: Symposium i anvendt statistik 2018

Tabel 14. Estimater af foderudnyttelse for en gris med startvægt på 30 kg, samt for so-og galtgrise (datasæt: alle grise insklusiv en outlier).

Label Estimate Standard Error DF t Value Pr> ltl

Foderudnyttelse 30kg 4.8222 3.2509 165 1.48 0.1399

Foderudnyttelse 30kg, so 6.4922 3.5863 165 1.81 0.0721

Foderudnyttelse 30kg, galt 3.1523 3.8124 165 0.83 0.4095

5. Konklusion og diskussion

Analyserne med datasættet af raske grise på individ niveau gav de bedste resultater i forhold til analyserne på sti-niveau, selv efter fjernelsen af' outlier stien' . Analysen resulterede i flest signifikante variable (Startvægt og Køn), og Behandling havde den lavestep-værdi (0.41) i forhold til de øvrige analyser. Desuden var spredningen mindst både på parameterestimaterne og på foderudnyttelsesestimateme. Analyse på individ­datasættet var dog meget følsom over for outliere, hvorfor det er yderst vigtigt at undersøge for dette og allerhelst bør de syge grise ekskluderes fra datasættet.

Analyser på sti-datasættet er dog ikke i samme grad påvirket af at skulle ekskludere outliere. Dette ses ved at sammenligne estimaterne i tabel 5 og tabel 14. Forklaringen herpå findes i beregningen af responsvariablen. På sti-niveau summeres alt foderet og divideres med den samlede difference mellem slutvægt og startvægt, hvor der på individ-niveau kun beregnes på grisens foderindtag divideret med dens tilvækst.

f oderudnyttelses_sti = (f1 + f z + f 3 + f4 + ··· )/(tv1 + tv2 + tv3 + tv4 + ···)

Er man kun ude efter foderudnyttelsesværdien for en given sti er det måske ikke besværet værd at indsamle data på individ-niveau. En dansk svineproducent vil primært være interesseret i, hvor mange kg foder, der går til at producere tilvækst. Hvis der er enkelte dyr, der ikke vokser optimalt, er det en omkostning, som blot følger med. Et datasæt bestående afregistreringer på stiniveau vil derfor i de fleste tilfælde give mening, når der regnes i kr.

Data på enkeltdyrsniveau er derimod interessant, når man er interesseret i at foretage undersøgelser, som kan forbedre foderudnyttelsen, idet det da er afgørende at kunne identificerer signifikante behandlingsgrupper. Er der dyr der undervejs bliver syge og tages ud af stien, og hvor sygdommen ikke kan relateres til selve behandlingen, da er det vigtigt at kunne ekskludere disse dyr for at opnå bedre og mere præcise resultater.

183

Page 193: Symposium i anvendt statistik 2018

Blev vi klogere?

Ja vi har fået en indsigt i hvad der sker med de syge dyr - hvordan de på virker estimaterne. Vi har også erfaret, at det at have data på individ-niveau eksklusiv de syge dyr, bevirker, at p-værdieme bliver mindre og estimaterne mere præcise. At have data på individ-niveau kan fremadrettet bruges i dimensioneringen af forsøg, som så kan udføres på en kortere tidshorisont, da man da ikke behøver helt så mange gentagelser.

Vi har også fået en indsigt i den forholdsvis store spredning, der er mellem dyrene. De individ data vi har er faktisk på dagsmåling, som giver os endnu mere indsigt (figur 3).

Vægt, kg

110

100

90

80

70

60 +

50 +

40 cfj

30

0 10 20 30 40 50 Dag efter indsættelse

O Vægt, kg + Foder. kg

60 70

Foder, kg

5.0

4.5

4.0

3.5

3.0

2.5

2.0

1.5

1.0

80

Figur 3: Graf over en enkelt gris' vægt (cirkel) og foderforbrug (kryds) dag for dag

184

Page 194: Symposium i anvendt statistik 2018

How to predict the outcome of a football match

Sara Armandi

1 Introduction

The aim ofthis article is to try to put together a model which can predict the outcome of a

football match. This is done using the new Rapid Predictive Modeler task, which utilizes

the features ofthe SAS Enterprise Miner, but runs through familiar environments.

2 Rapid Predictive Modeler

The Rapid Predictive Modeler is an interface to go through predictive modeling tasks in a

familiar environment. It is a tool to create data miner models using ether SAS Enterprise

Guide, SAS Studio or the Add-ins in Microsoft Excel. Behind the scenes, a job is created

in SAS Enterprise Miner. Renee, this produet needs to be available. All models are made

using the Rapid Predictive Modeler Task, which can be accessed through the menu panes.

In Figure 1, it is shown how to access the Rapid Predictive Modeler in SAS Enterprise

Guide

Figure 1: Access the Rapid Predictive Modeler in SAS Enterprise Guide l'i'I SAS Enterprise Guide

File Edit V.ew Tasks Favorites Program Tools Help !&·tei·riilQ 8 l" ~ X ! ..,~ J CJ· Z.Proæss F1ow ·

Browse ...

s i,, Prooiss Flow l...Qsamlet1517

·----··-··-1----------------- ----Data Desaibe

Graph

li:! Query Builder T Where I Data • Desaibe • Graph • Anafyze • ! Export • Se

ANOVA

Regression

Multivariate

Survival Analysis

Capability

Control Charts

iiiiiii." Pareto Chart. ..

& Hjemmehold & Resullat

AGF 02-ian AaB OHan FC Kilbenhavn 3-0 FCMidtivtlancl 2-0 FC Nordsiælland 0 -2 OB 3-0 Brandbv IF Ol-feb EsbierqfB 01-feb HobrolK 0· 1 RalldersFC 3-0 SøllderjyskE 01..feb ViborQ FF 0-0 AGF 03.feb

Model Scoring ... Rapid---

& Udehold

Brnndby!F EsbjerQfB RandersFC Vibortl FF SenderjyskE HobrolK OB FC l<ilbenhavn AaB FC Nordsiællancl FC Midtjylland AGF RalldersFC

r;~;--r~~~ ~ Recency, Frequency, and Monetaiy Analysis ...

20 2015/2016 Esbjerql'B 04 SønderivskE 21 2015/2016 FC Køberlhavn OHan FC Nordsjælland 22 2015/2016 Hobro lK OHan ViborqFF 23 2015/2016 OB 02-fab AGF 24 2015/2016 RandersFC 03-mar Brøndby IF

Source: Screenshot from Enterprise Guide session.

185

li Tilsk & Oor

13437 Jens Mac 7764 Michael -

10195 Anders P 7232 Lars Chr 321 1 Anders P 7797 Jørqen O

11494 Kenn Hat 9255 Peter Ra 4686 Dennis ~ 3389 Benjamir. 5245 Peter Kjc 6318 Michael .

10532 Kenn Ha1 9498 Jakob Ke 6477 Mads--Kri 3267 Jens Gra 6242 Michael , 3557 Jørqen O 7077 Benjamir. 7500 Dennis ~

11722 Jakob Ke 3305 Peter Kjc 9932 Lars Chr 7412 Jens Gra

Page 195: Symposium i anvendt statistik 2018

3 Data

The analysis is based on data from the Danish Superliga. Superligaen is the best football championship tournament in Denmark. The winner ofthis league is announced as the best

football team in Denmark.

On the homepage www.danskfodbold.com, stat1st1cs on every match played in the Superliga is available from the 1991/1992 season until today. Information about the

toumament from the season 20li/2012 to 2016120 I 7 are scraped and are the foundation

ofthis analysis.

More detailed information about each game can be found on the homepage

www.superliga.dk. However, data are only available from the 2015/2016 and up till today. This data includes detailed information about each player, and shooting statistics.

Based on the data, a win-lose chart can be created, which visualizes which team performs

best in each season. Figure 2 shows a win-lose chart for season 2016/2017.

186

Page 196: Symposium i anvendt statistik 2018

Figure 1: Win-Lose Chart ofthe Danish Superliga 2016/2017

The Danish Superliga (2016/17) • • Rank/Team Wms Wm-Lose Cha1t

1. FC København 25 I li : I I 11 I 111 I :li I J li III I

2. Brøndby IF 18 11 I III : 1 ll I 1i I 1:111 li 11 I III I

3. LyngbyBK 17 III I 11: 1 111:1 11 l I 1111 1:

4. FC Midtjylland 15 I li i I It I 11!1 I 1: ~ I 11111 :

5 . Randers FC 14 '11 I : 1.i:1 li~ I li ; I I ' 1 li ; ." 1:

6. FC Nordsjælland 13 ;i I li I '. 1:1 I ~. : li I I I 11: 1111;

7. SønderjyskE 12 III I I I f I I 1:111 :111 I : 11111:

8. AGF 12 III i I ~ ,1,:11 ) I ~ li I j I I i:

9 . AC Horsens 11 :1 I I .i I t I 11~1 :11 : I 111: III 1:

10. 08 11 11 I III' 11 I 11!1 I : 1 I

: I I I 1:

I : •1 I

11 . Silkeborg IF 10 :1 11 · ,1 III 11;11 I I 1111

12.AaB 10 · 11 1111i I fl 11 :•1 11: 1 1 I : III

13. Viborg FF 9 I I I 111 : 11 i:1 I •• :1 I : I

I I :1•1 1:

14. Esbjerg 18 6 11 li ". I I I l 11• I'

"' "' "' "' "' ... ... ... ... ... ... 5 5 5 5 ~ 5 0 5 5 5 5 N

~ ~ N N N ~ ~ N N

"' > ~ z :l'. >- z

:::> V 0 !!i ~ ~ ~ =; < "' 0 z 0

;; 5 ;; ;; 5 ;; ;; ;; 5 5 0

Notes: Inspiration on how to create this chart is based on the example made by Robert Allison

(http ://robslink.com/SAS/democd32/soccer _ england _info.htm) Source: https://www.superliga.dk

To make further interesting analyses, data about the Danish weather, as well as data on difl'erent odds are added to the final data. The weather data is fetched from the homepage www.wunderground.com, and the odds from www.football-data.eo.uk/denmark.php.

187

Page 197: Symposium i anvendt statistik 2018

In-vitro fertilization and in-vitro embryo culture in mouse as a reprotoxicity model for xenobiotics: some considerations on how

to analyse data and present results

Leslie Foldager1'2*, Ying Liu1, Hanne Skovsgaard Pedersen1, Knud Larsen3, Kaja Kjær Kristensen3, Henrik Callesen1, Martin Tang Sørensen1

1 Department of Animal Science, Aarhus University, DK8830 Tjele, Denmark 2 Bioinformatics Research Centre, Aarhus University, DK8000 Aarhus C, Denmark 3 Department ofMolecular Biology and Genetics, Aarhus University, DK8830 Tjele, Denmark

* Email: [email protected]

Abstract

Natura! mating in rodents with litter size as end-point is often used for assessment of pesticide reprotoxicity prior to approval. However, a test system including in-vitro fertilization (IVF), using a reduced sperm concentration to achieve less than maximum fertilization, followed by in-vitro embryo culture (IVC) reveals the details offertilization and early embryonic development. Such a system could reduce the risk of false negative results, i.e. approval ofxenobiotics with reprotoxic side effects. Thus, we aimed to establish a mouse IVF/IVC system to evaluate reprotoxicity of xenobiotics, illustrated by two pesticides: vinclozolin and chlormequat. During the statistical inference ofthis experiment, we have had many discussions on how to analyse data and present results - not least for outcomes presented as proportions. These considerations will be the focus of the symposium talk whereas this paper presents the study and results brie fly.

Introduction

Xenobiotics, such as chemicals and pesticides, may result in adverse effects on reproduction in human and animals. Thus, performing assessments for potential toxic effects on e.g. human health including toxicity to reproduction and development are required. Typically, these use animal test models using suitable species like rat, mouse and rabbit. Nevertheless, it is important to understand that mice litters of normal size may be obtained even if sperm count is reduced by half (Al-Hamdani & Yajurvedi, 2010). This may potentially invalidate litter size from natura! mating as endpoint for approval of xenobiotics, as a reprotoxic effects might be blurred. In addition, human has a much lower sperm production (Amann, 1986) which implies that reduced semen quality and production would be much more serious in human as compared to animals.

Therefore, application of sensitive assessment tests is recommended to avoid approval of xenobiotics with reprotoxic side effects. Details of fertilization and early embryonic

188

Page 198: Symposium i anvendt statistik 2018

development are key for this and may be obtained by use of a complete in-vitro embryo production (IVP) system; including in-vitro maturation (IVM), fertilization (IVF) and embryo culture (IVC). Since overload of sperm may mask some defects, using lower sperm concentration may improve the sensitivity for detection of reproductive toxicology when applying IVF (Fielden et al., 2002). As a precursor in the present study, we therefore performed a systematic calibration of sperm concentration in an IVF/IVC system, using an outbred mouse strain. Using a breakpoint analysis, we determined the sperm concentration that furnishes a sensitive assessment of sperm fertilizing capacity in relation to reprotoxic evaluations.

The main aim of our study was to compare and evaluate two end-points in mouse (litter size and IVF/IVC) for assessment of reprotoxicity of xenobiotics. To illustrate this, we considered two pesticides: vinclozolin as a positive control, and chlormequat, as an under reprotoxic suspicion xenobiotics.

Vinclozolin is a dicarboimide fungicide that has been banned for pesticide use in the European Union (EU) and USA because of documented toxic effects on reproduction. Studies using animal models have indicated that exposure to vinclozolin may result in testis dysfunction or ovarian malfunction because of alterations of DNA methylation (Anway et al., 2005; Hou et al., 2012). Moreover, other studies also showed compromised fertility after vinclozolin exposure (Guerrero-Bosagna et al., 2012; Eustache et al., 2009). Nevertheless, another study found no effect of maternal vinclozolin exposure on spermatogenesis, DNA methylation or fertility in first generation males (Inawaka et al., 2009). We therefore used vinclozolin as positive control.

Chlormequat ( chlorocholine chloride, CCC) is a growth regulator used for plant production, and exposure with high levels may affect reproduction in mammals (Sorensen & Danielsen, 2006). Moreover, studies have shown reduced fertilization and cleavage rates of oocytes incubated with spermatozoa from mice fed with diets mixed with CCC (Tomer et al., 1999), whereas no effect was observed in treated females (Langhammer et al., 1999). Using CCC intake below levels considered safe for human, no effect on pig semen quality was found in a study using IVF and artificial insemination (Sorensen et al. 2009a). In another study, however, a decreased fraction of live sperm was found with increasing chlormequat residue in CCC-treated wheat (Sorensen et al. 2009b ). Therefore, chlormequat remains under suspicion for detrimental effects on semen quality.

Manuscripts presenting the study have been submitted and are currently in preparation for submission, and we will exclude many details on lab methods and some results from the present paper. During the statistical inference ofthis experiment, we have had many discussions on how to analyse data and present results - not least for outcomes presented as proportions. These considerations will be the focus ofthe symposium talk.

189

Page 199: Symposium i anvendt statistik 2018

Materials and Lab Methods

All chemicals were purchased from Sigma-Aldrich Co. (St. Louis, MO, USA), unless otherwise stated. Before experimental start, a license was obtained from the Danish Animal Experiments Inspectorate (license no. 2014-15-0201-00420).

Mice

Naval Medical Research lnstitute (NMRI) male mice were used for both parts of the study. NMRI is an outbred mouse strain and therefore expected to show a larger between individuals variation than inbred strains. Though outbred strains are rather uniform compared to variability within species, this may be more representative to diversity of the human population. NRMI mice have been widely used in toxicology studies, for produet safety testing and for reprotoxicity studies (Kallio et al. 1986, Nishino et al. 1991, Tomer et al. 1999, Hafizi et al. 2014). NRMI females were also used for the natura! mating parts ofthe reprotoxicity study.

For sperm calibration and for the IVF/IVC part of the reprotoxicity study we used females from the inbred C57BL/6J mouse strain, also known as black 6 as it has a dark brown almost black coat. In our lab, females from this strain were experienced to have a high response to superovulation and a high rate of morphologically good oocytes (unpublished data), and in addition this strain is known to have a high reproductive performance compared to other inbred strains (Silver, 1995). Moreover, we chose an inbred strain to minimise variation from the female side since the main interest was on male fertility.

NMRI males (8-10 weeks old), NMRI females (6-8 weeks old) and C57BL/6J females (3 weeks old) were purchased from Taconic Biosciences (Silkeborg, Denrnark). All mice were housed in our standard laboratory animal facility under controlled light conditions (12 h light: 12 h dark). For the sperm calibration, mice were housed for 1-5 weeks prior to being sacrificed for the experiment. For the reprotoxicity study, mice were housed for up to 29 weeks before being sacrificed.

Sperm Calibration Study

C57BL/6J donor females (3.5-4 weeks old) were superovulated and approximately 63 h later sacrificed by cervical dislocation. Thereafter, oviducts were removed to collect cumulus-oocyte complexes (COCs). Donor males (9-15 weeks old) were sacrificed within 1 h before the females, and sperm from caudae epididymides were collected. Nine different sperm concentrations were used for IVF (le4, 2.5e4, 4e4, 5e4, le5, 5e5, le6, 1.5e6 and 2e6 sperm/ml), and 3-4 ofthese from each male.

Sperm was added to fertilization drops imrnediately after collection of COCs. COCs were kept separately from each female donor, and 1-2 females were used per sperm

190

Page 200: Symposium i anvendt statistik 2018

concentration per donor male. After further 4-5 h of lab procedures, oocytes were evaluated and only morphological good oocytes were selected for further culture. The morphological bad oocytes were counted and discarded. During the next 96 h of lab work, pronuclei were evaluated after 5 h, two-cells after 24 h, and blastocysts after 96 h. This period is referred to as IVC start.

The number of morphological good oocytes with visible pronuclei was measured to determine the proportion of morphological good oocytes having been penetrated by sperm, while oocytes developed to the two-cell stage were measured to indicate the proportion of zygotes capable of cleaving. The count of two-cell oocytes was used in the statistical analysis determining the sperm fertilizing capacity. Finally, blastocysts were counted as the proportion developing from two-cell stage to indicate if the IVC system works well. The IVF/IVC protocol was modified from a previous report (Byers et al. 2006). In the present paper, we only show results using two-cell counts in morphological good oocytes.

Reprotoxicity Study

The two pesticides were mixed into diets to obtain concentrations of 40 ppm (VLow) and 300 ppm (VHigh) for vinclozolin, and 900 ppm (CLow) and 2700 ppm (CHigh) for CCC. These concentrations were equivalent to no or lowest observed adverse effect level after natura! mating in rats (NOAEL or LOAEL). In addition, a control feed free of pesticides was mixed (Ctrl). Daily feed intake was not measured but the mice had free access to feed. Figure I gives an outline of the experimental design and mating

2 NMRI 9 Until birth Litter size and birth CS~B~6J 9 (.f'!.'f'\.. .............., weight of pups superovu ate , mature ~;j}i) oocytes recovered

Natura I Natura! mating From mating

)~':ii~ ~··t mating to -------• .i'{):~ ••-•••·• -----l~ " ~ I·~ ~· --- + weaning · · · ~~ --

9 2 males/ Exp. d Min. 35 Sperm litter days recovered

for 8-10 orccc weeks t

H• Fed with vinclozolin

...,. Fed with ctrl Sperm evaluation

Figure 1. Experimental design ofthe reprotoxicity study.

strategy.

In vitro fertilization and in vitro

embryo culture

We produced experimental males (Fl) by naturally mating 7-11 weeks old NMRl females (FO) with 8-20 weeks old NMRI males. During the mating period (1-5 days), FO females (and mating males) were fed control feed. When pregnancy was established (observation of vaginal plug), the mated female was fed one of the five

191

Page 201: Symposium i anvendt statistik 2018

diets throughout pregnancy and until weaning of the offspring. This way 1-4 F 1 males were obtained from each litter that had been exposed to low dose (VLow or CLow), high dose (VHigh or Cttigh), or no pesticide (Ctrl) from conception until weaning. After weaning, the experimental males continued with the same pesticide or control feed

Each experimental Fl male (8-10 weeks old) was natura! mated with 1-2 unexposed NMRI females. During the mating period (for 1-5 days), the mice were fed control feed. After the mating period, Fl males returned to experimental diets for at least 35 days prior to IVF. Natura! mated females were fed with control feed until parturition to evaluate litter size and birth weight ofpups (F2).

The IVF/IVC part was carried out with C57BL/6J females (3.5-4 weeks old) fed with control feed and using the same procedures as described for the sperm calibration study above. The experimental Fl male mice were 16-29 weeks old when sacrificed.

After capacitation, sperm from one side of epididymis was used for IVF and for evaluation of motility and viability, while sperm from the other side was used for investigation ofDNA methylation. Moreover, daily sperm production was determined after both testes from each male had been dissected and weighed. To save space, we omitted these results from the present paper.

Statistical Methods

Statistical analyses were carried out using R version 3.3.1 (R Core Team, 2016). A significance level of 5% was chosen, and 95% confidence intervals were obtained by profile likelihood method unless otherwise state.

Sperm Calibration Study

To determine an optimal sperm concentration to be used for the reprotoxicity study, a breakpoint analysis was carried out with segmented negative binomial regressions using the R package segmented (Muggeo, 2003; Muggeo, 2008). We used the log-link version ofthe negative binomial generalised linear model (referred to as NB2 by Hilbe, 2011) with two-cell counts as response, log ofmorphological good oocytes as offset, and adjusted for the count of morphological bad oocytes. Recently we discovered threshold regression models that may account for uncertainty of the threshold estimate (Fong et al., 2017), a linear mixed model version of segmented (Muggeo et al.; 2014, Muggeo, 2016), and Bayesian hierarchical piecewise regression models (Buscot et al., 2017). Nevertheless, for the breakpoint analysis used in the present study we disregarded the possible correlation between observations from the same male donor: 1-2 counts (females) for each of3-4 sperm concentrations.

The estimated breakpoint was then used to define a piecewise log-linear negative binomial generalized linear mixed model including loglO sperm concentration and

192

Page 202: Symposium i anvendt statistik 2018

number of morphological bad oocytes as fixed effects, log number of morphological good oocytes as offset, and Male-id as random effect to account for repeated use ofthe same male mouse. The slope for logi 0-concentration was specifically allowed to change at the estimated breakpoint. This model was used to determine a prediction curve for two-cell proportion of morphological good oocytes with 95% confidence bands over the whole range of sperm concentrations. For this prediction, we used the mean number of bad oocytes from the entire study.

Reprotoxicity Study

For Fl (experimental males) generation, litter size was investigated by a quasi-poisson regression accounting for a clearly significant under-dispersion (dispersion estimate of 0.34) as identified by the dispersion-test (z=-10.0, p<le-15) with the dispersiontest function from the AER package (Kleiber & Zeileis, 2008). Feed (treatment) group specific back-transformed mean estimates and 95% confidence intervals of litter sizes were obtained by inclusion of treatment group (Ctrl, CLow, Cmg1i, VLow, Vmgh) as a factor. The overall treatment effect (difference in litter size between treatment groups) was tested against the nul! model by an F 4,53-test. It may be noted, that linear normal models also fitted reasonably well; not only were coefficients exactly equal to the back-transformed coefficients from quasi-poisson (and poisson) but the confidence intervals were also almost identical to those from quasi-poisson regressions (whereas those from poisson were a bit wider).

Anormal linear mixed effects model with random effect of litter (aka Mother-id, i.e. FO females) was fitted to examine birth weight of Fl males using the lmer function from the lme4 package (Bates et al., 2015). Gaussian assumption for residuals and for litter random effect was checked with normal qq-plot and found to be acceptable. Restricted maximum likelihood (REML) was used for estimation of means and confidence intervals whereas maximum likelihood (ML) estimation was used for testing overall treatment effect by a four degrees of freedom chi-squared ( xt ) likelihood ratio test.

For analyses of the F2 generation, mean litter size was estimated by a normal linear mixed effects model with treatment group as fixed effect and Mother-id (FO females) as random effect. Male-id (Fl males) was not included as random effect, because this effect was tiny and induced a non-positive definite approximate variance-covariance matrix. Use of the normal distribution obviously is an approximation as litters size is an integer. As for the Fl, litter size under-dispersion was observed but unfortunately, quasi-poisson models including random effects seem to be unavailable. We compared results from quasi-poisson regression with normal regression without the random effect of Mother-id and found that: 1) the (back-transformed) mean estimates were identical and 2) the 95% confidence intervals had a maximum absolute difference of 0.03 corresponding to an approximately 0.25% relative difference. Moreover, control

193

Page 203: Symposium i anvendt statistik 2018

plots of residuals were acceptable. Mean birth weight was estimated by a normal linear mixed effects model including treatment group as fixed effect and Mother-id and Litter-id (aka Female-id within Male-id) nested in Mother-id as random effects. Here Female-id identify the female mouse used for natura! mating with Fl males. The random effect ofMale-id within Mother-id was very smalland therefore lefl: out. Both for litter size and birth weight, REML was used for estimates and confidence intervals while ML was used fortesting overall treatment effect by xi-test.

The pregnancy success for females natura! mated with Fl males was examined by a binomial generalised linear model. Overall difference between treatment groups was examined with a xi-test. Inclusion of correlation structure due to same mother (FO female) for Fl males (2-4 mates) and/or same Fl male (2 mates) was not possible -probably because ofa low degree ofvariation.

lVF and embryo development using sperm from Fl males was studied by use of poisson (pronuclei and two-cell per morphological good oocytes, and blastocyst per two-cells) or negative binomial (blastocyst per morphological good oocytes) generalized linear mixed effects models with log link function and random effects of Mother-id (FO female) and Male-id (Fl male) nested in Mother-id. To have well­defined log-offsets we excluded samples with zero morphological good oocytes and zero two-cells, respectively. However, for blastocysts per good oocytes (but not per two-cells) the random effect of Mother-id was either set to zero, very small, or prevented convergence of the estimation procedure and therefore only Male-id was included for that analysis. Tests of the overall difference between treatment groups were carried out using xi-test. Comparisons between groups were by Wald's test and presented as rate ratio with normal approximation 95% confidence intervals. All treatments were compared to controls and the high-dose treatment was compared to low-dose treatment within pesticide type.

Results

Sperm Calibration Study

Sperm from 20 NMRI males and 3,416 oocytes from 195 C57BL/6J females were used for IVF/IVC. Tue quartiles (lower, median and upper) oftwo-cell proportions of morphological good oocytes are shown for various sperm concentrations in Figure 2.

Figure 2 also show the result of the breakpoint analysis, which revealed a maximum two-cell proportion (51%, 95% Cl: 38-69%) at 3.6e4 sperm/ml (95% Cl: 2.l-6.le4). For future application of the described IVF/IVC reprotoxicity model, a sperm concentration lower than this breakpoint concentration is required in order to be within the respons i ve range for determination of sperm fertilizing capacity. In the reprotoxicity study, we used a concentration of 2.5e4 sperm/ml but it may be necessary to adjust this value depending on the chosen mouse strain.

194

Page 204: Symposium i anvendt statistik 2018

c: 0 t: 0 c.. ~ c..

0.7

0.6

0.5

0.4

0.3

0.2

0.1

,.. --/ -------- j__

- - -- - --- --- --400 0 10 000 100 000 20 0000

25000 50000 500000 1500000

Sperm concentration ( log10(conc/10000))

Figure 2. Sperm calibration. Two-cell proportion of morphological good oocytes as a function of sperm concentration. Medians, lower and upper quartiles for each applied sperm concentration are shown as dashes on the vertical bars. The curve is predicted from a piecewise log-linear negative binomial generalized linear mixed model on the continuous loglO-concentration of sperm, see details in Statistical Methods. Tue slope changes at the estimated breakpoint (determined before by the breakpoint analysis), which is indicated with 95% confidence interval by the grey horizontal bar.

Reprotoxicity Study

Number of Fl litters (FO females), their mean size, and average birth weights of the pubs from these litters are shown in Table 1 (left part). We found no significant differences between treatment groups. Similarly, we found no significant differences in mean litter size or mean birth weight between treatment groups in F2 generation, Table 1 (right part). Here we also show number females (1-2 per Fl male), number of F l males and the total number of pubs from their corresponding litters. Pregnancy results from natura! mating Fl males with NMRI females are shown in Table 2.

Table 3 present back-transformed estimates and 95% confidence intervals of fertilization rate ( oocytes with pronuclei and two-cells per morphological good oocyte) and embryo development (blastocysts per good oocytes and per two-cells). Comparisons between groups measured as rate ratio are shown in Table 4. Rates for V High tended to be decreased compared to both Ctrl and VLow for all measured stages of development. Obviously, however, these tendencies are not significant after correction for multiple testing.

195

Page 205: Symposium i anvendt statistik 2018

Tabte 1. Litter size and birth weight of Fl and F2 generation. Fl Iitter size modelled by quasi-poisson regression with back-transformed mean estimates and 95% confidence intervals shown as L95Meanu95. Normal linear mixed effects models were used for birth weight of F 1 males and litter size for both F 1 and F2 generations, see details in Statistical Methods. Total-row is from null model and p-value shown above the mean is for the overall treatment effect (between groups) tested by F4,5rtest (Fl titter size) or xl-tests.

F 1 generation ( exp. males) F2 generation

Litter Pubs Litter Pubs

Gro up N Size N Weight N (Np1) Size N Weight

Ctrl 15 12.413.514 6 136 1521.60169 48 (29) 12012.713J 600 1.581.621.66

CLow 10 11913.2146 94 1571.661.76 32 (18) 12513.314.1 413 1541.59164

CHigh 12 12.013 .214.5 107 uol.59168 34 (21) 11.8 12.613.4 426 1551.591.64

VLow IO 11112.413.8 72 1611.71132 34 (19) I 1.812.613.4) 426 155 1.591.64

V High 11 12.113.4147 104 1541.63172 31 (17) l l.612.513J 387 1571.631.68 0.81 0.41 0.62 0.72

Total 58 12613.2137 513 1591.63168 179(104) 12.412.713.1 2252 1591.61163

Tabte 2. Natural mating with Fl males. Proportion offemales pregnant when natura! mated with Fl males shown for each pesticide feed group. A binomial generalised linear model was applied and a xl -test of an overall difference between groups showed that the reduction to the null model (Total row) was non-significant (p=0.65).

Gro up NFI males N remales Pregnant Not pregnant Ctrl 29 57 48 (84%) 9 (16%)

CLow 18 36 32 (89%) 4 (11%)

CHi•h 21 42 34(81%) 8 (1 9%)

VLow 19 38 34 (89%) 4(11%)

VHioh 17 34 31 (91%) 3 (9%) Total 104 207 179 (86%) 28 (14%)

Discussion

The sperm calibration part was recently submitted for publication and is the first study that systematically calibrates sperm concentration in a mouse IVF/IVC system for the purpose of male reprotoxic assessments of xenobiotics. The general function of our IVP system was quite satisfying with rates of fertilization and blastocysts within an acceptable range. As shown in Figure 2, instead of plateau, there was a slight decrease in two-cell rates using higher sperm concentration (> 1 e6 sperm/ml). This may be caused

196

Page 206: Symposium i anvendt statistik 2018

Tabte 3. IVF and embryo development using sperm from Fl males. Back­transformed estimates and 95% confidence intervals from poisson (pronuclei and two­cell per morphological good oocytes, and blastocyst per two-cells) or negative binomial (blastocyst per morphological good oocytes) generalized linear mixed effects models, see details in Statistical Methods.

Gro up NF1 Nfemale Noocytes Pronuclei Two-cell Blastocyst BL/2-cell

Ctrl 28 96 2620 0 390 .4 7 0.56 0.400 .4 7 0 57 0190.260 36 0 450.540 66

CLow 18 56 1438 0.360.460 58 0 380.480 60 0 180.270.41 0.420.540.69

CHigh 20 67 1925 0_350.440 54 0360.450 56 0. I 60 .22o.3 I 0.380.47058

VLow 18 67 2008 0 380.48059 0380.480 60 0.180.270.41 0_390.5 lo 66

V High 17 60 1542 0280.360.45 0300.380.48 0. 110.160.23 0_320.4 lo 53

Total 101 346 9533 0.400.440.49 0.410.450.so 0.200.23028 0_440.500_55

Table 4. Comparison of IVF and embryo development between treatment groups using sperm from Fl males. Rate ratio with 95% confidence intervals from poisson (pronuclei and two-cell per morphological good oocytes, and blastocyst per two-cells) or negative binomial (blastocyst per morphological good oocytes) generalized linear mixed effects models, see details in Statistical Methods. Overall differences between groups were examined by xl-tests whereas pairwise comparisons were by Wald's tests.

Comparison

CLow VS Ctrl

CHigh VS c

VLow VS Ctrl

v ljigh vs ctrl

Pronuclei Two-cell

0.16 0.580. 7911

Blastocyst (BL) BL/Two-cell

022 0 570.8011

by the lower number of males used for higher sperm concentration and thus a larger variation. We conclude that a relatively low sperm concentration (2.5e4 sperm/ml, close to the Jower 95% confidence limit) is a precondition in a mouse IVF/IVC system to detect eventual reprotoxic effects on sperm fertilizing capacity. Our study illustrates the systematic approach needed to introduce and validate such an in vitro system used for reprotoxicity testing. Doing this, it may be of interest to account for the repeated

197

Page 207: Symposium i anvendt statistik 2018

use ofthe same Fl males and FO females. As noted in the Statistical Methods section, recent papers indicate that this may be possible. Methods that should consider further for future experiments.

Reprotoxicity Study

The results from the reprotoxicity study showed the necessity ofIVF/IVC to detect an adverse effect oflifelong exposure to high-dose vinclozolin in experimental males. We observed no effect of high-dose vinclozolin by evaluation of pregnancy success of unexposed females, litter sizes, or birth weights of offspring. In addition, evaluation of sperm motility and viability and of daily sperm production showed no effect of high­dose vinclozolin (results not shown).

This indicate that a combined IVF and IVC system is more sensitive than natura! mating and sperm evaluation methods, where previous study had shown the sensitivity of IVF. In our model, the toxic effect of high-dose vinclozolin was seen from fertilization (pronuclei rate ratio (RR) of0.76 for VHigh vs Ctrl, 95% Cl: 0.57-1.0) and continued with blastocyst formation (RR=0.60, 95% Cl: 0.37-0.98). High-dose vinclozolin two-cell rate also tended to be lower than in controls (RR=0.80, 95% Cl: 0.60-1.1) and for blastocyst rates from two-cells (RR=0.76, 95% Cl: 0.55-1.0). Our study is the first to compare sensitivity directly between an IVF/IVC system and a system based on litter size after natura! breeding for toxic effect on sperm. The sensitivity of our model relied on using a sperm concentration within the responsive range as determined by the sperm calibration study.

Although the high-dose of vinclozolin we used is lowest observed adverse effect level (LOAEL) on reproductive toxicity, we only detected adverse effect of high-dose of vinclozolin in IVF/IVC but not in litter size and birth weight. The explanation could be that we used NMRI mice while others used rats as the animal model. Tomer et al. (1999) showed a decreased fertilization using sperm from NMRI male mice exposed to low-dose eec in feed and water, but we did not observe significant differences between our two doses of eec and the group fed with Control feed. The possible reason could be that we used C57BL/6J females as oocytes donors for IVF to obtain higher number of oocyte after superovulation, whereas Tomer et al. used NMRI females.

Our results indicate that a negative effect of the pesticide vinclozolin on the reproductive system of male mice can be detected in an IVF/IVC system but not by natura! mating or conventional sperm assessments. We also examined DNA methylation but we refrained from treating this part in the present paper. Nevertheless, the conclusion from the our study is that a more sensitive toxicity system should be considered as the basis for safety tests of xenobiotics and their potential effects on

198

Page 208: Symposium i anvendt statistik 2018

human health and the environment. We expect to submit results from the reprotoxicity study for publication within a month or so.

Acknowledgement

The authors thank Klaus Villemoes, Anette M. Pedersen and Mette Jeppesen for excellent technical assistance. The work was supported financially by a grant from The Danish Council for Independent Research/Technology and Production Sciences (grant no. 4005-00256) and from our Department of Animal Science.

Conflicts of interests

The authors declare that they have no conflicting interests.

References

Al-Hamdani NM & Yajurvedi HN (2010). Ecotoxicol Environ Saf 73(5):1092-7. Amann RP (1986). Environ Health Perspect 70:149-58. Anway MD et al. (2005). Science 308(5727):1466-9. Bates Det al. (2015). Journal ofStatistical Software 67(1), 1-48. Buscot MJ et al. (2017). BMC Med Res Methodol 17(1 ):86. Byers SL et al. (2006). Theriogenology 65(9): 1716-26. Eustache F et al. (2009). Environ Health Perspect 117(8): 1272-9. Fielden MR et al. (2002). Endocrinology 143(8):3044-59. Fong et al. (2017). BMC Bioinformatics 18:454. Guerrero-Bosagna C et al. (2012). Reprod Toxicol 34( 4):694-707. Hafizi Let al. (2014). Cell J 15:310-315. Hilbe JM (2011). Negative Binomial Regression. Cambridge University Press. Hou Let al. (2012). Int J Epidemiol 41(1):79-105. Inawaka Ket al. (2009). Toxicol Appl Pharmacol 237(2):178-187. Kallio Set al. (1986). Cancer Chemother Pharmacol 17: 103-108. Kleiber C & Zeileis A (2008). Applied Econometrics with R. Springer-Verlag. Langhammer Metal. (1999). J Anim Physiol An N 81:190-202. Muggeo VMR (2003). Statistics in Medicine 22:3055-3071. Muggeo VMR (2008). R News 8(1):20- 25. Muggeo VMR et al. (2014). Statistical Modelling 14(4):293- 313 Muggeo VMR (2016). Working Paper available from https//www.researchgate.net. Nishino Y et al. (1991). J Endocrinol 130(3):109-114. R Core Team (2016). https://www.R-project.org/. Silver L (1995). Mouse Genetics. Concepts and Applications. Oxford University Press. Sorensen MT & Danielsen V (2006). Int J Andrology 29:129-133. Sorensen MT et al. (2009a). Animal 3:697-702. Sorensen MT et al. (2009b). P. 51-52 in: 3rd Int. Congress on Food and Nutrition. Tomer H et al. (1999). Reproductive Toxicology 13: 399-404.

199

Page 209: Symposium i anvendt statistik 2018

44 års opfølgning fra Copenhagen Male Study (CMS). Hvem overlever alder 85 år?

Cand. stat. Hans Bay Det nationale Forskningscenter for Arbejdsmiljø (NFA).

e-mail: [email protected]

Abstrakt: Copenhagen Male Study (CMS) er et studie af midaldrende mandlige danske arbejdstagere fra offentlige og private arbejdspladser, som startede i 1970-71. I det følgende er analyseret 4. 553 fra CMS, med henblik på at finde faktorer, der giver et langt liv. De indledende analyser viser, at søvn kan være medvirkende til at man lever længe. Men især god kondi, lavt blodtryk og forsagelse af rygning, alkohol og kaffe tyder på have stor indflydelse på, hvor længe man kan forvente at leve.

Data: Copenhagen Male Study (CMS) er et studie af midaldrende mandlige danske arbejdstagere fra offentlige og private arbejdspladser, som startede i 1970-71. Oprindeligt blev 6.125 personer, som var i arbejde i København, inviteret. Alle disse inviterede var født i perioden 1910 til 1932. Af de oprindeligt inviterede deltog 5.249 i den første undersøgelse, som bl. a omfattede måling af kondital og blodtryk. Endvidere blev der spurgt til søvnlængde, rygning, alkohol, kaffedrikning, motion etc. Efterfulgende er der af forskellige årsager ekskluderet yderligere 696 personer, så det analyserede materiale omfatter 4.553 personer. Alle 4.553 personer har haft mulighed for at blive 85 år opgjort pr. 14/11. Alderen på de deltagende personer var mellem 40 og 60 år ved den første undersøgelse i 1970/71. Personerne fik udleveret et spørgeskema og blev efterfølgende testet i perioden 1970/71. Testen omfattede måling af blodtryk og en konditest på cykel. Stikprøven på ca. 6 .000 må beskrives som værende relativ stor. (I Oktober 2015 var der ca. 71.000 mænd i alderen 40-60 år i København). Personerne var ansat på 14 virksomheder omfattende: "railway, public road construction, military, post, telephone, customs, national bank, and the medical industry (Garde2013)1". Deltagerne er inddelt i nedenstående tre kategorier.

Tabel 1. Inddeling af de 4.553 personer efter alder ved død. kategori beskrivelse antal Kat. 1 Død før alder 65 (oplever derfor ikke pension) 606 Kat. 2 Død mellem alder 65 og 85 2.610 Kat. 3 Overlever alder 85 1.337 I alt 4.553

På NF A har der været en særlig interesse i at undersøge søvnens betydning. I spørgeskemaet fra 1970/71 er der en række spørgsmål om rygevaner, motionsvaner, nuværende smerter samt et enkelt spørgsmål om søvn.

200

Page 210: Symposium i anvendt statistik 2018

Tbl2F dl" f 0 1 ° d a e or e mg a søvn spørgsma et pa e tre k ategoner kat.l kat. 2 kat. 3

CMS 70171 død før alder 65 mellem 65 Og 85 overlever alder 85 i alt under 6 timer 46 146 60 252 6-7 timer 466 2 .009 1.057 3.532 8-9 timer 87 442 218 747 over 9 timer 7 11 2 20 i alt 606 2.608 1.337 4.551

Note: 2 m1ssmgs.

I de efterfølgende analyser er 'over 9 timer' sammenlagt med '8-9 timer, da der er så relativt få personer, der har svaret at de sover mere end 9 timer i døgnet. (Denne variabel kaldes i det følgende for sleep). Med 3 kategorier for søvn bliver chi-i-anden værdien 9,1 hvilket giver en p-værdi på 5,8 %. Så noget tyder på, at det er bedre at sove længe. Andelen der overlever alder 85 for personer, der angiver at de sover mindre end 6 timer, er ca. 24 %, mens den for de to øvrige kategorier er over 28 %.

Analyse: søvns indflydelse på et langt liv. I første omgang analyseres data med 3 modeller, som omtales lidt senere. Variablen der skal forklares er "kaf' som er en kategorisk variabel, der angiver om man dør før alder 65, mellem 65 og 85 eller overlever 85 år. Som forklarende variable betragtes variablen sleep, som er en kategorisk variabel, der angiver hvor megen søvn man får i døgnet. Der medtages også variablen yr, som angiver fødselsår efter 1900. yr betragtes som en kontinuert variabel der ligger mellem 10 og 29. yr kan opfattes som udtryk for en "healthy worker effect". På undersøgelsestidspunktet i 1970/71 har personerne en alder mellem 40 og 60 år. De 60 årige har dermed overlevet 20 år mere end de yngste på 40 år. Variablen yr skal forsøge at korrigere for dette.

I model (1) og (2) bruges den traditionelle logistisk regression (med to udfald). Mens model (3) er en ordinal logistisk regression (med tre udfald). Model (1) undersøger sandsynligheden for at dø inden alder 65. Model (2) ser på sandsynligheden for at dø før alder 85. Mens model (3) er en kombination af model (1) og model (2). Dvs. model (3) ser på sandsynligheden for at dø inden alder 65 eller mellem alder 65 og 85 eller overleve alder 85.

Model (1)

[ P(die bef ore age 65) ]

log 1 - P(die befare age 65) = f301 + f311yr + f312sleep + Y1CYr * sleep)

201

Page 211: Symposium i anvendt statistik 2018

Model (2)

[ P(die bef ore age 85) ]

log (d. b f 85) = /302 + /321Yr + f322sleep + Y2(yr * sleep) 1 - P ie e ore age

Model (3)

[ P(die befare age k) ]

log (d . b f k) = /3ok + /31yr + /32sleep + y3(yr * sleep) k 1 - P ie e ore age

= 65 or 85

I model (3) er det kun de to intercepter /301 /302 , der bevirker at sandsynlighederne bliver forskellige

Tb 13 a e . resu tater fi orto kl . k 1 .. k d' 11 .. k ass1s e og1stis e regress10n o~ en or ma og1st1s regression Model I Model I Model I Model 2 Model2 Model2 Model 3 Model 3 Model 3

effect DF Chi-

DF Chi-

DF Chi-

sauare p square p square

yr 1 20,1 <0,0001 1 0,02 0,89 1 4,72 sleep 2 2,14 0,34 2 0,33 0,85 2 2,1 yr*sleep 2 2,16 0,34 2 0,34 0,84 2 2,12

For alle tre modeller gælder at vekselvirkningen bliver insignifikant (p-værdien bliver større end 34 %). Så i det følgende udelades vekselvirkningen.

Analyse 2, Her gentages de tre analyser, men uden vekselvirkning.

I model (1) og (2) bruges den traditionelle logistisk regression. Mens model (3) er en ordinal logistisk regression.

Model (4)

[ P(die bef ore age 65) J

log 1 - P(die bef ore age 65) = f3o 1 + f311yr + f312sleep

Model (5)

[ P(die bef ore age 85) J

log 1 - P(die bef ore age 85) = /302 + /321Yr + f322sleep

202

p

0,03 0,35

0,35

Page 212: Symposium i anvendt statistik 2018

Model (6)

[ P(die bef ore age k) ]

log 1 _ P(die bef ore age k) =flok+ f31yr + f32sleep k = 65 ar 85

I model (6) er det kun de to intercepter /301 /3oi. der bevirker at sandsynlighederne bliver forskellige. Så model (6) bygger på den antagelse at søvn (og fødselsår) har samme relative effekt på de to sandsynligheder.

Tabel 4: Estimater fra model (4), model (5) og model (6).

Model (4) Model(S) Model (6)

Odds Ratio Estimates Odds Ratio Estimates Odds Ratio Estimates. 3 categories

Die before 65 years Die before 85 years Die before 65 years, Die between 65 and 85,

(N=l.553) {N=4.553) Die after 85 years

(N=4.553)

Point 95% Wald Point 95% Wald Point 95% Wald variabel Estimate Confidence Limits Estimate Confidence Limits Estimate Confidence Limits

yr 1.058 1.039 1.078 0.994 0.982 1.007 1.013 1.001 1.024 (fødselsår)

6-7 timer 0.654 0.467 0.915 0.73~ 0.545 0.991 0.700 0.544 0.901 mod

under 6 timer

Over 8 timer 0.603 0.409 0.889 0.782 0.563 1.088 0.713 0.535 0.944 mod under 6 timer

I model (6) er der den antagelse, at det kun er de to intercepter /301 /Joi. der bevirker at sandsynlighederne bliver forskellige. Denne antagelse er testet med et chi-i-anden test (teststørrelsen bliver på 41,5 som med 3 frihedsgrader giver en p-værdi <0,0001). Med det store antal observationer, som bevirker at testens styrke bliver meget stærkt, er det besluttet at opretholde model (6).

Konklusion på de ikke justerede modeller er at, de, der sover mere end 6 timer i døgnet, lever længere end de, der sover mere end 6 timer i døgnet. Samt at der er en "healthy worker effect". Nedenstående estimaterne fra Model (6) vist er grafisk.

203

Page 213: Symposium i anvendt statistik 2018

Figl: Grafisk repræsentation af odds ratio estimater fra model( 6)

yrl

sleepl 2 vs I

sleepl 3vs I

0.6

Andre variable inddrages.

Odds Ratios with 95% Wald Confidence Limits

0.7 0.8

Odds Ratio

0.9 LO

Nu inddrages en del af de mange andre variable, der er indsamlet ved den første undersøgelse i 1970/71 . Personer har fået målt deres blodtryk, vægt og højde samt deltaget i en konditest. En række spørgsmål har spurgt ind til respondenters rygevaner, alkoholvaner mm. Ud fra besvarelserne i spørgeskemaet er der foretaget en beregning af hvor megen tobak personen ryger om dagen, tilsvarende med alkohol og kaffedrikning. Der er brugt en tilsvarende omregningsformel hos Jensen(2017)2.

204

Page 214: Symposium i anvendt statistik 2018

T b 15 And a e : . bl fr d . 1970171 re vana e a un ersøge sen 1

"Kontinuerte" variable før65 65-85 85+ p

KAT=l KAT=2 KAT=3 N=606 N=2610 N= l337

32,3 32,4 33,7 kondi (7,1) (7,1) (7,3) < 0,0001 map=DIASH+(l /3)*(SYSH- 103,2 100,8 98,4 DIASH). Et samlet mål for blodtryk (14,6) (13,1) (11 ,6) < 0,0001

138,8 135,6 132,l SYSH, blodtryk I højre arm (21,1) (19,2) (17,4) < 0,0001

85,5 83,4 81,5 DIASH, blodtryk I højre arm (12,9) (11, 7) (10,4) < 0,0001

7,6 5,8 2,7 Cigaret i gram (9,5) (8,4) (5,8) < 0,0001

1,1 1,3 0,8 Cigar i gram (4,0) (4,5) (3,4) 0,0021

3,3 3,3 2,3 Cerut i gram (6,9) (6,8) (5,2) < 0,0001

7,3 6,9 5,3 Pibe i gram (5,3) (9,4) (8,4) < 0,0001

19,3 17,3 11,2 tobak i alt i gram (1 1,9) (12,9) (12,0) < 0,0001

5,3 5,0 4,3 kaffe (3,2) (3,1) (2,7) < 0,0001

2,1 1,7 1,2 genstande (2,2) (1,7) (1 ,4) < 0,0001

76,9 77,7 76,6 vægt ( 11,5) (10,5) (9,2) 0,0053

173,8 174,5 174,9 højde (6,6) (6,5) (6,4) < 0,0001

25,4 25,5 25,0 BMI (3,5) (3,1) (2,6) 0,0024

59,1 76,0 88,9 alder i år (4,9) (5,6) (3 ,0)

1922,5 1921,0 1921,5 year ofbirth (yr) (4,6) (5,1) (5,0) < 0,0001

Man bemærker, at der er signifikante forskelle for alle variables vedkommende, dette skyldes ikke mindst den store stikprøve. Ovenstående variable en enten målt som kontinuerte variable (højde, vægt etc.) eller er omregnet til en form kontinuert variabel. I

205

Page 215: Symposium i anvendt statistik 2018

den kommende tabel er vist en række variable fra spørgeskemaet, der ikke er omregnet. Disse betegnes som "diskrete" variable. Endvidere er respondenterne inddelt i tre socialgrupper (se Garde2013). De spørgsmål der optræder i Tabel 6 vedrører primært spørgsmål om motion (enten på jobbet eller i fritiden eller på vej til job). Der er også medtaget et spørgsmål om brug af beroligende tabletter.

Tabel 6: "diskrete" variable fra undersøgelsen i 1970/71.

variabel før 65 65-85 85+ p DF Diskrete/ klassifikations variable KAT=! KAT=2 KAT=3 andel N=606 N=2610 N=l337 andel i socialgruppe 3 63,5 57,0 46,8 <0,0001 4 hyppig motion + sved 30,5 30,5 33,3 0,19 2 megen motion 11,1 11,2 10,2 0,39 4 dyrker idræt*) 15,3 19,6 24,9 <0,0001 2 i bil til arbejde 42,6 45,0 47,3 0,13 2 over 8 timers søvn 15,5 17,4 16,5 0,06 4 aldrig røget 3,6 7,1 15,0 <0,0001 2 beroligende tabletter, regelm. 6,0 6,0 4,6 0,46 4

Note: *)Dyrker De nogen form for sport eller idræt i Deres fritid? Ga/nej)

Analyse 3,

Der bruges nu alene en ordinal logistisk regressionsanalyse.

Model (7) l [ P(die bef ar e age k) ] + + l og 1- P(die befare age k) =flok f31yr flzS eep + {33X3 + {34X4 + ··· +

k = 65 or 85, I angiver variablene fra tabel 5 og fra tabel 6.

206

Page 216: Symposium i anvendt statistik 2018

Tabel 7: Estimater for model (7)

Odds Ratio Estimates 3 categories

Die befure 65 years Die between 65 and 85 Die after 85 years

(N=4.481)

Point 95% Wald Effect Estimate Confidence Limits

Yr 1.029 1.017 1.042

kondi 0.986 0.977 0.995

map 1.021 1.016 1.026

Ryg_g 1.030 1.024 1.035

kaffe 1.046 1.025 1.067

genstande 1.123 1.085 1.161

Ryg_n 1.498 1.201 1.868 Røget mod aldrig Ingen sport 0.828 0.713 0.960 mod sport soc 2 vs 1 1.166 0.973 1.397

soc 3 vs 1 1.534 1.299 1.811

6-7 timer mod under 6 0.835 0.645 1.082 timer Over 8 timer mod 0.822 0.617 1.096 under 6 timer

Man bemærker, at nu bliver søvnvariablen (sleep) insignifikant. Model (7) vil klassificere 58,6 % afrespondenteme korrekt.

Nedenstående er vist en grafisk repræsentation af estimaterne

207

Page 217: Symposium i anvendt statistik 2018

Fig. 2. Grafisk repræsentation af odds ratio fra model (7)

yrl

KONDI

rmp

ryg_g

genstande

kaffe

socgrp 2 vs 1

socgrp 3 vs 1

ryg_n 0 VS ]

sport 1 vs 2 -sleepl 2 vs 1 -sleepl 3 vs 1

0.75

Odds Ratios with 95% Wald Confidence Limits

le!

~

• • r---1

1.00

f--+--1

1.25

Odds Ratio

-

1.50 1.75

Konklusion: I modeller med høj grad af justering er der ikke længere sammenhæng mellem kort søvn og kort levetid. Derimod finder vi at deltagerne med god kondi og lavt blodtryk, som ikke ryger og motionere i fritiden lever længere. Modellen viser også at de, der drikker mindre alkohol og kaffe også lever længere. Flere af disse faktorer er dog korreleret med søvn, hvilket kan være medvirkende til, at søvn ikke længere er signifikant.

Referencer

1 Garde, Hansen, Holtermann, Gyntelberg, Suadicani. : Sleep duration and ischemic heart disease and all -cause mortality: Prospective cohort study on effects of tranquilizers/hypnotics and perceived stress. Scand J Work Environ Health. 2013;39(6):550-558

2 Jensen, Holtermann, Bay, Gyntelberg: Cardiorespiratory fitness and death from cancer: a 42-year follow-up from

the Capenhagen Male Study. Br J Sports med. 2017, 1364-1369.

208

Page 218: Symposium i anvendt statistik 2018

Nyheder i SAS Analytics 14.3

Anders Milhøj

Department ofEconomics, University ofCopenhagen

Øster Farimagsgade 5, DK-1353 København K

[email protected]

I efteråret 2017 blev Analytical Produets i den opdaterede version 14. 3 sendt på mar­kedet. Denne opdatering indeholder, som så mange gange før, interessante opdaterin­ger af de analytiske programpakker inden for statistik, økonometri, operationsanalyse etc. Disse opdateringer er løsrevet fra opdateringer af det samlede SAS-program, så det er stadig Base SAS, version 9.4, der anvendes.

SAS's nyere analytiske releases

Version 9.4 af Base SAS med Analytical Updates 14.1, som udkom sommeren 2015, blev midt i november 2017 opdateret for anden gang, nu til version 14.3 af Analytical Produets. I dette indlæg fokuseres på version 14.3, der som sædvanligt indeholder en del flere nyheder, end en ændring på første decimal ellers skulle antyde. I mine mange tidligere symposieindlæg er tidligere opdateringer bekrevet.

Kilderne til disse nyhedsoversigter er er SAS-hjælpen, som kan tilgås af alle - også uden en SAS-installation - via http://support.sas.com/, idet manualerne for SAS­pakkerne STAT, ETS. OR, QC er offentligt tilgængelige for alle, fx ved at Google.

Øget tilgængelighed til SAS

Der er nu frigivet to muligheder for gratis og simpelt at afvikle SAS.

SAS University Edition stilles gratis til rådighed for "alle", der påstår, at de studerer ved et universitet. Den markedsføres som SAS-U. Denne udgave kan uden videre an­vendes på en Mac! Det er jo en mulighed, der har været efterspurgt længe i universi­tetsverdenen, hvor Mac's markedsandel kan være over 50%. SAS-U kan downloades fra

http://www.sas.com/da dk/software/universitv-edition.html

Der kræves en virtualization software pakke, se

http://www.sas.com/da dk/software/universitv-edition.html#m=system-reguirements

209

Page 219: Symposium i anvendt statistik 2018

En ulempe er, at kun SAS-BASE, STAT pakken og IML (matrix-regning) er 100% med i SAS-U; dog ikke High Performance procedurerne med præfixet HP. Desuden er dele afETS (økonometri og tidsrækkeanalyse) pakken inkluderet fra og med somme­ren 2014. Det er de dele afETS, der kan betegnes som "data-scientist" procedurer i modsætning til videnskabelige, professionelle procedures. Det betyder, at PROC ESM til forudsigelser og PROC UCM, som er en del af indholdet i vores symposieindlæg dette år om salget af specialøl forudsagt ud fra social media data, er medtaget i SAS-U. Derimod er fx PROC V ARMAX og PROC X 13 ikke medtaget.

SAS on Demand for Academics er en serverløsning, hvor brugeren afVikler SAS­sessionen online på en fjernt beliggende server; helt som i gamle dage, hvor main­frames var eneste mulighed - der er dog desværre ikke længere mulighed for at anven­de hulkort! En lærer kan oprette kurser og derved stille data til rådighede for de stude­rende, dertilmelder sig kurset. Men der er også mulighed for "individual Iearners", så hvem som helst kan faktisk afVikle SAS-sessioner af den vej. Ikke alle hjørner af SAS er med, men både hele STAT pakken og hele ETS pakken er heldigvis 100% med, så SAS on Demand dækker behovet for alle mine SAS-kurser.

For yderligere oplysninger om SAS on Demand, se

https://www.sas.com/en us/software/on-demand-for-academics.htm

SAS og Big Data

Listen over HP (High Performance) procedurer, som kopierer de gængse SAS proce­durer, udvides stadigt. De regner hurtigt, og de udnytter maskinfigurationen fuldt ud. Fx kan de regne multi threadet, hvis maskinen indeholder flere processorer eller be­regningerne kan fordeles over geografiske adskilte CPU-er. De stilles til rådighed for almindelige SAS brugere i simple PC installationer. Derved kan visse ekstra raffine­menter udnyttes af alle som fx, at der i PROC HPREG, i modesætning til i PROC REG, stilles en CASS statement til rådighed. Det kræver specielle licenser at udnytte faciliteterne til distribueret kørsel af SAS-programmer.

SAS Viya

SAS Viya is an extension to the SAS platform that supports high-performance analyti­cal data preparation, variable transformations, exploratory analysis, analytical modeJ­ing, integrated model comparison, and scoring. The foliowing table describes the main components of SAS Viya. Y ou can write SAS programs using the syntax available for any produet that you have Iicensed and installed.

SAS Viya: The third generation ofhigh-performance, in-memory analytics that in­cludes SAS Cloud Analytic Services and SAS Visual Analytics. Additional software

210

Page 220: Symposium i anvendt statistik 2018

can be licensed separately. It is an open, high-performance, resilient, cloud-ready ex­tension to the SAS platform.

SAS Cloud Analytic Services (CAS): Tue analytic engine for SAS Viya. CAS uses high-performance, multithreaded analytic code to rapidly process requests against data ofany size.

SAS Visual Analytics and SAS Visual Statistics

SAS Visual Data Mining and Machine Learning:

SAS Econometrics: A set of functionality that provides techniques to model complex business and economic scenarios and to analyze the dynamic impact that specific events might have over time.

SAS Visual Forecasting: Provides automatic variable, event, and model selection. It then automatically generates your forecasts.

SAS Optimization: Aset of procedures for exploring models of distribution networks, production systems, resource allocation problems, and scheduling problems using the tools of operations research.

Disse pakker kan studeres i SAS-hjælpen og interesserede kan følge med på sas.com/viya.

STAT nyt

Der er kommet en ny procedure, PROC CAUSALMED, som undersøger mediatoref­fekter i kausale modeller. Dette er en videreudvikling af de to procedurer, PROC CAUSALTRT og PROC PSMATCH, der blev omtalt ved januar 2017 symposiet. PROC CAUSALMED anvendes i et eksempel senere i dette skriftlige indlæg.

PROC CAUSAL TRT beregner effekten af en behandling variabel på en kontinuert eller en diskret resultatvariabel. PROC PSMATCH indeholder en række værktøjer til matching med propensity scores.

Øvrige væsentlige ændringer i STAT

The GAMPL procedure now supports the Tweedie distribution.

In PROC FREQ, the COMMONRISKDIFF option in the TABLES statement provides estimates, confidence limits, and tests for the overall risk (proportion) difference for multiway tables.

211

Page 221: Symposium i anvendt statistik 2018

The IRT procedure now supports the nominal response model, which enables you to do item analysis of nominal responses.

The NLMIXED and MCMC procedures add a CMPTMODEL statement that fits com­partment models in pharmacokinetic analysis.

The PHREG procedure provides the cause-specific proportional hazards analysis for competing-risks data.

The QUANTREG and QUANTSELECT procedures provide fast quantile process re­gression.

Desuden er der kommet bootstrap metoder til beregning af konfidensintervaller i SURVEY-suiten af procedurer til analyser af stikprøvedata samt i PROC TTEST. Dis­se muligheder demonstreres senere i dette skriftlige indlæg.

ETSNyt

The TMODEL procedure is a new, experimental version ofthe MODEL procedure. The code that you use to perform nearly all analyses in PROC MODEL can be used without changes in PROC TMODEL. However, PROC TMODEL incorporates high­performance computational techniques and offers new features that enhance the func­tionality of PROC MODEL.

The foliowing features are available in PROC TMODEL:

Estimation and simulation ofmodels that use panel data when you specify cross­sectional variables in the CROSSSECTION statement

estimation of models that contain non linear random-effects parameters when you iden­tify cross-sectional variables in the input data

use of analytic expressions for Hessian matrices in the optimization process for most estimation methods by default

use ofthe nonlinear programming (NLP) solver available in SAS/OR software for per­forming the optimizations during estimation tasks

PROC TMODEL can execute many analyses faster than PROC MODEL through the use ofmultiple concurrent calculation threads.

New features have been added to the foliowing SAS/ETS components:

PANEL procedure

QLIM procedure

212

Page 222: Symposium i anvendt statistik 2018

SSM procedure

UCM procedure (multiple cycle components)

V ARMAX procedure (mere om fraktional integrated models)

IMLNyt

SAS/IML 14.3 supports several new features:

The SAS/IML language supports new syntax for defining and manipulating lists

You can transfer data between SAS/IML tables and R data frames by using the Ex­portTableToR subroutine and the ImportTableFromR function.

You can analyze complex-valued time series data by using new functions for time­frequency analysis

In addition, there are new functions, new graphics, and enhancements to existing func­tions:

The DISTANCE function supports an optional argument to compute distances be­tween observations in two different sets.

The TABLECREATE function supports creating a table from multiple matrices.

In this release, the traditional graphics functions and statements in SAS/IML are dep­recated and have been removed from the documentation.

QCNyt

In SAS/QC 14.3, the RAREEVENTS procedure can produce rare events charts that have distinct sets of probability limits for different phases of observations. The phases are defined by the values ofthe character variable _PHASE_. Tue CHART statement supports several options related to phases.

ORNyt

Several optimization salvers have been updated in SAS/OR 14.3 and improve their performance. The LP, MILP, QP, and NLP solver algorithms all reduce the overall solution time.

I version 14.2 kom en ny OPTGRAPH procedure, der stiller algoritmer inden for graf­teori, kombinatorisk optimering og netværksanalyse til rådighed. Licensen følger des­værre ikke med OR licensen for universiteterne, så jeg har ikke afprøvet mulighederne endnu.

213

Page 223: Symposium i anvendt statistik 2018

Bootstrap i PROC TTEST

I PROC TTEST kan i 14.3 konfidensintervaller for et gennemsnit og for differenser mellem to gennemsnit for to variable beregnes ved hjælp af bootstrap.

I Symposiet i Anvendt Statistikjanuar 1991 holdt jeg et indlæg om stikprøver fra en lille højreskæv population. Det anvendte datasæt var præmieindtægter fra 97 danske gensidige skadesforsikringsselskaber vist nok i 1989. Dette datasæt var domineret af to meget store selskaber Tryg og Almindelig Brand, med præmieindtægter dengang på ca 1.5 mia. kr" mens de fleste af øvrige selskaber var bittesmå, fx "Arbejdernes Billed­rør" med en præmieindtægt på under 100.000 kr. og "Fur Brand" med præmieindtægt på kun 30.053 kr. som det allermindste.

Et histogram over denne kraftige højreskæve fordeling kan faktisk ikke tegnes. Gen­nemnittet er 58.457.371 mens medianen blot er 1.951.049. Et sædvanligt konfidensin­terval for gennemsnittet beregnet ved hjælp af PROC TIEST er

Mean 95% CL Mean

58.457.371 12.801.426 104.110.000

Dette interval er symmetrisk, dannet som gennemsnittet plus/minus ca. 46 mio. kr. De simpleste metoder til bootstrapestimation er blot at erstatte standardafvigelsen estime­ret på sædvanlig måde ved kvadratorden af s2 med standardafvigelsen beregnet ved bootstrap. På den måde afspejler længden af konfidensintervallet fordelingen af de oprindelige observationer bedre end ved den implicitte normalfordelingsantagelse bag ved de traditionelle konfidensintervaller. Men intervallet bliver på denne måde sym­metrisk, og er derfor ikke velegnet i dette eksempel.

En mulighed er at basere konfidensintervallet på den helt sædvanlige t-test størrelse som beregnet med brug af s2; men så benytte fraktiler fra bootstrapestimation fra t-test størrelsen i stedet for fraktiler fra t-fordelingen. Der findes matematiske beviser for, at denne metode har overlegne egenskaber. Som default benyttes 10.000 bootstrap repli­kationer, så en simpel kode er

proc ttest data =wrk.forsik_ bad; var praem; bootstrap/bootci=boott; run;

Parameter Bias 95% CL

Mean -124.680 24.574.324 201.230.000

Da gennemsnittet jo var ca. 58 mio. kr" ses, at konfidensintervallet er stærkt asymme­trisk.

214

Page 224: Symposium i anvendt statistik 2018

Ud fra selve tankegangen bag bootstrap er The Percentile Interval det mest intuitive. I dette konfidensinterval beregnes konfidensintervallet blot ved at lade endepunkterne være de empiriske fraktiler i fordelingen afbootrapestimateme for gennemsnittet. Denne fremgangsmåde er intuitiv fornuftig, og der opnås ikke nødvendigvist et sym­metrisk interval.

proc ttest dat a=wrk.forsik_bad; var praem; bootstrap/bootci=percentile; run;

Parameter Bias 95% CL

Mean 461.188 20.454.482 111.340.000

Da gennemsnittet jo var ca. 58 mio. kr., ses, at konfidensintervallet er asymmetrisk.

Den intuitive fremgangsmåde kan udvide med forskellige korrektioner for bias, hvoraf "Bootstrap Bias-Corrected Percentile Interval" er default i PROC TIEST.

proc ttest data=wrk. forsik_ bad; var praem; ods sel ect bootstrap; bootstrap/bootci=BC; run;

Parameter Bias 95% CL

Mean 368.178 22.723.815 115.390.000

Da gennemsnittet jo var ca. 58 mio. kr., ses, at konfidensintervallet er asymmetrisk.

Stikprøver fra den lille højreskæve population

Som i symposieindlægget fra 1991 betragtedes nu stikprøver af størrelse n = 10 fra denne population på N = 97 gensidige skadesforsikringsselskaber ved simpel tilfældig udvælgelse. Der er overordnet set tre muligheder alt efter, om ingen, kun et eller begge de to meget store selskaber kommer med i stikprøven.

PROC SURVEYMEANS kan i version kan i 14.3 beregne bootstrap konfidensinter­valler for gennemsnit. Det gøres ved optionen varmethod=bootstrap i selve pro­cedurekaldet. Selve metoden er blot at beregne stikprøvevariansen ved hjælp af boot­strap i stedet for ved den sædvanlige s2 estimator. Denne metode til variansestimation fungerer også i de øvrige procedurer i SURVEY-suiten, fx i PROC SURVEYREG.

215

Page 225: Symposium i anvendt statistik 2018

Den bootstrapede standardafvigelse anvendes så i et symmetrisk konfidensinterval, hvilket ikke løser problemet med den stærkt højreskæve fordeling i dette dataeksem­pel. Men de ændrede standardafvigelser har stor betydning endda.

Hvis ingen af de to store er med i stikprøven får man:

Sædvanlig s2

Std Error Mean ofMean 95% CL for Mean

12.303.210 3.799.708 3.707.674 120.898.745

Bootstrapet standardafvigelse

Std Error Mean ofMean 95% CL for Mean

12.303.210 3.997.555 3260.112 121346.308

Hvis kun et af de to store selskaber er med i stikprøven får man:

Sædvanlig s2

Std Error Mean ofMean 95% CL for Mean

168.689.722 143.356 086 -155.604.275 1492.983.71 8

Bootstrapet standardafvigelse

Std Error Mean ofMean 95% CL for Mean

168.689.722 150.651.485 -172.107.614 1509.487.057

Hvis begge de to store er med i stikprøven får man

Sædvanlig s2

Sid Error Mean ofMean 95% CL for Mean

307.113. 736 187.053.054 -116.029.671 1730.257.142

Bootstrapet standardafvigelse

Sid Error Mean ofMean 95% CL for Mean

307. 113.736 196.187.491 -136.696.0491750.923.520

I mere komplicerede stikprøveplaner med klyngeudvælgelse udføres denne varianse­stimation med bootstrap kun for variansen på det overordnerede PSU (Primary Sam­pling Unit) niveau, men ikke inden i klyngerne.

216

Page 226: Symposium i anvendt statistik 2018

Et eksempel med PROC CAUSALMED

I version 14.2, som blev omtalt i symposieindlæggetjanuar 2017, blev to nye procedu­rer introduceret; PROC CAUSALTRT og PROC PSMATCH, der hver på deres måde kan anvendes til at underbygge eventuel kausalitet. Den nye PROC CAUSALMED tilføjer mediator aspektet til analyser af denne type. Ideen i en mediator er, at splitte en effekt af en variabel på en anden i en direkte effekt og en indirekte effekt, der går igennem en anden variabel.

Treatment (T) i--------.-.i Outcome (Y)

Mediator (M)

Kilde: SAS-dokumentationen

I PROC CAUSALMED kan der anvendes både kontinuerte og binære respons- og for­klarende variable.

Som et eksempel benyttes her samme problemstilling som ved det mundtlige indlæg i januar 2017; eksemplet var dog ikke med i det skriftlige bidrag. Som responsvariabel benyttes scoren i "Science" i den danske del af PISA datasættet. Som forklarende vari­abel benyttes motion, idet motion hævdes at have en effekt på elevernes faglige udbyt­te. Motion er defineret som en binær ja/nej tekstvariabel alt efter, om eleven har moti­oneret så han/hun svedte mere end en halv time i løbet af en uge. Desuden inddrages en kontinuert variabel, cultposs, der på en skala udtrykker, hvor mange kulturelle genstande, der ifølge barnets egne oplysninger findes i hjemmet. I en almindelig re­gression er der voldsom multikollinearitet mellem motion og kultur; korrelationen mellem regressionskoefficienteme er -0.94.

proc causalmed data=asdf2 all; class motion; model science=Moti on cul tposs; mediator motion=cul t poss; run;

Summary of Effects

Tota/Effect

Control/ed Direct Effect (CDE)

Natura/ Direct Effect (NDE)

Natura/ lndirect Effect (NIE)

Percentage Mediated

Percentage Eliminated

Estimate

27.7280

27.4588

27.4588

0.2692

0.9709

0.9709

standard Error

1.0943

10953

1.0953

0.0903

0.3267

0.3267

217

Wald95% Confidence Limits z Pr > IZI

25.5832 29.8729 25.34 <.0001

25.3121 29.6055 25.07 <.0001

25.3121 29.6055 25.07 <.0001

0.09225 0.4462 2.98 0.0029

0.3306 1.6112 2.97 0.0030

0.3306 1.6112 2.97 0.0030

Page 227: Symposium i anvendt statistik 2018

Tabellen viser at et trin op ad skalaen for kultur i hjemmet medfører en stigning på 27.7280 på PISA scoren for Science. Herafskyldes kun 0.2692, at det er voldsomt sig­nifikant at flere kulturgenstande i hjemmet medfør øget trang til motion.

Der kan også anvendes confounders, yderligere forklarende variable, i modellerne. Her anvendes køn, der i PISA datasættet meget intuitivt betegnes ST004D01T, forældrenes højeste uddannelse, hisei, samt immigrantstatus, immig, med tre niveauer.

proc causalmed data=asdf2 all; class motion ST004D01T immig MISCED; model science=Motion cultposs; mediator motion=cultposs ; covar ST004D01T hisei immig;

run;

I dette tilfælde er motion ikke signifikant for PISA scoren i Science i en almindelig regressionsmodel. Resultaterne fra PROC CAUSALMED viser, at effekten af kultur er 17.3424, når de øvrige forklarende variable indføres; heraf kan kun 0.0458 forklares ved at øget kultur i hjemmet medfører trang til øget motion for børnene.

Summary of Effects

standard Wa/d95% Estimate Error Confidence Limits z Pr> IZI

Tota/Effect 17.3882 1.1372 15.1593 19.6170 15.29 <.0001

Control/ed Direct Effect (CDE) 17.3424 1.1374 15.1132 19.5716 15.25 <.0001

Natura/ Direct Effect (NDE) 17.3424 1.1374 15.1132 19.5716 15.25 <.0001

Natura/ lndirect Effect (NIE) 0.0458 0.0389 -0.03051 0.1221 1.18 0.2395

Percentage Mediated 0.2634 0.2243 -0.1764 0.7031 1.17 0.2404

Percentage Eliminated 0.2634 0.2243 -0.1764 0.7031 1.17 0.2404

Referencer

Anders Milhøj(l991). Stikprøveudvælgelse fra en lille højreskæv population, Sympo­sium i Anvendt Statistik 1991, red. Karsen Vest Nielsen, UNl·C

Diverse hjemme sider på support.sas.com. Passager på engelsk er direkte citater.

218

Page 228: Symposium i anvendt statistik 2018

Finansministeriets økonomiske råderum:

luftkastel eller realitet?12

Professor Jesper Jespersen, Roskilde Universitet, e-mail: [email protected]

Begrebsafklaring

Begrebet 'økonomisk råderum' er blevet en etableret del af den fagøkonomiske og politiske

diskurs.

Råderummet knytter sig snævert til saldoen på den offentlige sektors strukturelle drifts- og

anlægsbudget.

Råderummet beregnes, kort fortalt, som overskuddet på den strukturelle saldo ved uændret

økonomisk politik, fuld indfasning af vedtagne strukturelle (arbejdsmarkeds)reformer og

under antagelse af en underliggende 'normal' vækst i BNP-produktiviteten på l Y, pct. p.a. 3

Beregningen er snævert relateret til den makroøkonomiske model og struktur, der anvendes

i bl.a. finansministeriet (FM).

Beregningerne gennemføres altid fremadrettet og er baseret på en fremskrivning af den

økonomiske struktur-/ligevægtsmodel: i 2017 blev der fremlagt en 2025-plan, hvor

råderummet for finanspolitiske initiativer var fastlagt på baggrund af disse begreber og

regneprincipper.

Beregning af det økonomiske råderum bygger således på en række antagelser:

1. at den benyttede model giver en retvisende beskrivelse af de centrale

sammenhænge i dansk økonomi,

2. at der opnås fuld (strukturel) beskæftigelse.

3. at der er ligevægt mellem opsparing og investering i den private sektor

4. at det er et meningsfuldt mål for finanspolitikken at sigte mod strukturel balance på

den offentlige sektors drifts- og anlægbudget

Det økonomiske råderum er defineret som saldoen på den offentlige sektors strukturelle

budget.

Dette paper vil belyse begrebs- og beregningsmæssige faldgruber og samfundsøkonomiske

misforståelser, der knytter sig til finansministeriets brug af begrebet Økonomisk Råderum.

1 Work in progress kommentarer er meget velkomne. 2 En t ilsvarende problemstilling knytter sig til begrebet f inanspolitisk holdbarhed, hvor den modelberegnede strukturelle budgetsaldo blot forlænges til 2060, altså ca. 45 år! Også den beregning benyttes ved tilrettelæggelsen af næste års finanspolitik ! 3 Antagelsen om væksten i produktiviteten sker eksogent, og kan derfor varieres fra beregning til beregning. Effekten heraf er dog begrænset, idet de fleste offentlige udgifter og indtægter varierer (proportionalt ) med den gennemsnitlige produktivitet.

219

Page 229: Symposium i anvendt statistik 2018

Finansministeriets beregning af det 'økonomiske råderum'

Figur 1. Struktural saldo på offentlig sektors drifts- og anlægsbudget

Kilde: Mads Kieler, 6. December 2017, præsentation

Pct. of GDP Pct. of GDP

2~ 2~

1~ 1~

1,0 1,0

0,5 0,5

0,0 +-----;---.---...-----------------+ 0,0

-0~ -0~

~~ ~~

~~ ~~

~~ ~~ 10 15 20 25 30 35 40 45 50 55

Som en konsekvents af budgetloven vedtaget i 2012 med virkning fra 2014 må

den offentlige sektors budget maksimalt udvise et underskud på Y, pct. af BNP

(med få undtagelser). Finansministeriet har derfor etableret en modelmæssig

praksis, hvorefter der gennemføres en sådan beregning bl.a. i forbindelse med

fremlæggelsen af næste års finanslov.

Der er tre forhold, der i særlig grad påvirker den strukturelle saldo:

1. den aktive finanspolitik,

2. erhvervsfrekvensen (navnlig blandt personer, der er berettiget til

arbejdsmarkedsrelaterede sociale ydelser) og

220

Page 230: Symposium i anvendt statistik 2018

3. demografiske forskydninger. Der er 3 ret så forskellige tidsperspektiver

involveret ved vurderinger af disse tre effekter. Den diskretionære/aktiove

finanspolitik kan i princippet ændres fra år til år. Hvis figuren lægges til grund

for vurderingen af den førte finanspolitik i årene 2010-2015, så var den

ekspansiv otg overskred det senere indførte loft på en Y, pct. af BNP.

Vurderet i forhold til konjunktursituationen - dvs. afstanden mellem den

aktuelle og strukturelle arbejdsløshed, se figur 2, så bidrog en ekspansiv

finanspolitik til at reducere denne afstand og herved mindske ubalancen i

dansk økonomi.

Figur 2. Aktuel og strukturel arbejdsløshed

Kilde: Mads Kieler, 6. December 2017, præsentation

Pct of labour force Pcl of labour force,

16 16

14 14

12 12

10 Structural 10

8 8

6 6

4 4

2 2

0 0 91 93 95 97 99 01 03 05 07 09 11 13 15 17 19

221

Page 231: Symposium i anvendt statistik 2018

Den markante forbedring i den strukturelle saldo frem mod 2025 kan primært

henføres til antagelsen om 1. forøgelsen af arbejdsudbuddet som en

konsekvens af allerede vedtagne 'arbejdsmarkedsreformer', en blanding af

nedskæringer i de sociale ydelser og forøgelse af tilbagetrækningsalderen fra

arbejdsstyrken. Finansministeriets model har den indbyggede egenskab, at et

øget arbejdsudbud omsættes i en tilsvarende stigning i beskæftigelsen efter 3-5

år (incl. fiscal reaction). I runde tal fører dette regneprincip, at en forøgelse af

arbejdsudbudet på 10.000 personer forbedrer den strukturelle saldo med 2-3

mia.kr. En forbedring af saldoen på ca. 1. pct. af BNP (ca. 20 mia. kr.) som ses

af figur 1, kan således tilvejebringes - i modellen -ved en forøgelse af

arbejdsudbudet på ca. 60-80.000 personer.

Hvorfor går det galt efter 2025! Den ofte omtalte 'hængekøje' på den

strukturelle saldo opstår som en konsekvens af demografiske forskydninger

(fortsat store årgange går på pension, mens forholdsvis små årgange træder ind

på arbejdsmarkedet). Det er dog ikke en permanent tilstand, som det ses, idet

efter 2040 vender demografien - alt andet lige, hvorom kun et kan siges, at alt

andet ikke vil være lige! - og efter 2040 begynder budgettet at bevæge sig mod

ligevægt og efter 2050 vil et stigende overskud vise sig på budgettet. Trækkes

beregningen ud til 2060 vil det offentlige budget- under de her benyttede

forudsætninger- vise sig at være holdbart, dvs. at den offentlige gæld i forhold

til BNP sammenlignet med i dag ikke vil være vokset.

Det paradoksale resultat, der fremstår af ovenstående gennemgang af den

finanspolitiske regel, at underskuddet på den offentlige strukturelle saldo

maksimalt må udgøre Y, pct. af BNP, er, at finanspolitikken skulle have været

strammet i den periode, hvor arbejdsløsheden var den højeste siden mid-

1990erne; mens der i dag, hvor arbejdsmarkedet strammet til lægges op til en

betydelig lempelse af finanspolitikken, idet råderummet- jfr. nedenfor ­

opgørestil ca. 1Yz pct. af BNP, der står til disposition for politikerne4•

4 Regeringen præsenterer tirsdag (maj, 2017) en ny økonomisk 2025-plan, der vil fastholde en målsætning om, at reformer skal øge beskæftigelsen med 55.000 - 60.000 personer mere, end der allerede er udsigt til.

De 2.500 job, som Løkke hidtil har 'skabt', stammer i følge FM-modellen fra et kontanthjælpsloft, en integrationsydelse og en nedsættelse af registreringsafgiften på biler, maj 2017.

222

Page 232: Symposium i anvendt statistik 2018

Beregning af Økonomisk Råderum er meget usikker

Det finanspolitiske Råderummet er i 2025-planen fra maj, 2017 som nævnt

ovenfor beregnet til 37 mia. kr.

I følge planen kan råderummet forøges med yderligere knap 15 mia. kr. til i alt

50,6 mia. kr. (man bemærker den præcision hvormed ta llet offentliggøres).

Forøgelsen af råderummet angives at være resultatet af flere af de såkaldte

arbejdsmarkedsreformer (ikke nærmere specificeret) alle med sigte mod at øge

udbuddet af arbejdskraft.

Disse beregninger er meget hypotetiske og helt overvejende baseret på

Finansministeriets regnemodel. En model, der er bedst kendt for sit

optimistiske syn på, at 'opsvinget kommer til næste år' og 'at øget udbud af

arbejdskraft fører til en tilsvarende forøgelse af beskæftigelsen' i løbet af få år.

Antagelser der ikke hidtil har afspejlet virkeligheden.

Beregningen af råderummet er baseret på modellens antagelse om en i

fremtiden stabil økonomisk årlig vækst i BNP på knap 2 pct., samtidig med at

arbejdsløsheden forsvinder af sig selv (uanset arbejdsudbuddet antages at øges

ganske betydeligt). Beskæftigelsen forventes således at stige 'af sig selv' frem

til 2025. Og her ønsker regeringen gennem endnu flere reformer at øge

udbuddet af arbejdskraft med yderligere 60.000 personer, der ligeledes 'af sig

selv' i finansministeriets model at komme i beskæftigelse - hujhej, hvor det går

i dansk økonomi (i følge modellen).

Lægges sådanne optimistiske forudsætninger til grund for beregningen af

offentlige indtægter og udgifter, så er det næsten som at trække 'penge i en

automat' . Skattegrundlaget vokser støt med 2 pct. om året og udgifterne til

kontanthjælp og dagpenge falder i takt med, at arbejdsløsheden falder og

beskæftigelsen stiger. Det offentlige forbrug er stort set nulstillet, idet den

beskedne reale vækst på 0,3 pct. om året primært fremkommer som et resultat

af, at de offentligt ansatte er antaget at løbe hurtigere.

Råderummet består således helt overvejende af 'matador-penge', som trækkes

ud af Finansministeriets regnemodel, og som er baseret på en række

modelmæssige forudsætninger, der kun i beskedent omfang afspejler den

økonomiske virkelighed, der har kendetegnet dansk økonomi igennem de

seneste 10 - 20 år.

223

Page 233: Symposium i anvendt statistik 2018

Men hvad er det reelle råderum i dansk økonomi?

Opgørelsen af det reelle råderum i dansk økonomi burde opgøres med afsæt i

virkeligheden. Her har der været et betydeligt ledighedsproblem i gennem de

seneste ti år. Et ganske betydeligt produktions potentiale er gået tabt, fordi

finanspolitikken blev tilrettelagt efter forestillingen om, at opsvinget kom af sig

selv til næste år, hvilket fik dansk økonomi til at køre i et slæbespor.

Der har været et helt unødvendigt fokus på de offentlige finanser, som ikke i sig

selv er en realøkonomisk restriktion - i hvert fald ikke for lande, der har deres

egen valuta. I forhold som bl.a. Storbritannien, Sverige og Polen har benyttet

sig særdeles aktivt af. Danmark adskiller sig dog fra disse tre lande ved i

udenrigsøkonomien at have bundet den danske krone til en fast kurs i forhold

til euro. Det stiller et særligt krav om, at der er overskud på betalingsbalancens

løbende poster, så der ikke opstår et akut lånebehov i udlandet.

Denne forudsætning om et overskud på betalingsbalancen har dog været mere

end rigeligt opfyldt igennem de seneste 10-20 år, så her ligger der ikke nogen

særskilt begrænsning på den økonomiske der kunne føres. Tværtimod,

tenderer det til et samfundsøkonomisk spild at have en valutareserve på mere

end 500 mia. kr.

Hvordan ser ubalancerne ud i virkeligheden? Lad os tage dem en for en.

Det kan med rimelig sikkerhed siges, at der fortsat er en betydelig

arbejdsløshed. De undersøgelser der er baseret på interviews - og derfor

internationalt sammenlignelige - angiver at ca. 170.000 personer er uden

arbejde; men parat til at påtage sig et, hvis det kunne opdrives. Det er omkring

det dobbelte af det officielle arbejdsløshedstal, hvor kun personer der

modtager dagpenge eller kontanthjælp medregnes. Forskellen i tallene er bl.a.

et udtryk for, hvormange der er faldet ud af dagpengesystemet. Der er dog

også et betydeligt antal studerende, der søger arbejde inkluderet i tallet. Men

der er en unødvendig stor arbejdsløshed, der helt åbenbart ikke er og ikke vil

forsvinde af sig selv, hvis altså de overhovedet bliver spurgt og inkluderet i

statistikken.

224

Page 234: Symposium i anvendt statistik 2018

Bruttoledigheden, nettoledigheden og AKU·ledigheden, sæsonkorrigeret

Tusinde 220

200

180

160

140

120 :::-:::::::~----..... --~~i ._ ____ _:::::_ __ ::::f:::.:;;;::::.:;;;;ii"~B~ruttoledige ~ 100

80 J F M A M J J A S 0 N D J F M A M J J A S 0 N D J F M A M J J A S 0 N D

2015 2016 2017

AKU-LEDIGHEDEN LIGGER PÅ 169.000

Kilde: Danmarks Statistik, nov. 2017

Det andet vigtige element i virkelighedens samfundsøkonomiske råderum er

betalingsbalanceoverskuddet. Aldrig har dansk økonomi haft et overskud så

stort som i dag. Det udgør ikke mindre end 170 mia. kr. Det er målt i forhold til

BNP ligeså stort som det tyske! Aldrig har dansk økonomi stået stærkere rustet

internationalt. Pengene fosser billedlig talt ind i landet, hvilket den bugnende

valutareserve i Nationalbankens kælder også vidner om. Der tjenes mange

penge på dansk eksport til udlandet. Det er et flot resultat af arbejdsmarkedets

parter ansvarlige overenskomster. Men betalingsbalanceoverskuddet er også

et udtryk for, at der spares mere op i Danmark, end der investeres. Det er et

samfundsøkonomisk problem, som der burde rettes op på.

Betalingsbalanceoverskuddet giver netop en råderum til at øge investeringerne

i Danmark. Dansk økonomis fremtidig kan ikke bygges på de eurofordringer (og

statsobligationer), der hober sig op som gold valutareserve (ca. 500 mia. kr.) i

Nationalbanken eller i de danske pensionskasser.

Det store betalingsbalanceoverskud er et udtryk for, at investeringsniveauet i

Danmark er for lavt. Det gælder både private og offentlige investeringer (i bred

forstand). Det er jo investeringer i ny teknologi, i miljøbeskyttelse, i

bredbåndsnet, hurtigtog, sundhed og ikke mindst forskning og uddannelse, der

225

Page 235: Symposium i anvendt statistik 2018

skal sikre, at dansk økonomi også i fremtiden kan skabe tilstrækkeligt med

velbetalte jobs. Her skal det erindres, at med den allerede vedtage lovgivning

vedr. arbejdsmarked og pension vil arbejdsudbuddet stige med op mod

200.000 personer i de kommende ti år, så det er næppe arbejdskraft som

sådan, der bliver mangel på; men om overhovedet så arbejdskraft med de rette

kvalifikationer.

Konklusion vedr. samfundsøkonomisk råderum

Det samfundsøkonomiske råde i dansk økonomi er ganske betydeligt. Det kan

med rimelig præcision opgøres ved at se på hhv. den aktuelle ledighed opgjort

efter internationale standarder og dernæst saldoen på betalingsbalancens

løbende poster. Som beskrevet ovenfor viser begge disse indikatorer at der er

et betydeligt uudnyttet realøkonomisk råderum. Det kommer som vist i figuren

nedenfor til udtryk i form af et historisk stort opsparingsoverskud i den private

sektor. Et overskud, der primært kan forklares ved et lav indenlandsk

investeringsniveau navnlig i den private sektor, men også i den offentlige

sektor. Hvor ikke mindst det sidste forhold ville være relativt let at afhjælpe

gennem en målrettet finanspolitisk strategi med sigte mod at etablere et

bæredygtigt udvikling. Et forhold som - desværre - både budgetloven og EUs

stabilitetspagt stiller sig hindrende for at gennemføre.

Q.

z CO -(1J

QJ "'C c (1J

0,28

0 ,26

0,24

0,22

0,2

0,18

0,16

0,14

0,12

0,1

Privat opsparing og reale investeringer

197519771979198119831985 198719891991199319951997 19992001200320052007 2009 2011 20132015

Kilde: OECD, Economic Outlook, 2016

226

Page 236: Symposium i anvendt statistik 2018

Taxes, Tesla, and Government Takings

Asplund, Marcus, CBS, Jinkins, David, CBS, Lutz, Chandler, CBS, and Paizs, Gyorgy

Extended Abstract, January 4, 2018

In December of 2015, the Tesla Model S - an expensive, fully electric luxury sedan - was the best­selling personal vehicle in Denmark. It is unusual that a luxury car was a best seller. It is even more unusual thai it was an electric car. lndeed, this was lhe fi rst time an electric car had lopped sales charts in any major market (Electrek 2016). 1 lmmediately foliowing the sales surge was a dramatic sales ebb. Very few new Tesla's were sold in the first six months of 2016. The reason for this dramatic variation in the Tesla market was a change in tax law passed by the Danish parliament in October 2015. The law ended a sales tax exemption for alternalive-fuel vehicles. Buyers rushed in to purehase and register their Tesla's before the new tax regime look effect on the 1st of January 2016. In this paper we document the market's reaction to the tax policy change - in particular the surge in sales of new Tesla's in late 2015, and lhe reaction ol the used car market in 2016. Using data we scraped from the most popular online used car marketplace in Denmark, we documenl how prices and characteristics ol listed used Tesla's changed from 2015 to 2016. We will use this data to look for evidence of speculation thai is registering Teslas before the tax change and selling them afterwards fora profit. Naturally some ol the Teslas purchased were tax arbitrage by consumers who wanted to avoid taxes themselves.

The Danish government chose to announce the tax change well before the change was implemented. A natural question is which parties benefited from the way the tax change was rolled out. We will argue that there is little evidence ol speculation on the used car market. Controlling for many observable characteristics, the price ol used Tesla's increased in this period. lnstead, our evidence is consistent with the idea that consumers who were planning on upgrading to a newer model ol Tesla in 2016 instead upgraded in 2015, listing their older model on the used car marketplace. There it is likely thai final consumers of new Tesla's gained from the tax rollout. They were able to avoid the new tax by purchasing in 2015. The big loser in our calculations is the government.

By announcing the tax change in advance, the government lost revenues from sales. We perlorm a back of the envelope calculalion, and find lhat the government's losses were on lhe order ol hundreds ol millions ol Danish Kroner (tens of millions of Euros). Our baseline estimate is that the government lost more than 350 million Danish Kroner. Since those thai would have paid the taxes are purehasers ol luxury vehicles, the rollout ol lhe tax change might be thought ol as a subsidy to the wealthiest Danes.

1 Tesla was producing only 1000 Model S's a week in 2015

https://tra nsportevolved. com/2015/05/06/tesl a-m otors-posts-q 1-2015-1 osses-d ue-to-strong- d all ar-high­ca p ital-expe nd itures-h its-1000-ca rwee k-mod el-s-prod uction/

227

Page 237: Symposium i anvendt statistik 2018

Contracting Out Welfare-to-Work Services

Lars Skipper and Kenneth Lykke Sørensen1

Aarhus University, CAFE

DECEMBER, 20 17

Extended Abstract

The use of private providers as a means to improving public-sector efficiency by spurring competition

and innovation, is widely used across many countries. The discussion of the relative performance of

providers vis-a-vis Public Employment Services (PES) in the welfare-to-work industry has therefore

been active for years, e.g. the UK (Burgess and Ratto (2003)), US (Finn (2007)), Sweden

(Bennmarker, Griiqvist, and Ockert (2013)), the Netherlands (Koning and Heinrich (2013)), Australia

(Struyven and Steurs (2005)), and France (Behaghel, Crepon, and Gurgand (2014)). By providers,

the literature understands private or semi-private firms supplying public services under a contract

with the government.

There is, however, limited knowledge as to what selection mechanisms are behind the decisions of

contracting out welfare services from the referring side ofthe PES. Yet shedding light on this process

is critical for the interpretation of existing estimates of the relative performance of providers. lf the

PES are referring when capacity is reaching its constraint, all current studies are irrelevant from a

policy perspective as differences between mean benefits and mean costs can be dwarfed by marginal

cost. There is also the question of learning - do the PES appear to re-optimize over contract periods

and settle for specific providers? lf they re-mix, what is driving this decision? It likely matters, for

1 Corresponding author: Kenneth Lykke Sørensen, Department of Economics and Business Economics, Aarhus University, Fuglesangs Alle 4, DK-8210 Aarhus V, Denmark, [email protected]. We !hank participants at the Annua! meeting ofthe Danish Econometric Society, the fourth annual conference ofthe International Association for Applied Econometrics, Sapporo, Japan, and the 29th EALE conference 2017, SI.Gallen, Switzerland for valuable ccmments as well as Rigsrevisionen, the Danish national audit office for providing the data. The views expressed herein are those of the authors and do not necessarily retlect the views of the Danish national audit office. Finally, Sørensen greatly acknowledge financial support from the Danish Council for Independent Research I Social Sciences (grant no. FSE 4182-00281 ).

228

Page 238: Symposium i anvendt statistik 2018

example, whether the PES only assigns certain types of unemployed individuals, whether they only

assign under special economic circumstances, and which providers they choose. The assignment

choice typically involves several interdependent steps with multiple decision makers involved. First,

PES management must decide whether or not to contract out part of the stock of unemployed workers.

Next, either management or the caseworker need to decide which workers to refer, and lastly, the

PES needs to make a decision about which particular provider to choose for each unemployed

individual being referred. This process is potentially very complex and dependent on the state of the

economy as well as the providers available to the PES. Given the complexity of the delivery of

welfare services, the choice of provider should probably be made based on other criteria than merely

the contract price such as the labor market prospects ofthe individual unemployed worker, the success

rates of the providers, etc. (see e.g. Carpineti, Piga, and Zanza (2006) for an overview of European

and US public procurement practices).

We model selection strategies ofthe PES when given the option ofreferring unemployed workers to

different providers under a common agreement contract with provider-specific prices. We explicitly

model the selection made by the PES in the presence of a Danish national tender, in which providers

placed bids in a sealed bid auction. The tender contained two target groups of workers and all

providers had to apply with a specific price for one or both target groups in one or more employment

regions.

We shed light on three fundamental questions regarding the outsourcing ofwelfare-to-work services

by the PES: (i) Why and under what circumstances does management at a PES choose to contract out

the core delivery of services? (ii) Do the PES choose randomly which unemployed workers to contract

out or do they refer workers according to employability or other observable characteristics? (iii) After

choosing which individuals to refer, do they randomly allocate them to available providers, or do they

select provider based on identifiable criteria such as price and past experiences with the providers?

We find that the PES systematically use the option of contracting out welfare-to-work services as

means of leverage when they are faced with capacity constraints. We also find that the choice of

which individuals to refer depend upon employability which is a function of schooling, expenses

already spent on the unemployed in terms of labor market programs, experience, age, gender, and

229

Page 239: Symposium i anvendt statistik 2018

other observable individual characteristics. Finally, we find that the contract price is particularly

important for the choice to contract out older workers.

References

Behaghel, Luc, Bruno Crepon, and Marc Gurgand (2014), Private and Public Provision ofCounseling to Job Seekers: Evidence from a Large Controlled Experiment, American Economic Journal: Applied Economics 6, no. 4:142- 174.

Bennmarker, Helge, Erik Griiqvist, and Bjiirn Ockert (2013), Effects of Contracting Out Employment Services: Evidence From a Randomized Experiment, Journal of Public Economics 98:68-84.

Burgess, Simon and Marisa Ratto (2003), The Role of Incentives in the Public Sector: Issues and Evidence, Oxford Review ofEconomic Policy 19:285-300.

Carpineti, Laura, Gustava Piga, and Matteo Zanza (2006), The Variety of Procurement Practice: Evidence from Public Procurement, in Nicola Dimitri, Gustava Piga, and Giancarlo Spagnolo (eds.), Handbook of Procurement, Cambridge University Press, 14-- 46.

Finn, Dan (2007), Contracting out welfare to work in the USA: delivery lessons, Tech. rep., Department for Work and Pensions, Research Report No 466.

Koning, Pierre and Carolyn J. Heinrich (2013), Cream Skimming, Parking, and Other Intended and Unintended Effects of High-Powered, Performance-Based Contracts, Journal of Policy Analysis and Management 32, no. 3 :461-483.

Struyven, Ludo and Geert Steurs (2005), Design and redesign of a quasi-market for the reintegration of jobseekers: empirical evidence from Australia and the Netherlands, Journal ofEuropean Social Policy 15, no. 3:211-229.

230

Page 240: Symposium i anvendt statistik 2018

The Geography of Foreign Direct Investment's spillover effects

Evidence from Denmark

Ditte Håkonsson Lyngemark1•3, Ismir Mulalic1•2 and Cecilie Dohlmann Weatherall1

1 Kraks Fond-Institutefor Urban Economic Research; 2 DTU Management Engineering, Technical University ofDenmark; 3 Department of Geosciences and Natura! Resource Management, University of Copenhagen.

Keywords: jirms, productivity, foreign direct investment.

JEL codes: D22, D24, RI I , RI 2.

Extended abstract

FDls are expected to be an important driver for economic growth and development. Therefore, policy

makers in many countries are focusing on making policies and strategies that will attract FDis (OECD

2003; Shatz and Venables 2003). Denmark is no exception and institutions at all levels stress the

importance of attracting FDis (e.g. Copenhagen Capacity and Invest in Denmark). Emphasis is often

placed on the faet that FDis contribute to increased productivity and competitiveness of domestic

firms and that knowledge spillovers are spatially bounded (Beugelsdijk and Mudambi 2013) (see

figure I). It is expected that the technology and knowledge, which the FDI holds, will spillover and

benefit domestic firms (Javorcik 2004 ). However, there is also a concern that FDis can have such a

competitive advantage that domestic firms are harmed by FDis (Gorg and Greenaway 2004; Lipsey

and Sjoholm 2004). In addition, Audretsch (2003) states, that there are no evidence that technological

advantages from the foreign firm influence the technological advantages of domestic firms. Previous

studies have looked into the vertical and horizontal spillover effects related to sector space (Javorcik

2004; Keller 2009), but we will look into the geographic space dimension and investigate how

proximity to FD!s influence the spillover effects to domestic firms.

This paper uses a unique dataset on FDis (including Greenfield Investments (Gls) and Mergers and

Acquisitions (MAs)) in Denmark and a ful! population of geo-coded micro-level panel data for Danish

firms. Furthermore, we have information about the regional variation in FDI offices in Denmark. The

data makes us able to estimate the effects of Gls or MAs on firm-level productivity. Furthermore, we

investigate if geographical or sectoral distance influence the effects of FD!s.

We estimate a production function with composite errors to determine productivity using the recent

identification strategy. We apply Levinsohn and Petrin (2003) to estimate Total Factor Productivity

(TFP). Then, we relate the estimated TFP to a measure of Gis on the municipality level in order to

investigate the extent of which the change in Gis affect the productivity of Danish firms. To identify

231

Page 241: Symposium i anvendt statistik 2018

the causa! effect of Gls, we apply different fixed-effects specifications, instrumental variables and

controls for geography. Since Gls might also attract economic activity, leading to local density

increasing, which in tum could affect productivity through agglomeration benefits, we also estimate

models that control for local employment densities. We provide estimates of these effects both at the

aggregate level as well as at the level of individual sectors.

The preliminary results show positive significant impact ofthe Gls on the productivity ofDanish firms

in the service sector. In contrast, the effect for the productivity of Danish firms in manufacturing are

negative or insignificant. These preliminary results suggest that firms in the service sector benefits

from increasing levels of Gls, while the firms in the manufacturing sector are not affected or

negatively affected by the Gls due to the increased competition.

There are several contributions to the literature. Firs!, the data does not suffer from attrition bias

because we have administrative data. Second, we look specifically at GI and MA spillover effects on

domestic firm-level productivity, because we can separate between Gis and MAs within FDis.

Furthermore, by instrumenting the variation in regional FDI offices we are able to estimate the causa!

effect of Gis and MAs on domestic firms' productivity and employment. Finally, we show how

geography influence the FDI spillover effect.

232

Page 242: Symposium i anvendt statistik 2018

Figure I. Year/y average number of fal/time jobs due to Gfs in the municipa/ity. In the period 2004-2013.

Divided by quartiles D None

- 0.01-3.10

- 3.10- 7.89 - 7.89 - 22.53 - 22.53 - 211.60

233

Page 243: Symposium i anvendt statistik 2018

References

Aitken BJ, Harrison AE (1999) Do Domestic Firrns Benefit from Direct Foreign Investment? Evidence from Venezuela. American Economic Review 89:605--618. doi: 10.1257/aer.89.3.605

Audretsch DB (2003) Corporate Form and Spatial Fonn. In: The Oxford Handbook ofEconomic Geography. Oxford University Press, Oxford, New York, pp 333-347

Barrios S, Giirg H, Strobl E (2011) Spillovers through backward linkages from multinationals: Measurement matters! European Economic Review 55:862- 875. doi: I 0.1016/j.euroecorev.20 I 0.10.002

Bennedsen M, Kongsted HC, Nielsen KM (2008) The causa! effect ofboard size in the performance of smalland medium-sized firms. Journal ofBanking & Finance 32:1098-1109. doi: 10.1Ol6/j.jbankfin.2007.09.016

Beugelsdijk S, Mudambi R (2013) MNEs as border-crossing multi-location enterprises: The role of discontinuities in geographic space. Journal oflnternational Business Studies 44:413-426.

Combes P-P, Duranton G, Gobillon L (2008) Spatial wage disparities: Sorting matters! Journal of Urban Economics 63 :723-742. doi: 10.1016/j.jue.2007.04.004

Combes P-P, Gobillon L (2015) The Empirics of Aggiorneration Economies. In: Handbook of Regional and Urban Economics, Volume 5. North Holland, pp 247-348

Florida RL (2012) The Rise ofthe Creative Class: Revisited. Basic Books Fons-Rosen C, Kalemli-Ozcan S, Sorensen BE, et al (201 7) Foreign Investment and Domestic

Productivity: Identifying Knowledge Spillovers and Competition Effects. National Bureau of Economic Research

Glaeser EL, Mare DC (2001) Cilies and Skills. Journal of Labor Economics 19:316-342. Giirg H, Greenaway D (2004) Much Ado about Nothing? Do Domestic Firms Really Benefit from

Foreign Direct Investment? The World Bank Research Observer 19:171- 197. Giirg H, Greenaway D (2001) Foreign direct investment and intra-industry spillovers A review ofthe

literature. Leverhulme Centre for Research on Globalisation and Economic Policy, Nottingham (United Kingdom) 38.

Haskel JE, Pereira SC, Slaughter MJ (2007) Does lnward Foreign Direct Investment Boost the Productivity ofDomestic Firms? The Review of Economics and Statistics 89:482-496.

Jaffe AB, Trajtenberg M, Henderson JV (1993) GEOGRAPHIC LOCALIZATION OF KNOWLEDGE SPILLOVERS AS EVIDENCED BY PATENT. Q J Econ 108:577-598. doi: 10.2307/2118401

Javorcik BS (2004) Does Foreign Direct lnvestment Increase the Productivity ofDomestic Firms? In Search of Spillovers Through Backward Linkages. American Economic Review 94:605- 627. doi: 10.1257 /0002828041464605

Keller W (2009) International Trade, Foreign Direct Investment, and Technology Spillovers. Keller W, Yeaple SR (2009) Multinational Enterprises, International Trade, and Productivity Growth:

Firm-Level Evidence from the United States. The Review of Economics and Statistics 91:821- 831.

Konings J (2001) The effects offoreign direct investment on domestic firms. Economics ofTransition 9:619- 633. doi: 10.1 11 1/1468-0351.00091

Levinsohn J, Petrin A (2003) Estimating Production Functions Using Inputs to Control for Unobservables. Review of Economic Studies 70:3 17- 341. doi: 10. l l l l/1467-937X.00246

Lipsey R, Sjiiholm F (2004) HOST COUNTRY IMPACTS OF INWARD FDI: WHY SUCH DIFFERENT ANSWERS?

Nielsen BB, Asmussen CG, Weatherall CD (2017) The location choice offoreign direct investments: Empirical evidence and methodological challenges. Journal of World Business 52:62-82. doi: I 0.1016/j.jwb.2016.10.006

OECD (2003) OECD Checklist for Foreign Direct Investment lncentive Policies. Rosenthai SS, Strange WC, Liu CH (2016) The Vertical City: Rent Gradients, Spatial Structure, and

Aggiorneration Economics. Shatz HJ, V enables AJ (2003) Geography of International Investment. In: The Oxford Handbook of

Economic Geography. Oxford University Press, Oxford, New York, pp 125- 145

234

Page 244: Symposium i anvendt statistik 2018

Anvendelse af surveys til eksperimentelle designs

Af

Mogens Dilling-Hansen

Department ofEconomics and Business Economics

Aarhus University

Mai!: [email protected]

Resume

"På de ene side er eksperimentelle designs og specielt anvendelse af eksperimentelle laboratorier centrale begreber ved fremskaffelse af data til videnskabelige artikler. På den anden side er det, vurderet på omfanget, ikke åbenlyst at traditionelle surveys er på vej ud. Denne artikel viser

hvordan et randomised block-design kan være et nyttigt redskab til at kontrollere stikprøve vali­ditet. Den empiriske analyse og illustration er en af de organisatoriske klassikere: Persontyper -her defineret ved Promotion/Prevention typer - kan have indflydelse på hvorledes kritik virker ... selv om det generelt kan være vanskeligt at modtage kritik, så kan den urimelige af slagsen nogen gange faktisk vise sig at øge produktiviteten!"

1 Indledning

Hvorfor måle noget der er umåleligt? Og hvordan findes sande kausale sammenhænge? Inden for

den samfundsvidenskabelig forskning er det relevante spørgsmål at stille. Ofte er analyser base­

ret på personlige holdninger, vurderinger, attituder ol" som både er svære at afgrænse og endnu

sværere at udtrykke. Samtidig er de fundne statistiske sammenhænge nok signifikante, men præ­

sentation af den grundlæggende årsagssammenhæng er typisk ikke eksisterende.

Med udgangspunkt i ovenstående problemer, som typisk er skabt af(dårlige) spørgeskemabase­

rede undersøgelser, så er det ikke overraskende at data fremskaffet ved eksperimenter og ekspe­

rimentelle designs er et must ved publicering af videnskabelige artikler.

Ikke desto mindre er der stadig behov for spørgeskemabaserede analyser. Denne konklusion er

dels baseret på et ressourcebaseret argument, da sande eksperimenter som regel er meget om-

235

Page 245: Symposium i anvendt statistik 2018

kostningskrævende og dels er der også eksempler på analyser, hvor data fra eksperimentelle de­

signs ikke er bedre end traditionelle survey-data.

I denne artikel præsenteres de metodemæssige argumenter for at vælge spørgeskema data frem­

for eksperimentelle data i en analyse af effekterne af feedback-former. Udgangspunktet er en

klassisk artikel afHiggins et. al. (2001), hvor der påvises systematiske forskelle i reaktionen på

positiv/negativ kritik afhængig af hvilken persontype personen er.

Opbygningen af artiklen er følgende. Kapitel 2 præsenterer eksperimentelle designs med fokus

på validitet af data, og kapitel 3 præsenterer det analytiske problem præsenteret i Higgins et. al.

(200 !). Kapitel 4 præsenterer det spørgeskemabaserede design som et brugbart alternativ til et

eksperimentel design og derefter præsenteres der i kapitel 5 nogle resultater, der underbygger

validiteten af det valgte design. Kapitel 6 afrunder og diskuterer rækkevidden afresultaterne.

2 Eksperimenter og eksperimentelle designs Anvendelse af naturlige eksperimenter er ønskværdig for alle analyser, der søger at påvise kausa-

le sammenhænge. Begrundelsen er den åbenlyse, at kausalitet kun kan påvises, hvis der er en

klar tidsmæssig effekt OG at der ikke er andre "extraneous" variable, der forplumrer billedet

(concomitant variation). Ulemperne ved disse eksperimenter er to åbenlyse problemer: Ofte er

det umuligt at konstruere "natura! settings" og eksperimenter tager tid og er dermed kostbare i

forhold til det umiddelbare alternativ, spørgeskemaundersøgelser.

Der er mange typer af eksperimenter, der søger at finde mellemløsninger hvor eksperimentet

stadig bevares, se f.eks. Engineering Statistics Handbook (2017); alle typer af indsamlingsmeto­

der kan have problemer med validiteten af data, hvis metoden enten anvendes forkert eller hvis

der ikke tages hånd om konkrete problemer ved indsamling af data.

Malhotra et. al. (2012) er en blandt mange, der påviser disse strukturelle problemer ved at ind­

samle data. Uanset om der er tale om intern eller ekstern validitet, så er alle målinger udsat for

forskellige former for 'forurening', der skaber bias i data. Enhver måleværdi kan præsenteres ved

følgende bogholderi-ligning:

X = 'true value ofX' + H + MA + MT + IT + I + SR + MO

236

Page 246: Symposium i anvendt statistik 2018

hvor X er den værdi, der søges efter.Her 'history', MA er 'maturation', testing effekter generelt

er delt op i MT ('main testing effect') og IT ('interactive testing effect'), I er 'instrumentation',

SR er 'statistical regression' og endelig sker det at en respondent af en eller anden grund ikke

svarer (MO for 'mortality ').

I Malhotra et. al. (2012) præsenteres denne sammenhæng som et forsvar for eksperimentelle da­

ta, fordi de øvrige effekter vurderes at være mindre i forhold til tilsvarende indsamling af data

ved hjælp af spørgeskemaer.

3 Regulatory focus og feedback effekt Bag den lidt trivielle observation af at vi er forskellige som persontyper ligger en implicit anta-

gelse af, at disse forskelle også har effekt på den individuelle adfærd. Med andre ord kan det

forventes at adfærd blandt andet kan forklares ved forskelle i personlighedstyper.

Teorien om 'regulatory focus' er i flere sammenhænge fremført af E. T. Higgins - se f.eks. Hig­

gins et. al. (1997) og Higgins et. al. (2001). Den grundlæggende tankegang i teorien er, at indivi­

ders syn på livet - specielt arbejdslivet- er styret af to motivationssystemer, promotion ogpre­

vention systemer.

Hvis et indidvid er styret af promotion-systemet, så er det vigtigt at være pro-aktiv, at sørge for at

gøre noget aktivt for at opnå noget (eller undgå noget uønsket). For denne persontype gælder der,

at det er vigtigt at udrette noget . . . også i arbejdslivet, og derfor vil denne persontype typisk ud­

vise en adfærd som er præget af at udrette noget positivt, noget der kan tilskrives den individuel­

le adfærd. Glæden ved at opnå noget positivt er dominerende for en promotion-type.

For et individ, der er styret afprevention-systemet, gælder der at adfærd grundlæggende er styret

af et ønske om at opnå sikkerhed, og derfor vil individet været præget af ønsket om at undgå

noget. En typisk måde at undgå noget uønsket er at være pligtopfyldende, at gøre som det for­

ventes af personen i en given (job-)situation. Glæden ved at undgå noget negativt er domineren­

de for en prevention-type.

Der er meget commonsense i forklaringen af de to psykologiske kræfter, der styrer den indivi­

duelle adfærd, og en af de særligt attraktive egenskaber ved den teori og persontyper er, at den

umiddelbart kan overføres til den individuelle adfærd på arbejdsmarkedet. Nogle persontyper er

237

Page 247: Symposium i anvendt statistik 2018

udpræget pro-aktive og det er vigtigt for dem, at deres indsats kan ses og måles - promotion­

typer vil udrette noget positivt, noget der kan ses og tilskrives individet selv. Andre persontyper

er i jobsituationen meget pligtopfyldende og ved at udfylde jobbet uden at stille spørgsmål ved

det rimelige i jobindholdet opnås en umiddelbar glæde - prevention-typer søger at undgå kon­

flikter ved pligtopfyldende at passe jobbet uden nødvendigt at skabe noget nyt.

Tre væsentlige udfordringer ved denne teori kan identificeres. For det første er vanskeligt at må­

le graden af, hvad der generelt betegnes som regulatory fokus, dvs. måle styrken/graden afpre­

vention- og promotion typer. Hvordan måles graden afregulatory focus? For det andet er littera­

turen bemærkelsesværdig tavs om evt. sammenhænge mellem de to dimensioner: Er det således

at et individ enten er domineret af den ene eller den anden type (en-dimensional skala) eller er

der tale om to uafhængige systemer, hvor et individ kan score højt på begge skalaer? For det

tredje er interessen for denne beskrivelse af persontyper helt grundlæggende afhængig af, at ad­

færd er bestemt af persontypen. Er det sådan at adfærden i en bestemt arbejdssituation afhænger

af persontypen?

238

Page 248: Symposium i anvendt statistik 2018

Table l. Evenl reaction qucstioonaire

This set of questions asks you about speci:fic cveou1 in your life. Pl.ea-;e indica1e your answer 10 each question by circling the appropriate number below it. I. Compared to most people, are you typically unable ro get what you want out of llfe? { - 0.65]

I 2 3 4 ncver or seldom sometimes very often

2. Growing up, would you ever "aoss the line'" by doing things diat yoltt' pan:nts would not 1aleran:·~ { -0.80) I 2 3 4 S

never or seldom sometimes very often 3. How often have you accomplished things lhat got you ·•psyched" to work even harder'l [0.37]

I 2 3 4 never or seldom a few times many times

4. Did you get on your parents' nerves often when you were growing up? [ -0.65] I 2 3 4

never or seldom sometimes very aften 5. How often did you obey rules and regulations that were established by your parent'l? [0.56]

I 2 3 4 never or seldom sometimes always

6. Growing up, did you ever ac1 in ways that your parents thought were objectionable? [ -0.84} I 2 3 4

never ot seldom sometimes very often 7. Do you often do well at different things that you try"! (054]

I 2 3 4 never or seldom sometimes very aften

8. Not heing careful enough has gonen me into b'Oub)e at times. l - 0.55) 1 2 3 4

never or seldom sometimes very often 9. When it <::ornes to achieving things that nre important tome. I find that I don't perform as well as I ideally would likc lodo. {-0.Sl j

I 4 never true sometimes true very o·ften true

10. I feel lik.e I have made progress toward being successfu1 in my life. (0.81] I 2 3 4

certainly fal~e c.'enainly true 11. l have found very few hobbiesoractivi1ies inmy li.fe that capture my interest or motivate me to put effort into lhem. { -0.53]

I 4 certainly fnise cenainly true

Kilde: Higgins et. al (2001). Tal i kantet parentes er factor loadings baseret på oblimin (oblique) rotation. Spørgs­mål/item nr I, 3, 7, 9, JO og 11 er en/aktor, variabel/item 2, 4, 5, 6, 8 den andenfaktor.

Tabel I ovenfor er hentet fra Higgins et. al. (2001), og svarene på de to tørste spørgsmål kan

findes i tabellen. Tabellen viser de 1 I generiske spørgsmål, som samlet skaber to måleskalaer til

måling af graden af promotion-/prevention-type. Spørgsmålene er samlet benævnt 'scale con­

struct for RFQ' (Regulatory Focus Questionnaire) og med henvisning til upubliceret manuskript

fremhæves det, at dette construct tilsammen danner en måleskala for hhv. graden afpromotion­

og prevention-type. Uantastet at der måles på fem forskellige ordinale skaler fra I ti l 5, så be­

stemmes de to dimensioner ved en oblique faktoranalyse (oblimin rotation) og det konkluderes at

Promotion scale items er variable I , 3, 7, 9, 10 og li

Prevention scale items er variable 2, 4, 5, 6 og 81

11 Faktorloadings er præsenteret i de kantede paranteser. Der er skiftet retning i måleskalaen for de enkelte spørgs­mål, så fortegn i tabel l udtrykker ikke umiddelbart t ilknytning til de to faktorer. Således er spørgsmål/item 5 vendt i forhold til spørgsmål 6.

239

Page 249: Symposium i anvendt statistik 2018

De to dimensioner identificeres ved to faktorer (baseret på egenværdi-kriteriet) og der er efter­

følgende fundet 'good internal reliability' ved en confirmative factor analysis (a=0.73 for den

estimerede promotion scale og a=0.80 for den estimerede prevention scale). Præsentationen af

disse relativt robuste resultater i Higgins et. al. (2001) følges op i Van Dijk et. al. (2011), hvor

regulatory focus defineres som et tradeoff mel lem sikkerhed og vækst, mellem vaner og idealer

og mellem vind/ikke-vind og tab/ikke-tab - en afVejning, der har særlig betydning for hvordan

den individuelle reaktion vil være på en ekstern påvirkning ... et eksempel på en eksterne på­

virkning er hvorledes feedback på en arbejdssituation fremføres. Den individuelle reaktion måles

dels ved bløde mål som motivation, men Van Dijk et. al. (2011) opstiller også hypoteser for for­

ventes ændring i performance: Hypoteserne er alle af typen

Performance = f ( regulatory focus, task-type, control)

hvor performance dels måles som ændring i effektivitet og dels ændring i kreativitet, regulatory

focus måles på den to-dimensionelle promotion-/prevention-type scale og task-type er dosering

afextraneous variable i form af feedback.

Feedback effekter Den nok vigtigste begrundelse for anvendelse af eksperimenter til analyser af ovenstående karak-

ter er, at det er muligt at estimere kausale sammenhænge med fuld kontrol over 'extraneous vari­

ables' . I ovenstående eksempel fra Van Dijk et. al. (2011) er de opstillede hypoteser baseret på at

der forventes en ændring i effektiviteten både hvad angår generel produktivitet og også kreativi­

tet, og denne ændring er afhængig af persontypen (regulatory focus) hvilken feedback (task-type)

der anvendes. Den vigtigste af disse effekter er, at positiv feedback ikke altid er den bedste form

for feedback, hvis effektiviteten skal øges, fordi personer, der overvejende er domineret afpre­

ventionfokus, faktisk vil ØGE arbejdsindsatsen ved negativ feedback . . . for at øge sikkerheden i

jobbet og for at undgå konflikter kan negativ feedback fremme arbejdsindsatsen.

Spørgsmålet er blot hvorledes undersøgelsesdesignet skal designes for at kunne måle disse effek­

ter? I en ideal verden uden budgetmæssige restriktioner vil et eksperimentelt design være det

åbenlyse valg:

240

Page 250: Symposium i anvendt statistik 2018

i) I første runde defineres et individs regulatory focus. Denne del kan evt. bestemmes

inden eksperimentet f.eks. ved hjælp afen traditionel spørgeskemaundersøgelse

ii) Eksperimentet er et klassisk pre-test post-test eksperimentelt design, hvor hvor effek­

ten måles som ændringen i performance-variablen.

Der er stadig to grundlæggende problemer, som kan tilskrives både MT og IT (testing effects) og

MA (maturation effect) i klassifikationen taget fra Malhotra et. al. (2012): Der kan kun laves eet

eksperiment per individ, da effekten af efterfølgende doseringer ikke kan antages at være uaf­

hængige af den første dosering (negativ feedback i første omgang må forventes at påvirke effek­

ten af en evt. efterfølgende positiv feedback). Samtidig er en væsentlig ulempe ved eksperimen­

ter, at de tager tid, og det er ikke kun et omkostningsmæssigt problem; men det må også forven­

tes at efterfølgende individer, der deltager i eksperimentet, vil være påvirket af, at de på forhånd

har kendskab til at task-type (typen af feedback) ikke er valgt afobjektive grunde, men mere at

eksperimentelle årsager.

I det efterfølgende kapitel viser, hvorledes en konkret undersøgelsesdesign baseret på spørge­

skemadata fra danske individers måler effekten af feedback og afhængighed af regulatory focus;

motivet for det valg er både betinget afressourcemæssige hensyn og for at minimere evt. MT/IT

og MA effekter.

4 Eksperimentelt design for dansk undersøgelse Formålet med den danske undersøgelse er at undersøge, hvorvidt det stadig (2017) er rimeligt at

definere danskere alene på deres syn på regulatory focus og hvorvidt der er sammenhænge mel­

lem graden afpromotion-/prevention-type og faktisk adfærd, når det pågældende individ udsæt­

tes for et helt almen hændelse på arbejdsmarkedet - feedback. Er det således at ros (positiv feed­

back), kritik (negativ feedback) eller neutrale kommentarer (feedback) påvirker det pågældende

individs evne til at arbejde?

Hypoteserne følger det teoretiske oplæg fra Higgins m.fl.

H 1: Er det rimeligt at antage eksistensen af en promotion og en prevention dimension?

H2: Påvirkes effektiviteten affeedback?

H3: Påvirkes kreativiteten af feedback?

H4: Er effekten af positiv og negativ feedback forskell ig?

241

Page 251: Symposium i anvendt statistik 2018

Problemstillingen er kausal i den forstand, at performance forklares ved dels individets forhold

til omgivelser (regulatory focus), typen af påvirkning (lask type) og "andre" forhold (extraneous

variables), og derfor er det åbenlyse valg afresearch design et eksperimentelt. Et "true eksperi­

mental design" vil være meget vanskeligt at a:tVikle og et statistical laboratory experiment, se

Engineering Statistical Handbook (2017), vil først og fremmest være ressourcekrævende, men

problemet ved metoden er også, at den samlede effekt af 'treatment', jf. kapitel 2 og 3, er en

række andre forhold, der påvirkes af det valgte design:

'treatment'= X+ H MA +MT+ IT l + SR + MO

Problemet med mulige effecter afMA (maturation), MT (main testing effect) og IT (interactive

testing effect) er, at de alle må forventes at være større end ved en traditionel spørgeskemameto­

de: Respondenterne vil - givet de kun spørges en gang - ikke kunne påvirkes af erfaringer fra

lignende spørgsmål (MA og MT), og interaktionseffekter (IT) kan undgås hvis der anvendes et

faktorielt design baseret på simpel tilfældig udvælgelse af respondenter i antallet afkategorier

(skabt af forskellige treatments).

På denne baggrund er valgt at analysere sammenhængene ved anvendelse af en online spørge­

skemaundersøgelse med randomiseret kontrol aftreatment:

a. Populationen er studerende på de første to år af de erhvervsøkonomiske uddannelser ved

Aarhus Universitet i foråret 2017. Der opnås homogenitet med hensyn til alder, medens

valget af erhvervsøkonomistuderende udelukkende er af bekvemmelighed. Den samlede

sampling-frame er dermed ca 1.500, og der er kommet svar fra 783 studerende i indsam­

lingsperioden 25. marts til 8. april 2017; det giver en svarprocent på ca 52%. Der er an­

vendt lodtrækningspræmier (10 biografbilletter) som redskab til at hæve svarprocenten.

b. Spørgeskemaet er på engelsk.

c. Performance måles to gange (før og efter treatment) og på to måder:

o Effektivitet måles ved at test algebraiske færdigheder: Ti spørgsmål (stigende

kompleksitet) skal løses på begrænset tid (3 minutter, ingen check af tid) før tre­

atment og ti nye spørgsmål efter treatment. Typisk eksempel er 6/2+4/2+ 1-(-2)=4

- sandt/falsk? Fordelingen af rigtige svare er en smule venstre-skæv for denne

undersøgelse (gennemsnitligt antal rigtige svar er ca 8)

242

Page 252: Symposium i anvendt statistik 2018

o Kreativitet måles ved at vurdere mulige anvendelse af et kendt objekt. Typisk ek­

sempel: Angiv mulige anvendelser af en papirclips (2 minutter, ingen check af

tid). Antallet af svar er igen venstre-skæv, men med ca. samme gennemsnit.

d. Treatment (task type) er tre forskellige former for feedback mellem første og andet test:

• Negativ feedback- uanset antal rigtige svar (effektivitet) eller anvendelser

(kreativitet) gives der negativ feedback ("Not so good ". ")

• Neutral feedback - uanset antal rigtige svar (effektivitet) eller anvendelser

(kreativitet) gives der negativ feedback (''Your result is registered. Please

proceed farther")

• Positiv feedback - uanset antal rigtige svar (effektivitet) eller anvendelser

(kreativitet) gives der positiv feedback(" Well done! ... ")

e. Randomisering sker efter fødselsdag: Der er 6 forskellige kombinationer af performance

(effektivitet og kreativitet) og task-type (3 typer af feedback) og respondenter aktiverer

det link, der svar til deres fødselsdag (link-I: fødselsdag den 1.-5. i en måned, link-2 ... ).

Ingen check af fødselsdag.

Tabel 2 viser antallet af indkomne svar i indsamlingsperioden; den 27. marts er første dag med

registrering af besvarelser og det ses, at valget af fødselsdag som grundlag for randomisering

fungerer efter hensigten og det på trods af at der ikke checkes for validitet af fødselsdag. Antallet

af besvarelser, der kan bruges i de endelige analyser af effekt af feedback, er på 509 observatio-

ner.

Tabel 2 Randomiseret design - antal svar og kvalitet af data

Responses, 27.03.2017 Responses, 03.04.2017 Responses, 08.04.2017 Questionna 1re

OK I Not OK Total OK Not OK Total OK Not OK Total

Efficiency - Negative 43 33 76 68 39 107 83 57 140 Efficiency - Neutral 36 29 65 56 34 90 64 42 106 Efficiency - Positive 44 32 76 69 32 101 81 39 120 Creativity - Negative 40 40 80 68 44 112 81 57 138 Creativity- Neutral 55 26 81 I 84 30 114 98 36 134 Creativity - Positive 53 32 85 83 32 115 102 43 145

Total: 271 192 463 428 211 639 509 274 783

243

Page 253: Symposium i anvendt statistik 2018

Fødselsdato er valgt for at få lige mange respondenter i hver kategori. Andre kriterier for at for­

dele respondenter på de seks grupper i tabel 2 er vurderet mindre egnede2; men resultatet er un­

der alle omstændigheder en smule specielt: De to sidste link, som stod anført i kontakt-mailen, er

også de to sidste grupper i tabel 2, og der har i hele perioden været en mindre overrepræsentation

af disse to grupper.

Nogle resultater

Analyserne af effekterne affeedback er baseret på en to-trins procedure til (i) dels beregning af

Regulatory Focus og (ii) effekten affeedback. I tabel 3 er vist faktoranalysen, som ligger til

grund for estimation af individets focus med hensynprevention og promotion (Regulatory Fo­

cus). Faktoranalysen er ovemet signifikant, og baseret på communalities ses det, at alle 11 gene­

riske spørgsmål ser ud til at fungere godt. Eneste undtagelse er spørgsmål 7, som evt. skal udela­

des af de efterfølgende analyser (spørgsmålet omhandler respondentens vurdering af egen ind­

sats).

Tabel 3 Estimation af"Regulatory focus" baseret på alle observationer

1!Final Gommunality Estimates jRotated Factor Loading s_1 0,48561 Factor 1 Factor 2 s_2 0,72646

s_2 0,851029 0,015270 s_3 0,64786 s_4 0,64484 s_8 0,836595 -0,005315 s_S 0,42519 s_3 0,807057 -0,044558 s_6 0,40191

s_4 0,804563 -0,025761 s_7 0,32714 s_8 0,69924 s_6 0,628543 0,047790 s_9 0,65456 s_9 -0,035825 0,810984 s_10 0,46936

s_1 -0,102005 0,697144 s_11 0,46615 s_10 -0,017774 0,686220

~Variance Explained by Each Factor s_l 1 -0,077030 0,684270 Factor Variance Percent Cum Pen:ent s_5 0,087619 0,639533 Factor 1 3,1722 28,838 28,838 s_7 0,142048 0,543354 Factor 2 2.8029 25,481 54,319

Note. Variabel S_3 og S_5 er byttet om i forho ld til variabelnavne i tabel 1

2 Lokale fodboldkyndige har således advaret mod at bruge fudselsmåned, fordi det i deres analyser er en klar over­vægt at særligt dygtige personer født i de første måneder af året.

244

Page 254: Symposium i anvendt statistik 2018

Tabel 4 viser de same estimationer, som er præsenteret i tabel 3, men kun baseret på den sidste

grupper afrespondenter (kreativitets-test kombineret med positiv feedback). Stikprøven er min­

dre end det samlede datasæt i tabel 3, men grundlæggende er resultaterne helt lig resultaterne fra

det fulde sample. Den uforklarlige overrepræsentation af grupperne 5 og 6 vurderes ikke at have

effekt på de efterfølgende analyser.

Tabel 4 Estimation af "Regulatory focus". Subsample: Kreativitet & positiv feedback

il Final Communality Estimates JRotated Factor Loading s_1 0,32784 Factor 1 Factor 2 s_2 0,77386 s_3 0,63459 s_2 0,881104 -0,037291 s_4 0,68741 s_4 0,821097 0,076283 s_S 0,33736 s_8 0,808641 -0,036886 s_6 0,43247 s_l 0,35683 s_3 0,797489 -0,075605 s_8 0,65174 s_6 0,650480 0,065633 s_9 0,66146 s_9 -0, 198336 0,800534 s_10 0,56284 s_11 0,56360 s_l 1 -0,088772 0,750720

s_10 0,032583 0,747599 1!Variance Explained by Each Factor s_S 0,047595 0,576069 Factor Varlanm Pen:ent Cum Pen:ent s_t -0,003203 0,572751 Factor 1 3,2437 29,488 29,488 sJ 0, 176415 0,560396 Factor 2 2,7560 25,055 54,543

Note. Variabel S_3 og S_5 er byttet om i forhold til variabelnavne i tabel I

Analyser af feedbacks effekt på performance præsenteres ikke her - i fremtidige analyser vil data

blive udvidet med flere målinger og i forhold til Higgins et. al. (2001) og Van Dijk et. al. (2011):

Spørgsmålet om typer af performance er her udvidet til at omfatte både generelle færdigheder

(fierdighedsregning) og kreative færdigheder (associationer i forhold til et produkt); men også

typen af feedback er udvidet med den neutrale kategori, således at der er tre typer af feedback.

Den sidste udvidelse er introduceret for at understrege det forhold, at neutral feedback måske

også kan skabe ændringer i performance - i givet fald er dette resultat blot en konstatering af, at

Hawthorne-effekter stadig eksisterer3•

3 Hawthorne effekter dækker over en række effekter, som er fundet i flere organisatoriske studier - alene det forhold, at et individ ved at det bliver iagttaget kan skabe ændret adfærd.

245

Page 255: Symposium i anvendt statistik 2018

Indtil disse analyser er klar, må vi stille os med nogle klare sammenhænge, som denne analyse

bekræfter:

Individer kan beskrives ved deres Regulatory Focus på enprevention skala og en promo­

tion skala - personer, der scorer højt på den ene skala scorer typisk lavt på den anden.

Individer, der scorer højt på en prevention skala søger gennem deres arbejdsindsats at

undgå konflikter

Individer, der scorer højt på en promotion skala, er meget fokuseret på at vise, hvad de

selv har opnået og laver

Negativ feedback (læs: kritik) er på ingen måde velkommen hos individer med stærk

promotion focus, og der findes i lighed med de klassiske studier en negativ feedback ef­

fekt. Positiv feedback virker omvendt til at forøge performance for denne gruppe af per­

soner, men denne effekt er knap så udtalt.

Negativ feedback er heller ikke velkommen for individer, der scorer højt på en preventi­

on skala; men her er effekten af den negative feedback positiv på performance: Konflik­

ter, der skaber ustabilitet, søges undgået ved at øge performance. Positiv kritik har endnu

mindre effekt for denne gruppe i forhold til gruppen af 'promotion-typer'.

6 Afrunding

De præsenterede analyser tager udgangspunkt i at eksperimenter og tilhørende eksperimentelle

designs er meget udbredt, når der indsamles data til videnskabelige artikler. Den konkrete pro­

blemstilling problematiserer den traditionelle opfuttelse af, at eksperimenter stort set fjerner alle

andre effekter fra kausale sammenhænge. I dette tilfælde er forholdet, at der skal testes for effek­

ter, som ikke er helt trivielle: Det at modtage feedback (læs: kritik) er ikke altid let, og derfor vil

adfærden også påvirkes uanset valg af metode.

Det nydelige ved resultaterne baseret på Higgins klassifikation af individer er, at der er en klar

intuitiv forståelse af resultaterne . .. når der kigges på feedback af det enkelte individs indsats

med formål at øge effektiviteten, så er det ikke altid klart, om der er pisk eller gulerod, der skal

vælges!

246

Page 256: Symposium i anvendt statistik 2018

Referencer Engineering S tatistics Handbook (2017), What is experimental design?, NIST /SEMATECH e­Handbook ofStatistical Methods, http://www.itl.nist.gov/div898/handbook/, cpt 3.

Higgins ET, Shah J, Friedman R. (1997), Emotional responses to goal attainment: Strength of regulatory focus as moderator. Journal af Personality and Social Psychology 72, pp. 515-525.

Higgins, E. T., R. S. Friedman, R. E. Harlow, L. C. ldson, 0. N. Ayduk & A. Taylor (2001), Achievement orientations from subjective histories of success: Promotion Pride versus Preven­tion Pride, European Journal af Social Psychology, 31, pp. 3-23.

Malhotra, N . K., D. F. Birks & P. Wills (2012), Marketing Research. An applied approach, Pear­son Education Limited, Essex, 2012

Van Dijk, D. & A. N. Kluger (2011) Task type as a moderator ofpositive/negative feedback ef­fects on motivation and performance:A regulatory focus perspective, Journal ofOrganizational Behavior, 32, pp. 1084-1105

247

Page 257: Symposium i anvendt statistik 2018

Educational choice and inter-regional migration The causal effect of secondary education on migration out ofless-urban areas

Elise Stenholt Sørensen ' and Anders Holm23

Symposium i anvendt statistik, 22-24 januar 2018

The aim ofthis study is to identify how educational choice intluences migration out ofrural areas

among youth. The net-migration from rural to large urban areas has increased significantly during the

last decades (United Nations, 2015; Det Økonomiske Råd, 2015; Van Der Gaag & Van Wissen, 2008).

Understanding the reasons behind this may, be interesting to determine the welfare for both those

Jeaving and those staying behind. Looking at the Danish case, we see that the main driver behind out

migration out of rural areas is an increasing share of young adults migrating towards the large urban

areas. At the same time, the share of young adults completing an opper secondary education have

increased significantly. Figure I below, illustrates how the growth in youth migration towards urban

areas largely follow the same trend as the growth educational attainment, from 1988 to 2013.

Figure I: Growth in migration and educational attainment among youth, 1988 - 2013

:120%

'il ~ 18%

j 16%

1. 14% 'il 4-< 0

g 12%

~ .6 .§ 10%

~ ·~ 8%

8

-" .... , .... ....~

.... ~

~ --

- - Share of 18-24 years old, moving to a large city

--Share of24 years old with an upper sec. education

55%

~

50% .6

" ·~ 45% " "" " u

1;l 40% il

0. 0.

" 4-<

35% 0 c 0

""E 0.

30% s 0 u

25%

The dashed line in Figure I shows the share of young adults, 18-24 years old, who moves to a large

city, out ofthe total share ofyoung adults living in a Iess urban area. A Jess urban area is defined as

cities with Jess than 20.000 inhabitants and rural areas. We find that youth out-migration from Jess

1 Kraks fond - Institute for Urban Econom ic Research. 2 Department ofSociology University ofWestem Ontario ' Department ofEconomics, University of Western Ontario

248

Page 258: Symposium i anvendt statistik 2018

urban areas has increased from I I% in I 988 to 18% in 20134• During the same period, youth

migration towards Jess urban areas have been constant at a rate about 3 - 4 %.

Simultaneously, the share ofyoung adults, aged 24 years, completing upper secondary

educations, have increased from 32 % in 1988 to 50% in 2013 (the green solid line in Figure 1 ).

Therefore, is it conceivable to think, that growth in educational attainment among young adults is one

ofthe main drivers behind the increasing youth migration towards the larger cities. From a regional

policy perspective, it is highly relevant to reveal, ifthe correlation between education and migration, is

in faet a causa! relationship. Because if so, an perhaps unintended consequence of investing in

increased access to upper secondary educations in rural areas, could be that young adults move away

from such areas, after graduation.

The empirical evidence of the relationship between education and inter-regional migration is limited.

Previous research of inter-regional migration has primarily focused on the adult population in the

labour force, trying to describe which local factors that attract and maintain highly skilled workers in a

region (see among others Detang-Dessendre, Goffette-Nagot, & Piguet, 2008; Mellander, Florida, &

Stolarick, 20 I 1 ).

Several empirical studies describe inter-regional migration patterns in the transition between education

and work. More specifically the studies examine the location choices ofrecent university graduates

and find that the majority ofthe university graduales stay in the city where they completed their

education.

Very few empirical studies examine youth migration, which is surprising considering

that initial moving decisions might be of greater importance than college graduation for people's

settlement choices later in life. One explanation of this gap in the literature, could be the Jack of

available data. Youth migration has not received much attention in micro-level research and the

empirical studies ofmigration decision are descriptive studies. No previous study has, to our

knowledge, described the causa! effect ofupper secondary education on migration among young

adults. Therefore, policy makers still Jack evidence about how Jocal education policies affects

migration and the regional population distribution. The aim of our paper is to contribute toward filling

this gap.

This study estimates the causa! effect of secondary education on moving out of non-urban areas, using

administrative data from Statistics Denmark in combination with detailed measures of distance to

upper secondary educational institutions. The data has rich information about five birth cohorts of

Danish 15-years old, who grew up in a Jess urban area. The sample encompasses five different birth

cohorts: all individuals bom from 1983 - 1987, who at age 15 were resident in a Jess-urban area in

249

Page 259: Symposium i anvendt statistik 2018

Denmark. We define a Jess-urban area as a city with Jess than 20.000 inhabitants in 2016 or a rural

area. This implies that the 33 !argest cities in Denmark is defined as "urban areas". Our sample

represents a significant share ofDanish youth as 60% ofthe 15-years old resided in a Jess urban area,

during the time period we analyse.

It is very natura! to assume that the decision to move and the decision to complete upper secondary

education (US) shares the same unobserved characteristics and therefore, that from the perspective of

the econometrician, the decision to complete upper secondary education is an endogenous variable.

Therefore, we resolve to the instrumental variable methods to generate exogenous variation in whether

the student is observed to have completed upper secondary education. More formally we estimate the

foliowing system of equations:

US = a+bz+cx +u

(I)

Move = a+ J)VS+ox+ e

where z is an instrumental variable (here distance to upper secondary education) and x is a vector of

explanatory variables. The variables u and e are error terms that may be correlated and a, b, c, u, p, o are regression parameters to be estimated together with error variances. As both US and Move are

binary variables, both equations are linear probability models.

The estimation results show that young individuals, who graduate from upper secondary

education have a significantly higher probability of moving to a large city, compared to those dropping

out or completing a vocational education. When we instrument the probability ofmoving, the positive

effect of upper secondary education becomes smaller, but remains statistically significant.

Our instrumental variable is distance to US. In studying return to college education many researchers

have used a similar instrumental variable, distance to college, e.g. (Cameron & Taber, 2004; Card,

1995; Carneiro, Heckman, & Vytlacil, 2011; Currie & Moretti, 2003). However, our identification

strategy relies on the assumption that, conditional on the variables in x, z is independent of u and e. If

same parental and student characteristics are unobserved but relevant in both equations in the system

( 1 ), then z being independent of u and e amounts to assuming that unobserved parental and student

characteristics are independent of distance to US. This may not be a tenable assumption.

To rectify this problem, we employ a robustness check. We make use ofthe faet, that same of

the students in our data are twins. We therefore estimate same sex twin fixed effect models to estimate

the treatment effects of US on Mave. The same sex twin fix ed effect (FE) estimate, purge all family

fix ed effects and, to the extend that same sex twins share genetic make up, also ( some of the) genetic

effects. We find a significant fixed effect estimates for the same sex twins. Hence, same sex twins that

differ with regard to having completed and upper secondary education, but have equal distance to 250

Page 260: Symposium i anvendt statistik 2018

school, also show differences in the propensity to move out of an rural area and into a larger city. This

suggests that completing an upper secondary education increases young individual ' s probability of

moving to an urban area significantly. Such knowledge about the mobility ofyoung people can help to

inform decisions about where to locate new educational institutions.

References

Cameron, S. V" & Taber, C. (2004). Estimation ofEducational Borrowing Constraints Using Returns

to Schooling. Journal of Political Economy, 112(1 ), 132- 182.

Card, D. (1995). Using Geographic Variation in College Proximity to Estimate the Return to

Schooling. Aspects of Labour Market Behavior: Essays in Honour of John Vanderkamp, 201-

222.

Carneiro, P" Heckman, J. J" & Vytlacil, E. J. (2011). Estimating Marginal Returns to Education. The

American Economic Review, 101(6), 2754-2781.

Currie, J" & Moretti, E. (2003). Mother's Education and the lntergenerational Transmission ofHuman

Capital: Evidence from College Openings. The Quarterly Journal of Economics, 118( 4),

1495- 1532.

Det Økonomiske Råd. (2015). Kapitel IVY derområder i Danmark. In Dansk Økonomi, forår 2015

(pp. 231-328). Det Økonomiske Råd.

Detang-Dessendre, C" Goffette-Nagot, F" & Piguet, V. (2008). Life Cycle and Migration to Urban

and Rural Areas: Estimation of a Mixed Logit Model on French Data*. Journal of Regional

Science, 48(4), 789-824.

Mellander, C., Florida, R., & Stolarick, K. (2011). Here to Stay - The Effects ofCommunity

Satisfaction on the Decision to Stay. Spatial EconomicAnalysis, 6(1), 5- 24.

United Nations. (2015). World Urbanization Prospects: The 2014 Revision. New York: United

Nations, Department ofEconomic and Social Affairs, Population Division.

Van Der Gaag, N" & Van Wissen, L. (2008). Economic Determinants of Internal Migration Rates: A

Comparison Across Five European Countries. Tijdschrifi Voor Economische En Sociale

Geografie, 99(2), 209-222.

251

Page 261: Symposium i anvendt statistik 2018

Do your neighbours matter? Evidence on peer effects from quasi-experimental data

Georges Poquillon" Bence Boje-Kovacsb

'University ofEssex, Wivenhoe Park, Colchester C04 3SQ, UK

b Kraks Fond, Institutefor Urban Economic Research, Frederiksholms Kanal 30, 1220 Copenhagen K. Derunark

How much do our neighbours impact us? The question of the influence of peers on individuals' behaviour has been widely studied in the economic and sociological literature. Evidence ofpeer effects have been observed in crime (Damm and Dustman, 2014), labour market outcome (Damm, 2014), and health (Eisenberg et al" 2013) among others. Understanding how interactions with people living in the same area affect one's behaviour has become a key issue for policymakers, as the clustering of deprived households in specific neighbourhood has been identified as perpetuating poverty cycles (Weatherall et al" 2016). Renee, a growing literature on neighbourhood effects has emerged over the last decades. However, the lack of reliable data, as well as unresolved endogeneity issues due to individuals' self-selection into neighbourhoods, have often been major hindrances to clearly isolate neighbourhood from individual effects (Cheshire et al" 2016). Most of the studies that managed to propose a proper identification strategy relied on natura! experiments, usually involving very specific populations such as refugees m Denmark (Damm, 2009, Damm, 2014, and Damm and Dustman, 2014). Yet, to our knowledge, no such research has been carried out on a large scale.

ldentification strategy: This paper therefore aims at investigating how individuals are influenced by their neighbours' characteristics in Denmark. In particular, we look at how exposure to peers affects one's socio-economic outcomes. The main challenge of peer effect identification is to deal with reflexion and correlated effects (Manski, 1993; Moffitt, 2001). In order to cope with these obstacles, we rely on forced moves of households caused by the demolition of the building they live in. The advantages of this method are twofold. First, the demolition decision is orthogonal to residents' characteristics, which allows us to get rid of the correlated effects, as the length of exposure of individuals to their peers is exogenous. Second, it enables to create partially overlapping groups, as in Giacomo De Giorgi et al. (2010), which solves the reflexion problem. Indeed, except in the case where two individuals move in a building on the same day, exposure to peers will vary at the household level as two households which moved in at two different periods will not have been exposed to the same peers for the same period of time. This problem can be described in the linear model that follows:

252

Page 262: Symposium i anvendt statistik 2018

Y; is the outcome of interest of individual I. 2- Lh l;hYh is the weighted average of N;

individual i's peers y , l;h corresponds to the time individuals i and h spent together in the same building and N; is total number of people who lived during individual i's stay. The same pattem applies to Lh l;hxh, where x is aset of covariates.

Data: In our analysis, we use different sets of Danish administrative register data. First, we identify demolished buildings by following each building's unique identifier between 2004 and 2012. When the building's unique identifier disappears from one year to the next in the register data, it indicates that the building has been demolished. This method yields 9331 individuals in Denmark between 2004 and 2012. Second, we identify the neighbourhoods in which demolished buildings are located by exploiting administrative register data on the full Danish population. Each demolished building is linked to a neighbourhood, and we define as peers all the individuals in the neighbourhood who don't live in the demolished building. Third, we merged individuals to their socio-economic characteristics using administrative register data. Tue Danish statistics registers enable us to trace back people's information from 1986 up to 2014. Thus, we are able to calculate the length of exposure of residents in demolished buildings to its neighbours before demolishment and measure socio­economic outcomes after demolishment.

253

Page 263: Symposium i anvendt statistik 2018

The end of the Rasch model ?

Karl Bang Christensen

Departrnent ofBiostatistics

University ofCopenhagen

The Rasch model (Rasch, 1960; Fischer, Molenaar, 1995; Christensen, Kreiner,

Mesbah, 2013) is a statistical model for ordinal categorical data that was developed in the

context of educational testing in the late 1950's by Danish mathematician Georg Rasch. It has

been used extensively in educational and health research and it has a number of desirable

properties that makes it a candidate fora gold standard ofvalidity. A reading test, or any

similar measurement instrument, should be shown to measure the concept it is intended to

measure. This is done when creating a new instrument, translating an instrument to another

language or using an instrument in a new patient population. A key feature in validation is

evaluation ofthe fit of observed data to a latent variable model, typically an item response

theory (van der linden & Hambleton, 1997) model or a confirmatory factor analysis (CFA;

Joreskog, 1969) model.

In the original book (Rasch, 1960) the first seven chapters described

measurement principles with very little reference to mathematics, while three chapters

described the mathematics. So from the very start the model did not focus on mathematics.

The Rasch model was accepted and widely used in educational (and later health)

measurement largely due to contributions in the psychometric literature by Erling Andersen

and others. The mathematical theory underlying the Rasch model makes it a special case of an

item response theory model, but there are important differences in the interpretation of the

model parameters and its philosophical implications that separate proponents ofthe Rasch

model sharply from the item response theory modeling tradition. A central aspect ofthis

separation relates to the requirement of specific objectivity, a defining property of the Rasch

model and, according to Georg Rasch, a requirement for successful measurement.

Stand-alone statistical software programs (Andrich, 2015; Linacre, 2011) have

made Rasch analysis feasible for non-statisticicans and is the basis ofseveral teaching efforts.

The use ofsoftware to create graphs and tables, and the interpretation ofthese play a

fundamental role, while the more technical mathematical aspects ofthe Rasch model play a

smaller role. Parallel to the practice and teaching ofthe Rasch model based on stand-alone

254

Page 264: Symposium i anvendt statistik 2018

software packages Rasch methodology has slowly, but steadily been made available in the

mainstream software packages R (Mair, Hatzinger, 2007; Kiefer, Robitzsch, Wu, 2016), SAS

(Christensen, 2013), and Stata (Hardouin, 2007). The implementations in Rare by far the

most widely used and are for the most part methodologically sound. Until now these

implementations have found Jess use in health research than they deserve.

A drawback ofthe widespread use ofstand-alone software packages is that new

methodological developments have found their way into practice very slowly, if at all.

Furthermore the stand-alone programs are developed by small groups of people, and it is

uncertain if they will continue to be developed.

Thus, the current state ofthe Rasch model is a discrepancy between practice and

state-of-the-art implementations exists. On one hand established methodology, existing

courses, and tradition is based on archaic stand-alone software packages that are Iikely to

disappear within the next 5-10 years. On the other hand modem implementations ofRasch

methodology in Ris available, but these are used rather little by the traditional users ofRasch

measurement validation. We offer number of examples on how sub-optimal statistical

methodology persists, and illustrate how the arguments for using the Rasch model stray

further and further away from statistics employing philosophy of science, citing Kuhns ( 1970)

concept of"paradigm shifts" and discussing "fundamental measurement," like that found in

the physical sciences.

We discuss the need to create new education efforts that combines the

knowledge accrued by the existing Rasch measurement tradition with up to date software

implementations that will not become obsolete.

255

Page 265: Symposium i anvendt statistik 2018

References

van der linden, W. J. , & Hambleton, R. K. (1997). Handbook of Modem Item Response

Theory. (W. J. van der Linden & R. K. Hambleton, Eds.). New York, NY: Springer New

York. https://doi.org/10.1007/978-1-4757-2691-6

Joreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor

analysis. Psychometrika, 34(2), 183-202.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests.

Copenhagen: Danish National Institute for Educational Research.

Fischer, G. H., & Molenaar, I. W. (1995). Rasch Models. (G. H. Fischer & I. W. Molenaar,

Eds.). New York, NY: Springer New York. http:l/doi.org/10.1007/978-1-4612-4230-7

Christensen, K. B., Kreiner, S., & Mesbah, M. (2013). Rasch Models in Health. (K. B.

Christensen, S. Kreiner, & M. Mesbah, Eds.), Rasch Models in Health. Hoboken, NJ

USA: John Wiley & Sons, Ine. http:/ldoi.org/10.1002/9781118574454

Andrich, D., Sheridan, B. S" & Luo, G. (2015). RUMM2030: Rasch uni-dimensional models

for measurement. Perth, Australia: RUMM Laboratory. www.rumrnlab.com

Linacre, J.M. (2011). Winsteps® (Version 3.71.0) [Computer Software]. Beaverton, Oregon:

Winsteps.com. Retrieved January l, 2011. Available from http://www.winsteps.com

Christensen, K. B. (2013). Conditional Maximum Likelihood Estimation in Polytomous

Rasch Models Using SAS. ISRN Computational Mathematics, 2013(i), 1- 8.

http:/ldoi.org/10.1155/2013/617475

Hardouin, J.-B. (2007). Rasch analysis: Estimation and tests with raschtest, Stata Journal,

StataCorp LP, vol. 7(1), pages 22-44, February.

Mair, P" & Hatzinger, R. (2007). CML based estimation of extended Rasch models with the

eRm package in R. Psychology Science, 49(1 ), 26--43.

K.iefer, T" Robitzsch, A., & Wu, M. (2016). TAM: Test Analysis Modules (Version 1. 995-0)

[Computer software]. Retrieved from http://CRAN.R-project.org/package=TAM

Kuhn, T. 1970. The structure ofscientific revolutions. 2nd Ed. Chicago: University of

Chicago Press.

256

Page 266: Symposium i anvendt statistik 2018

Assessing observable donor and produet characteristics as risk factors for adverse health outcomes in blood transfusion recipients

By Klaus Rostgaard

Department ofEpidemiology Research, Statens Serum Institut, Copenhagen. <[email protected]>

ABSTRACT

Studying observable donor and produet characteristics ( e.g. donor sex and age, age of blood produet) as risk factors for adverse health outcomes in blood transfusion recipients is methodologically challenging because the studied characteristics are easily strongly confounded. E.g. the sicker the blood recipient, the more transfusions the recipient will receive, and the more likely it therefore is for the recipient to receive produets with rare characteristics, e.g. receiving blood from a donor who is very old or very young. In simulation studies it is easily demonstrated that we will find rare donor or produet characteristics detrimental to recipient health ifwe do not adjust analyses very carefully. Here we shall present a completely different approach for studying the effect of donor and produet characteristics that are not part ofthe criterion for using said produet, i.e. that the exposure (donor and produet characteristic) is essentially randomly distributed within a class offunctionally equivalent produets. We then study produet survival (time from transfusion to adverse event in the recipient) as a function of exposure conditional on the class of produets only.

Keywords: blood transfusion medicine, confounding, randomization, transmissability

INTRODUCTION

By far the !argest resource for research in hazards associated with blood transfusion is SCANDAT, a Swedish-Danish database comprising all electronic records ofblood donations, blood produets and blood transfusions in the two countries - essentially tallying who donated what to whom when. 1 One line of research in SCANDAT investigates putative associations between observable blood donor/blood produet characteristics and outcome in the blood recipient.2- 8 The blood donor/blood produet characteristic ofinterest may be frequent, e.g. female donor, large produet age (shelf life: time from donation to transfusion), old donor or infrequent, e.g. that the donor developed disease X at some time after donation. In this paper we shall for simplicity only concem ourselves with non-recurrent ( chronic) survival outcomes, e.g. time to death or time to first occurrence of disease X.

257

Page 267: Symposium i anvendt statistik 2018

The )argest methodological challenge in these studies is a multitude ofbiases and confounding factors. 9 E.g. the sicker the blood recipient, the more transfusions the recipient will receive, and the more likely it therefore is for the recipient to receive produets with rare characteristics, e.g. receiving blood from a donor who is very old or very young. Renee these rare characteristics will often appear to be dangerous as exposures ifwe do not adjust properly for e.g. number oftransfusions. The problem is magnified by the sheer size ofSCANDAT. It contains data on 20+ million blood produets from 1.3 million donors transfused into 2.1 million blood recipients.1 Renee in this context the slightest uncontrolled confounding will create statistically significant findings. The researchers working with SCANDAT have often taken on the responsibility of questioning and criticising more "exciting" findings from other studies that this and that blood produet or donor characteristic is detrimental to blood recipient health. The latest study to this effect is a critique by Edgren et al. of a Canadian study by Chasse et al. which found blood from females and from young donors to be associated with increased risks of death as opposed to receiving blood produets with the opposite characteristics. 10•11 Through simulations ofthe Canadian approach and how we would do it ourselves it was demonstrated that in all likelihood the Canadian findings were solely due to incomplete adjustment for the number of transfusions administered to a given recipient - a simple log-linear term is insufficient here. 11

A SUGGESTION FOR NEW METHODOLOGY

The challenges ofvery accurate confounder adjustment makes it highly relevant to examine a completely difterent approach for assessing donor and produet characteristic effects which by design should be immune to many ofthe biases that have to be addressed in the traditional approach. The blood produet survival approach assesses the survival of a blood produet (time from transfusion to occurrence of event in the host, i.e. the blood recipient) as a function ofthe characteristics ofthe blood produet. The idea is that if a certain produet characteristic affects recipient survival (time to event) then this will be reflected in the produet survival as well. Ifwe can then device strata of functionally equivalent produets, so that the use of one produet as opposed to another produet from the same stratum is completely random it then becomes possible to assess the effect ofblood produet characteristics on blood produet health in an un-confounded way, conditional on the produet class (stratum).

This poses at Ieast three problems to be examined: 1) Row do we ensure that our practical definition of strata of "functionally

equivalent blood produets" actually creates the desired objects, where the drawing of one produet as opposed to another produet from the same stratum is equiprobable and uncorrelated with the traditional confounders?

2) What is the optimal way ofmeasuring effects?

258

Page 268: Symposium i anvendt statistik 2018

3) How should we estimate or correct the standard error ofwhatever estimator we choose in 2)? In most conceivable approaches there are correlations between survival times for different produets ending up in the same recipient, so the sampled produet survival times will not be independent of each other given the parameters and the traditional analysis based on independence will yield an inflated estimate of precision.

In our experiments so far strata of"functionally equivalent blood produets" have been defined by unique combinations ofxprodcode, original produet code, ABO donor bloodtype, bloodbank (county/region), and year of donation. Xprodcode is a SCANDAT-specific multi axis variable that records type ofproduct (whole blood, plasma, erythrocytes, platelets, ")and certain characteristics (washed, irradiated, filtered, pooled, " .). The produet is the unit ofblood donated by a known donor, equipped with characteristics ofthe later manufacture ofthe blood unit, the transfusion date and the recipient to receive it and a link to the donation, so that donor and temporal characteristics can be attributed to the produet. Thus a transfused pooled produet is split into its original component parts in this set-up. This set-up yields thousands ofstrata, many ofwhich in typical applications would not contain both variation in outcome and variation in exposure, and as such would add no information in e.g. a stratified Cox regression. However most blood produets would still be in an informative stratum. Sparse data bias is an aften overlooked bias away from the null that may pop up in this context. 12 The potential for this is something to be investigated in simulations. A further !imitation ofthe approach is that we cannot then assess effects of characteristics that are part of the definition of equivalent blood produet strata.

So-called stratified Cox regression where each stratum has its own baseline hazard function seems to be the obvious choice, since this is time to event data, and the Cox regression produces output (Hazard ratios) on a form very familiar to epidemiologists and medical researchers. It also fits into our daily framework of risk factor epidemiology. But one could envisage other mainly non-parametric alternatives.

We have experimented with different methods for dealing with question 3 on actual SCANDAT data (donor-produet-recipient links, dates and strata characteristics). An obvious solution is to avoid the correlations altogether by just picking randomly one blood produet administered to a blood recipient. This works fine for frequent exposures, but is very wasteful for rare exposures and should therefore not be the general methodology. We have not figured out a general unbiased methodology that avoids the correlations and at the same time preferentially samples interesting/rare exposures. Assessing survival of all produets (from informative strata) combined with some type ofrobust variance estimation (e.g. the sandwich estimator) is our hest suggestion at the moment, but is not really pleasant computationally due to the sheer size of the data.

259

Page 269: Symposium i anvendt statistik 2018

SIMULATION RESULTS AND CONCLUSION

In our few experiments to date we have only used stratified Cox regression for the analysis of produet survival. In these experiments and simulations we have seen divergent results from using different methods to address question 3 that could not be replicated in simulations. This suggests that some overlooked bias is lurking in there, and for the moment prohibits any further advances towards the deployment ofthis produet survival methodology.

REFERENCES

1. Edgren, G. et al. The new Scandinavian Donations and Transfusions database (SCANDAT2): a blood safety resource with added versatility. Transfusion 55, 1600--6 (2015).

2. Edgren, G. et al. Duration ofred blood cell storage and survival oftransfused patients. Transfusion 50, 1185-1195 (2010).

3. Ed gren, G. et al. Risk of cancer after blood transfusion from donors with subclinical cancer: a retrospective cohort study. lancet 369, 1724- 1730 (2007).

4. Halmin, M. et al. Length of storage of red blood cells and patient survival after blood transfusion: A binational cohort study. Ann. Intern. Med. 166, (2017).

5. Shanwell, A. et al. Post-transfusion mortality among recipients of ABO­compatible but non-identical plasma. Vox Sang. 96, 316-23 (2009).

6. Edgren, G. et al. Transmission ofNeurodegenerative Disorders Through Blood Transfusion: A Cohort Study. Ann. Intern. Med. 165, 316- 24 (2016).

7. Vasan, S. K. et al. Lack of association between blood donor age and survival of transfused patients. Blood 127, 658--61 (2016).

8. Hjalgrim, H. et al. No evidence of transmission of chronic lymphocytic leukemia through blood transfusion. Blood 126, 2059--61 (2015).

9. Edgren, G., Rostgaard, K. & Hjalgrim, H. Methodological challenges in observational transfusion research: lessons learned from the Scandinavian Donations and Transfusions (SCANDAT) database. ISET Sci. Ser. 12, 191- 195 (2017).

10. Chasse, M. et al. Association ofBlood Donor Age and Sex With Recipient Survival After Red Blood Cell Transfusion. JAMA Intern. Med. 176, 1307-14 (2016).

11. Edgren, G. et al. Association of Donor Age and Sex With Survival of Patients Receiving Transfusions. JAMA Intern. Med. 177, 854-860 (201 7).

12. Greenland, S., Mansournia, M. A. & Altman, D. G. Sparse data bias: a problem hiding in plain sight. BMJ352, il981 (2016).

260

Page 270: Symposium i anvendt statistik 2018

METODISKE OVERVEJELSER I FORHOLD TIL CHARLSONS KOMORBIDITETSINDEKS

SOREN MOLLER

OPEN - Odense Patient data Explorative Network, Odense Universitetshospital og Klinisk Institut, Syddansk Universitet, Odense

I NDLEDNING

Charlsons komorbiditetsindeks ( Charlson comorbidity index, CCI) blev udviklet af Mary E. Charlson og kollegaer i 1987 [l] som et prognostisk indeks, der ud fra en række udbredte komorbiditeter genererer en score, der er associeret med overlevelse. Indekset er konstrueret ved at bestemme den univariate HR for 1-års mortalitet mellem patienter, der har, henholdsvis ikke har, de enkelte (binære) komorbiditeter og efterfølgende at oversætte denne HR til en score mellem 1 og 6 for hver komorbiditet. CCI er herefter summen af disse scores for de oprindeligt 19 inkluderede komorbiditeter, resulterende i en værdi mellem 0 og 37. På baggrund af forbedret prognose for nogle af de inkluderede sygdomme udviklede Rude Quan og kollegaer i 2011 en opdateret udgave af CCI [4] som kun inkluderer 12 komorbiditeter og resulterer i en totalscore mellem 0 og 24. Den opdaterede CCI er efterfølgende blevet valideret på store patientkohorter, og vi vil i denne artikel kun forholde os til denne opdatede udgave, hvis bagvedliggende prævalenser og HR (fra [4]) og de deraf afledte scores præsenteres i nedenstående tabel:

Komorbiditet Prævalens (%) HR Score Congestive heart failure ( CHD) 5,0 1,91 2 Dementia 3,2 2,39 2 Chronic pulmonary disease 7,1 1,28 Rheumatologic disease 1,2 1,30 Mild liver disease 1,0 1,94 2 Diabetes with chronic complications 1,9 1,22 Hemiplegia or paraplegia 1,4 2,26 Rena! disease 3,6 1,43 Any maligna.ncy, including leukemia. and lymphoma 5,0 2,28 2 Moderate or severe liver disease (SLD) 0,5 3,83 4 Metastatic solid tumor 3,2 6,01 6 AIDS/ HIV 0,06 3,69 4

Mens CCI oprindeligt blev udviklet som en prædiktor for overlevelse efter hospitalsindlæggelse b liver den sidenhen i stigende grad brugt som justeringsvariabel i sundhedsvidenskabelige studier, der under­søger andre eksponeringer, med det formål at kontrollere for potentiel konfounding af komorbiditeter. Denne tilgang er specielt udbredt i dansk og skandinavisk sundhedsforskning, da der ofte er mulight automatiseret at bestemme patienters ccr ud fra registerdata med en acceptabel validitet [5], hvilket gør justerng for CCI til en effektiv tilgang, til at prøve at tage højde for potentiel konfounding.

I praksis kan denne justering operationaliseres i regressionsmodeller på forskellige måder, enten ved at medtage CCI som numerisk kovariat, hvilket er betænkeligt, eftersom en lineær sammenhæng er usandsynligt (se [2]), eller opdelt i en række kategorier, typisk 0/1/2/3+, 0/ 1/ 2+ eller 0/ 1-2/ 3+, men

261

Page 271: Symposium i anvendt statistik 2018

METODISKE OVERVEJELSER I FORHOLD T IL CHARLSONS KOMORBIDITETSINDEKS

potent ielt også som 25 kategorier, en for hver mulig værdi. Alternativt vil det i nogle situationer være muligt at justere for de 12 komorbid iteter som 12 separate binære kovariate.

PROBLEMSTILLING

I dette simulationsstudie ønsker vi at undersøge:

• I hvor høj grad betyder måden, der justeres for CCI-kornorbiditeter på , noget for størrelsen af det bias, der skyldes konfounding mellem eksponeringen af interesse og komorbiditeter?

• Hvor stor er dett e bias i forhold t il det fra lit teraturen [3] kendte, konservat ive bias, der skyldes manglende justering for selvstændige risikoprædiktorer i Cox-regression?

• Hvordan afhænger ovenst ående af effektsørrelsen på den primære eksponering?

SI MULAT ION

Vi undersøger effekten af CCl-justeringerne ved at simulere overlevelsesdata i to scenarier:

• Tid t il død (Weibull-fordelt , shape=l , scale= 15 hvis eksponering og alle komorbiditeter=O) • CCI-kornorbiditeter (Uafuængige og binære med prævalens og HR som i [4]) • Scenarie A: En binær eksponering A

- Associeret med t idligere død HR=l, 1,5, 2 eller 4 - Prævalens 0,2 - Uafhængig af CC!s komorbiditeter

• Scenarie B : En b inær eksponering B - Associeret med t idligere død HR= l , 1,5, 2 eller 4 - P rævalens 0,20 for personer uden CHD og SLD, 0,4 for dem med en af dem og 0,8 for

dem med begge • Censureringstid uafuængigt af ovenstående (Weibull-fordelt, shape=l , scale= 5)

Ud fra dette er der estimeret HR for eksponeringen (A eller B) ved Cox-regression, ujusteret , justeret for CCI på forskellige måder, henholdsvis justeret for de enkelte komorbiditer.

RESULTAT ER

Vi observerer følgende HR ved 10000 simula t ioner med 2000 observat ioner i hver for hver af de to scenarier og fire sande værdier a f eksponeringens HR. Først som geometrisk middel (svarende t il eksponentialet af middelværdien for koefficienterne der ligger bag HR):

Geometrisk m iddel af HR A B A B A B A B Sand HR 1 1 1,5 1,5 2 2 4 4 Ingen justering 0,9998 1,0520 1,4734 1,5464 1,9399 2,0301 3,7552 3,9083 Justering for CHD & SLD 0,9997 0,9989 1,4803 1,4800 1,9559 1,9538 3,8191 3,8113 Justering for alle komorbiditeter 1,0000 0,9989 1,4999 1,4992 2,0009 1,9984 4,0001 3,9920 Jusster ing for CCI numerisk 0-24 1,0001 1,0274 1,4890 1,5277 1,9757 2,0231 3,8991 3,9800 Justering for CCJ kategorisk 0-24 1,0001 1,0111 1,4838 1,4969 1,9628 1,9745 3,8539 3,8493 J ustering for CCI 0/ 1/ 2/ 3+ 1,0001 1,0169 1,4910 1,5148 1,9806 2,0089 3,9196 3,9663 Justering for CCI 0/ 1/ 2+ 0,9999 1,0143 1,4879 1,5080 1,9737 1,9970 3,8936 3,9302 Justering for CCI 0/ 1-2/ 3+ 1,0001 1,0210 1,4906 1,5204 1,9796 2,0157 3,9155 3,9765

262

Page 272: Symposium i anvendt statistik 2018

METODISKE OVERVEJELSER I FORHOLD TIL CHARLSONS KOMORBIDITETSINDEKS

Tilsvarende kan vi sammenligne den observerede median-HR mellem de to scenarier for forskellige sande HR:

M e dia n afHR A B A B A B A B Sand HR 1 1 1,5 1,5 2 2 4 4 Ingen justering 1,0004 1,0530 1,4730 1,5458 1,9377 2,0305 3,7523 3,9062 J ustering for CHD & SLD 0,9999 1,0001 1,4797 1,4807 1,9543 1,9552 3,8147 3,8106 J ustering for alle komorbiditeter 1,0003 0,9993 1,5001 1,4992 2,0010 1,9997 3,9993 3,9919 Justering for CCI numerisk 0-24 0,9997 1,0287 1,4900 1,5274 1,9743 2,0260 3,8990 3,9810 Justering for CCI kategorisk 0-24 1,0000 1,0107 1,4922 1,5080 1,9827 2,0045 3,9290 3,9617 Justering for CCI 0/ 1/ 2/ 3+ 0,9999 1,0174 1,4913 1,5151 1,9800 2,0118 3,9169 3,9677 Justering for CCI 0/ 1/ 2+ 0,9998 1,0149 1,4873 1,5094 1,9741 1,9998 3,8926 3,9300 Justering for CCI 0/ 1-2/ 3+ 1,0004 1,0221 1,4908 1,5206 1,9796 2,0180 3,9132 3,9774

Sættes det geometriske middel for den observerede HR i scenario A (uden konfounding) og scenario B (med konfounding) i forhold t il hinanden og afbildes mod den sande HR i de otte modeller fremkommer denne graf:

"' 0 -

0-0-0

... C> - 0

N 0

g 0:: I

+-+-+

- 181-ISI-- + <> " v===~====:::o " X - V <>

X - X V

m 0

(i'.' C> - .o.---D.-o. a I

"" "' 0 - o Ingen justering

o Justering for CHD & SLD lJ, Justering for alle komorbiditeter + Justering for CCI numerisk

"' ~ 0

x Justering for CCI kategorisk - <> Justering for CCI 0/1/2/3+

V Justering for CCI 0/1 /2+ " Justering for CCI 0/1-2/3+

I I I

1.5 2 4

Sand HR

263

Page 273: Symposium i anvendt statistik 2018

METODISKE OVERVEJELSER I FORHOLD TIL CHARLSONS KOMORBIDITETSINDEKS

DISKUSSION OG KONKLUSION

Som forventet observerer vi generelt en forhøjet HR for scenarie B i forhold t il scenarie A i alle modeller, hvor der ikke tages højde for CHD og SLD {hhv. alle 12 komorbiditeter), da eksponeringen i scenarie B har en positiv association med disse komorbiditeter, resulterende i en progressiv bias grundet konfounding, herunder i sitationen med en sand HR= 1, hvor scenario A giver HR-estimater tæt på 1, mens scenario B overvurderer HR med 1-2%.

Til gengæld bliver denne progressive bias i scenario B mindre med øget sand HR, resulterende i at begge secnarier for en sand HR på 4 (og til dels 2) udviser et konservativ bias, med en underestimation af HR på op til 6% for scenarie A og op til 5% for scenarie B i alle modeller, bortset fra modellen som er fuldt justeret for de 12 komorbiditeter som separate binære kovariate. Specielt bemærker vi at der for modellen, hvor der justeres for CHD og SLD findes den samme estimerde HR for scenario A og B, men med et konservativt bias på omkring 5%.

Sammenlignes de forskellige justeringer for CCI, ses at de kategoriske opdelinger af CCI generelt giver mindre bias grundet konfounding end numerisk CCI. Den bedste justering opnås ved at justere for CCI kategorisk i a lle værdier 0-24, men dette vil normalt ikke være en realistisk tilgang, da der typisk kun er meget få personer med de højeste CCI-værdier i sundhedsvidenskabelige studier. I dette simulationsstudie giver opdelingen 0/1/ 2+ den mindste bias, men dette kan potentielt skyldes at de to konfunderende komorbiditeter giver henholdsvis 2 og 4 CCl-point og billedet kunne se anderledes ud, hvis der havde været konfounding af en af de komorbiditeter, der kun giver et point.

Sammenfattende konkluderer vi at typen af justering for CCI-komorbiditeter betyder noget for konfoundingbias i Cox-regression, men at dette bias typisk, specielt for store effektstørrelser af ek­sponeringen, er mindre end det konservative bias, der udløses af manglende justering for selvstændige prædiktorer, uafhængigt af, om de konfunderer eksponeringen af interesse eller ej. Af denne grund kan der overvejes om det, hvis datasættet er stort nok og informatione er tilgængelig1 bør justeres for CCI-komorbiditeterne som separate binære kovariate i stedet for at aggregere dem til et komor­biditetsindeks.

REFERENCES

fl] M. E. Charlson, P. Pompei, K. L. Ales, and R. MacKenzie, A new method of classifying prognostic comorbidity in longitudinal studies: development and validation, J Chronic Dis 40 (1987), no. 5, 373- 383.

l2J W. D'Hoore, A. Bouckaert, and C. Tilquin, Practical Considerations on the Use of the Charlson Comorbidity Index with Administrative Data Bases, J Clin Epidemiol 49 (1996), no. 12, 1429-1433.

[3] M. H. Gail, S. Wieand, and S. Piantadosi, Biased Estimates of TI-eatment Effect in Randomized Experiments with NonlinearRegressions and Omitted Covariates, Biometrika 71 (1984), no. 3, 431-444.

j4] H. Quan, B. Li, C. M. Couris, K. Fushimi, P. Graham, P. Hider, J.-M. Januel , and V . Sundararajan, Updating and Validating the Charlson Comorbidity Index and Score for Risk Adjustment in Hospital Discharge Abstracts Using Data From 6 Countries, Arn J Epidemiol 173 (2011), no. 6, 676-682.

[5J S. K. Thygesen, C. F. Christiansen, S. Christensen, T. L. Lash, and H.T. Sørensen, The predictive value of ICD-10 diagnostic coding used to assess Charlson comorbidity index conditions in the population-based Danish National Registry of Patients, BMC Medical Research Methodology 11 (2011), no. 83.

264

Page 274: Symposium i anvendt statistik 2018

Computational Parametric Mapping Line Rosendahl Meldgaard Pedersen, NFA

1 Introduction

Functional magnetic resonance imaging (fMRI) is a non-invasive technique for study­ing neural activity through changes in the blood-flow of the brain from an acquired series of three-dimensional images. The MR.-scanner changes the spin of the protons in the body and makes it possible to acquire an image, or volume, of the human brain consisting of approximately 100,000 discrete volume elements, also called voxels. From tuning parameters these can either be optimized spatially to obtain a detailed replica of the brain anatomy, or optimized temporally with the purpose of detecting changes in brain activity over time.

Due to its non-invasive nature and the technological advances fMR.l have become subject for an increasing interest, especially within neuropsychological and the study of cognitive processes such a.~ decision-mal<ing.

Generally fMRI data is analyzed using a mass-univariate Gaussian generalized lin­ear model applied voxel-wise on the acquired volumes, where the regressors of the model are constructed from the variables assumed to generate the observed signal, i.e. the (experimental) factors that make the neurons spike. Model-based JMRI is a method for analyzing fMRI data from a trial-by-trial experiment, where one or more of the regressors are constructed from a computational model, which is a mathemat­ically specified mapping between the induced stimuli and the behavioral responses, and depends on same unknown parameters. In model-based fMRI these parameters are estimated from the observed behavior, and the predicted variables are then re­gressed onto the fMRI data in arder to identify correlating regions within the brain.

The aim of my thesis is to develop a brain-centric alternative approach to model­based fMRI, where the unknown parameters are estimated from the fMRI data rather than the behavioral data. Within a Bayesian framework we will develop a method, Computational Parametric Mapping (CPM) , for computing the posterior distribution of the unknown parameters of the computational model. We will adapt the scheme presented by Hansen, Larsen and Nielsen in (Hansen et al" 2002) for camputing the posterior distribution, and furthermore use this t o formalize a voxel select ion criteria.

CPM will be applied to data from a reversal learning paradigm, where the compu­tational model is specified by the R.escorla-Wagner learning rule with the unknown, so-called learning rate parameter a.

2 Background

Neural act ivity increases the oxygen consumption of the neurons, which results in an increase in blood supply, as oxygen is bound to hemoglobin within the blood. Hemoglobin saturated with oxygen is denoted oxygenated h.emoglobin, while depleted hemoglobin is referred to as deoxygenated hemoglobin. As the supply exceeds the need, the ratio of oxygenated and deoxygenated hemoglobin increases and is observed a8 a change in the fMRJ signal, i.e. a change in the image intensities. This is known as the blood-oxygenation-level-dependent (BOLD) contrast, and can be seen as an indirect measure of brain activity.(Huettel et al" 2009)

265

Page 275: Symposium i anvendt statistik 2018

The increase in blood-flow upon neural activity and the physiological changes here from , is called the hemodynamic response, and its properties and bebavior is a well­examined phenomenon. It can be described through the hemodynamic response func­tion (HRF), which characterizes the typical behavior of the BOLD signal over time. The BOLD response has three main features which should be captured by the HRF; the initial dip, the peak and the poststimulus undershoot - lasting for a total of ap­proximately 30 seconds. The canonical HRF from the MATLAB toolbox Statistical Parametric Mappinq (SPM) by Karl Friston and colleagues in (Friston et al" 2007), depicted in Figure 1, is constructed from a sum of two Gamma functions - one mod­eJing the peak and one modeJing the undershoot.

0.025 The Hemodynamic Response Function (HRF)

0.02

0.015

~ ·;;; O.Q1 0

6 <Il

0.005

~lnitialdip ~ Poststimulus undershoot -o.oos~---~---~----~---~---~---~---~

0 10 15 20 25 30 35

Time (seconds)

Figure 1: The canonical shape of the hemodynamic response function (HRF) is character­ized by three main features; the initial dip (here: invisible), tbe peak and the poststimulus undershoot. Tbe initial dip occurs 1-2 seconds after stimulation (time O) when the amount of deoxygenated hemoglobin increases just befare the supply of oxygenated bemoglobin. The supply of oxygenated hemoglobin is seen as an increase in the signal peaking around 4-6 seconds upon stimulation, which is followed by the poststimulus undershoot that goes be­low baseline befare stabilizing at around 30 seconds after stimulation.(Poldrack and Nichols, 2011) (Huettel et al" 2009)

2.1 fl\1RI Data and the Preprocessing

Data consists of a series of three-dimensional volumes, thus we have a four-dimensional data matrix, where the fourth dimension is the time (in the unit of scans) . Each el­ement in the data matrix corresponds to the gray scale intensity (stored as integers) for the corresponding voxel in said scan. Metadata such as voxel position is stored in a header file.

The raw data from fMRJ experiments contains a lot of statistical noise due to sev­eral factors such as subject movement , scanner artifacts and eye movement. Hence an important aspect of handling fMRJ data is to preprocess the images befare the statistical analysis in order to remove as much noise as possible befare the modeJing.

266

Page 276: Symposium i anvendt statistik 2018

This preprocessing consists af a series of steps illustrated in Figure 2, and handles hoth detecting and restoring images from art ifacts as well as preparing t he data for analysis. (Poldrack and Nichols, 2011)

Motion correction handles noise induced by mavement af t he subject , and distor­tion correction deals with the artifact associated with the scanning procedure such as effects from the inhomogeneity of the magnetic field. Slice timing correction is needed because a volume is obtained as a series af slices - typically between 5 and 40 - acquired over a time interval - repitition t ime (TR) typically af 1 to 5 seconds. Therefore it is necessary to take this time delay into account and a.ssure that the measured signals are correct.ed for t his. (Poldrack and Nichols, 2011)

Motion

correction

Acquiring -Data

----~---~

Distortion

correction

---Preprocessing

Slice timing correction

Statistical f------+

Analysis

---

N ormalization Spatial

smoothing

Figure 2: An important aspect af he analysis af fMRI data is the preproccsesing af the raw data to reduce the statisticaJ noise from the acquisition af the images. The preprocessing is performed in a series of steps with the purpose of detecting and cleaning the data for artifact.s from different sources, and preparing the data for the statistical analysis.

The overall anatomical structure of the bra.in and its functional areas are approxi­mately the same across humans, but on a micro-structural-level the difference between brains makes it impossible to make a direct comparison between images at the level af individual voxels. Therefore all images are registered, or normaJized, to a com­mon space or template, which is an anatomical atlas af the brain. This templates constitutes a "mean" brain image typically averaged across hundreds of individuals. By normalizing to the template the t ime series af each voxel becomes comparable between subjects.

In the step of spatiaJ smoothing, the images are smoothed to remove high-frequency informat ion, thus reducing noise at voxel-level af the data, while increasing the signal­to-noise ratio (SNR) for signals and features across regions including many voxels. Furthermore smoothing correct spatial variabili ty not captured by the spatial align­ment during motion correction. (Poldrack and Nichols, 2011) (llidgway, 2011)1

1 Another step, which is usually not characterized as preprocessing 1 but as a part of tbe statistical analysis1 is temporal filtering. Just as in the step of spatial filtering1 a Gaussian kernel is applied to the data with the purpose of reducing the noise induced from drift during the scans from both subject as well as scanner. The temporal filter is a high-pass filter1 as drift is a low-frequency noise component, and can be performed through matrix multiplication before the stat istical analysis (also known as pre-whit,ening) or be incorporated in the rnodeling of the data. Hence the discrepancy of the characterization of t his step.(Poldrack and Nicbols, 2011) (Phillips, 2011)

267

Page 277: Symposium i anvendt statistik 2018

2.2 General Linear Models of fMRI Data

The standard approach for analyzing fMRI dat a is within a mass-univariate Gaussian generalized linear model for each voxel, where the spatial dependencies between vox­els initially are ignored, but later corrected for befare doing inference. Because af the lag of the BOLD response relative to the neural activity a linear convolution model has been developed and described by Friston et al. in (Friston et al., 2007). In the foliowing we will describe this setup.

For a four-dimensional volume, where the fonrth dimension is the time in units af scans, the intensit ies are extracted and reshaped into a vector with an element for each voxel. Let v E N denote the indexing af voxels in an unzipped three-dimensional volume, and let Yv = (Yv(l) , Yv(2), ... , Yv(Tscans ))T denote the observed fMRJ signal at each scan t = 1, ... , Tscans· Then for each voxel v we define the model

Y v= X/3 +e, (1)

where we emphasize the time dependencies af the model are embedded in the design matrix and error distribution, whereas the regression coefficients /3 E JæP are time-­invariant, p being the number af explanatory variables. Generally the fMRJ data is assumed to follow a multivariate Gaussian distribution such that

e = (e(l) ,e(2), . . . ,e(Tscans)f ~ N(0, <T2I:) , <T2 > 0,

where I: models the noise autocorrelation. By fi tting the data and estimating the autocorrelation, the data can be pre-whitened such that the i.i.d. assumpt ion af the GLM holds. The spatial correlation between voxels is corrected for when doing infer­ence e.g. from Bonferroni correction af the p-values obtained from t-test . (Phillips, 2011)

The design matrix is formed by the explanatory variables assumed to generate the observed fMRI signal. The regressors, X;, i = 1, ... , p represent.s the predicted BOLD responses ane would observe upon neural activation, up to same scaling factor. This scaling factor is the regression coefficient af GLM, which is to be estimated. The pre­dicted BOLD response is constructed from a convolution between a stimulus function and the canonical HRF.

The stimulus function defines the neural spiking induced by some experimental event, e.g. visual or auditory stimulation, and each exploratory variable has an associated stimulus function, denoted u;(t ), i = 1, ... , p, also called a stick f unction. It is defined as a sum af Dirac o-functions, namely

J

u;(t) = L.\;,jo(t - o;,J), (2) j =l

where o;,j is the onset s of a sequence of J events for variable i, and .\;,j defines the magnitude of tbe induced response. Depending on ( the interpretat ion af) the underly­ing event, the parameter .\;,j can vary across events or be constant for all j = 1, . . . , J. When ,\i,J varies with j if is referred to a parametric modulation, while a fixed .\;,j = 1 is usually referred t o as a main effect.

268

Page 278: Symposium i anvendt statistik 2018

Not all regressors in the design matrix are of interest, as the design matrix should include all variables believed to generate the signal. This includes nuisance regressors that controls for variation in the signal not induced by the experiment such as cardiac activity or drift. The variables of interest are generally the ones manipulated or mea­sured by the experimenter. As an example, consider the reversal learning paradigm illustrated in Figures 3 and 4 conducted by the authors of (Behrens et al., 2007).

2.3 Computational Models

A computational model typically formulates a mapping between a set of stimuli and the behavior it induces , thus it specifies a quantitative hypothesis about the com­putations in the brain, which can be tested and used to generate prediction about the neural signal. The use of computational models within neuroimaging facilitates investigation of neural computations that were otberwise intractable since they are latent. (Daw, 2011) (Cohen et al., 2017) (O'Doherty et al., 2007)

Let i = 1, ... , nirials be the number of trials in a given experiment where a subject is presented with two stimuli A and B each associated with a reward (which can be fixed or alter over time). In order to optimize the total gained reward over time, at each trial the subject aims to learn the reward distribution of each stimuli from the choices in previous trials. The computational model given by the Rescorla-Wagner learning rule states that the predicted probability of reward for one of the stimuli, say A, at trial i is given by

M(a): Pi(A) = F';-1 (A) + °'~i- 1 (A), i = 1, ... , ntrials (3)

where ~i-1(A) = R.-1-P;-1(A) is the prediction error ofthe previous trial; R;_1 = 1 if A was rewarded and 0 else. a E (0, 1) is an unknown parameter to be estimated, called the learning rate, and determines the weight of the outcome of previous trials. Note that since the two stimuli are mutually exclusive, we have that P;(B) = l-Pi(A).

2.4 Model-based fMRI

The name 'Model-based fMRI' refers to the use of computational models within the analysis of fMRI data. As an example we consider the computational model M(a) defined above in (3). From the observed behavior the free parameter ais estimated and the (predicted) variables P; and ~i-1 are then convolved with the HRF and used as regressors in the GLM of the fMRI data, in (1). The inference on the computa­tional parameters are then derived from the regressions coefficients in the GLM model through classical statistical methods such as t-tests.

Though model-based fMRI has proven fruitful in the investigations and identifica­tion of brain regions correlating with the observed BOLD signal, t here still are some issues regarding this approach. Foremost is the computational model is fitted to the behavior, meaning the estimation procedure of the parameters is driven solemnly by the behavioral data and not the neural activity assumed to generate it. Furthermore the fitting of the latent parameters prior to the GLM fit means that the parameters are considered fixed in relation to the neural activity, thus excluding the possibility of different parameter estimates across the entire brain. This Jack of anatomical dif­ferentiation across the brain is highly implausible given the complex structure of the brain and the biological evidence for parallel computations within the brain.

269

Page 279: Symposium i anvendt statistik 2018

REACTIONTIME

~ STIMULI ARE SUBJECT MAY SUBJECT WAIT

PRESENTED CHOOSE FOR OUTCOME

DECISION PREOICTION

FEEDBACK

IS GIVEN

MONITOR

BlACK SCREEN

UNTIL NEXT TRIAL

ITI

Figure 3: Illustration of a single trial of an experiment of a reversal learning paradigm. At each trial the subject are to choose hetween two stimuli represented by a green and a hine rectangle, eacb with a visible reward magnitude, but unknown reward probability. The subject was rewarded if he or she chose the rewarded stimuli. The objective for the subject is to learn the reward probability based on previous trials in arder to increase their total gain.

At the beginning of the t rial two stimuli are sbown for 4-8 seconds ( the interval changes with every trial) . When a question mark appears the suhjects chooses one of the two stimuli with the purpose of gaining a reward. After the button press that marks t.he choice t.he subject wait.s for 4-8 seconds befare feedback is given by revealing the rewarded stimuli so the subject learns if the chosen stimulus was rewarded. A black screen is shown for 3-7 seconds until next trial starts. Each trial consists af different phases - decision phase, prediction phase and monitor (feedback) phase - which are to be incorporated in the regression model used for analy7-ing the data.

REACTION TIME

~ STIMULI ARE SUBJECT MAY SUBJECT WAIT

PRESENTED CHOOSE FOR OUTCOME

DECISION PREDICTION

l I

FEEDBACK

IS GIVEN

MONITOR

BLACK SCREEN

UNTIL NEXT TRIAL

ITI

Figure 4: Illustration of the reversal learning paradigm, and the stk.k functions for the main effect. of the decision phase and the monitor phase, and an example af a parametric modulation of the decision phase. The two top stick functions models the main effect. of the decision phase and the monitor phase, i.e. both regressors have the same magnit.ude of I, while the bottom is a parametric modulation af the decision phase with the reward magnitude in arder to model the potential effect of the expected reward size.

270

Page 280: Symposium i anvendt statistik 2018

3 Computational Parametric Mapping

We will adopt and expand the setup presented in Hansen, Nielsen and Larsen (2002) (Hansen et al. , 2002), where the posterior distribution for the underlying parameters is derived from imposing a conjugate prior to the ful! JilæJihood. From this we can estimate the parameters of the computational model, and furthermore use this to cmn­pare posterior dist ributions of different parameters (or parameterizat ions) in terms of the Bayes factor with the purpose of determine which voxels can be considered active relative to the computationaJ model. We will call this Computational P arametric Mappin_q (CPM) a.s the emphasis is on the computational parameters rather than the parameters of the regression model which generally are used for inference.

We will in our setup let the signal be modeled by a number of regressors, each con­volved with the canonical HR.F as described in Section ?? and 2.2, where we let the latent parameters be those of an underlying computational model M . Let Y v denote the observed signal in a voxel v, and assume the distribution of Y v can be described by the linear model in (1) , where we have an underlying computat ional model M(8 ) depending On some unknown parameters 8 defined on some parameter Space 8 C:: JR1.

The scheme presented by Hansen et al.(Hansen et al., 2002) is as follows:

Step 1: we can compute the full likelihood of the dat a given the parameters, p(YvJ/3,o-2 , 8 ), which can be reduced to p(Yv l8) by assuming a prior distribut ion for the parameters (/3, o-2 ) and marginalize them out , i.e.

(4)

To make (4) tractable, we will employ a conjugate prior for (/3, o-2 ) given by the normal-inverse-gamma distribution.

Step 2: Having the reduced likelihood p(Y vl8) , the posterior probability of the parameters 8 is given by Bayes' formula

(8JY) = p(Y vl8 )p(8 ) p v p(Yv) . (5)

By discretizing t he parameter Space e into equidistant points such that e "" { 81, 82 , .. . , 8N} for a suitable N E N, and assuming that p( 8) is a flat prior on this set , we obtain the following approximation of a probability distribution

(8 IY ) = p(Y vl8) p V N '

L:k=l p(Y vl8k) + Po(Y v) (6)

where Po(Yv) denotes the null-model, which in this case would correspond to the reduced model without the variable(s) related to the computational model and the parameter 8 . The null-model thus represents the case where the parameters 8 are not present at all , i.e. the sub-model where the corresponding regressor has been excluded.

We assume data Y v = (Yv(l ), Yv(2), . . . , Yv(Tscans)JI' is normaJ!y distributed and in­dependent , depending on the unknown parameters (/3, o-2, 8) E JRP X [0, oo) X 8. The

271

Page 281: Symposium i anvendt statistik 2018

full likelihood with respect to the Lebesgue measure of lRT""""" is then

~ 2 _ ( 1 ) 2 _(Yv-f3l2

p(Yvl/3, 8, O" , X) - 21r0"2 e 2• , (7)

Where we emphasize that the parameters 8 are latent within X in (7) above.

Let (,13, o-2 ) ~ P,µ(,13, o-2 ) fora probability distribution P,µ depending on same hyper­parameters 'I/; E '1! ~ JRk. and let p(,13, o-2 11/J) denote the corresponding density w.r.t. the Lebesgue measure on JRP x [O, oo) . Then we let P,µ(,13, a-2) = N -r-1 (a, b, µ,V), i.e. 'I/; = ( a, b, µ ,V), which is the normal-inverse-gamma distribution with parameters a, b > 0, µ E JRP, and V which is a p x p covariance matrix. Thus the integral in ( 4) becomes tractable as the integrand tao will be a normal-inverse Gamma distribution.

By discretizing the parameter space such that 8 "" LJ{:'.,1 { 8;} for sufficiently large N E N, and apply Bayes' formula we obtain a posterior distribution as a continuous approximation from a sequence ofpoint probabilities for each 8; as given in (6) . Due to the discretization we assume the prior distribution of the latent parameters, p( 8) is a flat prior, as we want to emphasize our (Jack of) knowledge about the t rue valne. Besides the maximum a posteriori estimate, we compute the Bayes fact ors of this and the null-model, which we will use for selecting voxels of interest with respect to the proposed computational model and parameter estimate.

4 Analysis and Results

We consider a data set of 17 subjects from the behavioral experiment on reinforcement learning as described in Section 2.2. We will apply CPM at individual level, where the data from each subject will be analyzed in the same way, befare the results are compared. Initially the CPM analysis is performed on the entire brain, but the main focus will be on the pre-defined region of interest (ROI), the anterior cingulate cortex (ACC), which is known to be associated with decision-making and reward anticipation.(Behrens et al. , 2007) (Kennerley et al., 2006)

4.1 The Model for Analysis and CPM

The model we propose has 21 reg:ressors including a regresser constructed from the R.escorla-Wagner learning rule. The null-model which we will use for voxel selection is the model without any regressors related to the computational model, and thus contains 20 regressors in the design matrix. We expect the neural activity from the decision-making to be associated wit h the probahilities of the chosen stimulus. Renee instead of modeJing this activity from P, (A) or P;(B) in (3) alone, we construct the probability distribution for the variable of the chosen stimulus at each trial, P;(C), as

R(C) = { P;(G) if the green stimulus is chosen at trial i, ' P;(B) if the blue stimulus is chosen at t rial i.

(8)

We compute this for 99 equidistant values of a E [0.01, 0.99] resulting in 99 design matrices for which we compute the posterior distribution of a. The null-model used for voxel selection is represented by the design matrix of all but the regresser com­puted from P;(C).

272

Page 282: Symposium i anvendt statistik 2018

The other 20 regressors are included to model the remaining variation in the sig­nal. 13 af t hese are nuisance regressors; six motion correction regressors are added to correct for physiological noise, and were computed in relation to t he preprocessing of the data. Seven regressors are main effect regressors that models the expected BOLD response invoked by the experimental factors, and are standard in the analysis of data from reversal learning experiments. The remaining seven are temporal derivatives of these main regressors. These are added to correct for a possible lag in t he BOLD response.

4.2 Results

The voxel selection cri teria proved to be very conservative for the majority of the 17 subjects, where only approximately 10 % of the voxels - corresponding to approxi­mately 130 out af 1311 voxels - were considered active with respect to the compu­tational model. Most showed small, and same subjects only minuscule , clusters of selected voxels scattered across the ACC. Generally the Bayes factors showed to be very low even for the selected voxels. The estimated learning rate Ci.M AP however differed widely between subjects , both in range across the parameter space and in the bias towards either end of it. Furthermore we also saw different trends in anatomical distribution across subjects .

For illustrative purposes figures 5a and 5b (on the final page) show the anatomical distribution of Bayes factors and a.M AP across the ACC for subject NV.

5 Discussion and Conclusion

In this thesis we have analyzed fMRJ data from a novel brain-cent ric Bayesian ap­proach, where we have computed the posterior distribution for a latent parameter within an assumed underlying computational model. From this we can deri ve a param­eter estimate based on fMRJ data rather t han behavioral data. We have implemented a criteria for selecting voxels of interest with respect to the proposed computational model, as the Bayesian model selection allows us to compare models of different com­plexities and thus have a formalized methodology. Our focus has been on computational model within reinforcement learning, but as CPM has been developed from a statistical rather than neural or psychological per­spective, the framework generalizes to all settings where an underlying computational model is assumed to generate the observed BOLD signal.

The voxel selection criteria clearly had the greatest impact of the ( anatomical) dis­tribution of Ci.MAP , as most voxels did not have a high enough Bayes factor for being considered active. We would suggest the specifi cations of t his criteria should be further investigated as the discarding of the majority of the voxels. A possible modification of this could be to model the spat ial correlation between the voxels in a pre-specified region. Furthermore it could be investigated if modeJing the temporal dependence would improve the performance af CPM.

273

Page 283: Symposium i anvendt statistik 2018

N _, .i:.

. #

,

' I'

"" "

. "

" .;'

,, ,

"

" "

". " ,,

~

• (a

) T

he

anat

omic

al d

istr

ibu

tion

of

Bay

es f

acto

rs a

cros

s th

e sa

ggit

al p

lane

in

subj

ect

NV

. B

righ

t ye

llow

cor

resp

onds

to

Bay

es f

acto

rs s

tric

tly

grea

ter

than

7-

8, w

hile

Bay

es f

acto

rs l

ower

tha

n 2

are

dark

and

thu

s in

visi

ble.

H

ere

mos

t of

the

Bay

es f

acto

rs a

re 6

or

high

er,

and

thi

s fo

r a

larg

e pa

rt o

f th

e A

CC

.

(b)

The

an

atom

ical

dis

trib

utio

n of

°'M

AP-

esti

mat

es i

n su

bjec

t N

V.

Gre

en

corr

espo

nds

to

valu

es c

lose

to

zer

o, b

lue

corr

espo

nds

to v

alue

s ar

ound

0.5

, w

hile

red

cor

resp

onds

to

valu

es c

lose

to

l.

Gen

eral

ly t

he e

stim

ated

lea

rnin

g ra

tes

are

abov

e 0.

5 fo

r th

is s

ubj

ect,

whe

re t

he s

mal

lest

lea

rnin

g ra

tes

are

foun

d in

the

lef

t si

de a

nd u

pper

sid

e og

the

AC

C.

Page 284: Symposium i anvendt statistik 2018

References

Timothy E J Behrens, Mark W Woolrich , Mark E Walton, and Matthew F S Rush­worth. Learning the value of information in an uncertain world. Nature neu­roscience, 10(9) :1214- 21, 2007. ISSN 1097-6256. doi: 10.1038/ nn1954. URL http://""'1.ncbi.nlm.nih.gov/pubmed/ 17676057.

Jonathan D Cohen, Nathaniel Daw, Barbara Engelhardt, Uri Hasson, Kai Li, Yael Niv, Kenneth A Norman, Jonathan Pillow, Peter J Ramadge, Nicholas B Turk­browne, and Theodore L Willke. Computational approaches to fMRl analysis. 20 (3), 2017. doi: 10.1038/nn.4499.

Nathaniel D. Daw. Decision Making, Affect, and Leaming: Attention and Perfor­mance XXIII, chapter Trial-by-trial data analysis using computational models. Uni­versity of Cambridge, Cambridge UK, 2011.

K. J. Fristen, J. T Ashburner, S. J. Kiebel, T. E. Nichols, and W. Penny. Statistical Parametric Mapping: The Analysis of Ji'unctional Brain Images. Elsevier Ine., 2007.

Lars Kai Hansen, F. Å Nielsen, and Jan Larsen. Exploring fMRJ data for periodic signal components. Artificial Intelligence in Medicine, 25(1):35-44, 2002. ISSN 09333657. doi: 10.1016/S0933-3657(02)00007-6.

Scott A Huettel, Allen W Sang, and Gregory McCarthy. Ji'unctional magnetic reso­nance imaging. W. H. Freeman, New York, 2. edition, 2009.

S W Kennerley, M E Walton, T E Behrens, M .J Buckley, and M F Rush­worth. Optimal decision making and the anterior cingulate cortex. Nature Neu­roscience, 9(7):940- 947, 2006. ISSN 1097-6256; 1097-6256. doi: nnl 724(pii]. URL http ://;rww.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve{&}db= PubMed{&}dopt=Citation{&}list{_}uids=16783368.

John P. O'Doherty, Alan Hampton, and Hackjin Kim. Model-based fMRJ and its ap­plication to reward learning and decision making. Annals of the New York Academy of Sciences, 1104:35-53, 2007. ISSN 00778923. doi: 10.1196/ annals.1390.022.

Christophe Phillips. The General Linear Model. https: I /mediacentral. ucl. ae. uk/Player/3835, 2011.

Russen a Poldrack and Thomas E Nichols. Handbook of functional MRI data analysis, volume 4. 2011. ISBN 9780521517669. URL http : //webcat. warwick. ae. uk/ record=b2542955{-}S15.

Ged Ridgway. Spatial Preprocessing. https : I / mediacentral . ucl . ae. uk/Player I 2888, 2011.

275

Page 285: Symposium i anvendt statistik 2018
Page 286: Symposium i anvendt statistik 2018

Matching SAS customers with SAS-skilled students for collaboration on thesis projects.

For SAS-skilled students:

Do you want to work on a real-life case with REAL data?

Do you want to get a preview of working life?

Do you want to take advantage of the great analytical possibilities in SAS?

For SAS customers:

Do you need fresh eyes on a project?

Do you have ideas for an interesting project but lack of time?

Do you want to help a student – and maybe a future employee?

Can you answer YES to any of this?

Then the SAS® Thesis Program is ideal for you!

Want to know more about the SAS® Thesis Program?

Contact: Anne Olesen – [email protected]

Follow SAS for Students on Facebook: www.facebook.com/SASforStudents