22
1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al- Awadhi

1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

  • View
    219

  • Download
    1

Embed Size (px)

Citation preview

Page 1: 1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

1

Quantifying Opinion about a Logistic Regression using

Interactive Graphics

Paul GarthwaiteThe Open University

Joint work with Shafeeqah Al-Awadhi

Page 2: 1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

2

Introduction/Plan

• This work arose from a practical problem in logistic regression.

• The theory extends easily to elicit opinion about the link function of any glm.

• I will outline the method for glm’s in general.• The motivating problem has some additional

(commonly occurring) structure that the elicitation method exploits.

• Interactive computing is used to elicit opinion.• Prior models can be formed that aim to allow a small

amount of data to correct some potential systematic biases in assessments.

• Results for the practical problem will be given.

Page 3: 1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

3

Motivating Example

The task is to model the habitat distribution of fauna

in south-east Queensland - bats, birds, mammals etc.

Available information:• Environmental attributes on a GIS database.• Sample information of presence/absence at 300-

400 sites.• Background knowledge of ecologists.

The ecologists have seen the bat (say) in various

locations but this information is difficult to use

in a traditional statistical analysis because it has

not been obtained from any sampling scheme.

Prob(presence) = f (environmental attributes)

Page 4: 1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

4

Continuous variables: elevation; quarterly rainfall and temperatures; canopy cover; slope; aspect.Factors: land type; vegetation; forest structure;

logging; grazing; etc.

A workshop with 15 ecologists indicated• unimodal or monotic relationships• independence between attributes in their effect on

the probability of presence.

00.05

0.10.15

0.20.25

0.30.35

0.40.45

0.5

0 2 4 6 8 10 12

attribute

prob

(pre

senc

e)

Page 5: 1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

5

Generalised Linear Model (glm)

The model has the form where g[.] is the link function.

For logistic regression, and

is the probability of presence.

is the vector of predictor variables.

From the ith predictor variable, , a vector of explanatory variables is constructed

such that we have the linear

equation

r[ ( )]Y g

[ ] ln( /(1 ))g

r

ir

'11

'X X ... m n m nY

'

, ( ),1X ( ,..., )i iii X X

Page 6: 1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

6

Define:

and then is a linear function ofY

,1 , ( )X ( , ... , )'.

i i ii X X

, 1

, ,, 1 , 1

, ,, 1

0 if

if

if .

i i j

i j i i i ji j i j

i j i j ii j

R r

X R r r R r

r r r R

Page 7: 1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

7

Factors:One factor level (the best one, say) is chosen

as the reference level. Each other level is given a

dummy 0/1 variable that equals 1 for that level

and 0 for all other levels:

,i jX

,,

1 if

0 otherwisei i j

i jR r

X

Page 8: 1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

8

'11

'X X ... m n m nY

The sampling model is

Let

For the prior distribution we put

The values of the parameters in red must be chosen

by the expert to represent his or her opinions.

1 ( , ... , )'.m n

00 10

1

MVN ,'

b b

Page 9: 1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

9

Assessing medians and quartiles.These are fundamental assessment tasks the expert performs. How far is it from Aberdeen to Southampton?

25% 25% 25% 25%

470m 525m 600miles

The median (blue) is assessed first and then the

lower and upper quartiles (red).

Ecologists were given practice at performing these

tasks in preparatory training and explanation.

| | |

|||

Page 10: 1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

10

Eliciting and

and . Also,

at the reference point. The expert

assesses , the median of at this point.

(For logistic regression is the probability

of presence.)

We put .

The expert also assesses the lower and upper quartiles and . We put

0b

00

0

E( )b 00

V ) ar(

Y

0.50m

0.50m

0.500

[ ] g mb

0.75m

00

2

0.75 0.25 g(m ) ( )

1.348

g m

0.25m

Page 11: 1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

11

Eliciting and

• is determined from the unconditional assessments.

• is determined from assessments conditional on

. equalling .

1 b

b

1

0.75

m

Page 12: 1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

12

Eliciting and for factors.

Put . Then

enabling to be estimated.

[Go to program]

b 1

0.75 0.75 [ ]y g m

0.

100 0.75 075 1

b E[ | ] ( )yy b

1

Page 13: 1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

13

Assessments to obtain

Conditional on the first three line segments being

correct, the dashed lines are quartiles of where the

line might continue.

Page 14: 1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

14

Conditional Assessments for Factors

• The circles indicate conditions.• Dotted horizontal bars are previous assessments.• Solid bars are current assessments and must be

within the dotted bars if is positive-definite.

[Go to program]

Page 15: 1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

15

Calculating

Iterative calculations determine .

Start by estimating the lower-right scalar

element of , and call it . Then estimate the

lower-right of and call it , etc.

If

and is positive-definite, then so is

provided .

1A

p2 2

1

a 'A

a Aii i

ii i

a

1A

iAi

11

a ' A aii i iia

Ap

Page 16: 1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

16

Alternative Prior Models

Individuals can show systematic bias in their

subjective assessments. The aim is to form prior

models that allow a small amount of data to

largely correct some potential biases.

Prior 2The marginal distribution of is diffuse, rather

than . The conditional distribution

of is assumed to be unchanged:

This allows for error in specifying the origin of the

Y-axis.

0 00

N ( , )b

| MVN (b, )

Page 17: 1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

17

Prior 3Prior 3 replaces the scale for Y with some other

linear scale. is again given a diffuse

distribution and the conditional distribution of

is taken to be

is also given a diffuse distribution.

Prior 4This is the same as Prior 3, except it allows for

systematic bias in quartile assessments by putting

are given diffuse distributions.

2 | MVN ( b, )

|

| MVN ( b, )

a d, n

Page 18: 1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

18

Cross-validation and scoring

• The usefulness of a prior distribution can be objectively examined by using cross-validation and a scoring rule.

• For the cross-validation the data for a species were divided into four sets. Each set in turn was omitted and the remaining sets used to form prediction equations.

• Prediction equations were applied to the omitted set and squared error loss determined:

where the summation is over all sites in the omitted (validation) set, is the probability of presence given by the prediction equation, and is a 0/1 dummy variable indicating absence/presence.

• This defines a proper scoring rule.

2Squared error loss ( )k k

kw

k

kw

Page 19: 1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

19

Results for little bent-wing bat

_______________________________________Method Set 1 Set 2 Set 3 Set4 Total

Prior 1 9.57 8.93 8.94 9.30 36.74Prior 2 9.62 9.03 8.98 9.24 36.87Prior 3 9.52 8.86 8.92 8.81 36.11Prior 4 9.73 8.87 8.90 8.62 36.13Frequent. 11.03 9.72 9.55 10.78 41.07No data 10.83 9.81 9.92 10.56 41.12

SampleResults

11/94 10/94 10/93 11/94 42 in375

Page 20: 1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

20

-5.0

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

-5.0 -4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0

prior value

po

ster

ior

valu

e u

sin

g P

rio

r 1

Prior 1

-5.0

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

-5.0 -4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0

prior value

po

ster

ior

valu

e u

sin

g P

rio

r 2

Prior 2

-5.0

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

-5.0 -4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0

prior value

po

ster

ior

valu

e u

sin

g P

rio

r 3

Prior 3

-5.0

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

-5.0 -4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0

prior value

po

ster

ior

valu

e u

sin

g P

rio

r 4

Prior 4

MVN (b, ) MVN (b, )

2

MVN ( b, ) MVN ( b, )

Page 21: 1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

21

____________________________________________

____________________________________________

.

Method

Littlebent-wingbat

Comm-onbent-wingbat

Frog-mouth

Pow-erfulowl

Great-erglider

Prior 1 36.74 12.75 28.76 13.61 43.90Prior 2 36.87 12.73 28.91 13.60 43.94Prior 3 36.11 12.41 25.99 13.17 42.35Prior 4 36.13 12.75 28.61 13.61 43.90Frequent. 41.07 13.70 30.91 14.38 44.15No data 41.12 13.66 29.54 15.07 48.81

SampleResults

42 in375

13 in375

31 in324

14 in324

53 in343

Page 22: 1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

22

Concluding Comments

• The elicitaion method described here is able to handle large problems by:

(a) using interactive graphics

(b) suggesting values to the expert that might

represent his or her opinions.• It is believed that the use of graphs can improve

the quality of the assessed distributions.• Cross-validation can demonstrate clearly the gain

from using prior knowledge, when there is such gain.

• Additional parameters in the prior model can allow limited data to be used more effectively.