1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

1

Quantifying Opinion about a Logistic Regression using

Interactive Graphics

Paul GarthwaiteThe Open University

Joint work with Shafeeqah Al-Awadhi

2

Introduction/Plan

• This work arose from a practical problem in logistic regression.

• The theory extends easily to elicit opinion about the link function of any glm.

• I will outline the method for glm’s in general.• The motivating problem has some additional

(commonly occurring) structure that the elicitation method exploits.

• Interactive computing is used to elicit opinion.• Prior models can be formed that aim to allow a small

amount of data to correct some potential systematic biases in assessments.

• Results for the practical problem will be given.

3

Motivating Example

The task is to model the habitat distribution of fauna

in south-east Queensland - bats, birds, mammals etc.

Available information:• Environmental attributes on a GIS database.• Sample information of presence/absence at 300-

400 sites.• Background knowledge of ecologists.

The ecologists have seen the bat (say) in various

locations but this information is difficult to use

in a traditional statistical analysis because it has

not been obtained from any sampling scheme.

Prob(presence) = f (environmental attributes)

4

Continuous variables: elevation; quarterly rainfall and temperatures; canopy cover; slope; aspect.Factors: land type; vegetation; forest structure;

logging; grazing; etc.

A workshop with 15 ecologists indicated• unimodal or monotic relationships• independence between attributes in their effect on

the probability of presence.

00.05

0.10.15

0.20.25

0.30.35

0.40.45

0.5

0 2 4 6 8 10 12

attribute

prob

(pre

senc

e)

5

Generalised Linear Model (glm)

The model has the form where g[.] is the link function.

For logistic regression, and

is the probability of presence.

is the vector of predictor variables.

From the ith predictor variable, , a vector of explanatory variables is constructed

such that we have the linear

equation

r[ ( )]Y g

[ ] ln( /(1 ))g

r

ir

'11

'X X ... m n m nY

'

, ( ),1X ( ,..., )i iii X X

6

Define:

and then is a linear function ofY

,1 , ( )X ( , ... , )'.

i i ii X X

, 1

, ,, 1 , 1

, ,, 1

0 if

if

if .

i i j

i j i i i ji j i j

i j i j ii j

R r

X R r r R r

r r r R

7

Factors:One factor level (the best one, say) is chosen

as the reference level. Each other level is given a

dummy 0/1 variable that equals 1 for that level

and 0 for all other levels:

,i jX

,,

1 if

0 otherwisei i j

i jR r

X

8

'11

'X X ... m n m nY

The sampling model is

Let

For the prior distribution we put

The values of the parameters in red must be chosen

by the expert to represent his or her opinions.

1 ( , ... , )'.m n

00 10

1

MVN ,'

b b

9

Assessing medians and quartiles.These are fundamental assessment tasks the expert performs. How far is it from Aberdeen to Southampton?

25% 25% 25% 25%

470m 525m 600miles

The median (blue) is assessed first and then the

lower and upper quartiles (red).

Ecologists were given practice at performing these

tasks in preparatory training and explanation.

| | |

|||

10

Eliciting and

and . Also,

at the reference point. The expert

assesses , the median of at this point.

(For logistic regression is the probability

of presence.)

We put .

The expert also assesses the lower and upper quartiles and . We put

0b

00

0

E( )b 00

V ) ar(

Y

0.50m

0.50m

0.500

[ ] g mb

0.75m

00

2

0.75 0.25 g(m ) ( )

1.348

g m

0.25m

11

Eliciting and

• is determined from the unconditional assessments.

• is determined from assessments conditional on

. equalling .

1 b

b

1

0.75

m

12

Eliciting and for factors.

Put . Then

enabling to be estimated.

[Go to program]

b 1

0.75 0.75 [ ]y g m

0.

100 0.75 075 1

b E[ | ] ( )yy b

1

13

Assessments to obtain

Conditional on the first three line segments being

correct, the dashed lines are quartiles of where the

line might continue.

14

Conditional Assessments for Factors

• The circles indicate conditions.• Dotted horizontal bars are previous assessments.• Solid bars are current assessments and must be

within the dotted bars if is positive-definite.

[Go to program]

15

Calculating

Iterative calculations determine .

Start by estimating the lower-right scalar

element of , and call it . Then estimate the

lower-right of and call it , etc.

If

and is positive-definite, then so is

provided .

1A

p2 2

1

a 'A

a Aii i

ii i

a

1A

iAi

11

a ' A aii i iia

Ap

16

Alternative Prior Models

Individuals can show systematic bias in their

subjective assessments. The aim is to form prior

models that allow a small amount of data to

largely correct some potential biases.

Prior 2The marginal distribution of is diffuse, rather

than . The conditional distribution

of is assumed to be unchanged:

This allows for error in specifying the origin of the

Y-axis.

0 00

N ( , )b

| MVN (b, )

17

Prior 3Prior 3 replaces the scale for Y with some other

linear scale. is again given a diffuse

distribution and the conditional distribution of

is taken to be

is also given a diffuse distribution.

Prior 4This is the same as Prior 3, except it allows for

systematic bias in quartile assessments by putting

are given diffuse distributions.

2 | MVN ( b, )

|

| MVN ( b, )

a d, n

18

Cross-validation and scoring

• The usefulness of a prior distribution can be objectively examined by using cross-validation and a scoring rule.

• For the cross-validation the data for a species were divided into four sets. Each set in turn was omitted and the remaining sets used to form prediction equations.

• Prediction equations were applied to the omitted set and squared error loss determined:

where the summation is over all sites in the omitted (validation) set, is the probability of presence given by the prediction equation, and is a 0/1 dummy variable indicating absence/presence.

• This defines a proper scoring rule.

2Squared error loss ( )k k

kw

k

kw

19

Results for little bent-wing bat

_______________________________________Method Set 1 Set 2 Set 3 Set4 Total

Prior 1 9.57 8.93 8.94 9.30 36.74Prior 2 9.62 9.03 8.98 9.24 36.87Prior 3 9.52 8.86 8.92 8.81 36.11Prior 4 9.73 8.87 8.90 8.62 36.13Frequent. 11.03 9.72 9.55 10.78 41.07No data 10.83 9.81 9.92 10.56 41.12

SampleResults

11/94 10/94 10/93 11/94 42 in375

20

-5.0

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

-5.0 -4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0

prior value

po

ster

ior

valu

e u

sin

g P

rio

r 1

Prior 1

-5.0

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

-5.0 -4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0

prior value

po

ster

ior

valu

e u

sin

g P

rio

r 2

Prior 2

-5.0

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

-5.0 -4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0

prior value

po

ster

ior

valu

e u

sin

g P

rio

r 3

Prior 3

-5.0

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

-5.0 -4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0

prior value

po

ster

ior

valu

e u

sin

g P

rio

r 4

Prior 4

MVN (b, ) MVN (b, )

2

MVN ( b, ) MVN ( b, )

21

____________________________________________

____________________________________________

.

Method

Littlebent-wingbat

Comm-onbent-wingbat

Frog-mouth

Pow-erfulowl

Great-erglider

Prior 1 36.74 12.75 28.76 13.61 43.90Prior 2 36.87 12.73 28.91 13.60 43.94Prior 3 36.11 12.41 25.99 13.17 42.35Prior 4 36.13 12.75 28.61 13.61 43.90Frequent. 41.07 13.70 30.91 14.38 44.15No data 41.12 13.66 29.54 15.07 48.81

SampleResults

42 in375

13 in375

31 in324

14 in324

53 in343

22

Concluding Comments

• The elicitaion method described here is able to handle large problems by:

(a) using interactive graphics

(b) suggesting values to the expert that might

represent his or her opinions.• It is believed that the use of graphs can improve

the quality of the assessed distributions.• Cross-validation can demonstrate clearly the gain

from using prior knowledge, when there is such gain.

• Additional parameters in the prior model can allow limited data to be used more effectively.

Documents

1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi