Upload
tranlien
View
217
Download
0
Embed Size (px)
Citation preview
FITTING THE LOGISTIC AND GOMPERTZ GROWTH CURVES TO CUMULATIVE GERMINATION DATA
Wendy Bergerud Biometrician
LIBRARY '"'" 0 CqpTC
gJ(&\$S:5"F:k: OF 1 - 1 \.<;:.,S! r cP 14.50 Gt?t/ERP,;i\dEN-f S"r.
VICTQRIA, B.C. V8W 3E7
B.C. Min is t ry o f Forests Research Branch
Technical Support Group 31 Bastion Square
Vic tor ia , B-C, V8W 3E7
1. INTRODUCTION
This es say w i l l d e s c r i b e how t o f i t t h e sigmoid-shaped l o g i s t i c and gompertz cu rves t o cumulat ive germinat ion d a t a u s ing t h e SAS procedure PROC N L I N . The d a t a a r e c o l l e c t e d by determining t h e p ropor t i on (Y) of s eeds which
have germinated a t any g iven time ( t ) . This in format ion is c o l l e c t e d a p re sc r ibed number of days a f t e r i n i t i a t i o n of germinat ion.
The d a t a have t h e fo l lowing c h a r a c t e r i s t i c s :
1 ) 0 G Y( t ) G 1, where Y( t ) i s t h e p ropor t i on of germinated s eeds a t times t.
2) Y( t - 1) G Y( t ) , f o r each se t of s eeds observed through time.
3 ) Observat ion times may not be e q u a l l y spaced.
4) There may be censored obse rva t ions , i.e., t h e s t u d y is l i k e l y t o end be fo re a l l s e e d s which a r e v i a b l e have germinated.
The d a t a a r e c o l l e c t e d t o answer t h e fo l lowing ques t ions :
1 ) What is t h e f i n a l germinat ion propor t ion?
2) What is t h e maximum germinat ion r a t e and when does it occur?
3 ) If d i f f e r e n t sets of seed have rec ieved d i f f e r e n t t r ea tmen t s and/or a r e from d i f f e r e n t sou rces t hen which t rea tment and/or sou rces a r e d i f f e r e n t i a t e d from which?
The a n a l y s i s o f t h i s t y p e o f d a t a c o u l d b e approached a l o n g s e v e r a l
d i f f e r e n t avenues ( s e e S c o t t , J o n e s and Williams 1984) . P robab ly t h e o l d e s t
approach would b e t o " l i n e a r i z e H t h e l o g i s t i c o r gompertz ma themat ica l form
and t h e n f i t a s t a n d a r d r e g r e s s i o n . T h i s would r e q u i r e t r a n s f o r m i n g t h e
p r o p o r t i o n s . A s l i g h t l y more s o p h i s t i c a t e d approach would b e t o f i t t h e
n o n - l i n e a r e q u a t i o n d i r e c t l y . T h i s approach is more manageable now w i t h t h e
widespread a v a i l a b i l i t y o f compute r s and packaged programs. Also t h e r e h a s been more t h e o r e t i c a l work done l a t e l y t o s t u d y t h e e x t r a problems t h a t
n o n - l i n e a r models pose . These two approaches assume t h a t t h e l l e r r o r s u a b o u t
t h e f i t t e d c u r v e w i l l b e i n d e p e n d e n t l y normal ly d i s t r i b u t e d w i t h e q u a l
v a r i a n c e s a t any time o r p o i n t a l o n g t h e c u r v e . I t may b e more r e a s o n a p l e t o assume a b inomina l d i s t r i b u t i o n f o r t h e d a t a s o t h a t a g e n e r a l i z e d l i n e a r
models approach would be r e q u i r e d ( s e e , f o r example, McCullagh and Ne lder ,
1983). The l o g i s t i c l i n k c o u l d b e used f o r a l o g i s t i c shaped c u r v e and t h e
complementary l o g - l o g l i n k f o r t h e gompertz.
The above approaches do n o t t a k e i n t o a c c o u n t t h e non-independence o r
l lwi th in - sub jec tv l n a t u r e o f t h e o b s e r v a t i o n s s o t h a t t h e s t a t i s t i c s d e r i v e d
c o u l d b e b i a s e d and g i v e i n c o r r e c t p r o b a b i l i t y l e v e l s f o r t h e d i f f e r e n t
p o s s i b l e c o n c l u s i o n s . A more comprehensive approach would b e t o view t h e d a t a
as t h e r e a l i z a t i o n o f a s t o c h a s t i c p r o c e s s where t h e p r o b a b i l i t y o f
g e r m i n a t i o n would change w i t h time, t r e a t m e n t and t h e p r e v i o u s o b s e r v a t i o n .
The i n f l u e n c e o f t h e p r e v i o u s o b s e r v a t i o n may b e v e r y i m p o r t a n t s i n c e t h e
number o f s e e d s which have ge rmina ted c a n n o t l o g i c a l l y d e c r e a s e w i t h time ( i . e . , from t h e p r e v i o u s o b s e r v a t i o n ) .
A d i f f e r e n t approach which would a l s o t a k e i n t o a c c o u n t t h e dependent
n a t u r e o f t h e o b s e r v a t i o n s would b e t o u s e a p r o p o r t i o n a l h a z a r d s model. These
models were deve loped f o r a n a l y s i s o f s u r v i v a l i n t h e h e a l t h s c i e n c e s and much
work h a s been done on t h e i r u s e . T h i s approach t h e n c o u l d make u s e o f t h e
huge l i t e r a t u r e p u b l i s h e d on t h i s s u b j e c t w i t h i n t h e h e a l t h s c i e n c e s .
T h i s e s s a y w i l l o n l y d i s c u s s t h e f i r s t two approaches i n any d e p t h , and
emphasis w i l l b e on t h e n o n - l i n e a r f i t s . T h i s is j u s t i f i e d by t h e g r e a t e r
famil iar i ty of these techniques and by the experience tha t data, so f a r , a r e
f i t t e d reasonably well by them. The l a s t two approaches, while interest ing,
would require a great deal more mathematical labour and would be more
d i f f i c u l t t o understand.
2. THE MODELS
The four models which w i l l be examined i n t h i s report a re summarized i n
Table 1. Useful character is t ics o-f 'bvtJ3 grswdh curs/e?s a re summarized i n
Table 2. Capital l e t t e r s w i l l denote 91trueu values OF ~ n a m e t e r s while the
observed values or estimates w i l l be denoted by small l e t t e r s . Examples of
the four models a re plotted i n Figures I through 4 . I n the non-linear models
a l l three parameters can be estimated but i n the l inear versions i t is
necessary t o "known A before the data can De f i t t e d . The parameter A i s the
asymptopic value for Y ( t ) , i . e . , i t i s the maximum proportion of germinated
seeds which could be observed a f t e r a very lnng tine. The parameter C is
essent ial ly a Klocationu parameter stxifjting the curve t o the l e f t or r ight
along the X-axis while B i s a "scalee parameter. For grawth models B i s
always posit ive and larger values indicate a quicker rise from zero t o A .
Table 1: Mathematical forms of the l o g i s t i c and gompertzqrowth curves
GROWTH CURVE NON-LINEAR ILXNEAR
Logistic Y = A / ( 1 + e c+Bt) + E Log ( ( A - Y ) / Y ) = C + B t + E
Gompertz
Where Y i s the observed proportion of seeds germinated a t time t.
A , B and C a re the I1truew parameters of each model. E i s the "truet1 independent e r rors dis t r ibuted as N(O),o ')
The funct ion o f C and B as l oca t i on and scale parameters i s separate f o r the'
l o g i s t i c curve but i s confused for the gompertz. An a l t e rna t i ve
i n t e rp re ta t i on f o r C and B i n the gompertz may be t ha t C i nd ica tes how soon
the curve s t a r t s t o r i s e whi le B ind icates how qu ick ly i t w i l l r i s e once
star ted. Nevertheless qu i t e s i m i l a r gompertz curves can be generated by very
d i f f e r e n t p a i r s o f B and C so i t i s d i f f i c u l t t o spec i fy t h e i r e f f e c t on the
curve.
Table 2. Some charac te r i s t i cs o f the l o g i s t i c and gompertz growth curves
Character is t ic Log i s t i c Gompertz
C+Bt ) -Bt -Ce Non-linear Form Y = A / ( 1 + e Y = Ae
= A / ( 1 + De B t > Parameters: Asymptote A
Location C
Scale B
Inverse Form t = ( l o g (A-Y)/Y )-C)/B t = (( log(1og A/Y) ) + l o g C)/B
Rate o f Change
( i .e . dewivative) BY (Y-A)/A
Maximum Rate o f Change
occurs a t t ime t = -C/B t = l o g C/B
= -Log D/B
when Y = A/2 Y - - A/e
w i th r a t e o f G = -AB/4 G = AB/e
Time when Y = 0.99A t = (-4.595 - C)/B t = (-4.600 + l o g C)/B
0.90A t = (-2.197 - C)/B t = (-2.250 + l o g C)/B
0.50A t = ( O - C)/B t = (-0.367 + l o g C)/B
0.37A t = ( 0.541 - C)/B t = ( 0.0 + l o g C)/B
0.10A ' t = ( 2.197 - C)/B t = ( 0.834 + l o g C)/B
C *-----------------------------------------------------
I DD-DDD-DD-DDD-DD-CC--
D D C D D D D D
C C C C C C C C D D
C C C
D D C C C
D C
C C 0
D C 8 8
D C 8 8
C B B
D C e
I D C e e
c e
D C B
E D C
B
c e D
e c B
I D C e
0 %
D C P 6 ---------------------D------c-----------e---------------------------------
c e D 0
C B A A
B
1 D C 6 C 9
A A
D 6 A A
c e A A
f D C E A A
B A A
D c e A A
£ D C A A
E 9 C
A A A e A c
i D c e A A A D C 6 8 A A A
D c e s A A re c B A A A rr-g-- - I E D c c a 6: & A A A A y: ( 1 , s
C i r E 6 b A A A A A J E A A A A A
E C - - + - - - - - - * - - - - - - * - - - - - - * - - - - ~ - * - - - ~ ~ ~ * ~ ~ ~ ~ ~ ~ * ~ ~ ~ ~ ~ ~ * ~ ~ ~ ~ ~ ~ * ~ ~ ~ ~ ~ ~ * ~ ~ ~ ~ ~ ~ * ~ ~
0 5 1 C 1 5 20 2 5 3 0 35 4 0 4 5 5 0
FIGURE 1. Plots of the l o g i s t i c curve w i t h different values of B 1.0 <-----------------------------------------------
D - G D 3 - D D - D D D - D D - C C C - C C - C C - -
t D D C C D C C D
C C C C C CC D D
C C C C C C
e D C C e e e e
i D C E E D C
8 5
C C e s
c e e D C
5
O S 8 t C e
D C f
e 3
t C C D
D C 5
6
c t c I 6
c.6 i c 9
E 6
D c en At+ I A
a A
D E A 0.4 + A k
c C 5 A A B = .05, .1, .15, .2 [----------------------------------------------------------*---------------
D 5 C 6
A A A
3 A
D C A
C D b
D E b A
C 6 L
I A
E D C 5 E
A A
D C A A
B A A I; i -Sg ' -
c e DC 6i
A a A A A
Y: e D C C i 6 8 A A A A A C
0.0 * A A r t A A A A A A A A t - - + - - - - - - * - - - - - - + - - - - - - , - - - - - ~ * - - - - ~ - * ~ ~ ~ ~ ~ ~ * ~ ~ ~ ~ ~ ~ * ~ ~ ~ ~ ~ ~ * ~ ~ ~ ~ ~ ~ * ~ ~ ~ ~ ~ ~ * ~ ~
0 5 10 1 5 Z C 2 5 3 0 3 5 40 4 5 50
FIGURE 2. P lo ts of the gornpertz curve w i t h different values of 13
FIGURE 1. P l o t s o f t h e l o g i s t i c c u r v e w i t h d i f f e r e n t v a l u e s
4 44 4 444 4 8
4 44 444 4 6 8 8 8
44 8 8 8 8 @
1 1 1
4 4 44 8 8
1 8 11
1 1
D. 8 4 4
8 8 ' 1 1 1
i 4 4 4
8 6
1 4 8
1 4 6
1
4 8 1
4 8 1 1
4 4 8
6 1
4 8 1
4 8 1 1
4 4
8 8
1 1 ---------------------4--------~--------- l---------------------------------
4 8 1
4 8 1
4 8 8
1
4 8 1
1
8 ' 1
8 1
8 1
8 1
8 8 1
1 l1
11 11
1 C - 2.1 t Y = ( I + e j-
L --+------+------+------+------+-----*------+------+------+------+------+--
0 5 10 1s 20 25 S O 35 40 4s 50
FIGURE 2. P l o t s o f t h e gompe;~z c u r v e w i t h d i f f e r e n t v a l u e s
3. THE FITTING PROCEDURE
I n t h i s sec t ion I w i l l describe how t o use PROC NLIN t o f i t e i t h e r the
l o g i s t i c o r gompertz curve t o data. I w i l l n o t discuss how NLIN does t h i s .
An exce l lent , n o t too techn ica l discussion o f non-l inear est imat ion can be
found i n Chapter 10 o f Draper and Smith (1981). This i s recommended reading
as i s , o f course, the descr ip t ion o f NLIN i n the SAS User's Guide: S t a t i s t i c s .
NLIN has two d i f f e r e n t de fau l t est imat ion methods. I f der i va t i ves o f the
model are present the Gauss-Newton method (denoted as GAUSS) i s used, whi le i f
they are no t present the Secant method (denoted as DUD) i s used. The
Gauss-Newton method i s fas te r and t h e r e ~ y cheaper than the Secant Method bu t
requ i res the ca l cu la t i on o f der ivat ives. Determining these der i va t i ves may be
an onerous task so t h a t the Secant method becomes much more a t t r a c t i v e . An
example was r u n f i t t i n g e i t h e r the l o g i s t i c o r gompertz model w i t h e i t h e r the
GAUSS o r DUD methods. The r e s u l t s are presented i n Table 3.
Table 3. Comparison o f the Secant and Gauss-Newton methods w i t h PROC NLIN
MODEL FITTING METHOD: DUD - GAUSS
L o g i s t i c CPU t ime = 0.65 sec. 0.32 sec.
Number o f iterations = 2 0 7
Gompertz CPU t ime = 0.82 sec. 0.38 sec.
Number o f i t e r a t i o n s = 2 2 11
A l l f o u r computer runs used 284K o f workspace memory storage. The sums o f
squares f o r t h e two methods agreed t o w i t h i n e i g h t decimal p laces w h i l e t h e
parameter est imates agreed t o a t l e a s t f o u r s i g n i f i c a n t d i g i t s . I t i s q u i t e
c l e a r t h a t i f computer cos ts a r e impor tant then t h e Gauss-Newton Method should
be used. I t would be o f some va lue then t o develop, i n general, t h e
d e r i v a t i v e s f o r these two models.
3.1 I n c l u s i o n o f Treatment E f f e c t s i n t h e Models
Before developing t h e general form o f t h e d e r i v a t i v e s f o r these two models
we must f i r s t expand t h e simple forms presented i n Taole 1 t o a l l ow d i f f e r e n t
treatments t o i n f l u e n c e t h e shape o f t h e curves. Treatment e f f e c t s may be
represented by d i f f e r e n t values o f A , B, o r C f o r bo th t h e l o g i s t i c o r
gompertz curves. As i n o rd ina ry m u l t i p l e regress ion we can use dummy
va r iab les t o represent such t reatment e f f e c t s . A s imple example with j u s t two
treatments w i l l be used f o r i l l u s t r a t i o n , and extensions t o more complicated
s i t u a t i o n s should be s t ra igh t fo rward .
Suppose t h a t we de f ine a dummy v a r i a b l e X such t h a t :
X = 0 f o r t reatment 1
= 1 f o r t reatment 2
Then we can r e w r i t e t h e non- l inear forms f o r Table 1 as:
LOGISTIC: Y = ( A + AIX)/[l + ExP((C + C1x) + ( 6 + B ~ x ) ~ ) ] . . . . (1)
GOMPERTZ: Y = (A + AIX) EXP[- (C + CIX)EXP (-(B + BIX)t)] . . . . (2)
Each model now has t h e s i x parameters A, A1, B, B1, C and C1, which can
be est imated us ing PROC NLIN. For t h e f i r s t t reatment X = 0 t h e gompertz
model i s :
Y = A EXPI- C EXP(- B t ) ]
The model f o r t h e second t reatment i s :
s ince X = 1. The meaning o f t h e t h r e e new parameters i s now c l e a r . AL i s
t h e d i f f e r e n c e i n t h e asymptotes f o r t h e two treatments, w h i l e B1 and C1
a r e t h e d i f f e r e n c e s i n t h e B and C parameters f o r t h e two treatments. I f t h e
two treatments do n o t have d i f f e r e n t e f f e c t s then we would expect A1, B1
and C1 t o be zero. Treatment d i f f e rences can be demonstrated by non-zero
values o f any one o r combinat ion o f A1, B1 and C1. The models (1) and
( 2 ) and t h e i r d e r i v a t i v e s can be e a s i l y extended t o i n c l u d e covar ia tes and/or
dummy va r iab les t o represent a d d i t i o n a l treatments. Der i va t i ves f o r models
(1) and (2) a r e g iven i n Tables 4 and 5.
Table 4: Der i va t i ves f o r t h e L o g i s t i c Curve
BTA
dY - - - d A,
X * BTA
dY - - dB
- t * DER
- dY = X * t * DER - dY = X * DER dB1 dC1
where BT = ExP[(C + C1x) + (B + B1x)t]
BTA = 1 / ( 1 + BT)
and E R =-(A + A1x)0~/( (1 + BT) 2
Table 5: Der i va t i ves f o r t h e Gompertz Curve.
where EB = EXP(-(B + BIX)t)
3.2 Suggested F i t t i n g Procedures The Gauss-Newton method is i t e r a t i v e and s o r e q u i r e s i n i t i a l e s t i m a t e s o f
t h e parameters , c a l l e d s t a r t i n g values . It a l s o r e q u i r e s d e r i v a t i v e s o f t h e func t ion o r model with r e s p e c t t o each o f t h e parameters i n t h e model. The d e r i v a t i v e s f o r t h e incrementa l parameters A1, B1 and C1 would no t be r equ i r ed i f j u s t one curve was t o be f i t .
Some exper ience with t h i s problem has suggested t h a t t h e procedure o u t l i n e d below is an e f f i c i e n t way t o proceed.
Step 1: Obtain S t a r t i n g va lues f o r each curve S t e p 2: F i t a l l cu rves i n d i v i d u a l l y S t e p 3: Se t up dummy v a r i a b l e s t o follow p o s s i b l e response p a t t e r n s o f t h e
d a t a t o t h e t r ea tmen t s and e s t i m a t e paameters S t ep 4: Try d i f f e r e n t models on a l l t h e d a t a u n t i l s a t i s f i e d t h a t a
reasonable f i n a l model has been found
I w i l l now d i s c u s s each s tep i n some d e t a i l .
3.2.1 S t a r t i n g Values S t a r t i n g va lues f o r one fit can be ob ta ined by p l o t t i n g t h e propor t ion o f
seed germinated a g a i n s t time observed and by p l o t t i n g t h e d i f f e r e n c e s of succes s ive obse rva t ions versus t h e average time of t h e two observa t ions . The second p l o t is a rough p l o t of t h e growth r a t e . By eye, draw a smooth sigmoid shaped curve through t h e d a t a on t h e first p l o t and a peaked curve f o r t h e second p l o t . F igu re s 5 and 6 provide some example p l o t s .
A good e s t i m a t e of A can be ob ta ined from t h e first p l o t by no t ing a t what va lue t h e curve seems t o be l e v e l i n g out . The asymptot ic value should be s l i g h t l y higher . For example, i n F igure 1 t h e h ighes t va lue t h e curve a t t a i n s is 0.88 s o we might use 0.90 a s an i n i t i a l e s t i m a t e o f A , denoted by
A. = 0.90. I n i t i a l e s t i m a t e s o f t h e maximum germinat ion r a t e and t h e time
o f its occurrence w i l l be denoted by Bo and Co r e spec t ive ly . We can o b t a i n t h e s e by r ea r r ang ing t h e app rop r i a t e formula i n Table 2. I f t h e e s t i m a t e of t h e maximum germinat ion r a t e is GR and t h e time is tR then
LOGISTIC: B, = -4GR / A, and Co = -Bo tR . * . ( 3 )
GOWERTZ: B, = eGR / A, and co = exp (Bo tR) -. . (4)
FIGURE 5. P l o t o f p ropor t i on of s eeds germinated ve r sus time, with a hand- f i t t ed curve.
LOGISTIC GOMPERTZ Bo -0.27 Bo 0.18 Co 5.6 Co 45.3
FIGURE 6. P l o t s of germinated r a t e ve r sus time, with a hand- f i t t ed curve.
The v a l u e s of GR and tR c o u l d be o b t a i n e d from t h e first p l o t by n o t i n g t h a t t h e maximum g e r m i n a t i o n r a t e o c c u r s a t Ao/2 f o r t h e l o g i s t i c c u r v e and
Ao/e f o r t h e gompertz. The time t h e c u r v e r e a c h e s A0/2 o r Ao/e is tR and t h e s l o p e of t h e t a n g e n t a t t h a t p o i n t is GR. It w i l l be e a s i e r t o g e t v a l u e s f o r GR and tR from t h e second p l o t and t h e y are l i k e l y t o be more dependable t h a n t h o s e from t h e first p l o t ( i f enough p o i n t s a r e a v a i l a b l e ) . The time a t which t h e peak o c c u r s is tR and t h e h e i g h t o f t h e peak is GR. The above e q u a t i o n s (3) and (4) c a n be used t o o b t a i n Bo and Co. T i p t o n (1984) d i s c u s s e s a q u i c k e r method of o b t a i n i n g s t a r t i n g v a l u e s f o r t h e gompertz curve. An advan tage o f t h e g r a p h i c a l method is t h a t t h e r e s e a r c h e r w i l l deve lop some f a m i l i a r i t y w i t h h i s / h e r
d a t a .
3.2.2 F i t t i n g Curves I n d i v i d u a l l y
The n e x t s t e p is t o f i t t h e c u r v e s f o r each t r e a t m e n t s e p a r a t e l y . The e a s i e s t way t o do t h i s is w i t h a BY s t a t e m e n t added t o PROC N L I N s o t h a t SAS does s e p a r a t e a n a l y s e s f o r each v a l u e o f t h e v a r i a b l e ( s 1 i n d i c a t e d i n t h e BY s t a t e m e n t . T h i s may c a u s e some problems w i t h t h e s t a r t i n g v a l u e s though, s i n c e o n l y one PARAMETER s t a t e m e n t c a n be used. If t h e v a l u e s o b t a i n e d from t h e p r e v i o u s step a r e n o t t o o d i f f e r e n t f o r a l l t h e c u r v e s t h e n one set o f v a l u e s may be used. I f t h e v a l u e s a r e q u i t e d i f f e r e n t t h e n a g r i d o f v a l u e s (see SAS manual) may be used a l though t h i s is more expens ive t o run. Examples o f t h e SAS s t a t e m e n t s a r e p r e s e n t e d i n Tab le 6.
There a r e a few problems which may occur whi le runn ing t h i s job. Var ious e r r o r messages may be o b t a i n e d if t h e d e r i v a t i v e s are i n c o r r e c t , a l t h o u g h none o f t h e
e r r o r messages w i l l s t a t e t h i s . Other problems are d e s c r i b e d i n t h e SAS manual, Draper and Smith (1981) and Bard (1974).
A major problem t h a t we must be aware o f , is t h e p o s s i b i l i t y t h a t t h e program w i l l n o t converge t o t h e set of pa ramete r e s t i m a t e s which p r o v i d e t h e minimum r e s i d u a l sums o f s q u a r e s . I n some cases t h i s can be found by look ing a t t h e p l o t o f t h e f i t t e d c u r v e o v e r l a i d on a p l o t of t h e d a t a and n o t i n g t h a t t h e c u r v e obv ious ly fits worse t h a n it cou ld . For example, when t h e c u r v e is a s t r a i g h t l i n e
b u t t h e d a t a is a s igmoid ly shaped scatter o f p o i n t s .
To understand t h i s problem and how i t can a r i s e we need a t l e a s t a s imple model
f o r how PROC NLIN works. The goa l o f t h e program and f o r any l i n e a r regression
(such as PROC REG) i s t o f i n d t h e parameter est imates o f t h e curve which prov ide
t h e smal les t sum o f t h e squared res idua ls , where each r e s i d u a l i s ca l cu la ted as t h e
d i f f e r e n c e o f t h e observed value from t h a t va lue p red ic ted by t h e f i t t e d curve. I n
l i n e a r regress ion these est imates can be found d i r e c t l y i n j u s t one step. A p l o t
o f t h e r e s i d u a l sums o f square (RSS) versus one parameter might l o o k l i k e :
lowest Residual Sums o f Squares
"bestn est imate
There is only one minimum. I n t h e non-l inear c a s e t h i s curve does no t have a p a r a b o l i c shape a l though it may be well approximated by a p a r a b o l i c shape nea r
t h e minimum. I n f a c t , it is p o s s i b l e t o have s e v e r a l lfminimumslf i n t h e
non-l inear case . I n f a c t it might look l i k e :
RSS
f fbes t" e s t i m a t e
Obviously we would l i k e t o o b t a i n t h e ug loba l l l minimum. The c l o s e r t h e
s t a r t i n g va lues a r e t o t h e l1best" e s t i m a t e t h e more l i k e l y t h e program w i l l
i t e r a t e t o t h e g l o b a l minimum. If t h e s t a r t i n g va lues a r e bad then we may g e t some
o t h e r minimum i n s t e a d . Th i s problem must be kep t i n mind while f i t t i n g non-l inear
equa t ions . See Draper and Smith (1981) f o r a more comprehensive d i s cus s ion o f t h i s
po in t . Another method o f d e t e c t i n g l o c a l minimums i n s t e a d o f t h e g l o b a l minimum
w i l l be de sc r ibed below when d i s c u s s i n g t h e comparison of d i f f e r e n t models. It is u s e f u l a t t h i s s t a g e t o p l o t t h e f i t t e d l i n e wi th t h e d a t a p o i n t s and t o
p l o t t h e r e s i d u a l s . Example SAS program s t a t emen t s t o do t h i s a r e i n Table 6.
Table 6. Example SAS programs t o f i t the l o g i s t i c and gompestz curves
TITLE EXAWLE SAS PROCRAHS FOR USING PROC NLIN;
/* PROGRAXXING FOR FITTING THE LOGISTIC CURVE */
/* BELOW ARE ADDITIONS FOR PROC NLIN DATA-TREAT; /* FITTING TWO CURVES AT ONCE BY YR; PARAXETERS A10.90 B--0.27 C-5.6 /* A1=-0.1 B1--0.02 Clml AAAA-A / * + A l * X BBBB = B / * + B l * X CCCC = C / * + C l * X BT EXP (CCCC + BBBB*DAY) ; BTA-1/ (~+BT) ; DER - -AAAA*BT/ (~+BT) **2 ;
HODEL FGC - AAAA*BTA; DER.A BTA; /* DER.Al = X * BTA; DER.B = DAY *DER ; /* DER.Bl = X * DAY*DER; DER.C = DER ; / * DER.Cl = X * DER; OUTPUT OUT=RESID PREDICTED-PFGC RESIDUALmRFGC;
TITLE2 -- FITTING THE LOGISTIC CURVE --; TITLE3 INDIVIDUAL RUNS FOR EACH YEAR ; PROC RANK DATAmRESID OUT-RESID NOW-BLOH; BY TREAT; VAR RFGC; RANKS STDNRH;
PROC PLOT DATAmRESID; PLOT RFGC * (STDNRH DAY)=YR / VREFmO; PLOT FGC*DAY=YR PFGC*DAY='*'/OVERLAY;
I
/* PROGRAnnING FOR FITTING THE GOHPERTZ CURVE */
/* BELOW ARE ADDITIONS FOR */ PROC NLIN DATAITREAT; /* FITTING TWO CURVES AT ONCE */ BY YR; PARAMETERS A-0.90 B-0.18 (345.3 /* Alm-0.1 Bll-0.02 C1-15 */ ; A Q A A L A / * + A l * X */ ; BBBB = B / * + B l * X */ ; CCCC = C /* + C I A X */ ; EB EXP (-DAY* (BBBB) ) ; EE = EXP (-EB* (CCCC) ) ; BE = CCCC*DAY*AAAA*EB*EE; CE = -AAAA*EB*EE; XODEL FGC = * EE; DER.A = / * DER.Al= X * EE; * / DER.B = BE; /* DER.Bl= X * BE; */ DER.C = CE; /* DER.Cl= X * CE; * / OUTPUT OUT-RESID PREDICTEPPFGC RESIDUALIRFGC;
TITLE2 -- FITTING THE GOWPERTZ CURVE --; TITLE IVIDUAL RUNS FOR EACH YEAR; PROC DATA=RESID OUTnRESID NOW=BLOX; BY TREAT; VAR RFGC; S STDNRH;
PROC PLOT DATAmRESID; PLOT RFGC * (STDNRH DAY)=YR / VREFmO; PLOT FGC*DAYmYR PFGC*DAY='*'/OVERLAY;
3.2.3 Dummy Variables
The next s tep i s t o decide how t o represent our treatments us ing dummy
var iab les and t o determine i n i t i a l estimates o f t h e i r parameters. This can be
done i n many ways and what i s chosen w i l l depend somewhat on the expected
re la t ionsh ips between the d i f f e r e n t treatments. I f there are only two
treatments then one could choose the dummy var iab le t o have values 0 o r 1 as
discussed e a r l i e r . Another p o s s i b i l i t y i s t o use the values -1 o r 1. With
more treatments there are many more choices. As an example, suppose t h a t the
treatments form a 2x3 f a c t o r i a l so t h a t there are a t o t a l o f s i x treatments.
We might choose the fo l low ing var iables:
F i t t e d value - - X3 X4 X5 f o r the asymptote Treat A Treat B X 1 X2 - - -
1 1 0 0 0 0 0 A
2 0 0 0 0 1 A + A5
2 1 0 0 0 1 0 A + A4
2 0 0 1 0 0 A + A3
3 1 0 1 0 0 0 A + A2
2 1 0 0 0 0 A + A1
Only f i v e dummy var iab les are requ i red f o r s i x treatments. I n general the
number o f dummy var iab les requ i red i s one l e s s than the t o t a l number o f
treatments. Another poss ib le arrangement i s :
F i t t e d value Treat A Treat B - X 1 X2 X3 X4 X5 o f the Asymptote - - - - 1 1 1 1 1 1 1 A + A 1 + A + A 3 + A + A 5 2 4
2 - 1 1 1 -1 -1 A - A 1 + A + A 3 - 2 A4 - A5
2 1 1 0 -2 0 2 A - A 1 - 2A3 + 2 -1 0 -2 0 2 A - A 1 -
2A5 2A3 +
3 1 1 -1 2A5
1 -1 1 A + A 1 - A + A 3 - A + A 5 2 4
2 -1 -1 1 1 -1 A - A 1 - A + A J + A 4 2 - A5
T h i s arrangement uses polynomial contrasts. Thus l inear and quadratic trends
i n treatment A could a l so be detected i n addition t o the general fac tor ia l
t e s t s . A t h i r d arrangement which takes the f ac to r i a l arrangement of
treatments A and B i n to account is:
Fitted value
Treat A Treat B - X 1 - X2 X 3 X4 X 5 for the Asymptote - - -
The dummy variables X 1 , X2 and X3 represent the main ef fec ts of treatments
A and B while X4 and X5 a re t h e i r interaction.
A l l three s e t s of dummy variables can be used t o allow differences i n the
asymptote A and/or scale parameter B and/or location parameter C . To see how
t h i s can be done, l e t ' s create three parameters which are functions of the dummy variables so tha t t h e i r value can be different depending on the
treatment. That is:
Let AAAA = A + A l X l + A2X2 + A 3 X 3 + AqXq + AgXg
BBBB = B + B l X l + B2X2 + B3X3 + 04x4 + BgXg
CCCC = C + C l X l + 4 x 2 + C3X3 + C4X4 + cgxg
The form o f AAAA as a func t ion o f i n d i v i d u a l parameters i s i n t h e l a s t
column o f t h e above desc r ip t i ons o f t h r e e poss ib le dummy v a r i a ~ l e
arrangements. The s p e c i f i c values o f A1,A2, . . . ,A5 w i l l be d i f f e r e n t
depending on t h e dummy v a r i a b l e s chosen. The f u n c t i o n o f BBBB and CCCC would
be s i m i l i a r t o those o f AAAA. The form o f t h e model which w i l l i n c l u d e any
t reatment e f f e c t s on t h e shape o f t h e l o g i s t i c o r gompertz curves i s :
To r u n t h i s l l fu l l l l model we need i n i t i a l est imates f o r A,A1,A2.. .A5,
B, ... B5,C, ... C5. We can o b t a i n these by us ing est imates obta ined from t h e
i n d i v i d u a l f i t s . The values we g e t w i l l depend on t h e dummy v a r i a b l e
s t r u c t u r e ( a l s o known as t h e design m a t r i x ) chosen. How t h i s i s done i s bes t
demonstrated by example. Suppose we use t h e l a s t s e t o f dummy va r iab les and
t h e asymptote values shown below:
F i t t e d value
f o r t h e a s v m ~ t o t e
Est imate o f asymptote from
i n d i v i d u a l f i t s ( w i t h S.E.)
T h i s information provides us w i t h s i x equations which can be solved for
A y A 1y. . . ,A5. We get the following resul ts :
PARAMETER ESTIMATES FOR THE ASYMPTOTE
The standard er rors associated w i t h the above parameter estimates a re very
APPROXIMATE being simply the sum of the standard er rors of each term i n the
corresponding equation. We can use them a s a VERY ROUGH indication of which
parameters a r e l ike ly t o be zero or not required i n a f ina l model. Tests between appropriate models w i l l provide u s w i t h more re l iab le indications of
which parameters a re non-zero. Similiarly we can obtain i n i t i a l estimates for
the locations and scale parameters:
PARAMETER ESTIMATES
For scale , B For location, C
The as te r i sks indicate those parameters whose magnitude is greater then
its APPROXIMATE standard error . A t the l e a s t , these parameters should be included i n our f i r s t attempt for a f ina l model.
3.2.4 Finding a F i n a l Model
The next s t e p i n t h i s p rocess is t o f i t s e v e r a l models t o t h e whole d a t a s e t ( i . e . no BY s ta tement with PROC NLIN) u n t i l we f i n d ou t which fits t h e d a t a f a i r l y well. I n a l i n e a r m u l t i p l e r eg re s s ion problem t h e a d d i t i o n a l sums o f squa re s r u l e (see f o r example Dobson 1983 o r Abraham and Ledo l t e r 1983,)
can be used t o test v a r i a b l e s o r sou rces f o r l l s i gn i f i cance l t . The l o g i c is
t h a t i f we can assume t h a t t h e " f u l l modelH fits t h e d a t a f a i r l y well, then we
examine t h e i n c r e a s e i n t h e Res idua l Sums of Squares (RSS) when one v a r i a b l e
o r sou rce of v a r i a t i o n is removed. I f t h e change is sma l l then t h a t v a r i a b l e o r sou rce does not c o n t r i b u t e much t o t h e explana t ion o f t h e da t a . We might dec ide t o l e a v e it out of t h e f i n a l model. On t h e o t h e r hand, i f t h e change is l a r g e t hen we must l e a v e it i n . This p r i n c i p l e can be used with cau t ion i n t h e non-l inear case .
The uncorrected t o t a l sums of squa re s (UTSS), c o r r e c t e d t o t a l sums o f squa re s (CTSS) and t h e r e s i d u a l sums o f squa re s (RSS) ou tput by PROC N L I N f o r a l l t h e i n d i v i d u a l non-l inear fits w i l l add up t o t h a t ob ta ined by f i t t i n g t h e
1 f u l l model t o t h e d a t a . Thus we need not f i t t h e f u l l model t o g e t t h e o v e r a l l minimum RSS. This RSS is t h e "bes tn t h a t any reduced model o f a l l t h e d a t a might achieve and can be used a s a y a r d s t i c k with which t o compare o t h e r models. This is, o f course , only t r u e f o r comparing l o g i s t i c models with o t h e r l o g i s t i c models o r gompertz models with o t h e r gompertz models.
I f t h e t r ea tmen t s have a f a c t o r i a l s t r u c t u r e t hen t h e first reduced models t o f i t is t h e one without t h e h ighes t o rde r i n t e r a c t i o n . If t h e change i n RSS is sma l l we may then d e l e t e t h i s i n t e r a c t i o n from t h e model and then f i t
without t h e next l e v e l ( s ) o f i n t e r a c t i o n s ( o r main e f f e c t s i f t h e r e a r e no
i n t e r a c t i o n s l e f t . ) A s i n ANOVA it is u s e f u l i f t h e r e a r e some s p e c i f i c a p r i o r i ques t i ons o r c o n t r a s t s a s t h e dummy v a r i a b l e s can be set up t o - s p e c i f i c a l l y answer t h e s e ques t ions . These specific dummy v a r i a b l e s can be
t e s t e d e i t h e r s i n g l y o r i n l o g i c a l groups.
l ~ h i s may not occur f o r t h e RSS if t h e RSS of any of t h e i n d i v i d u a l fits
and/or t h e f u l l model a r e no t t h o s e of t h e g loba l minimum.
5. Example
Some germination data co l l ec ted by Carole Leadem and Sue Baker (E.P.
848.06) w i l l be used as an example. Several r ep l i ca tes o f Abies amabil is
seeds from two seedlots co l lec ted i n two consecutive years were used i n the
t r i a l . The numbers o f seed germinated were recorded over a 42-day period.
Di f ferences i n response due t o c o l l e c t i o n year w i l l be examined.
The SAS program i n Table 6 was run and the output i s i n the Appendix.
Other models were a l so f i t and per t inen t in format ion on the d i f f e r e n t runs i s
summarized i n Table 7 and 8. Both t he l o g i s t i c and gompertz curves were f i t .
The r e s u l t s a re very s i m i l a r although, o f course, t he parameter estimates were
d i f f e ren t . The res i dua l sums o f squares f o r the gompertz f i t s were very
s l i g h t l y smaller than the l o g i s t i c f i t s f o r t h i s p a r t i c u l a r s e t o f data.
Table 7. Residual Sums o f Squares and Tests o f Hypotheses
MODEL d f - I n d i v i d u a l years: 1978 4 7
1979 47 - Sum 9 4
A l l data combined 97
A d i f f e r e n t ; B, C same 96
A, C d i f f e r e n t ; B same 95
MODELS HYPOTHESES USED - d f
No o v e r a l l d i f fe rence 1-2 3,94
A d i f f e r e n t , B a n d Csame 1-3 2,94
A, C d i f f e r e n t ; B same 1-4 1,94
LOGISTIC GOMPERTZ
0.53758 0.52371 0.60595 0.55088
1.14353 1.07459
n
cn c, a, Y
U
([I k
D
c
.4
W
Cr)
V
Cr)
W
I- Q
f, I- 'm
W
ar W
I- W
z
Q
E
Q a
I- H
LL
Y W
0
0
z
a, k
([I a
E 0 0
nn
nn
0
* k
m
cn a,
a, >
0
c
-I
a, ([I
k
3
a, u
cc -4
4-
>
-4
.d
n
-0
C H
First, each year was f i t individually, and then a l l the data was f i t w i t h
no dis t inct ion made for year. An overall t e s t for differences between years was constructed by the extra sums of squares principle. The re su l t s i n
Table 7 strongly suggest tha t there is a year effect . Differences between parameter estimates for each year were compared w i t h t he i r rough standard
er ror (see Table 8 ) and suggested tha t the asymptote ( A ) should be different
b u t the scale ( B ) and location ( C ) parameters might be the same for each
year. The roughness of the standard errors used can be seen by comparing them
w i t h the bet ter estimates provided by a f i t t o the data allowing a l l parameters t o vary ( f i t 2 i n Table 8) .
The t h i r d model f i t then allowed A t o vary b u t res t r ic ted B and C t o be
the same for both years. The F-value t o t e s t the reasonableness of t h i s model strongly suggests tha t one or both of B and C vary. Plots of the individual f i t s suggest tha t it is more l ikely tha t C varies. Thus a fourth model was f i t allowing A and C t o vary and as indicated by the F-test i n Table 7 appears
t o provide a reasonable model for the data.
4 S t a t i s t i c s
The method of f i t t i n g individual curves, of se t t ing up dummy variables t o
specify treatments and the sequential f i t t i n g of various models can a l l be
accomplished qui te reasonably without s t a t i s t i c s . There are more f i t t i n g problems than for similar l inear models b u t these have been non-statistical
problems. For tha t matter we could avoid s t a t i s t i c s a l l together and decide on an ad hoc basis when changes i n the RSS were large or small. We must make -- some important and possibly unlikely assumptions t o develop s t a t i s t i c a l t e s t s
t o help u s w i t h these decisions. In so f a r as these assumptions are unreal is t ic we must be cautious of relying too heavily on the s t a t i s t i c s for
answers t o our questions.
4.1 Assumptions
The first assumption we make i s t h a t t h e l o g i s t i c o r gompertz curve,
whichever one we use , is t h e " r i g h t n model f o r t h e germinat ion process . Since
it is u n l i k e l y t h a t we eve r know what t h e " t ruet1 model is, t h i s assumption is
almost c e r t a i n l y i n c o r r e c t . What r e a l l y ma t t e r s he re , a t l e a s t f o r p r a c t i c a l purposes , is t h a t t h e model is reasonable and provides a good empi r i ca l base f o r i n v e s t i g a t i o n . If t h e r e a r e d a i l y r e p l i c a t e s f o r each t rea tment it is p o s s i b l e t o test t h e model f o r " lack o f f i t n .
The second assumption is t h a t t h e observa t ions o r r e s i d u a l s o f t h e obse rva t ions from t h e I1true1' curve a r e independent, unbiased and random. These c h a r a c t e r i s t i c s o f t h e r e s i d u a l s can no t u sua l ly be tested but must be assumed from t h e des ign and conduct of t h e i n v e s t i g a t i o n . Although t h e r e s i d u a l s from germinat ion d a t a a r e l i k e l y t o be random it is much less l i k e l y
t h a t they well be independent i f t h e same set o f s eeds a r e observed a t d i f f e r e n t times throughout t h e s tudy. How t h i s e f f e c t s t h e s t a t i s t i c a l tests proposed below is not c l e a r a t t h i s time.
The t h i r d assumption t h a t we make is t h a t t h e r e s i d u a l s fol low a normal d i s t r i b u t i o n with mean ze ro and cons t an t var iance. The assumption o f cons t an t va r i ance may not be reasonable p a r t i c u l a r l y a t very low and very high germinat ion p ropor t i ons and e s p e c i a l l y f o r t h e non - l i n e a r models s i n c e t h e
va lues a r e cons t r a ined t o be g r e a t e r than ze ro but less than one. The l i n e a r i z e d models would more l i k e l y have cons t an t var iance s i n c e t h e
t ransformed p ropor t i ons a r e no t a s r e s t r a i n e d a s a r e t h e o r i g i n a l values . The two t y p e s o f model a r e q u i t e d i f f e r e n t a t l e a s t i n t h e assumed d i s t r i b u t i o n o f t h e r e s i d u a l s . For t h e non-l inear models t h e r e s i d u a l s a r e taken from t h e sigmoid curve , while they a r e taken from t h e s t r a i g h t e n e d curve f o r t h e l i n e a r i z e d models. I f t h e r e s i d u a l s a r e normally d i s t r i b u t e d about t h e l i n e a r i z e d model they w i l l no t be normally d i s t r i b u t e d about t h e non-l inear model (and v i c e versa) . The most prominent disadvantage o f t h e l i n e a r i z e d models is t h e requirement t h a t t h e asymptote A must be assumed t o be KNOWN a s it cannot be es t imated .
Given t h a t t h e above assumptions a r e reasonable f o r t h e data a t hand then
a l l t h e usua l F- tes ts t h a t a r e used w i t h m u l t i p l e l i n e a r regress ion may be
used here. For a d iscussion o f these t e s t s see f o r ins tance W e t h e r i l l (1981),
Draper and Smith (1981), Dobson (1983) o r Abraham and Ledo l te r (1983).
4.2 Confidence L i m i t s fo r Non-Linear Funct ions o f t h e Parameters.
The s a l i e n t fea tures o f t h e f i t t e d germinat ion curve are:
i) t h e maximum germinat ion, A
i i ) t h e maximum germinat ion r a t e , G max and iii) t h e t ime tmax when Gmax occurs.
These values a r e e a s i l y obta ined from t h e parameter est imates on t h e p r i n t o u t
us ing t h e formulae i n Table 2. I t i s u s e f u l t o have est imates o f t h e standard
e r r o r s f o r these values. Th is i s e a s i l y obta ined f o r t h e maximum germinat ion
A s ince i t i s d i r e c t l y ou tput by PROC NLIN. The maximum germinat ion r a t e ,
Gmax and i t s t ime o f occurrence tmax are non- l inear func t i ons o f a l l t h ree
parameters hence i t i s much harder t o determine t h e i r s tandard e r r o r s and
approximate methods must be used. Once t h e standard e r r o r s a r e est imated t h e
conf idence l i m i t s a r e easy t o obta in .
The approximate method used depends on two assumptions. The f i r s t i s t h a t
sample s i zes a r e l a r g e enough t h a t we can invoke t h e c e n t r a l l i m i t theorem.
Th is means t h a t we assume asymptot ic m u l t i v a r i a t e no rma l i t y f o r t h e parameters
o f t h e l o g i s t i c and gompertz equations. Th is i s done s ince i t i s known how t o
o b t a i n conf idence l i m i t s f o r a m u l t i v a r i a t e normal d i s t r i b u t i o n . The second
assumption, which may be more c r i t i c a l , i s t h a t t h e f u n c t i o n such as Gmax o r
tmax i s w e l l approximated by a l i n e a r f u n c t i o n over t h e j o i n t conf idence
i n t e r v a l s o f t h e parameters. The l a r g e r t h e i n d i v i d u a l parameter conf idence
l i m i t s a r e t h e l e s s l i k e l y t h i s i s t o be reasonable.
4.2.1 Linear Functions
In general, the variance of a l inear function can be obtained very
easi ly . Suppose tha t we want t o calculate 3 A + 6 then its variance would be
9*var(A). For the maximum germination estimate we simply use the asymptote A
output by PROC NLIN and the (asymptote) estimate of its standard er ror , also
output by PROC N L I N . The confidence limits can be obtained by multiplying
t h i s standard er ror by the appropriate t-value.
4.2.2 Non-linear Functions
There a re a t l e a s t two different ways t o obtain confidence limits for the
non-linear functions. The f i r s t i s straightforward mathematically and is
similar t o the &-method [see Rao (1973) or Bishop, Fienberg and Holland
(1975)l. The non-linear function is approximated by a l inear one obtained
from the first-order terms i n a Taylor's s e r i e s expansion about the estimate.
The variance of t h i s l inear approximation i s used as an estimate of the
variance. T h i s method i s described i n Johnson and Kotz (Vol. 2 , 1984) and
Beers (1957). The second method i s described i n Box, Hunter and Hunter
(pgs. 563-570, 1978) and i s l ike ly t o be more re l iab le than the f i r s t . It
includes a check on the l inear i ty of the approximation and requires l e s s
mathematical expertise. See Box, Hunter and Hunter for a description and/or
my handout on confidence limits.
The formulae for Gmax and tmax for both the l o g i s t i c and gompertz
curves obtained by the f i r s t method a re presented i n Table 9. These variances
a r e expressed i n terms of the population or l1trueV values of the parameters.
Table 9. Approximate Variances
Curve Funct ion
L o g i s t i c Gmax = -AB -
4
Gompert z
t = LogC max B
Aooroximate Variance
[ L O ~ C ? / V ~ ~ ( B ) + Var(C) + 2Cov(B,C)
{ B I( B2 c 2 ~ o g 2 c BCLogC
Est imates a r e obta ined by s u b s t i t u t i n g parameter est imates a,b and c from t h e
p r i n t o u t s i n p lace o f A,B, and C. Variance est imates f o r each parameter a re
simply obta ined by squar ing t h e est imate o f i t s standard e r r o r from t h e
p r i n t o u t . Est imates o f covariances between any p a i r o f parameters i s obtained
by m u l t i p l y i n g t h e asymptot ic c o r r e l a t i o n o f t h e two parameters with t h e i r
respec t i ve standard e r r o r est imates.
PARAMETER
PARAMETER
EXAMPLE SAS OUTPUT FROM PROC NLIN -- FITTING THE LOGISTIC CURVE --
FOR 1978
NON-LINEAR LEAST SQUARES SUMMARY STATISTICS DEPENDENT VARIANT FGC
SOURCE DF SUM OFSQUARES MEAN SQUARE
REGRESSION 3 12.98091133 4.32697044 RESIDUAL 47 0.53758494 0.01143798 UNCORRECTED TOTAL 50 13.51849627
ESTIMATE ASYMPTOTIC ASYMPTOTIC 95% STD. ERROR CONFIDENCE INTERVAL
LOWER UPPER 0.82846513 0.03035873 0.76739140 0.88953885
-0.32572265 0.05106219 -0.42844625 -0.22299906 6.92307421 1.03473923 4.84145301 9.00469541
ASYMPTOTIC CORRELATION MATRIX OF THE PARAMETERS A B C
A 1.000000 0.387263 -0.307585 B 0.387263 1.000000 -0.98571 7 C -0.307585 -0.985717 1.000000
FOR 1979
NON-LINEAR LEAST SQUARES SUMWRY STATISTICS DEPENDENT VARIANT FGC
SOURCE DF SUM OFSQUARES MEAN SQUARE
REGRESSION 3 21.18397465 7.06132488 RESIDUAL 47 0.60595411 0.01 289264 UNCORRECTED TOTAL 50 21.78992876
ESTIMATE ASYMPTOTIC ASYMPTOTIC 95% STD. ERROR CONFIDENCE INTERVAL
LOWER UPPER 0.93489519 0.02877630 0.87700490 0.99278548
-0.341 64833 0.04478851 -0.431 75094 -0,25154571 5.83925868 0.72999773 4.37069664 7.30782072
ASYMPTOTIC CORRELATION MATRIX OF THE PARAMETERS A B C
A 1.000000 0.407326 -0.303033 B 0.407326 1.000000 -0.976350
C -0.303033 -0.976350 1.000000
EXAMPLE SAS PROGRAMS FOR USING PROC NL IN -- FITTING THE LOGISTIC CURVE --
INDIVIDUAL RUNS FOR EACH YEAR
PLOT OF RFGC*STDNRM SYMBOL I S VALUE OF YR
RFGC ) 0 . 4 4
I O" i
B
B
B B
" * ' i A A A
BAA A B
O . ' i A
BAAA ABBE
*A
0 . 0 +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - AB
- 0 . 1
- 0 . 2
- 0 . 3 +
- 0 . 4 +
AB-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - BAAABAAAA
AAABA BAABB
BBBBB + AA
BB BB
B BA +
8 A
A
A
-0 .5 + I .............................................................................................................. - 3 . 0 - 2 . 5 - 2 . 0 - 1 . 5 - 1 . 0 - 0 . 5 0 . 0 0 . 5 1 .O 1.5 2 .O 2 . 5
STDNRM
NOTE : 3 6 OBS HIDDEN
EXAMPLE SAS PROGRAMS FOR USING PROC NLIN -- FITTING THE LOGISTIC CURVE --
INDIVIDUAL RUNS FOR EACH YEAR
PLOT OF RFGC*DAY SYMBOL I S VALUE OF YR
RFGC 1 0 . 4 +
I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8 10 12 14 16 18 20 22 24 26 28 3 0 32 34 36 38 4 0 42
DAY
28 OBS HIDDEN
WAWLE SAS PROGRAMS FOR USING PROC K I N -- FITTING THE LOGISTIC CURVE -- INDIVIDUAL R W S FOR EACH YEAR
PLOT OF FGC*DAY SYMBOL I S VALUE OF YR PLOT OF PFEX*DAY SYMBOL USED I S *
~ - ~ ~ ~ ~ ~ - - - - - - 4 - - - + - - - 9 - - - + - - - 4 - - - 4 - - ~ + - ~ ~ + - - ~ 4 - ~ ~ 4 ~ ~ ~ + ~ ~ ~ + ~ ~ ~ 4 ~ ~ ~ 4 ~ ~ ~ 4 ~ ~ ~ 4 ~ ~ ~ + ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42
DAY
MOTE : SB OBS HIDDEN
LITERATURE CITED
Abraham, B. and J. Ledol te r . 1983. S t a t i s t i c a l methods f o r fo recas t i ng . John Wiley and Sons, Toronto.
Bard, Yonathan. 1974. Nonlinear parameter es t imat ion . Adcademic P re s s , New York.
Beers , Yardley. 1957. In t roduc t ion t o t h e theory of e r r o r . Addison-Wesley Pub. Co., Reading, Mass.
Bishop, Y.M.N., S.E. Fienberg and P.W. Holland. 1975. Discrete m u l t i v a r i a t e a n a l y s i s : theory and p r a c t i c e . The MIT P re s s , Cambridge, Mass.
Box, G.E.P., W.G. Hunter and J.S. Hunter. 1978. S t a t i s t i c s f o r experimenters : an i n t r o d u c t i o n t o des ign , d a t a a n a l y s i s and model bu i ld ing , John Wiley and Sons, Toronto (pp. 563-570).
Dobson, A. 1983. An i n t r o d u c t i o n t o s ta t is t ical modelling. Chapman and Hall, London.
Draper, N. and H. Smith. 1981. Applied r eg re s s ion a n a l y s i s . 2nd ed. John Wiley and Sons, Toronto.
Johnson and Kotz. 1984. Vol. 2
Ka lb f l e i s ch , J. 1983. Unpublished course no tes , Univers i ty o f Waterloo, Waterloo, Ont.
McCullagh, P. and J.A. Nelder. 1983. Generalized Linear Models. Chapman and Hall, London.
Rao, C.R. 1973. Linear s t a t i s t i c a l i n f e r ence and its a p p l i c a t i o n s , 2nd ed. John Wiley and Sons, New York.
SAS I n s t i s t u e Inc. 1982. SAS ( S t a t i s t i c a l Analysis System) u s e r ' s guide: statistics. SAS I n s t i t u t e Inc. , P.O. Box 800, Gary, N.C.
Wethe r i l l , G.B. 1981. In t e rmed ia t e s ta t i s t i ca l methods. Chapman and Hall, London