Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
ZERO-INFLATED NEGATIVE BINOMIAL MODELS IN SMALL AREA ESTIMATION
IRENE MUFLIKH NADHIROH
DEPARTMENT OF STATISTICSFACULTY OF MATHEMATICS AND NATURAL SCIENCES
BOGOR AGRICULTURAL UNIVERSITY2009
ii
ABSTRACT
IRENE MUFLIKH NADHIROH Zero-Inflated Negative Binomial Models in Small Area Estimation Under the advisory of KHAIRIL ANWAR NOTODIPUTRO and INDAHWATI
The problem of over-dispersion in Poisson data is usually solved by introducing prior distributions which lead to negative binomial models Poisson data sometime is also suffered by excess zero problems a condition when data contains too many zero or exceeds the distributions expectation Zero Inflated Negative Binomial (ZINB) method can be utilized to solve such problems This paper demonstrates the adaption of ZINB methods in Small Area Estimation with excess zero data It is shown that the excess zero problem has substantially influenced the Empirical Bayes (EB) estimates and the adaption of ZINB methods has improved the precision and reliability of the estimates
Key Words Small Area Estimation Zero-Inflation Poisson-Gamma Negative Binomial Regression Empirical Bayes
iii
ZERO-INFLATED NEGATIVE BINOMIAL MODELS IN SMALL AREA ESTIMATION
BYIRENE MUFLIKH NADHIROH
G14104031
Final Project ReportAs a partial fulfillment for requirement of a Bachelor Degree in Science
at Department of StatisticsFaculty of Mathematics and Natural Sciences
DEPARTMENT OF STATISTICSFACULTY OF MATHEMATICS AND NATURAL SCIENCES
BOGOR AGRICULTURAL UNIVERSITY2009
iv
Title ZERO-INFLATED NEGATIVE BINOMIAL MODELS IN SMALL AREA ESTIMATION
Name Irene Muflikh NadhirohID No G14104031
Approved by
Advisor I Advisor II
Prof Dr Ir Khairil Anwar Notodiputro Ir Indahwati MSi NIP 130891386 NIP 131909223
Acknowledged byDean of Faculty of Mathematics and Natural Sciences
Bogor Agricultural University
Dr Drh Hasim DEANIP 131578806
Passed examination date
v
BIOGRAPHY
Irene Muflikh Nadhiroh was born in Padang on October 3th 1986 as a first daughter of Ir Irianto Oetomo and Fine Analisa Maharani She has two siblings
In 1998 she graduated from SD Dukuh 09 East Jakarta and then she continued his study at SLTP Negeri 1 Bogor and graduated at 2001 She finished her study at SMU Labschool Rawamangun Jakarta in 2004 and then enrolled in Bogor Agricultural University through USMI In 2004 she joined Department of Statistics Faculty of Mathematics and Natural Sciences
During her time of study she was signed up as lecturer assistant for Basic Statistics class and Experimental Design class in 2006 and 2007 respectively She was also a member of Gamma Sigma Beta (Statistics Students AssociationGSB) and had occupied the head of science division of GSB in 2006-2007 On February-March 2008 complete her fields practice at PT Field Dimension Indonesia
vi
ACKNOWLEDGEMENTS
First of all the author modestly admitted that completion of this paper would not be possible without invaluable help from many generous and extraordinary people The author was deeply in debt for their helps ideas critics and improvement advices during writing process However they should not be hold responsible for all mistakes and deficiencies in this paper which were purely authors So hereby I would like to express my graceful to
1 All praise and gratitude for Allah SWT Alhamdulillah hirabbil alamin With his bless I able to finish this paper Thanks Allah for giving me a wonderful life with extraordinary people around me
2 Prof Dr Khairil Anwar Notodiputro and Ir Indahwati MSi for the early motivation discussion advices support and their great enthusiasm
3 Mr Bambang Sumantri MSi as examiner thanks for the spirit advices and critics4 My beloved family for the unlimited love ever after 5 Mr Alfian Futuhul Hadi MSi for enlightening discussion when I was in trouble6 Mr Bagus Sartono MSi thank you very much to run my data at your lab with your
wonderful computer Sorry if it might disturb you7 Mr Anang MSi and Mr Rahman MSi for sharing their knowledge and technical support8 Mr Dr Ir Hari Wijayanto MS all lecturer and staff at Statistics Department Thanks for
knowledge of statistics and knowledge of life that you shared It means a lot for me9 Rahmatullah Sigit Dodiet Sasongko SSi for the spirit love care time and patience Keep
it real Still love me forever and ever10 Mr Dionisius Laksmana Bisara Putra SSi for edited my paper critics and provided useful
discussion for author 11 Maulana Chistanto SSi and Yhanuar Ismail SSi thank for being my best brother12 Nikhen Sevrien and (alm Dini) thanks for lighting my day13 Rere Yusri Agus Ika Cinong Toki Cheri Fisca Wiwik Neng Mala Lilis Dika
Rangga Lele Dodi Kus Inal Bebek Koler and all of Statisticsrsquo41 14 Everyone that helps me in this study which can not be named personally
This thesis is not perfect so I am expecting the critics advices and recommendation to people who read my thesis Thank You God bless you all
Bogor January 2009
Irene Muflikh Nadhiroh
1
TABLE OF CONTENTS
PageINTRODUCTION 1
Background 1Objectives1
LITERATURE REVIEW 1Direct Estimation1Small Area Estimation 1Small Area Models1Empirical Bayes Methods 2Poisson-Gamma Models 2Negative Binomial Regression 2Over-disperse at Count Data 3Zero-Inflated Models3Zero-Inflated Negative Binomial 3Jackknife Method of Estimating MSE( EB
i )3
METHODOLOGY 4Data 4Methods4
RESULT AND DISCUSSION 4Estimation of Prior Parameter is Based of EB Method with Negative Binomial Regression 4Estimation of Prior Parameter is Based of EB Method with Zero-Inflated Negative Binomial Regression 5Comparison of EB estimator with Negative Binomial Regression and EB estimator with ZINB 5
CONCLUSION 5RECOMMENDATION 6REFERENCES 6
LIST OF TABLES
PageTable 1 MSE and RRMSE of EB Estimator with NBR 4Table 2 MSE (II) and RRMSE (II) of EB Estimator with NBR 4Table 3 MSE and RRMSE of EB Estimator with ZINB 5Table 4 MSE (II) and RRMSE (II) of EB Estimator with ZINB 5
LIST OF APPENDICES
PageAppendix 1 Result of EB estimation with NBR 7Appendix 2 Result of EB estimation with ZINB 8Appendix 3 Result of EB estimation (II) with NBR 9Appendix 4 Result of EB estimation (II) with ZINB 10Appendix 5 Syntax program for generate data 11Appendix 6 Syntax program EB with NBR 13Appendix 7 Syntax program EB with ZINB 16
1
INTRODUCTION
BackgroundDirect estimation is usually applied in big
scale survey but it is sometime difficult to utilize such estimator in a smaller region especially the sample size is too small In this case indirect estimation which adds covariates to estimate the parameter is usually used This type of estimation is broadly known as Small Area Estimation
Kismiantini (2007) conducted a research in Small Area Estimation based on Poisson-Gamma models Maximum Likelihood Estimation was used with Negative Binomial Regression techniques to estimate the respective prior parameter Moreover Negative Binomial Regression was used to resolve over-dispersion problem in the data
In reality count data is not onlycharacterized by over-dispersion but sometimes by excess-zero Excess-zero is a condition when the data contains too many zero or exceeds the distributionrsquos expectation 100 observations from Poisson model with response mean of 4 we could expect that there will be 2 zeros If the data have 30 zeros it should be obvious that the distributional assumptions have been violated Therefore the estimated parameter and standard error will be biased (Hardin amp Hilbe 2007) In this paper Zero-Inflated models were adapted to solve this type of problem
ObjectivesThe research objectives are
1 To investigate the performance of Negative Binomial Regression on Small Area Estimation in case of excess-zero
2 To apply Zero-Inflated Count Models on Small Area Estimation in case of excess-zero
3 To evaluate the performance of Zero-Inflated Count Models in estimating prior parameter for Small Area Estimation
LITERATURE REVIEW
Direct EstimationDirect estimates are generally ldquodesign
basedrdquo in the sense that they make use of ldquosurvey weightrdquo and associated inferences are based on the probability distribution by the sample design with the population values held fixed (Rao 2003) In particular direct estimates of a domain parameter are based only on the domain-specific sample data
Data from sample survey have been used to be a reliable estimate of parameter Ramsini et al (2001) mentioned that direct estimates of small area are unbiased although it would have big variance cause itrsquos small sample size
Small Area EstimationThe term of small area can be everything
depending on our object of interest It can be a city age group sex group region and rural district In general small area is used to denote any domain which the direct estimation with adequate precision can not be produced (Rao 2003) It happens because the sample size in small area is too small As a result direct estimation based on sampling design is not capable to produce direct estimation with adequate precision Furthermore small area estimation is developed as a statistic technique for estimating the parameter of small area This technique is used in effort to make estimation with adequate level of precision It works as indirect estimation that lend the strength of variable interest values from related areas through the use of supplementary information related to variable interest such as recent census count and current administrative records (Rao 2003) Indirect estimation is a process of estimating a domainrsquos parameter by connecting the information in that domain with another domain using an appropriate model So the estimator works by including other domainrsquos data (Kurnia amp Notodiputro 2006)
Small Area ModelsThere are two link models in indirect
estimation First traditional method based on implicit models that provide a link to relate small area through supplementary data Second explicit small area models that make specific allowance between area variations (Rao 2003) This research used the second model and it could be classified into two broad types of basic model1 Basic area level (type A) model
Basic area level model or aggregate model includes all models that relate small area with area-specific auxiliary variables These models are essential if unit (element) level data are not available Assuming parameter estimators
i is
related to area specific auxiliary data or covariate variables T
pii xxx )( 11 by
a linear model
2
iiT
ii vbx with i=1hellipm
iv ~N(0 2v ) are area-specific random
effect and Tp )( 1 is 1p vector of
regression coefficients Therefore ib are
known as positive constants For making inferences about
i direct estimators iy
are assumed available Accordingly assuming
iii ey where i=1hellipm with
sampling error ie ~N(0 ei2 ) and ei
2are known At the end both models are combined and as a result is new model
iiiT
ii evbxy where i=1hellipm
(Rao 2003)2 Basic unit level (type B) model
Unit level model includes all models that relate unit values of the study variable to unit-specific auxiliary variables Assuming unit-specific auxiliary variables T
ijpijij xxx )( 1 and
correspondingly a nested regression model
ijiT
ijij evxy where
i=1hellipm and j=1hellip in with
iv ~N(0 2v ) and also ie ~N(0 ei
2 )
Empirical Bayes MethodsThe Bayesian approach is based on Bayes
Law which was found by Thomas Bayes This law was introduced by Richard Proce in 1763 two years after Thomas Bayes passed away In 1774 and 1781 Laplace gave the details and relevancies for modern Bayesian statistics (Gill 2002 in Kismiantini 2007)
Novick in Good (1980) mentioned that Bayes method is difficult to adopt and sometimes is very sensitive due to the requirement of prior probability informationwhich is usually difficult to obtain Robbin (1955) introduced Empirical Bayes methods by assuming a particular prior distribution estimating based on the sample Rao (2003) said that EB (Empirical Bayes) and HB (Hierarchical Bayes) are compatible for binary and count data in Small Area Estimation Therefore EB method was used in this research
Rao (2003) summarized EB methods in Small Area Estimation as follows 1 Obtain the posterior probability density
function of the small area parameter2 Estimate the parameters from the
marginal density function
3 Use the estimated posterior density forinferences regarding the parameters ofinterest
Poisson-Gamma ModelsPoisson model is a standard model in
dealing with count data Generally count data can be suffered by over-dispersion problem Therefore a Poisson formula had been developed to accommodate extra variance from sample data Two-stage models have been introduced for count data known as mixed model Poisson-Gamma Wakefield (2006) introduced Poisson-Gamma model which was easier to use with SMR (Standard Mortality Ratio) as a direct estimator This study used Wakefield model with alteration in direct estimator
Let iy be a number of specific individual
at small area-i which has specific characteristic of interest and written as follow
j
iji yy
ijy are the-jth object at the-ith small area where
j=1hellipn and i=1hellipm
First stage )(~ ii
ind
i Poissony is assumed
where )( ii x describes a regression
model in area level ix is a vector of
covariates and Tpii)( is a vector of
regression coefficientsSecond stage distribution
)1(~ gammaiid
iis assumed as a prior
distribution with mean 1 and variance 1
Then the marginal distribution |iy is
negative binomialMoreover Wakefield (2006) used Bayes
Theorem and acquired posterior distributionas
)1(~|i
iii ygammay
and EB estimator as
iiiiB
iEB
i )ˆ1(ˆˆ)ˆˆ(ˆˆ
with )ˆˆ(ˆˆ iii ii y are direct
estimation from i and iy are the number of
observation
Negative Binomial Regression The negative binomial regression model
seems have been first discussed by Anscombe (1972) Others have pointed out its success indealing with over-dispersed count data
3
Lawless (1987) elaborated the mixture model parameterization of the negative binomial providing formulas for its log likelihood mean variance and moments Later Breslow (1990) cited Lawlessrsquo work and since its inception to the late 1980rsquos the negative binomial regression model have been construed as a mixture model that is useful for accommodating otherwise over-dispersed Poisson data (Hardin amp Hilbe 2007) The negative binomial distribution function is written as
yk
kk
k
y
kyxyg
)1()(
)()|(
where y=012hellip k and are negative
binomial parameter with )(yE and
ky 2)var( k mention as disperse
parameter which is shown that the data consist of over-dispersed
Over-disperse at Count DataCount data for Poisson regression
including by over-disperse if variance bigger than mean or if the expected value of variance is smaller than expected This phenomenon is written as
)()( ii yEyVar (McCullagh amp Nelder 1989)
Zero-Inflated ModelsZero-Inflated models consider two distinct
sources of zero outcomes One source is generated from individuals who do not enter into the counting process the other from those who do enter the count process but result in a zero outcome (Hardin amp Hilbe 2007)
Lambert (1992) first described this type of mixture model in the context of process control in manufacturing It has since been used in many applications and is now founddiscussed in nearly every book or article dealing with count response models
For the zero-inflated model the probability of observing a zero outcome equals the probability that an individual is in the always-zero group plus the probability that individual is not that group times the probability that the counting process produces a zero If )0(B as
the probability that the binary process result in a zero outcomes and )0Pr( as the probability
that the counting of a zero outcomes the probability of a zero outcome for the system is then given by (Hardin amp Hilbe 2007)
)0Pr()1()0()0Pr( ZBy The probability of a nonzero count is
)Pr()]0(1[)0Pr( kBkky This model would produce two groups of
parameter one is zero-inflation parameter which shown that the covariate significantly contribute to having a zero outcomes And the other parameter is negative binomial parameter which modeling the response with the covariate
Zero-Inflated Negative BinomialThere are many kinds of zero-inflated
model each model has plus and minus and is used in different type of data Zero-Inflated negative binomial is one kind of them This model is used in over-disperse and excess-zero data As a result among parameter estimators there would be k parameters which indicate that over-disperse occur in data just as disperse parameter in negative binomial regression
The probability distribution of this model is as follow
)|( iii xyYP )|0()(1)( iii xgxx )|()(1 iii xygx
Where is a function of iz ix are vector
of zero-inflated covariate and is a vector of
zero-inflated coefficient which will be estimated Meanwhile )|( ii xyg is probability
distribution of negative binomial written asiy
i
i
iii
iii y
yxyg
)1()(
)()|(
Mean and variance of ZINB are
))(1)(1()|(
)1()|(
iiiiii
iiii
xyV
xyE
Jackknife Method of Estimating MSE( EBi )
Jackknife methods is one of general methods used in survey because itrsquos unpretentious concept (Jiang Lahiri and Wan 2002) This methods have been known by Tukey (1958) and developed to be a method that capable to be bias corrected of estimator by remove observation-i for i=1hellipm and performs parameter estimation
Rao (2003) the Jackknife step to estimate MSE( EB
i ) are
1 Assume that )ˆˆ(ˆ iiEBi yk
)ˆˆ(ˆ111 ii
EBi yk then calculate
m
l
EBi
EBii m
mM 2
12 )ˆˆ(1ˆ
2 Calculate the delete-i estimator 1
ˆ
and
1 then calculate
4
)]ˆˆ()ˆˆ([1
)ˆˆ(ˆ111111 ii
m
miiiiii ygyg
m
mygM
And )ˆ( 21 vig is the variance estimator of
posterior distribution which is used to measure the variability associated with i
The use of )ˆ( 21 vig is leads to severe of
underestimation of )ˆ( EBiJMSE related
with estimation in prior parameter Therefore the estimator
iM1ˆ correct the
bias of )ˆ( 21 vig
3 Calculate the jackknife estimator of MSE( EB
i ) as
iiEBiJ MMMSE 21
ˆˆ)ˆ(
METHODOLOGY
DataThis research assumed that the available
auxiliary data is on area level so this research used basic area level model The data were simulated with 30 small areas and one covariate Every batch generated different conditions of excess-zeros data start from 01 until 09 probability of zero in small area This research assumed structure of relation between respond and covariate was linear
MethodsThe following steps in generating data
using SAS 91 were used1 Fix the value of
iX for the- i th area
2 Define the expected probability of zero in each small area ))0(( iYP then
calculate ))0(log( ii YPLambda
3 Generate )11(~ Gammai4 Calculate )log(
iiLambda 5 Fit linear regression between and
iX to
obtain0 and
16 Calculate )`exp(X= ii 7 Calculate
iiparmlambda 8 Generate )(~ parmlambdaPoissonyi
Moreover in analyzing data the following steps were applied 1 Generate the negative binomial regression
with genmod procedure in SAS 91 and Zero-Inflated Negative binomial Regression with countreg procedure in SAS 92
2 Estimate the prior parameter which are and
3 Estimate using EB method4 Calculate MSE for indirect estimation5 Calculate RRMSE (Root Relative Mean
Square Error)
i
ii
MSERRMSE
ˆ)ˆ(
)ˆ(
RESULT AND DISCUSSION
Estimation of Prior Parameter is Based on EB Method with Negative Binomial
RegressionIn case of non-excess-zero data the
estimator produced small and consistent MSE Meanwhile if the number of excess-zero isapproximately 30 or more with expected probability of zero 06 the performance of estimates tends to be unreliable As a result EB estimation produced negative values
RRMSE of the estimator increasessimultaneously along with the increase of number of zero in the data Furthermore if thedata contain excess zero at least 30 theestimator is unreliable
Table 1 MSE and RRMSE of EB Estimator with NBR
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 033 016 018 01302 035 020 026 02003 040 023 036 03004 042 027 050 04205 045 031 072 05906 -12875 033 -038 08107 253671 040 -1216 13508 -584495 030 30946 21109 39135606 016 116E+10 664
Table 2 MSE (II) and RRMSE (II) of EB Estimator with NBR
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 033 016 018 01302 035 020 026 02003 040 023 036 03004 042 027 050 04205 046 031 071 05806 26197 033 -035 07507 950007 040 -1002 09908 1444250 030 22054 11009 41595285 016 677E+09 056
5
Table 1 show that the iterative process produced unexpected negative values of MSEThe simplest way to solve this problem is tochange the negative value to zero MSE (II) and RRMSE (II) in table 2 are the result of MSE and RRMSE after the negative value of MSE has been changed to zero
When data have expected probability of zero by 06 to 09 mean of MSE (II) increases drastically Similarly mean of RRMSE (II) increases sharply when data have 08 to 09 expected probability of zero However when data have 06 to 07 expected probability of zero the mean of RRMSE (II) is negative due to the negative value of EB estimates
Estimation of Prior Parameter is Based on EB Method with Zero-Inflated
Negative Binomial RegressionThe EB estimates are similar to the
estimates produced by NBR method although they are slightly outperformed NBR method when the data only contain small number of zeros In particular as shown by table 3 if data have expected probability of zero by 01 to 05 ZINB produces bigger MSE for EB estimator than which NBR produces
Whereas if data have expected probability of zero by 06 to 07 ZINB gives better estimates The estimates were also unbiased as it covers parameter values adequately However ZINB begins to produce inconsistent estimates if data have expected probability of zero by 08 or more due to enormous MSE
Besides when data have expected is because ZINB generates small estimates which is close to the parameter values
Mean of MSE (II) with ZINB is biggerthan the mean of MSE with ZINB That is because when negative value of MSE changed to zero it doesnrsquot have reduction factor in the mean calculation
Comparison of EB estimator withNegative Binomial Regression and EB
estimator with ZINBEB estimates given by both NBR and
ZINB methods are similar for data with small numbers of zero However ZINB method produces bigger MSE than NBR do as long as expected probability of zero in data does not exceed 06 thresholds
But ZINB method performs better if data have expected probability of zero by 06 to 07 In this case EB estimates given by NBR method are unstable and inconsistent due to estimatesrsquo negative value and huge MSE that
can be thousand times larger than theiracceptable value On the other hand EB estimator with ZINB works well it givesunbiased estimates and its MSE values are more stable than EB estimates with NBR
Both methods would have performed poorly if data had expected probability of zero by 08 or more EB estimators with both methods were inconsistent as a result of very huge MSE values they produced
Table 3 MSE and RRMSE of EB Estimator with ZINB
Probability of zero
Mean of MSE
Median ofMSE
Mean ofRRMSE
Median of RRMSE
01 045 017 024 01402 043 020 033 02103 071 028 052 03204 054 028 0632 04205 086 033 7322807 06606 061 038 29817 10307 058 025 218119 19408 -128 -14E-07 162697 37509 2954790 -1E-06 35E+278 609508
Table 4 MSE (II) and RRMSE (II) of EB Estimator with ZINB
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 045 017 024 01402 0436 020 0324 02103 072 028 051 031104 055 028 061 04105 095 033 6561235 05806 075 038 23406 07007 150 025 134655 06908 175 0 733506 009 2954908 0 12E+278 0
CONCLUSION
Excess-zero in data highly influenced the result of EB estimation Conventional method such as negative binomial regression in prior estimation has produced unbiased and unreliable EB estimator for data with expected probability of zero by 06 This is shown bybig number of MSE and negative value of estimator
Meanwhile EB estimation by ZINB method produced more reliable estimator even when the data have expected probability of zero by 06 to 07
The ZINB has also provided a reliable estimator for data with less than 5333 of zeros This means that performance of ZINB
6
declines when the data have expected probability of zero by 08 or more As shown by the big MSE and inconsistent estimator
RECOMMENDATION
This research is based on many assumptions and suffered by several limitations If the assumptions and boundaries can be relaxed can be expected better result There are some recommendations for the next research1 The generating process in this research
does not reflect the real sampling processIf the generating process similar to the real sampling process it might give better result because it will be closer with the real application
2 It will be more interesting to runexperiment which takes account of larger number of areas since the number of areas will influence data modeling
3 The Restricted Maximum Likelihood maybe applied when estimating prior parameter with ZINB and NBR in other to solve the negative value of MSE
4 Theoretical research of ZINB and Empirical Bayes estimator is important to understand the behavior of parameter estimates of ZINB in Empirical Bayes setting
REFERENCES
Erdman D L Jackson A Sinko 2008 Zero-Inflated Poisson and Zero-Inflated Negative Binomial Models Using the COUNTREG Procedure SAS Global Forum 2008322-2008httpwww2sascomproceedingsforum2008322-2008pdf [25 Agustus 2008]
Famoye F KP Singh 2006 Zero-Inflated Generalized Poisson Regression Model with an Application to Domestic Violence Data Journal of Data Science 4117-130
Hardin JW JM Hilbe 2007 Generalized Linear Models and Extensions Texas A Stata Press Publication
Kurnia A KA Notodiputro 2006 Penerapan Metode Jackknife dalam pendugaan Area Kecil Forum Statistika dan Komputasi April 2006 p12-15
Kismiantini 2007 Pendugaan Statistik Area Kecil Berbasis Model Poisson-Gamma [Tesis] Bogor Institut Pertanian Bogor Fakultas Matematika dan Pengetahuan Alam
McCullagh P J A Nelder 1983 Generalized Linear Models London Chapmann and Hall
Ramsini B et all 2001 Uninsured Estimates by County A Review of Options and IssueshttpwwwodhohiogovDataOFHSurvofhsrfq7pdf [24 April 2008]
Rao JNK 2003 Small Area Estimation New York John Wiley amp Sons
Wakefield J 2006 Disease mapping and spatial regression with count data httpwwwbepresscomuwbiostatpaper286pdf [24 April 2008]
7
Appendix 1 Result of EB estimation with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE -100569 0194196 0271669 041875 045917 3239598RRMSE 0123605 0300339 0422426 0503566 0642418 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE -237643 0235733 0306641 0452955 05091 3652167RRMSE 0038956 0412708 0588924 0717336 0844735 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE -749011 0254097 0330078 -12875 0539873 2354887RRMSE -663045 051763 0813734 -038057 1287528 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE -7513075 0235378 0402092 2536714 0876569 6051162RRMSE -10741 0704796 1355566 -121606 3040291 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601
bias 0000395 0116669 0254473 1091172 0497898 6297454MSE -6E+09 -016583 0301527 -584495 5718409 185E+09RRMSE -212936 0927338 2115163 3094627 1359703 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655MSE -38E+09 -130817 0159682 39135606 3074073 12E+11
RRMSE -909131 1647188 6639631 116E+10 1585472 706E+11
8
Appendix 2 Result of EB estimation with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE -053954 010933 0168797 0449506 0369775 360843RRMSE 0022947 0096443 0136424 0238099 0241955 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE -07309 0126202 0201463 0425844 0414597 1734815RRMSE 0021807 0144983 0210692 0326097 0401786 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE -229891 0156942 0277017 0707983 0590466 7469014RRMSE 0023998 0210095 0317195 0519524 0618802 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE -125713 0181557 0284338 0540615 0498521 423089RRMSE 0054916 028362 0420396 0630776 0778033 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442
MSE -181856 0194818 0334706 0859252 0711939 7997074RRMSE 0026206 0387294 0662251 7322807 1312302 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE -34589 0078006 0376514 060793 0804116 3426488RRMSE 000461 0502807 1033578 2981671 2012552 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE -142213 -001433 0255331 0584152 1132152 264456RRMSE 0064209 0847956 1942286 2181192 4589042 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE -10651 -56E-05 -14E-07 -127819 1452962 1132741RRMSE 0063244 1475413 3754705 162697 9221163 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE -175652 -33E-05 -1E-06 2954790 152E-06 613E+08
RRMSE 0040681 4059441 6095076 35E+278 5569021 16E+281
9
Appendix 3 Result of EB estimation (II) with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE 0 0194196 0271669 0419181 045917 3239598RRMSE 0 0300116 0422209 0502895 0641904 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE 0 0235733 0306641 0456258 05091 3652167RRMSE 0 0410357 0585765 0712314 0841838 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE 0 0254097 0330078 2619677 0539873 2354887RRMSE -663045 0448118 0750369 -034911 1209918 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE 0 0235378 0402092 9500073 0876569 6051162RRMSE -10741 0288999 0995659 -100163 2527784 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601bias 0000395 0116669 0254473 1091172 0497898 6297454MSE 0 0 0301527 1444250 5718409 185E+09RRMSE -212936 0 1104113 2205437 5656681 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655
MSE 0 0 0159682 41595285 3074073 12E+11
RRMSE -909131 0 0557622 677E+09 9311925 706E+11
10
Appendix 4 Result of EB estimation (II) with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE 0 010933 0168797 0450626 0369775 360843RRMSE 0 0095932 0135647 023675 0239669 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE 0 0126202 0201463 0428006 0414597 1734815RRMSE 0 0142648 020709 0320663 0395479 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE 0 0156942 0277017 0716543 0590466 7469014RRMSE 0 0203913 0311937 0506882 0615401 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE 0 0181557 0284338 0549835 0498521 423089RRMSE 0 0270309 0405926 0606317 0766631 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442MSE 0 0194818 0334706 094973 0711939 7997074RRMSE 0 0316402 0576343 6561235 1240175 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE 0 0078006 0376514 0749436 0804116 3426488RRMSE 0 0258286 0698814 2340612 1714808 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE 0 0 0255331 1501268 1132152 264456RRMSE 0 0 0688797 1346552 2500825 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE 0 0 0 1755486 1452962 1132741RRMSE 0 0 0 7335062 3311711 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE 0 0 0 2954908 152E-06 613E+08
RRMSE 0 0 0 12E+278 416189 16E+281
11
Appendix 5 Syntax program for generate data
data b generate x1(covariate) and ei input x1cards0222831971100013131702314625252218171412202210run
macro bangkit_datado r=1 to 100
data egenerate poisson-gamma with excess zerodo kk=1 to 30set btetha = rangam(11)lambda = -log(01) peluang munculnya nilai nol yang diinginkan (01-09)starlambda = log(lambdatetha)output endrun
proc regmodel starlambda = x1 ods output ParameterEstimates=workbetha_lr (keep=Parameter Estimate)run
proc transpose data=workbetha_lr out=workbetha_lr_t
12
Appendix 5 Syntax program for generate data (continued)
rundata _null_set workbetha_lr_tcall symput (Intercept col1)call symput (x1 col2)run
data ddo kk=1 to 30set emu = exp(ampIntercept + ampx1x1)parmlambda = mutethaypoi = rand(poissonparmlambda)output endrun
ods trace onto take percent zero on dataproc freq data=dtables ypoi ods output OneWayFreqs=workzerorundata zeroset zerokeep percentrunproc transpose data=zero out=zero1 rundata _null_set workzero1call symput (pctz col1)rundata dset dpzero=amppctzr=amprrun
proc append data=d base=d1run
endmend
bangkit_data
13
Appendix 6 Syntax program EB with NBR
macro sae_nbdo x=1 to 900
data workaset workeif ^(u=ampx) then deleterun
this genmod procedure estimates the response without zero-inflation proc genmod data=amodel ypoi = x1 dist=nb link=logods output ParameterEstimates=workbetha_nb (keep=Parameter Estimate)run
proc transpose data=workbetha_nb out=workbetha_nb_trun
data _null_set workbetha_nb_tcall symput (Intercept col1)call symput (x1 col2)call symput (Dispersion col3)run
EB with negbin-regdata workduga_nbset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + ampDispersion)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(ampDispersion+ypoi)((mu_hat_b+ampDispersion)2)bias_b=abs(teta_hat_bayes-parmlambda)run
proc append data=workduga_nb base=workduga_nb1run
jacknifedo h=1 to 30
data workdset workduga_nb1if ^(u=ampx) then deleterundata workjacknbamphset workdif u=ampxif kk=amph then deleterun
proc genmod data=workjacknbamph output p out=sasyi_estmodel ypoi = x1 dist = nb link=logods output parameterestimates=workbetha_est_nbamph (keep=parameter Estimate)
14
Appendix 6 Syntax program EB with NBR (continued)
runproc transpose data=workbetha_est_nbamph out=workbetha_est_nbtamphrundata _null_set workbetha_est_nbtamphcall symput (Intercept_ col1)call symput (x1_ col2)call symput (Dispersion_ col3)run
data workduganbamphset workdmu_hat_b_amph=exp(ampIntercept_ + ampx1_x1)w_b_amph=mu_hat_b_amph (mu_hat_b_amph + ampDispersion_)teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2g1_amph=(ampDispersion_+ypoi)((mu_hat_b_amph+ampDispersion_)2)beda_g_amph=g1_amph-g1run
data workmse_nb_jmerge workduganb1 workduganb2 workduganb3 workduganb4 workduganb5 workduganb6 workduganb7 workduganb8 workduganb9 workduganb10 workduganb11 workduganb12workduganb13 workduganb14 workduganb15 workduganb16 workduganb17workduganb18 workduganb19 workduganb20 workduganb21 workduganb22workduganb23 workduganb24 workduganb25 workduganb26 workduganb27workduganb28 workduganb29 workduganb30by kkrun
data workmse_nb_jset workmse_nb_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampjendm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesul = ampxrun
proc append data=workmse_nb_j base=workmse_nb_j1run
data workhasilnbmerge workd workmse_nb_j keep kk x1 tetha mu parmlambda ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_b
15
Appendix 6 Syntax program EB with NBR (continued)
run
ods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilnb BASE=workhasilnb1 appendver=v6run
ENDmend
sae_nb
16
Appendix 7 Syntax program EB with ZINB
macro sae_zinb
do x=1 to 900
data workaset work eif ^(u=ampx) then deleterun
proc countreg data=amodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workpe(keep=Parameter Estimate)run
proc transpose data=workpe out=workpe_trun
data _null_set workpe_tcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + amp_Alpha)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(amp_Alpha+ypoi)((mu_hat_b+amp_Alpha)2)bias_b=abs(teta_hat_bayes-parmlamdha)
run
proc append data=workduga base=workduga1run
do h=1 to 30
data workdset workduga1if ^(u=ampx) then deleterundata workjackzinbamphset workdif u=ampxif kk=amph then deleterun
proc countreg data=jackzinbamphmodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workbetha_est_ZINBamph
17
Appendix 7 Syntax program EB with ZINB (continued)
(keep=Parameter Estimate)run
proc transpose data=workbetha_est_ZINBamph out=workbetha_est_ZINBtamphrun
data _null_set workbetha_est_ZINBtamphcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaZINBamphset workdmu_hat_b_amph=exp(ampIntercept + ampx1x1)mu_hat_b_amph= ampb_o- + ampb_1- x1w_b_amph=mu_hat_b_amph (mu_hat_b_amph + (amp_Alpha))teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2
g1_amph =((mu_hat_b_amph2ampalpha_)2)(ampalpha_+y_i)((mu_hat_b_amph2ampalpha_)+mu_hat_b_amph)2
g1_amph=(amp_Alpha+ypoi)((mu_hat_b_amph+amp_Alpha)2)
g1_amph =(A2)(ampk- + y_i)( a +mu_hat_b)2
beda_g_amph=g1_amph-g1run
data workmse_ZINB_jmerge workdugaZINB1 workdugaZINB2 workdugaZINB3 workdugaZINB4 workdugaZINB5 workdugaZINB6 workdugaZINB7 workdugaZINB8 workdugaZINB9 workdugaZINB10 workdugaZINB11 workdugaZINB12workdugaZINB13 workdugaZINB14 workdugaZINB15 workdugaZINB16 workdugaZINB17workdugaZINB18 workdugaZINB19 workdugaZINB20 workdugaZINB21 workdugaZINB22workdugaZINB23 workdugaZINB24 workdugaZINB25 workdugaZINB26 workdugaZINB27workdugaZINB28 workdugaZINB29 workdugaZINB30by kkrun
data workmse_ZINB_jset workmse_ZINB_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampj
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb
ii
ABSTRACT
IRENE MUFLIKH NADHIROH Zero-Inflated Negative Binomial Models in Small Area Estimation Under the advisory of KHAIRIL ANWAR NOTODIPUTRO and INDAHWATI
The problem of over-dispersion in Poisson data is usually solved by introducing prior distributions which lead to negative binomial models Poisson data sometime is also suffered by excess zero problems a condition when data contains too many zero or exceeds the distributions expectation Zero Inflated Negative Binomial (ZINB) method can be utilized to solve such problems This paper demonstrates the adaption of ZINB methods in Small Area Estimation with excess zero data It is shown that the excess zero problem has substantially influenced the Empirical Bayes (EB) estimates and the adaption of ZINB methods has improved the precision and reliability of the estimates
Key Words Small Area Estimation Zero-Inflation Poisson-Gamma Negative Binomial Regression Empirical Bayes
iii
ZERO-INFLATED NEGATIVE BINOMIAL MODELS IN SMALL AREA ESTIMATION
BYIRENE MUFLIKH NADHIROH
G14104031
Final Project ReportAs a partial fulfillment for requirement of a Bachelor Degree in Science
at Department of StatisticsFaculty of Mathematics and Natural Sciences
DEPARTMENT OF STATISTICSFACULTY OF MATHEMATICS AND NATURAL SCIENCES
BOGOR AGRICULTURAL UNIVERSITY2009
iv
Title ZERO-INFLATED NEGATIVE BINOMIAL MODELS IN SMALL AREA ESTIMATION
Name Irene Muflikh NadhirohID No G14104031
Approved by
Advisor I Advisor II
Prof Dr Ir Khairil Anwar Notodiputro Ir Indahwati MSi NIP 130891386 NIP 131909223
Acknowledged byDean of Faculty of Mathematics and Natural Sciences
Bogor Agricultural University
Dr Drh Hasim DEANIP 131578806
Passed examination date
v
BIOGRAPHY
Irene Muflikh Nadhiroh was born in Padang on October 3th 1986 as a first daughter of Ir Irianto Oetomo and Fine Analisa Maharani She has two siblings
In 1998 she graduated from SD Dukuh 09 East Jakarta and then she continued his study at SLTP Negeri 1 Bogor and graduated at 2001 She finished her study at SMU Labschool Rawamangun Jakarta in 2004 and then enrolled in Bogor Agricultural University through USMI In 2004 she joined Department of Statistics Faculty of Mathematics and Natural Sciences
During her time of study she was signed up as lecturer assistant for Basic Statistics class and Experimental Design class in 2006 and 2007 respectively She was also a member of Gamma Sigma Beta (Statistics Students AssociationGSB) and had occupied the head of science division of GSB in 2006-2007 On February-March 2008 complete her fields practice at PT Field Dimension Indonesia
vi
ACKNOWLEDGEMENTS
First of all the author modestly admitted that completion of this paper would not be possible without invaluable help from many generous and extraordinary people The author was deeply in debt for their helps ideas critics and improvement advices during writing process However they should not be hold responsible for all mistakes and deficiencies in this paper which were purely authors So hereby I would like to express my graceful to
1 All praise and gratitude for Allah SWT Alhamdulillah hirabbil alamin With his bless I able to finish this paper Thanks Allah for giving me a wonderful life with extraordinary people around me
2 Prof Dr Khairil Anwar Notodiputro and Ir Indahwati MSi for the early motivation discussion advices support and their great enthusiasm
3 Mr Bambang Sumantri MSi as examiner thanks for the spirit advices and critics4 My beloved family for the unlimited love ever after 5 Mr Alfian Futuhul Hadi MSi for enlightening discussion when I was in trouble6 Mr Bagus Sartono MSi thank you very much to run my data at your lab with your
wonderful computer Sorry if it might disturb you7 Mr Anang MSi and Mr Rahman MSi for sharing their knowledge and technical support8 Mr Dr Ir Hari Wijayanto MS all lecturer and staff at Statistics Department Thanks for
knowledge of statistics and knowledge of life that you shared It means a lot for me9 Rahmatullah Sigit Dodiet Sasongko SSi for the spirit love care time and patience Keep
it real Still love me forever and ever10 Mr Dionisius Laksmana Bisara Putra SSi for edited my paper critics and provided useful
discussion for author 11 Maulana Chistanto SSi and Yhanuar Ismail SSi thank for being my best brother12 Nikhen Sevrien and (alm Dini) thanks for lighting my day13 Rere Yusri Agus Ika Cinong Toki Cheri Fisca Wiwik Neng Mala Lilis Dika
Rangga Lele Dodi Kus Inal Bebek Koler and all of Statisticsrsquo41 14 Everyone that helps me in this study which can not be named personally
This thesis is not perfect so I am expecting the critics advices and recommendation to people who read my thesis Thank You God bless you all
Bogor January 2009
Irene Muflikh Nadhiroh
1
TABLE OF CONTENTS
PageINTRODUCTION 1
Background 1Objectives1
LITERATURE REVIEW 1Direct Estimation1Small Area Estimation 1Small Area Models1Empirical Bayes Methods 2Poisson-Gamma Models 2Negative Binomial Regression 2Over-disperse at Count Data 3Zero-Inflated Models3Zero-Inflated Negative Binomial 3Jackknife Method of Estimating MSE( EB
i )3
METHODOLOGY 4Data 4Methods4
RESULT AND DISCUSSION 4Estimation of Prior Parameter is Based of EB Method with Negative Binomial Regression 4Estimation of Prior Parameter is Based of EB Method with Zero-Inflated Negative Binomial Regression 5Comparison of EB estimator with Negative Binomial Regression and EB estimator with ZINB 5
CONCLUSION 5RECOMMENDATION 6REFERENCES 6
LIST OF TABLES
PageTable 1 MSE and RRMSE of EB Estimator with NBR 4Table 2 MSE (II) and RRMSE (II) of EB Estimator with NBR 4Table 3 MSE and RRMSE of EB Estimator with ZINB 5Table 4 MSE (II) and RRMSE (II) of EB Estimator with ZINB 5
LIST OF APPENDICES
PageAppendix 1 Result of EB estimation with NBR 7Appendix 2 Result of EB estimation with ZINB 8Appendix 3 Result of EB estimation (II) with NBR 9Appendix 4 Result of EB estimation (II) with ZINB 10Appendix 5 Syntax program for generate data 11Appendix 6 Syntax program EB with NBR 13Appendix 7 Syntax program EB with ZINB 16
1
INTRODUCTION
BackgroundDirect estimation is usually applied in big
scale survey but it is sometime difficult to utilize such estimator in a smaller region especially the sample size is too small In this case indirect estimation which adds covariates to estimate the parameter is usually used This type of estimation is broadly known as Small Area Estimation
Kismiantini (2007) conducted a research in Small Area Estimation based on Poisson-Gamma models Maximum Likelihood Estimation was used with Negative Binomial Regression techniques to estimate the respective prior parameter Moreover Negative Binomial Regression was used to resolve over-dispersion problem in the data
In reality count data is not onlycharacterized by over-dispersion but sometimes by excess-zero Excess-zero is a condition when the data contains too many zero or exceeds the distributionrsquos expectation 100 observations from Poisson model with response mean of 4 we could expect that there will be 2 zeros If the data have 30 zeros it should be obvious that the distributional assumptions have been violated Therefore the estimated parameter and standard error will be biased (Hardin amp Hilbe 2007) In this paper Zero-Inflated models were adapted to solve this type of problem
ObjectivesThe research objectives are
1 To investigate the performance of Negative Binomial Regression on Small Area Estimation in case of excess-zero
2 To apply Zero-Inflated Count Models on Small Area Estimation in case of excess-zero
3 To evaluate the performance of Zero-Inflated Count Models in estimating prior parameter for Small Area Estimation
LITERATURE REVIEW
Direct EstimationDirect estimates are generally ldquodesign
basedrdquo in the sense that they make use of ldquosurvey weightrdquo and associated inferences are based on the probability distribution by the sample design with the population values held fixed (Rao 2003) In particular direct estimates of a domain parameter are based only on the domain-specific sample data
Data from sample survey have been used to be a reliable estimate of parameter Ramsini et al (2001) mentioned that direct estimates of small area are unbiased although it would have big variance cause itrsquos small sample size
Small Area EstimationThe term of small area can be everything
depending on our object of interest It can be a city age group sex group region and rural district In general small area is used to denote any domain which the direct estimation with adequate precision can not be produced (Rao 2003) It happens because the sample size in small area is too small As a result direct estimation based on sampling design is not capable to produce direct estimation with adequate precision Furthermore small area estimation is developed as a statistic technique for estimating the parameter of small area This technique is used in effort to make estimation with adequate level of precision It works as indirect estimation that lend the strength of variable interest values from related areas through the use of supplementary information related to variable interest such as recent census count and current administrative records (Rao 2003) Indirect estimation is a process of estimating a domainrsquos parameter by connecting the information in that domain with another domain using an appropriate model So the estimator works by including other domainrsquos data (Kurnia amp Notodiputro 2006)
Small Area ModelsThere are two link models in indirect
estimation First traditional method based on implicit models that provide a link to relate small area through supplementary data Second explicit small area models that make specific allowance between area variations (Rao 2003) This research used the second model and it could be classified into two broad types of basic model1 Basic area level (type A) model
Basic area level model or aggregate model includes all models that relate small area with area-specific auxiliary variables These models are essential if unit (element) level data are not available Assuming parameter estimators
i is
related to area specific auxiliary data or covariate variables T
pii xxx )( 11 by
a linear model
2
iiT
ii vbx with i=1hellipm
iv ~N(0 2v ) are area-specific random
effect and Tp )( 1 is 1p vector of
regression coefficients Therefore ib are
known as positive constants For making inferences about
i direct estimators iy
are assumed available Accordingly assuming
iii ey where i=1hellipm with
sampling error ie ~N(0 ei2 ) and ei
2are known At the end both models are combined and as a result is new model
iiiT
ii evbxy where i=1hellipm
(Rao 2003)2 Basic unit level (type B) model
Unit level model includes all models that relate unit values of the study variable to unit-specific auxiliary variables Assuming unit-specific auxiliary variables T
ijpijij xxx )( 1 and
correspondingly a nested regression model
ijiT
ijij evxy where
i=1hellipm and j=1hellip in with
iv ~N(0 2v ) and also ie ~N(0 ei
2 )
Empirical Bayes MethodsThe Bayesian approach is based on Bayes
Law which was found by Thomas Bayes This law was introduced by Richard Proce in 1763 two years after Thomas Bayes passed away In 1774 and 1781 Laplace gave the details and relevancies for modern Bayesian statistics (Gill 2002 in Kismiantini 2007)
Novick in Good (1980) mentioned that Bayes method is difficult to adopt and sometimes is very sensitive due to the requirement of prior probability informationwhich is usually difficult to obtain Robbin (1955) introduced Empirical Bayes methods by assuming a particular prior distribution estimating based on the sample Rao (2003) said that EB (Empirical Bayes) and HB (Hierarchical Bayes) are compatible for binary and count data in Small Area Estimation Therefore EB method was used in this research
Rao (2003) summarized EB methods in Small Area Estimation as follows 1 Obtain the posterior probability density
function of the small area parameter2 Estimate the parameters from the
marginal density function
3 Use the estimated posterior density forinferences regarding the parameters ofinterest
Poisson-Gamma ModelsPoisson model is a standard model in
dealing with count data Generally count data can be suffered by over-dispersion problem Therefore a Poisson formula had been developed to accommodate extra variance from sample data Two-stage models have been introduced for count data known as mixed model Poisson-Gamma Wakefield (2006) introduced Poisson-Gamma model which was easier to use with SMR (Standard Mortality Ratio) as a direct estimator This study used Wakefield model with alteration in direct estimator
Let iy be a number of specific individual
at small area-i which has specific characteristic of interest and written as follow
j
iji yy
ijy are the-jth object at the-ith small area where
j=1hellipn and i=1hellipm
First stage )(~ ii
ind
i Poissony is assumed
where )( ii x describes a regression
model in area level ix is a vector of
covariates and Tpii)( is a vector of
regression coefficientsSecond stage distribution
)1(~ gammaiid
iis assumed as a prior
distribution with mean 1 and variance 1
Then the marginal distribution |iy is
negative binomialMoreover Wakefield (2006) used Bayes
Theorem and acquired posterior distributionas
)1(~|i
iii ygammay
and EB estimator as
iiiiB
iEB
i )ˆ1(ˆˆ)ˆˆ(ˆˆ
with )ˆˆ(ˆˆ iii ii y are direct
estimation from i and iy are the number of
observation
Negative Binomial Regression The negative binomial regression model
seems have been first discussed by Anscombe (1972) Others have pointed out its success indealing with over-dispersed count data
3
Lawless (1987) elaborated the mixture model parameterization of the negative binomial providing formulas for its log likelihood mean variance and moments Later Breslow (1990) cited Lawlessrsquo work and since its inception to the late 1980rsquos the negative binomial regression model have been construed as a mixture model that is useful for accommodating otherwise over-dispersed Poisson data (Hardin amp Hilbe 2007) The negative binomial distribution function is written as
yk
kk
k
y
kyxyg
)1()(
)()|(
where y=012hellip k and are negative
binomial parameter with )(yE and
ky 2)var( k mention as disperse
parameter which is shown that the data consist of over-dispersed
Over-disperse at Count DataCount data for Poisson regression
including by over-disperse if variance bigger than mean or if the expected value of variance is smaller than expected This phenomenon is written as
)()( ii yEyVar (McCullagh amp Nelder 1989)
Zero-Inflated ModelsZero-Inflated models consider two distinct
sources of zero outcomes One source is generated from individuals who do not enter into the counting process the other from those who do enter the count process but result in a zero outcome (Hardin amp Hilbe 2007)
Lambert (1992) first described this type of mixture model in the context of process control in manufacturing It has since been used in many applications and is now founddiscussed in nearly every book or article dealing with count response models
For the zero-inflated model the probability of observing a zero outcome equals the probability that an individual is in the always-zero group plus the probability that individual is not that group times the probability that the counting process produces a zero If )0(B as
the probability that the binary process result in a zero outcomes and )0Pr( as the probability
that the counting of a zero outcomes the probability of a zero outcome for the system is then given by (Hardin amp Hilbe 2007)
)0Pr()1()0()0Pr( ZBy The probability of a nonzero count is
)Pr()]0(1[)0Pr( kBkky This model would produce two groups of
parameter one is zero-inflation parameter which shown that the covariate significantly contribute to having a zero outcomes And the other parameter is negative binomial parameter which modeling the response with the covariate
Zero-Inflated Negative BinomialThere are many kinds of zero-inflated
model each model has plus and minus and is used in different type of data Zero-Inflated negative binomial is one kind of them This model is used in over-disperse and excess-zero data As a result among parameter estimators there would be k parameters which indicate that over-disperse occur in data just as disperse parameter in negative binomial regression
The probability distribution of this model is as follow
)|( iii xyYP )|0()(1)( iii xgxx )|()(1 iii xygx
Where is a function of iz ix are vector
of zero-inflated covariate and is a vector of
zero-inflated coefficient which will be estimated Meanwhile )|( ii xyg is probability
distribution of negative binomial written asiy
i
i
iii
iii y
yxyg
)1()(
)()|(
Mean and variance of ZINB are
))(1)(1()|(
)1()|(
iiiiii
iiii
xyV
xyE
Jackknife Method of Estimating MSE( EBi )
Jackknife methods is one of general methods used in survey because itrsquos unpretentious concept (Jiang Lahiri and Wan 2002) This methods have been known by Tukey (1958) and developed to be a method that capable to be bias corrected of estimator by remove observation-i for i=1hellipm and performs parameter estimation
Rao (2003) the Jackknife step to estimate MSE( EB
i ) are
1 Assume that )ˆˆ(ˆ iiEBi yk
)ˆˆ(ˆ111 ii
EBi yk then calculate
m
l
EBi
EBii m
mM 2
12 )ˆˆ(1ˆ
2 Calculate the delete-i estimator 1
ˆ
and
1 then calculate
4
)]ˆˆ()ˆˆ([1
)ˆˆ(ˆ111111 ii
m
miiiiii ygyg
m
mygM
And )ˆ( 21 vig is the variance estimator of
posterior distribution which is used to measure the variability associated with i
The use of )ˆ( 21 vig is leads to severe of
underestimation of )ˆ( EBiJMSE related
with estimation in prior parameter Therefore the estimator
iM1ˆ correct the
bias of )ˆ( 21 vig
3 Calculate the jackknife estimator of MSE( EB
i ) as
iiEBiJ MMMSE 21
ˆˆ)ˆ(
METHODOLOGY
DataThis research assumed that the available
auxiliary data is on area level so this research used basic area level model The data were simulated with 30 small areas and one covariate Every batch generated different conditions of excess-zeros data start from 01 until 09 probability of zero in small area This research assumed structure of relation between respond and covariate was linear
MethodsThe following steps in generating data
using SAS 91 were used1 Fix the value of
iX for the- i th area
2 Define the expected probability of zero in each small area ))0(( iYP then
calculate ))0(log( ii YPLambda
3 Generate )11(~ Gammai4 Calculate )log(
iiLambda 5 Fit linear regression between and
iX to
obtain0 and
16 Calculate )`exp(X= ii 7 Calculate
iiparmlambda 8 Generate )(~ parmlambdaPoissonyi
Moreover in analyzing data the following steps were applied 1 Generate the negative binomial regression
with genmod procedure in SAS 91 and Zero-Inflated Negative binomial Regression with countreg procedure in SAS 92
2 Estimate the prior parameter which are and
3 Estimate using EB method4 Calculate MSE for indirect estimation5 Calculate RRMSE (Root Relative Mean
Square Error)
i
ii
MSERRMSE
ˆ)ˆ(
)ˆ(
RESULT AND DISCUSSION
Estimation of Prior Parameter is Based on EB Method with Negative Binomial
RegressionIn case of non-excess-zero data the
estimator produced small and consistent MSE Meanwhile if the number of excess-zero isapproximately 30 or more with expected probability of zero 06 the performance of estimates tends to be unreliable As a result EB estimation produced negative values
RRMSE of the estimator increasessimultaneously along with the increase of number of zero in the data Furthermore if thedata contain excess zero at least 30 theestimator is unreliable
Table 1 MSE and RRMSE of EB Estimator with NBR
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 033 016 018 01302 035 020 026 02003 040 023 036 03004 042 027 050 04205 045 031 072 05906 -12875 033 -038 08107 253671 040 -1216 13508 -584495 030 30946 21109 39135606 016 116E+10 664
Table 2 MSE (II) and RRMSE (II) of EB Estimator with NBR
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 033 016 018 01302 035 020 026 02003 040 023 036 03004 042 027 050 04205 046 031 071 05806 26197 033 -035 07507 950007 040 -1002 09908 1444250 030 22054 11009 41595285 016 677E+09 056
5
Table 1 show that the iterative process produced unexpected negative values of MSEThe simplest way to solve this problem is tochange the negative value to zero MSE (II) and RRMSE (II) in table 2 are the result of MSE and RRMSE after the negative value of MSE has been changed to zero
When data have expected probability of zero by 06 to 09 mean of MSE (II) increases drastically Similarly mean of RRMSE (II) increases sharply when data have 08 to 09 expected probability of zero However when data have 06 to 07 expected probability of zero the mean of RRMSE (II) is negative due to the negative value of EB estimates
Estimation of Prior Parameter is Based on EB Method with Zero-Inflated
Negative Binomial RegressionThe EB estimates are similar to the
estimates produced by NBR method although they are slightly outperformed NBR method when the data only contain small number of zeros In particular as shown by table 3 if data have expected probability of zero by 01 to 05 ZINB produces bigger MSE for EB estimator than which NBR produces
Whereas if data have expected probability of zero by 06 to 07 ZINB gives better estimates The estimates were also unbiased as it covers parameter values adequately However ZINB begins to produce inconsistent estimates if data have expected probability of zero by 08 or more due to enormous MSE
Besides when data have expected is because ZINB generates small estimates which is close to the parameter values
Mean of MSE (II) with ZINB is biggerthan the mean of MSE with ZINB That is because when negative value of MSE changed to zero it doesnrsquot have reduction factor in the mean calculation
Comparison of EB estimator withNegative Binomial Regression and EB
estimator with ZINBEB estimates given by both NBR and
ZINB methods are similar for data with small numbers of zero However ZINB method produces bigger MSE than NBR do as long as expected probability of zero in data does not exceed 06 thresholds
But ZINB method performs better if data have expected probability of zero by 06 to 07 In this case EB estimates given by NBR method are unstable and inconsistent due to estimatesrsquo negative value and huge MSE that
can be thousand times larger than theiracceptable value On the other hand EB estimator with ZINB works well it givesunbiased estimates and its MSE values are more stable than EB estimates with NBR
Both methods would have performed poorly if data had expected probability of zero by 08 or more EB estimators with both methods were inconsistent as a result of very huge MSE values they produced
Table 3 MSE and RRMSE of EB Estimator with ZINB
Probability of zero
Mean of MSE
Median ofMSE
Mean ofRRMSE
Median of RRMSE
01 045 017 024 01402 043 020 033 02103 071 028 052 03204 054 028 0632 04205 086 033 7322807 06606 061 038 29817 10307 058 025 218119 19408 -128 -14E-07 162697 37509 2954790 -1E-06 35E+278 609508
Table 4 MSE (II) and RRMSE (II) of EB Estimator with ZINB
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 045 017 024 01402 0436 020 0324 02103 072 028 051 031104 055 028 061 04105 095 033 6561235 05806 075 038 23406 07007 150 025 134655 06908 175 0 733506 009 2954908 0 12E+278 0
CONCLUSION
Excess-zero in data highly influenced the result of EB estimation Conventional method such as negative binomial regression in prior estimation has produced unbiased and unreliable EB estimator for data with expected probability of zero by 06 This is shown bybig number of MSE and negative value of estimator
Meanwhile EB estimation by ZINB method produced more reliable estimator even when the data have expected probability of zero by 06 to 07
The ZINB has also provided a reliable estimator for data with less than 5333 of zeros This means that performance of ZINB
6
declines when the data have expected probability of zero by 08 or more As shown by the big MSE and inconsistent estimator
RECOMMENDATION
This research is based on many assumptions and suffered by several limitations If the assumptions and boundaries can be relaxed can be expected better result There are some recommendations for the next research1 The generating process in this research
does not reflect the real sampling processIf the generating process similar to the real sampling process it might give better result because it will be closer with the real application
2 It will be more interesting to runexperiment which takes account of larger number of areas since the number of areas will influence data modeling
3 The Restricted Maximum Likelihood maybe applied when estimating prior parameter with ZINB and NBR in other to solve the negative value of MSE
4 Theoretical research of ZINB and Empirical Bayes estimator is important to understand the behavior of parameter estimates of ZINB in Empirical Bayes setting
REFERENCES
Erdman D L Jackson A Sinko 2008 Zero-Inflated Poisson and Zero-Inflated Negative Binomial Models Using the COUNTREG Procedure SAS Global Forum 2008322-2008httpwww2sascomproceedingsforum2008322-2008pdf [25 Agustus 2008]
Famoye F KP Singh 2006 Zero-Inflated Generalized Poisson Regression Model with an Application to Domestic Violence Data Journal of Data Science 4117-130
Hardin JW JM Hilbe 2007 Generalized Linear Models and Extensions Texas A Stata Press Publication
Kurnia A KA Notodiputro 2006 Penerapan Metode Jackknife dalam pendugaan Area Kecil Forum Statistika dan Komputasi April 2006 p12-15
Kismiantini 2007 Pendugaan Statistik Area Kecil Berbasis Model Poisson-Gamma [Tesis] Bogor Institut Pertanian Bogor Fakultas Matematika dan Pengetahuan Alam
McCullagh P J A Nelder 1983 Generalized Linear Models London Chapmann and Hall
Ramsini B et all 2001 Uninsured Estimates by County A Review of Options and IssueshttpwwwodhohiogovDataOFHSurvofhsrfq7pdf [24 April 2008]
Rao JNK 2003 Small Area Estimation New York John Wiley amp Sons
Wakefield J 2006 Disease mapping and spatial regression with count data httpwwwbepresscomuwbiostatpaper286pdf [24 April 2008]
7
Appendix 1 Result of EB estimation with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE -100569 0194196 0271669 041875 045917 3239598RRMSE 0123605 0300339 0422426 0503566 0642418 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE -237643 0235733 0306641 0452955 05091 3652167RRMSE 0038956 0412708 0588924 0717336 0844735 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE -749011 0254097 0330078 -12875 0539873 2354887RRMSE -663045 051763 0813734 -038057 1287528 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE -7513075 0235378 0402092 2536714 0876569 6051162RRMSE -10741 0704796 1355566 -121606 3040291 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601
bias 0000395 0116669 0254473 1091172 0497898 6297454MSE -6E+09 -016583 0301527 -584495 5718409 185E+09RRMSE -212936 0927338 2115163 3094627 1359703 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655MSE -38E+09 -130817 0159682 39135606 3074073 12E+11
RRMSE -909131 1647188 6639631 116E+10 1585472 706E+11
8
Appendix 2 Result of EB estimation with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE -053954 010933 0168797 0449506 0369775 360843RRMSE 0022947 0096443 0136424 0238099 0241955 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE -07309 0126202 0201463 0425844 0414597 1734815RRMSE 0021807 0144983 0210692 0326097 0401786 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE -229891 0156942 0277017 0707983 0590466 7469014RRMSE 0023998 0210095 0317195 0519524 0618802 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE -125713 0181557 0284338 0540615 0498521 423089RRMSE 0054916 028362 0420396 0630776 0778033 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442
MSE -181856 0194818 0334706 0859252 0711939 7997074RRMSE 0026206 0387294 0662251 7322807 1312302 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE -34589 0078006 0376514 060793 0804116 3426488RRMSE 000461 0502807 1033578 2981671 2012552 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE -142213 -001433 0255331 0584152 1132152 264456RRMSE 0064209 0847956 1942286 2181192 4589042 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE -10651 -56E-05 -14E-07 -127819 1452962 1132741RRMSE 0063244 1475413 3754705 162697 9221163 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE -175652 -33E-05 -1E-06 2954790 152E-06 613E+08
RRMSE 0040681 4059441 6095076 35E+278 5569021 16E+281
9
Appendix 3 Result of EB estimation (II) with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE 0 0194196 0271669 0419181 045917 3239598RRMSE 0 0300116 0422209 0502895 0641904 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE 0 0235733 0306641 0456258 05091 3652167RRMSE 0 0410357 0585765 0712314 0841838 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE 0 0254097 0330078 2619677 0539873 2354887RRMSE -663045 0448118 0750369 -034911 1209918 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE 0 0235378 0402092 9500073 0876569 6051162RRMSE -10741 0288999 0995659 -100163 2527784 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601bias 0000395 0116669 0254473 1091172 0497898 6297454MSE 0 0 0301527 1444250 5718409 185E+09RRMSE -212936 0 1104113 2205437 5656681 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655
MSE 0 0 0159682 41595285 3074073 12E+11
RRMSE -909131 0 0557622 677E+09 9311925 706E+11
10
Appendix 4 Result of EB estimation (II) with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE 0 010933 0168797 0450626 0369775 360843RRMSE 0 0095932 0135647 023675 0239669 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE 0 0126202 0201463 0428006 0414597 1734815RRMSE 0 0142648 020709 0320663 0395479 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE 0 0156942 0277017 0716543 0590466 7469014RRMSE 0 0203913 0311937 0506882 0615401 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE 0 0181557 0284338 0549835 0498521 423089RRMSE 0 0270309 0405926 0606317 0766631 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442MSE 0 0194818 0334706 094973 0711939 7997074RRMSE 0 0316402 0576343 6561235 1240175 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE 0 0078006 0376514 0749436 0804116 3426488RRMSE 0 0258286 0698814 2340612 1714808 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE 0 0 0255331 1501268 1132152 264456RRMSE 0 0 0688797 1346552 2500825 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE 0 0 0 1755486 1452962 1132741RRMSE 0 0 0 7335062 3311711 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE 0 0 0 2954908 152E-06 613E+08
RRMSE 0 0 0 12E+278 416189 16E+281
11
Appendix 5 Syntax program for generate data
data b generate x1(covariate) and ei input x1cards0222831971100013131702314625252218171412202210run
macro bangkit_datado r=1 to 100
data egenerate poisson-gamma with excess zerodo kk=1 to 30set btetha = rangam(11)lambda = -log(01) peluang munculnya nilai nol yang diinginkan (01-09)starlambda = log(lambdatetha)output endrun
proc regmodel starlambda = x1 ods output ParameterEstimates=workbetha_lr (keep=Parameter Estimate)run
proc transpose data=workbetha_lr out=workbetha_lr_t
12
Appendix 5 Syntax program for generate data (continued)
rundata _null_set workbetha_lr_tcall symput (Intercept col1)call symput (x1 col2)run
data ddo kk=1 to 30set emu = exp(ampIntercept + ampx1x1)parmlambda = mutethaypoi = rand(poissonparmlambda)output endrun
ods trace onto take percent zero on dataproc freq data=dtables ypoi ods output OneWayFreqs=workzerorundata zeroset zerokeep percentrunproc transpose data=zero out=zero1 rundata _null_set workzero1call symput (pctz col1)rundata dset dpzero=amppctzr=amprrun
proc append data=d base=d1run
endmend
bangkit_data
13
Appendix 6 Syntax program EB with NBR
macro sae_nbdo x=1 to 900
data workaset workeif ^(u=ampx) then deleterun
this genmod procedure estimates the response without zero-inflation proc genmod data=amodel ypoi = x1 dist=nb link=logods output ParameterEstimates=workbetha_nb (keep=Parameter Estimate)run
proc transpose data=workbetha_nb out=workbetha_nb_trun
data _null_set workbetha_nb_tcall symput (Intercept col1)call symput (x1 col2)call symput (Dispersion col3)run
EB with negbin-regdata workduga_nbset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + ampDispersion)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(ampDispersion+ypoi)((mu_hat_b+ampDispersion)2)bias_b=abs(teta_hat_bayes-parmlambda)run
proc append data=workduga_nb base=workduga_nb1run
jacknifedo h=1 to 30
data workdset workduga_nb1if ^(u=ampx) then deleterundata workjacknbamphset workdif u=ampxif kk=amph then deleterun
proc genmod data=workjacknbamph output p out=sasyi_estmodel ypoi = x1 dist = nb link=logods output parameterestimates=workbetha_est_nbamph (keep=parameter Estimate)
14
Appendix 6 Syntax program EB with NBR (continued)
runproc transpose data=workbetha_est_nbamph out=workbetha_est_nbtamphrundata _null_set workbetha_est_nbtamphcall symput (Intercept_ col1)call symput (x1_ col2)call symput (Dispersion_ col3)run
data workduganbamphset workdmu_hat_b_amph=exp(ampIntercept_ + ampx1_x1)w_b_amph=mu_hat_b_amph (mu_hat_b_amph + ampDispersion_)teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2g1_amph=(ampDispersion_+ypoi)((mu_hat_b_amph+ampDispersion_)2)beda_g_amph=g1_amph-g1run
data workmse_nb_jmerge workduganb1 workduganb2 workduganb3 workduganb4 workduganb5 workduganb6 workduganb7 workduganb8 workduganb9 workduganb10 workduganb11 workduganb12workduganb13 workduganb14 workduganb15 workduganb16 workduganb17workduganb18 workduganb19 workduganb20 workduganb21 workduganb22workduganb23 workduganb24 workduganb25 workduganb26 workduganb27workduganb28 workduganb29 workduganb30by kkrun
data workmse_nb_jset workmse_nb_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampjendm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesul = ampxrun
proc append data=workmse_nb_j base=workmse_nb_j1run
data workhasilnbmerge workd workmse_nb_j keep kk x1 tetha mu parmlambda ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_b
15
Appendix 6 Syntax program EB with NBR (continued)
run
ods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilnb BASE=workhasilnb1 appendver=v6run
ENDmend
sae_nb
16
Appendix 7 Syntax program EB with ZINB
macro sae_zinb
do x=1 to 900
data workaset work eif ^(u=ampx) then deleterun
proc countreg data=amodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workpe(keep=Parameter Estimate)run
proc transpose data=workpe out=workpe_trun
data _null_set workpe_tcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + amp_Alpha)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(amp_Alpha+ypoi)((mu_hat_b+amp_Alpha)2)bias_b=abs(teta_hat_bayes-parmlamdha)
run
proc append data=workduga base=workduga1run
do h=1 to 30
data workdset workduga1if ^(u=ampx) then deleterundata workjackzinbamphset workdif u=ampxif kk=amph then deleterun
proc countreg data=jackzinbamphmodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workbetha_est_ZINBamph
17
Appendix 7 Syntax program EB with ZINB (continued)
(keep=Parameter Estimate)run
proc transpose data=workbetha_est_ZINBamph out=workbetha_est_ZINBtamphrun
data _null_set workbetha_est_ZINBtamphcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaZINBamphset workdmu_hat_b_amph=exp(ampIntercept + ampx1x1)mu_hat_b_amph= ampb_o- + ampb_1- x1w_b_amph=mu_hat_b_amph (mu_hat_b_amph + (amp_Alpha))teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2
g1_amph =((mu_hat_b_amph2ampalpha_)2)(ampalpha_+y_i)((mu_hat_b_amph2ampalpha_)+mu_hat_b_amph)2
g1_amph=(amp_Alpha+ypoi)((mu_hat_b_amph+amp_Alpha)2)
g1_amph =(A2)(ampk- + y_i)( a +mu_hat_b)2
beda_g_amph=g1_amph-g1run
data workmse_ZINB_jmerge workdugaZINB1 workdugaZINB2 workdugaZINB3 workdugaZINB4 workdugaZINB5 workdugaZINB6 workdugaZINB7 workdugaZINB8 workdugaZINB9 workdugaZINB10 workdugaZINB11 workdugaZINB12workdugaZINB13 workdugaZINB14 workdugaZINB15 workdugaZINB16 workdugaZINB17workdugaZINB18 workdugaZINB19 workdugaZINB20 workdugaZINB21 workdugaZINB22workdugaZINB23 workdugaZINB24 workdugaZINB25 workdugaZINB26 workdugaZINB27workdugaZINB28 workdugaZINB29 workdugaZINB30by kkrun
data workmse_ZINB_jset workmse_ZINB_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampj
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb
iii
ZERO-INFLATED NEGATIVE BINOMIAL MODELS IN SMALL AREA ESTIMATION
BYIRENE MUFLIKH NADHIROH
G14104031
Final Project ReportAs a partial fulfillment for requirement of a Bachelor Degree in Science
at Department of StatisticsFaculty of Mathematics and Natural Sciences
DEPARTMENT OF STATISTICSFACULTY OF MATHEMATICS AND NATURAL SCIENCES
BOGOR AGRICULTURAL UNIVERSITY2009
iv
Title ZERO-INFLATED NEGATIVE BINOMIAL MODELS IN SMALL AREA ESTIMATION
Name Irene Muflikh NadhirohID No G14104031
Approved by
Advisor I Advisor II
Prof Dr Ir Khairil Anwar Notodiputro Ir Indahwati MSi NIP 130891386 NIP 131909223
Acknowledged byDean of Faculty of Mathematics and Natural Sciences
Bogor Agricultural University
Dr Drh Hasim DEANIP 131578806
Passed examination date
v
BIOGRAPHY
Irene Muflikh Nadhiroh was born in Padang on October 3th 1986 as a first daughter of Ir Irianto Oetomo and Fine Analisa Maharani She has two siblings
In 1998 she graduated from SD Dukuh 09 East Jakarta and then she continued his study at SLTP Negeri 1 Bogor and graduated at 2001 She finished her study at SMU Labschool Rawamangun Jakarta in 2004 and then enrolled in Bogor Agricultural University through USMI In 2004 she joined Department of Statistics Faculty of Mathematics and Natural Sciences
During her time of study she was signed up as lecturer assistant for Basic Statistics class and Experimental Design class in 2006 and 2007 respectively She was also a member of Gamma Sigma Beta (Statistics Students AssociationGSB) and had occupied the head of science division of GSB in 2006-2007 On February-March 2008 complete her fields practice at PT Field Dimension Indonesia
vi
ACKNOWLEDGEMENTS
First of all the author modestly admitted that completion of this paper would not be possible without invaluable help from many generous and extraordinary people The author was deeply in debt for their helps ideas critics and improvement advices during writing process However they should not be hold responsible for all mistakes and deficiencies in this paper which were purely authors So hereby I would like to express my graceful to
1 All praise and gratitude for Allah SWT Alhamdulillah hirabbil alamin With his bless I able to finish this paper Thanks Allah for giving me a wonderful life with extraordinary people around me
2 Prof Dr Khairil Anwar Notodiputro and Ir Indahwati MSi for the early motivation discussion advices support and their great enthusiasm
3 Mr Bambang Sumantri MSi as examiner thanks for the spirit advices and critics4 My beloved family for the unlimited love ever after 5 Mr Alfian Futuhul Hadi MSi for enlightening discussion when I was in trouble6 Mr Bagus Sartono MSi thank you very much to run my data at your lab with your
wonderful computer Sorry if it might disturb you7 Mr Anang MSi and Mr Rahman MSi for sharing their knowledge and technical support8 Mr Dr Ir Hari Wijayanto MS all lecturer and staff at Statistics Department Thanks for
knowledge of statistics and knowledge of life that you shared It means a lot for me9 Rahmatullah Sigit Dodiet Sasongko SSi for the spirit love care time and patience Keep
it real Still love me forever and ever10 Mr Dionisius Laksmana Bisara Putra SSi for edited my paper critics and provided useful
discussion for author 11 Maulana Chistanto SSi and Yhanuar Ismail SSi thank for being my best brother12 Nikhen Sevrien and (alm Dini) thanks for lighting my day13 Rere Yusri Agus Ika Cinong Toki Cheri Fisca Wiwik Neng Mala Lilis Dika
Rangga Lele Dodi Kus Inal Bebek Koler and all of Statisticsrsquo41 14 Everyone that helps me in this study which can not be named personally
This thesis is not perfect so I am expecting the critics advices and recommendation to people who read my thesis Thank You God bless you all
Bogor January 2009
Irene Muflikh Nadhiroh
1
TABLE OF CONTENTS
PageINTRODUCTION 1
Background 1Objectives1
LITERATURE REVIEW 1Direct Estimation1Small Area Estimation 1Small Area Models1Empirical Bayes Methods 2Poisson-Gamma Models 2Negative Binomial Regression 2Over-disperse at Count Data 3Zero-Inflated Models3Zero-Inflated Negative Binomial 3Jackknife Method of Estimating MSE( EB
i )3
METHODOLOGY 4Data 4Methods4
RESULT AND DISCUSSION 4Estimation of Prior Parameter is Based of EB Method with Negative Binomial Regression 4Estimation of Prior Parameter is Based of EB Method with Zero-Inflated Negative Binomial Regression 5Comparison of EB estimator with Negative Binomial Regression and EB estimator with ZINB 5
CONCLUSION 5RECOMMENDATION 6REFERENCES 6
LIST OF TABLES
PageTable 1 MSE and RRMSE of EB Estimator with NBR 4Table 2 MSE (II) and RRMSE (II) of EB Estimator with NBR 4Table 3 MSE and RRMSE of EB Estimator with ZINB 5Table 4 MSE (II) and RRMSE (II) of EB Estimator with ZINB 5
LIST OF APPENDICES
PageAppendix 1 Result of EB estimation with NBR 7Appendix 2 Result of EB estimation with ZINB 8Appendix 3 Result of EB estimation (II) with NBR 9Appendix 4 Result of EB estimation (II) with ZINB 10Appendix 5 Syntax program for generate data 11Appendix 6 Syntax program EB with NBR 13Appendix 7 Syntax program EB with ZINB 16
1
INTRODUCTION
BackgroundDirect estimation is usually applied in big
scale survey but it is sometime difficult to utilize such estimator in a smaller region especially the sample size is too small In this case indirect estimation which adds covariates to estimate the parameter is usually used This type of estimation is broadly known as Small Area Estimation
Kismiantini (2007) conducted a research in Small Area Estimation based on Poisson-Gamma models Maximum Likelihood Estimation was used with Negative Binomial Regression techniques to estimate the respective prior parameter Moreover Negative Binomial Regression was used to resolve over-dispersion problem in the data
In reality count data is not onlycharacterized by over-dispersion but sometimes by excess-zero Excess-zero is a condition when the data contains too many zero or exceeds the distributionrsquos expectation 100 observations from Poisson model with response mean of 4 we could expect that there will be 2 zeros If the data have 30 zeros it should be obvious that the distributional assumptions have been violated Therefore the estimated parameter and standard error will be biased (Hardin amp Hilbe 2007) In this paper Zero-Inflated models were adapted to solve this type of problem
ObjectivesThe research objectives are
1 To investigate the performance of Negative Binomial Regression on Small Area Estimation in case of excess-zero
2 To apply Zero-Inflated Count Models on Small Area Estimation in case of excess-zero
3 To evaluate the performance of Zero-Inflated Count Models in estimating prior parameter for Small Area Estimation
LITERATURE REVIEW
Direct EstimationDirect estimates are generally ldquodesign
basedrdquo in the sense that they make use of ldquosurvey weightrdquo and associated inferences are based on the probability distribution by the sample design with the population values held fixed (Rao 2003) In particular direct estimates of a domain parameter are based only on the domain-specific sample data
Data from sample survey have been used to be a reliable estimate of parameter Ramsini et al (2001) mentioned that direct estimates of small area are unbiased although it would have big variance cause itrsquos small sample size
Small Area EstimationThe term of small area can be everything
depending on our object of interest It can be a city age group sex group region and rural district In general small area is used to denote any domain which the direct estimation with adequate precision can not be produced (Rao 2003) It happens because the sample size in small area is too small As a result direct estimation based on sampling design is not capable to produce direct estimation with adequate precision Furthermore small area estimation is developed as a statistic technique for estimating the parameter of small area This technique is used in effort to make estimation with adequate level of precision It works as indirect estimation that lend the strength of variable interest values from related areas through the use of supplementary information related to variable interest such as recent census count and current administrative records (Rao 2003) Indirect estimation is a process of estimating a domainrsquos parameter by connecting the information in that domain with another domain using an appropriate model So the estimator works by including other domainrsquos data (Kurnia amp Notodiputro 2006)
Small Area ModelsThere are two link models in indirect
estimation First traditional method based on implicit models that provide a link to relate small area through supplementary data Second explicit small area models that make specific allowance between area variations (Rao 2003) This research used the second model and it could be classified into two broad types of basic model1 Basic area level (type A) model
Basic area level model or aggregate model includes all models that relate small area with area-specific auxiliary variables These models are essential if unit (element) level data are not available Assuming parameter estimators
i is
related to area specific auxiliary data or covariate variables T
pii xxx )( 11 by
a linear model
2
iiT
ii vbx with i=1hellipm
iv ~N(0 2v ) are area-specific random
effect and Tp )( 1 is 1p vector of
regression coefficients Therefore ib are
known as positive constants For making inferences about
i direct estimators iy
are assumed available Accordingly assuming
iii ey where i=1hellipm with
sampling error ie ~N(0 ei2 ) and ei
2are known At the end both models are combined and as a result is new model
iiiT
ii evbxy where i=1hellipm
(Rao 2003)2 Basic unit level (type B) model
Unit level model includes all models that relate unit values of the study variable to unit-specific auxiliary variables Assuming unit-specific auxiliary variables T
ijpijij xxx )( 1 and
correspondingly a nested regression model
ijiT
ijij evxy where
i=1hellipm and j=1hellip in with
iv ~N(0 2v ) and also ie ~N(0 ei
2 )
Empirical Bayes MethodsThe Bayesian approach is based on Bayes
Law which was found by Thomas Bayes This law was introduced by Richard Proce in 1763 two years after Thomas Bayes passed away In 1774 and 1781 Laplace gave the details and relevancies for modern Bayesian statistics (Gill 2002 in Kismiantini 2007)
Novick in Good (1980) mentioned that Bayes method is difficult to adopt and sometimes is very sensitive due to the requirement of prior probability informationwhich is usually difficult to obtain Robbin (1955) introduced Empirical Bayes methods by assuming a particular prior distribution estimating based on the sample Rao (2003) said that EB (Empirical Bayes) and HB (Hierarchical Bayes) are compatible for binary and count data in Small Area Estimation Therefore EB method was used in this research
Rao (2003) summarized EB methods in Small Area Estimation as follows 1 Obtain the posterior probability density
function of the small area parameter2 Estimate the parameters from the
marginal density function
3 Use the estimated posterior density forinferences regarding the parameters ofinterest
Poisson-Gamma ModelsPoisson model is a standard model in
dealing with count data Generally count data can be suffered by over-dispersion problem Therefore a Poisson formula had been developed to accommodate extra variance from sample data Two-stage models have been introduced for count data known as mixed model Poisson-Gamma Wakefield (2006) introduced Poisson-Gamma model which was easier to use with SMR (Standard Mortality Ratio) as a direct estimator This study used Wakefield model with alteration in direct estimator
Let iy be a number of specific individual
at small area-i which has specific characteristic of interest and written as follow
j
iji yy
ijy are the-jth object at the-ith small area where
j=1hellipn and i=1hellipm
First stage )(~ ii
ind
i Poissony is assumed
where )( ii x describes a regression
model in area level ix is a vector of
covariates and Tpii)( is a vector of
regression coefficientsSecond stage distribution
)1(~ gammaiid
iis assumed as a prior
distribution with mean 1 and variance 1
Then the marginal distribution |iy is
negative binomialMoreover Wakefield (2006) used Bayes
Theorem and acquired posterior distributionas
)1(~|i
iii ygammay
and EB estimator as
iiiiB
iEB
i )ˆ1(ˆˆ)ˆˆ(ˆˆ
with )ˆˆ(ˆˆ iii ii y are direct
estimation from i and iy are the number of
observation
Negative Binomial Regression The negative binomial regression model
seems have been first discussed by Anscombe (1972) Others have pointed out its success indealing with over-dispersed count data
3
Lawless (1987) elaborated the mixture model parameterization of the negative binomial providing formulas for its log likelihood mean variance and moments Later Breslow (1990) cited Lawlessrsquo work and since its inception to the late 1980rsquos the negative binomial regression model have been construed as a mixture model that is useful for accommodating otherwise over-dispersed Poisson data (Hardin amp Hilbe 2007) The negative binomial distribution function is written as
yk
kk
k
y
kyxyg
)1()(
)()|(
where y=012hellip k and are negative
binomial parameter with )(yE and
ky 2)var( k mention as disperse
parameter which is shown that the data consist of over-dispersed
Over-disperse at Count DataCount data for Poisson regression
including by over-disperse if variance bigger than mean or if the expected value of variance is smaller than expected This phenomenon is written as
)()( ii yEyVar (McCullagh amp Nelder 1989)
Zero-Inflated ModelsZero-Inflated models consider two distinct
sources of zero outcomes One source is generated from individuals who do not enter into the counting process the other from those who do enter the count process but result in a zero outcome (Hardin amp Hilbe 2007)
Lambert (1992) first described this type of mixture model in the context of process control in manufacturing It has since been used in many applications and is now founddiscussed in nearly every book or article dealing with count response models
For the zero-inflated model the probability of observing a zero outcome equals the probability that an individual is in the always-zero group plus the probability that individual is not that group times the probability that the counting process produces a zero If )0(B as
the probability that the binary process result in a zero outcomes and )0Pr( as the probability
that the counting of a zero outcomes the probability of a zero outcome for the system is then given by (Hardin amp Hilbe 2007)
)0Pr()1()0()0Pr( ZBy The probability of a nonzero count is
)Pr()]0(1[)0Pr( kBkky This model would produce two groups of
parameter one is zero-inflation parameter which shown that the covariate significantly contribute to having a zero outcomes And the other parameter is negative binomial parameter which modeling the response with the covariate
Zero-Inflated Negative BinomialThere are many kinds of zero-inflated
model each model has plus and minus and is used in different type of data Zero-Inflated negative binomial is one kind of them This model is used in over-disperse and excess-zero data As a result among parameter estimators there would be k parameters which indicate that over-disperse occur in data just as disperse parameter in negative binomial regression
The probability distribution of this model is as follow
)|( iii xyYP )|0()(1)( iii xgxx )|()(1 iii xygx
Where is a function of iz ix are vector
of zero-inflated covariate and is a vector of
zero-inflated coefficient which will be estimated Meanwhile )|( ii xyg is probability
distribution of negative binomial written asiy
i
i
iii
iii y
yxyg
)1()(
)()|(
Mean and variance of ZINB are
))(1)(1()|(
)1()|(
iiiiii
iiii
xyV
xyE
Jackknife Method of Estimating MSE( EBi )
Jackknife methods is one of general methods used in survey because itrsquos unpretentious concept (Jiang Lahiri and Wan 2002) This methods have been known by Tukey (1958) and developed to be a method that capable to be bias corrected of estimator by remove observation-i for i=1hellipm and performs parameter estimation
Rao (2003) the Jackknife step to estimate MSE( EB
i ) are
1 Assume that )ˆˆ(ˆ iiEBi yk
)ˆˆ(ˆ111 ii
EBi yk then calculate
m
l
EBi
EBii m
mM 2
12 )ˆˆ(1ˆ
2 Calculate the delete-i estimator 1
ˆ
and
1 then calculate
4
)]ˆˆ()ˆˆ([1
)ˆˆ(ˆ111111 ii
m
miiiiii ygyg
m
mygM
And )ˆ( 21 vig is the variance estimator of
posterior distribution which is used to measure the variability associated with i
The use of )ˆ( 21 vig is leads to severe of
underestimation of )ˆ( EBiJMSE related
with estimation in prior parameter Therefore the estimator
iM1ˆ correct the
bias of )ˆ( 21 vig
3 Calculate the jackknife estimator of MSE( EB
i ) as
iiEBiJ MMMSE 21
ˆˆ)ˆ(
METHODOLOGY
DataThis research assumed that the available
auxiliary data is on area level so this research used basic area level model The data were simulated with 30 small areas and one covariate Every batch generated different conditions of excess-zeros data start from 01 until 09 probability of zero in small area This research assumed structure of relation between respond and covariate was linear
MethodsThe following steps in generating data
using SAS 91 were used1 Fix the value of
iX for the- i th area
2 Define the expected probability of zero in each small area ))0(( iYP then
calculate ))0(log( ii YPLambda
3 Generate )11(~ Gammai4 Calculate )log(
iiLambda 5 Fit linear regression between and
iX to
obtain0 and
16 Calculate )`exp(X= ii 7 Calculate
iiparmlambda 8 Generate )(~ parmlambdaPoissonyi
Moreover in analyzing data the following steps were applied 1 Generate the negative binomial regression
with genmod procedure in SAS 91 and Zero-Inflated Negative binomial Regression with countreg procedure in SAS 92
2 Estimate the prior parameter which are and
3 Estimate using EB method4 Calculate MSE for indirect estimation5 Calculate RRMSE (Root Relative Mean
Square Error)
i
ii
MSERRMSE
ˆ)ˆ(
)ˆ(
RESULT AND DISCUSSION
Estimation of Prior Parameter is Based on EB Method with Negative Binomial
RegressionIn case of non-excess-zero data the
estimator produced small and consistent MSE Meanwhile if the number of excess-zero isapproximately 30 or more with expected probability of zero 06 the performance of estimates tends to be unreliable As a result EB estimation produced negative values
RRMSE of the estimator increasessimultaneously along with the increase of number of zero in the data Furthermore if thedata contain excess zero at least 30 theestimator is unreliable
Table 1 MSE and RRMSE of EB Estimator with NBR
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 033 016 018 01302 035 020 026 02003 040 023 036 03004 042 027 050 04205 045 031 072 05906 -12875 033 -038 08107 253671 040 -1216 13508 -584495 030 30946 21109 39135606 016 116E+10 664
Table 2 MSE (II) and RRMSE (II) of EB Estimator with NBR
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 033 016 018 01302 035 020 026 02003 040 023 036 03004 042 027 050 04205 046 031 071 05806 26197 033 -035 07507 950007 040 -1002 09908 1444250 030 22054 11009 41595285 016 677E+09 056
5
Table 1 show that the iterative process produced unexpected negative values of MSEThe simplest way to solve this problem is tochange the negative value to zero MSE (II) and RRMSE (II) in table 2 are the result of MSE and RRMSE after the negative value of MSE has been changed to zero
When data have expected probability of zero by 06 to 09 mean of MSE (II) increases drastically Similarly mean of RRMSE (II) increases sharply when data have 08 to 09 expected probability of zero However when data have 06 to 07 expected probability of zero the mean of RRMSE (II) is negative due to the negative value of EB estimates
Estimation of Prior Parameter is Based on EB Method with Zero-Inflated
Negative Binomial RegressionThe EB estimates are similar to the
estimates produced by NBR method although they are slightly outperformed NBR method when the data only contain small number of zeros In particular as shown by table 3 if data have expected probability of zero by 01 to 05 ZINB produces bigger MSE for EB estimator than which NBR produces
Whereas if data have expected probability of zero by 06 to 07 ZINB gives better estimates The estimates were also unbiased as it covers parameter values adequately However ZINB begins to produce inconsistent estimates if data have expected probability of zero by 08 or more due to enormous MSE
Besides when data have expected is because ZINB generates small estimates which is close to the parameter values
Mean of MSE (II) with ZINB is biggerthan the mean of MSE with ZINB That is because when negative value of MSE changed to zero it doesnrsquot have reduction factor in the mean calculation
Comparison of EB estimator withNegative Binomial Regression and EB
estimator with ZINBEB estimates given by both NBR and
ZINB methods are similar for data with small numbers of zero However ZINB method produces bigger MSE than NBR do as long as expected probability of zero in data does not exceed 06 thresholds
But ZINB method performs better if data have expected probability of zero by 06 to 07 In this case EB estimates given by NBR method are unstable and inconsistent due to estimatesrsquo negative value and huge MSE that
can be thousand times larger than theiracceptable value On the other hand EB estimator with ZINB works well it givesunbiased estimates and its MSE values are more stable than EB estimates with NBR
Both methods would have performed poorly if data had expected probability of zero by 08 or more EB estimators with both methods were inconsistent as a result of very huge MSE values they produced
Table 3 MSE and RRMSE of EB Estimator with ZINB
Probability of zero
Mean of MSE
Median ofMSE
Mean ofRRMSE
Median of RRMSE
01 045 017 024 01402 043 020 033 02103 071 028 052 03204 054 028 0632 04205 086 033 7322807 06606 061 038 29817 10307 058 025 218119 19408 -128 -14E-07 162697 37509 2954790 -1E-06 35E+278 609508
Table 4 MSE (II) and RRMSE (II) of EB Estimator with ZINB
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 045 017 024 01402 0436 020 0324 02103 072 028 051 031104 055 028 061 04105 095 033 6561235 05806 075 038 23406 07007 150 025 134655 06908 175 0 733506 009 2954908 0 12E+278 0
CONCLUSION
Excess-zero in data highly influenced the result of EB estimation Conventional method such as negative binomial regression in prior estimation has produced unbiased and unreliable EB estimator for data with expected probability of zero by 06 This is shown bybig number of MSE and negative value of estimator
Meanwhile EB estimation by ZINB method produced more reliable estimator even when the data have expected probability of zero by 06 to 07
The ZINB has also provided a reliable estimator for data with less than 5333 of zeros This means that performance of ZINB
6
declines when the data have expected probability of zero by 08 or more As shown by the big MSE and inconsistent estimator
RECOMMENDATION
This research is based on many assumptions and suffered by several limitations If the assumptions and boundaries can be relaxed can be expected better result There are some recommendations for the next research1 The generating process in this research
does not reflect the real sampling processIf the generating process similar to the real sampling process it might give better result because it will be closer with the real application
2 It will be more interesting to runexperiment which takes account of larger number of areas since the number of areas will influence data modeling
3 The Restricted Maximum Likelihood maybe applied when estimating prior parameter with ZINB and NBR in other to solve the negative value of MSE
4 Theoretical research of ZINB and Empirical Bayes estimator is important to understand the behavior of parameter estimates of ZINB in Empirical Bayes setting
REFERENCES
Erdman D L Jackson A Sinko 2008 Zero-Inflated Poisson and Zero-Inflated Negative Binomial Models Using the COUNTREG Procedure SAS Global Forum 2008322-2008httpwww2sascomproceedingsforum2008322-2008pdf [25 Agustus 2008]
Famoye F KP Singh 2006 Zero-Inflated Generalized Poisson Regression Model with an Application to Domestic Violence Data Journal of Data Science 4117-130
Hardin JW JM Hilbe 2007 Generalized Linear Models and Extensions Texas A Stata Press Publication
Kurnia A KA Notodiputro 2006 Penerapan Metode Jackknife dalam pendugaan Area Kecil Forum Statistika dan Komputasi April 2006 p12-15
Kismiantini 2007 Pendugaan Statistik Area Kecil Berbasis Model Poisson-Gamma [Tesis] Bogor Institut Pertanian Bogor Fakultas Matematika dan Pengetahuan Alam
McCullagh P J A Nelder 1983 Generalized Linear Models London Chapmann and Hall
Ramsini B et all 2001 Uninsured Estimates by County A Review of Options and IssueshttpwwwodhohiogovDataOFHSurvofhsrfq7pdf [24 April 2008]
Rao JNK 2003 Small Area Estimation New York John Wiley amp Sons
Wakefield J 2006 Disease mapping and spatial regression with count data httpwwwbepresscomuwbiostatpaper286pdf [24 April 2008]
7
Appendix 1 Result of EB estimation with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE -100569 0194196 0271669 041875 045917 3239598RRMSE 0123605 0300339 0422426 0503566 0642418 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE -237643 0235733 0306641 0452955 05091 3652167RRMSE 0038956 0412708 0588924 0717336 0844735 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE -749011 0254097 0330078 -12875 0539873 2354887RRMSE -663045 051763 0813734 -038057 1287528 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE -7513075 0235378 0402092 2536714 0876569 6051162RRMSE -10741 0704796 1355566 -121606 3040291 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601
bias 0000395 0116669 0254473 1091172 0497898 6297454MSE -6E+09 -016583 0301527 -584495 5718409 185E+09RRMSE -212936 0927338 2115163 3094627 1359703 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655MSE -38E+09 -130817 0159682 39135606 3074073 12E+11
RRMSE -909131 1647188 6639631 116E+10 1585472 706E+11
8
Appendix 2 Result of EB estimation with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE -053954 010933 0168797 0449506 0369775 360843RRMSE 0022947 0096443 0136424 0238099 0241955 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE -07309 0126202 0201463 0425844 0414597 1734815RRMSE 0021807 0144983 0210692 0326097 0401786 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE -229891 0156942 0277017 0707983 0590466 7469014RRMSE 0023998 0210095 0317195 0519524 0618802 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE -125713 0181557 0284338 0540615 0498521 423089RRMSE 0054916 028362 0420396 0630776 0778033 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442
MSE -181856 0194818 0334706 0859252 0711939 7997074RRMSE 0026206 0387294 0662251 7322807 1312302 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE -34589 0078006 0376514 060793 0804116 3426488RRMSE 000461 0502807 1033578 2981671 2012552 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE -142213 -001433 0255331 0584152 1132152 264456RRMSE 0064209 0847956 1942286 2181192 4589042 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE -10651 -56E-05 -14E-07 -127819 1452962 1132741RRMSE 0063244 1475413 3754705 162697 9221163 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE -175652 -33E-05 -1E-06 2954790 152E-06 613E+08
RRMSE 0040681 4059441 6095076 35E+278 5569021 16E+281
9
Appendix 3 Result of EB estimation (II) with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE 0 0194196 0271669 0419181 045917 3239598RRMSE 0 0300116 0422209 0502895 0641904 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE 0 0235733 0306641 0456258 05091 3652167RRMSE 0 0410357 0585765 0712314 0841838 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE 0 0254097 0330078 2619677 0539873 2354887RRMSE -663045 0448118 0750369 -034911 1209918 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE 0 0235378 0402092 9500073 0876569 6051162RRMSE -10741 0288999 0995659 -100163 2527784 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601bias 0000395 0116669 0254473 1091172 0497898 6297454MSE 0 0 0301527 1444250 5718409 185E+09RRMSE -212936 0 1104113 2205437 5656681 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655
MSE 0 0 0159682 41595285 3074073 12E+11
RRMSE -909131 0 0557622 677E+09 9311925 706E+11
10
Appendix 4 Result of EB estimation (II) with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE 0 010933 0168797 0450626 0369775 360843RRMSE 0 0095932 0135647 023675 0239669 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE 0 0126202 0201463 0428006 0414597 1734815RRMSE 0 0142648 020709 0320663 0395479 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE 0 0156942 0277017 0716543 0590466 7469014RRMSE 0 0203913 0311937 0506882 0615401 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE 0 0181557 0284338 0549835 0498521 423089RRMSE 0 0270309 0405926 0606317 0766631 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442MSE 0 0194818 0334706 094973 0711939 7997074RRMSE 0 0316402 0576343 6561235 1240175 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE 0 0078006 0376514 0749436 0804116 3426488RRMSE 0 0258286 0698814 2340612 1714808 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE 0 0 0255331 1501268 1132152 264456RRMSE 0 0 0688797 1346552 2500825 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE 0 0 0 1755486 1452962 1132741RRMSE 0 0 0 7335062 3311711 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE 0 0 0 2954908 152E-06 613E+08
RRMSE 0 0 0 12E+278 416189 16E+281
11
Appendix 5 Syntax program for generate data
data b generate x1(covariate) and ei input x1cards0222831971100013131702314625252218171412202210run
macro bangkit_datado r=1 to 100
data egenerate poisson-gamma with excess zerodo kk=1 to 30set btetha = rangam(11)lambda = -log(01) peluang munculnya nilai nol yang diinginkan (01-09)starlambda = log(lambdatetha)output endrun
proc regmodel starlambda = x1 ods output ParameterEstimates=workbetha_lr (keep=Parameter Estimate)run
proc transpose data=workbetha_lr out=workbetha_lr_t
12
Appendix 5 Syntax program for generate data (continued)
rundata _null_set workbetha_lr_tcall symput (Intercept col1)call symput (x1 col2)run
data ddo kk=1 to 30set emu = exp(ampIntercept + ampx1x1)parmlambda = mutethaypoi = rand(poissonparmlambda)output endrun
ods trace onto take percent zero on dataproc freq data=dtables ypoi ods output OneWayFreqs=workzerorundata zeroset zerokeep percentrunproc transpose data=zero out=zero1 rundata _null_set workzero1call symput (pctz col1)rundata dset dpzero=amppctzr=amprrun
proc append data=d base=d1run
endmend
bangkit_data
13
Appendix 6 Syntax program EB with NBR
macro sae_nbdo x=1 to 900
data workaset workeif ^(u=ampx) then deleterun
this genmod procedure estimates the response without zero-inflation proc genmod data=amodel ypoi = x1 dist=nb link=logods output ParameterEstimates=workbetha_nb (keep=Parameter Estimate)run
proc transpose data=workbetha_nb out=workbetha_nb_trun
data _null_set workbetha_nb_tcall symput (Intercept col1)call symput (x1 col2)call symput (Dispersion col3)run
EB with negbin-regdata workduga_nbset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + ampDispersion)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(ampDispersion+ypoi)((mu_hat_b+ampDispersion)2)bias_b=abs(teta_hat_bayes-parmlambda)run
proc append data=workduga_nb base=workduga_nb1run
jacknifedo h=1 to 30
data workdset workduga_nb1if ^(u=ampx) then deleterundata workjacknbamphset workdif u=ampxif kk=amph then deleterun
proc genmod data=workjacknbamph output p out=sasyi_estmodel ypoi = x1 dist = nb link=logods output parameterestimates=workbetha_est_nbamph (keep=parameter Estimate)
14
Appendix 6 Syntax program EB with NBR (continued)
runproc transpose data=workbetha_est_nbamph out=workbetha_est_nbtamphrundata _null_set workbetha_est_nbtamphcall symput (Intercept_ col1)call symput (x1_ col2)call symput (Dispersion_ col3)run
data workduganbamphset workdmu_hat_b_amph=exp(ampIntercept_ + ampx1_x1)w_b_amph=mu_hat_b_amph (mu_hat_b_amph + ampDispersion_)teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2g1_amph=(ampDispersion_+ypoi)((mu_hat_b_amph+ampDispersion_)2)beda_g_amph=g1_amph-g1run
data workmse_nb_jmerge workduganb1 workduganb2 workduganb3 workduganb4 workduganb5 workduganb6 workduganb7 workduganb8 workduganb9 workduganb10 workduganb11 workduganb12workduganb13 workduganb14 workduganb15 workduganb16 workduganb17workduganb18 workduganb19 workduganb20 workduganb21 workduganb22workduganb23 workduganb24 workduganb25 workduganb26 workduganb27workduganb28 workduganb29 workduganb30by kkrun
data workmse_nb_jset workmse_nb_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampjendm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesul = ampxrun
proc append data=workmse_nb_j base=workmse_nb_j1run
data workhasilnbmerge workd workmse_nb_j keep kk x1 tetha mu parmlambda ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_b
15
Appendix 6 Syntax program EB with NBR (continued)
run
ods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilnb BASE=workhasilnb1 appendver=v6run
ENDmend
sae_nb
16
Appendix 7 Syntax program EB with ZINB
macro sae_zinb
do x=1 to 900
data workaset work eif ^(u=ampx) then deleterun
proc countreg data=amodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workpe(keep=Parameter Estimate)run
proc transpose data=workpe out=workpe_trun
data _null_set workpe_tcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + amp_Alpha)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(amp_Alpha+ypoi)((mu_hat_b+amp_Alpha)2)bias_b=abs(teta_hat_bayes-parmlamdha)
run
proc append data=workduga base=workduga1run
do h=1 to 30
data workdset workduga1if ^(u=ampx) then deleterundata workjackzinbamphset workdif u=ampxif kk=amph then deleterun
proc countreg data=jackzinbamphmodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workbetha_est_ZINBamph
17
Appendix 7 Syntax program EB with ZINB (continued)
(keep=Parameter Estimate)run
proc transpose data=workbetha_est_ZINBamph out=workbetha_est_ZINBtamphrun
data _null_set workbetha_est_ZINBtamphcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaZINBamphset workdmu_hat_b_amph=exp(ampIntercept + ampx1x1)mu_hat_b_amph= ampb_o- + ampb_1- x1w_b_amph=mu_hat_b_amph (mu_hat_b_amph + (amp_Alpha))teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2
g1_amph =((mu_hat_b_amph2ampalpha_)2)(ampalpha_+y_i)((mu_hat_b_amph2ampalpha_)+mu_hat_b_amph)2
g1_amph=(amp_Alpha+ypoi)((mu_hat_b_amph+amp_Alpha)2)
g1_amph =(A2)(ampk- + y_i)( a +mu_hat_b)2
beda_g_amph=g1_amph-g1run
data workmse_ZINB_jmerge workdugaZINB1 workdugaZINB2 workdugaZINB3 workdugaZINB4 workdugaZINB5 workdugaZINB6 workdugaZINB7 workdugaZINB8 workdugaZINB9 workdugaZINB10 workdugaZINB11 workdugaZINB12workdugaZINB13 workdugaZINB14 workdugaZINB15 workdugaZINB16 workdugaZINB17workdugaZINB18 workdugaZINB19 workdugaZINB20 workdugaZINB21 workdugaZINB22workdugaZINB23 workdugaZINB24 workdugaZINB25 workdugaZINB26 workdugaZINB27workdugaZINB28 workdugaZINB29 workdugaZINB30by kkrun
data workmse_ZINB_jset workmse_ZINB_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampj
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb
iv
Title ZERO-INFLATED NEGATIVE BINOMIAL MODELS IN SMALL AREA ESTIMATION
Name Irene Muflikh NadhirohID No G14104031
Approved by
Advisor I Advisor II
Prof Dr Ir Khairil Anwar Notodiputro Ir Indahwati MSi NIP 130891386 NIP 131909223
Acknowledged byDean of Faculty of Mathematics and Natural Sciences
Bogor Agricultural University
Dr Drh Hasim DEANIP 131578806
Passed examination date
v
BIOGRAPHY
Irene Muflikh Nadhiroh was born in Padang on October 3th 1986 as a first daughter of Ir Irianto Oetomo and Fine Analisa Maharani She has two siblings
In 1998 she graduated from SD Dukuh 09 East Jakarta and then she continued his study at SLTP Negeri 1 Bogor and graduated at 2001 She finished her study at SMU Labschool Rawamangun Jakarta in 2004 and then enrolled in Bogor Agricultural University through USMI In 2004 she joined Department of Statistics Faculty of Mathematics and Natural Sciences
During her time of study she was signed up as lecturer assistant for Basic Statistics class and Experimental Design class in 2006 and 2007 respectively She was also a member of Gamma Sigma Beta (Statistics Students AssociationGSB) and had occupied the head of science division of GSB in 2006-2007 On February-March 2008 complete her fields practice at PT Field Dimension Indonesia
vi
ACKNOWLEDGEMENTS
First of all the author modestly admitted that completion of this paper would not be possible without invaluable help from many generous and extraordinary people The author was deeply in debt for their helps ideas critics and improvement advices during writing process However they should not be hold responsible for all mistakes and deficiencies in this paper which were purely authors So hereby I would like to express my graceful to
1 All praise and gratitude for Allah SWT Alhamdulillah hirabbil alamin With his bless I able to finish this paper Thanks Allah for giving me a wonderful life with extraordinary people around me
2 Prof Dr Khairil Anwar Notodiputro and Ir Indahwati MSi for the early motivation discussion advices support and their great enthusiasm
3 Mr Bambang Sumantri MSi as examiner thanks for the spirit advices and critics4 My beloved family for the unlimited love ever after 5 Mr Alfian Futuhul Hadi MSi for enlightening discussion when I was in trouble6 Mr Bagus Sartono MSi thank you very much to run my data at your lab with your
wonderful computer Sorry if it might disturb you7 Mr Anang MSi and Mr Rahman MSi for sharing their knowledge and technical support8 Mr Dr Ir Hari Wijayanto MS all lecturer and staff at Statistics Department Thanks for
knowledge of statistics and knowledge of life that you shared It means a lot for me9 Rahmatullah Sigit Dodiet Sasongko SSi for the spirit love care time and patience Keep
it real Still love me forever and ever10 Mr Dionisius Laksmana Bisara Putra SSi for edited my paper critics and provided useful
discussion for author 11 Maulana Chistanto SSi and Yhanuar Ismail SSi thank for being my best brother12 Nikhen Sevrien and (alm Dini) thanks for lighting my day13 Rere Yusri Agus Ika Cinong Toki Cheri Fisca Wiwik Neng Mala Lilis Dika
Rangga Lele Dodi Kus Inal Bebek Koler and all of Statisticsrsquo41 14 Everyone that helps me in this study which can not be named personally
This thesis is not perfect so I am expecting the critics advices and recommendation to people who read my thesis Thank You God bless you all
Bogor January 2009
Irene Muflikh Nadhiroh
1
TABLE OF CONTENTS
PageINTRODUCTION 1
Background 1Objectives1
LITERATURE REVIEW 1Direct Estimation1Small Area Estimation 1Small Area Models1Empirical Bayes Methods 2Poisson-Gamma Models 2Negative Binomial Regression 2Over-disperse at Count Data 3Zero-Inflated Models3Zero-Inflated Negative Binomial 3Jackknife Method of Estimating MSE( EB
i )3
METHODOLOGY 4Data 4Methods4
RESULT AND DISCUSSION 4Estimation of Prior Parameter is Based of EB Method with Negative Binomial Regression 4Estimation of Prior Parameter is Based of EB Method with Zero-Inflated Negative Binomial Regression 5Comparison of EB estimator with Negative Binomial Regression and EB estimator with ZINB 5
CONCLUSION 5RECOMMENDATION 6REFERENCES 6
LIST OF TABLES
PageTable 1 MSE and RRMSE of EB Estimator with NBR 4Table 2 MSE (II) and RRMSE (II) of EB Estimator with NBR 4Table 3 MSE and RRMSE of EB Estimator with ZINB 5Table 4 MSE (II) and RRMSE (II) of EB Estimator with ZINB 5
LIST OF APPENDICES
PageAppendix 1 Result of EB estimation with NBR 7Appendix 2 Result of EB estimation with ZINB 8Appendix 3 Result of EB estimation (II) with NBR 9Appendix 4 Result of EB estimation (II) with ZINB 10Appendix 5 Syntax program for generate data 11Appendix 6 Syntax program EB with NBR 13Appendix 7 Syntax program EB with ZINB 16
1
INTRODUCTION
BackgroundDirect estimation is usually applied in big
scale survey but it is sometime difficult to utilize such estimator in a smaller region especially the sample size is too small In this case indirect estimation which adds covariates to estimate the parameter is usually used This type of estimation is broadly known as Small Area Estimation
Kismiantini (2007) conducted a research in Small Area Estimation based on Poisson-Gamma models Maximum Likelihood Estimation was used with Negative Binomial Regression techniques to estimate the respective prior parameter Moreover Negative Binomial Regression was used to resolve over-dispersion problem in the data
In reality count data is not onlycharacterized by over-dispersion but sometimes by excess-zero Excess-zero is a condition when the data contains too many zero or exceeds the distributionrsquos expectation 100 observations from Poisson model with response mean of 4 we could expect that there will be 2 zeros If the data have 30 zeros it should be obvious that the distributional assumptions have been violated Therefore the estimated parameter and standard error will be biased (Hardin amp Hilbe 2007) In this paper Zero-Inflated models were adapted to solve this type of problem
ObjectivesThe research objectives are
1 To investigate the performance of Negative Binomial Regression on Small Area Estimation in case of excess-zero
2 To apply Zero-Inflated Count Models on Small Area Estimation in case of excess-zero
3 To evaluate the performance of Zero-Inflated Count Models in estimating prior parameter for Small Area Estimation
LITERATURE REVIEW
Direct EstimationDirect estimates are generally ldquodesign
basedrdquo in the sense that they make use of ldquosurvey weightrdquo and associated inferences are based on the probability distribution by the sample design with the population values held fixed (Rao 2003) In particular direct estimates of a domain parameter are based only on the domain-specific sample data
Data from sample survey have been used to be a reliable estimate of parameter Ramsini et al (2001) mentioned that direct estimates of small area are unbiased although it would have big variance cause itrsquos small sample size
Small Area EstimationThe term of small area can be everything
depending on our object of interest It can be a city age group sex group region and rural district In general small area is used to denote any domain which the direct estimation with adequate precision can not be produced (Rao 2003) It happens because the sample size in small area is too small As a result direct estimation based on sampling design is not capable to produce direct estimation with adequate precision Furthermore small area estimation is developed as a statistic technique for estimating the parameter of small area This technique is used in effort to make estimation with adequate level of precision It works as indirect estimation that lend the strength of variable interest values from related areas through the use of supplementary information related to variable interest such as recent census count and current administrative records (Rao 2003) Indirect estimation is a process of estimating a domainrsquos parameter by connecting the information in that domain with another domain using an appropriate model So the estimator works by including other domainrsquos data (Kurnia amp Notodiputro 2006)
Small Area ModelsThere are two link models in indirect
estimation First traditional method based on implicit models that provide a link to relate small area through supplementary data Second explicit small area models that make specific allowance between area variations (Rao 2003) This research used the second model and it could be classified into two broad types of basic model1 Basic area level (type A) model
Basic area level model or aggregate model includes all models that relate small area with area-specific auxiliary variables These models are essential if unit (element) level data are not available Assuming parameter estimators
i is
related to area specific auxiliary data or covariate variables T
pii xxx )( 11 by
a linear model
2
iiT
ii vbx with i=1hellipm
iv ~N(0 2v ) are area-specific random
effect and Tp )( 1 is 1p vector of
regression coefficients Therefore ib are
known as positive constants For making inferences about
i direct estimators iy
are assumed available Accordingly assuming
iii ey where i=1hellipm with
sampling error ie ~N(0 ei2 ) and ei
2are known At the end both models are combined and as a result is new model
iiiT
ii evbxy where i=1hellipm
(Rao 2003)2 Basic unit level (type B) model
Unit level model includes all models that relate unit values of the study variable to unit-specific auxiliary variables Assuming unit-specific auxiliary variables T
ijpijij xxx )( 1 and
correspondingly a nested regression model
ijiT
ijij evxy where
i=1hellipm and j=1hellip in with
iv ~N(0 2v ) and also ie ~N(0 ei
2 )
Empirical Bayes MethodsThe Bayesian approach is based on Bayes
Law which was found by Thomas Bayes This law was introduced by Richard Proce in 1763 two years after Thomas Bayes passed away In 1774 and 1781 Laplace gave the details and relevancies for modern Bayesian statistics (Gill 2002 in Kismiantini 2007)
Novick in Good (1980) mentioned that Bayes method is difficult to adopt and sometimes is very sensitive due to the requirement of prior probability informationwhich is usually difficult to obtain Robbin (1955) introduced Empirical Bayes methods by assuming a particular prior distribution estimating based on the sample Rao (2003) said that EB (Empirical Bayes) and HB (Hierarchical Bayes) are compatible for binary and count data in Small Area Estimation Therefore EB method was used in this research
Rao (2003) summarized EB methods in Small Area Estimation as follows 1 Obtain the posterior probability density
function of the small area parameter2 Estimate the parameters from the
marginal density function
3 Use the estimated posterior density forinferences regarding the parameters ofinterest
Poisson-Gamma ModelsPoisson model is a standard model in
dealing with count data Generally count data can be suffered by over-dispersion problem Therefore a Poisson formula had been developed to accommodate extra variance from sample data Two-stage models have been introduced for count data known as mixed model Poisson-Gamma Wakefield (2006) introduced Poisson-Gamma model which was easier to use with SMR (Standard Mortality Ratio) as a direct estimator This study used Wakefield model with alteration in direct estimator
Let iy be a number of specific individual
at small area-i which has specific characteristic of interest and written as follow
j
iji yy
ijy are the-jth object at the-ith small area where
j=1hellipn and i=1hellipm
First stage )(~ ii
ind
i Poissony is assumed
where )( ii x describes a regression
model in area level ix is a vector of
covariates and Tpii)( is a vector of
regression coefficientsSecond stage distribution
)1(~ gammaiid
iis assumed as a prior
distribution with mean 1 and variance 1
Then the marginal distribution |iy is
negative binomialMoreover Wakefield (2006) used Bayes
Theorem and acquired posterior distributionas
)1(~|i
iii ygammay
and EB estimator as
iiiiB
iEB
i )ˆ1(ˆˆ)ˆˆ(ˆˆ
with )ˆˆ(ˆˆ iii ii y are direct
estimation from i and iy are the number of
observation
Negative Binomial Regression The negative binomial regression model
seems have been first discussed by Anscombe (1972) Others have pointed out its success indealing with over-dispersed count data
3
Lawless (1987) elaborated the mixture model parameterization of the negative binomial providing formulas for its log likelihood mean variance and moments Later Breslow (1990) cited Lawlessrsquo work and since its inception to the late 1980rsquos the negative binomial regression model have been construed as a mixture model that is useful for accommodating otherwise over-dispersed Poisson data (Hardin amp Hilbe 2007) The negative binomial distribution function is written as
yk
kk
k
y
kyxyg
)1()(
)()|(
where y=012hellip k and are negative
binomial parameter with )(yE and
ky 2)var( k mention as disperse
parameter which is shown that the data consist of over-dispersed
Over-disperse at Count DataCount data for Poisson regression
including by over-disperse if variance bigger than mean or if the expected value of variance is smaller than expected This phenomenon is written as
)()( ii yEyVar (McCullagh amp Nelder 1989)
Zero-Inflated ModelsZero-Inflated models consider two distinct
sources of zero outcomes One source is generated from individuals who do not enter into the counting process the other from those who do enter the count process but result in a zero outcome (Hardin amp Hilbe 2007)
Lambert (1992) first described this type of mixture model in the context of process control in manufacturing It has since been used in many applications and is now founddiscussed in nearly every book or article dealing with count response models
For the zero-inflated model the probability of observing a zero outcome equals the probability that an individual is in the always-zero group plus the probability that individual is not that group times the probability that the counting process produces a zero If )0(B as
the probability that the binary process result in a zero outcomes and )0Pr( as the probability
that the counting of a zero outcomes the probability of a zero outcome for the system is then given by (Hardin amp Hilbe 2007)
)0Pr()1()0()0Pr( ZBy The probability of a nonzero count is
)Pr()]0(1[)0Pr( kBkky This model would produce two groups of
parameter one is zero-inflation parameter which shown that the covariate significantly contribute to having a zero outcomes And the other parameter is negative binomial parameter which modeling the response with the covariate
Zero-Inflated Negative BinomialThere are many kinds of zero-inflated
model each model has plus and minus and is used in different type of data Zero-Inflated negative binomial is one kind of them This model is used in over-disperse and excess-zero data As a result among parameter estimators there would be k parameters which indicate that over-disperse occur in data just as disperse parameter in negative binomial regression
The probability distribution of this model is as follow
)|( iii xyYP )|0()(1)( iii xgxx )|()(1 iii xygx
Where is a function of iz ix are vector
of zero-inflated covariate and is a vector of
zero-inflated coefficient which will be estimated Meanwhile )|( ii xyg is probability
distribution of negative binomial written asiy
i
i
iii
iii y
yxyg
)1()(
)()|(
Mean and variance of ZINB are
))(1)(1()|(
)1()|(
iiiiii
iiii
xyV
xyE
Jackknife Method of Estimating MSE( EBi )
Jackknife methods is one of general methods used in survey because itrsquos unpretentious concept (Jiang Lahiri and Wan 2002) This methods have been known by Tukey (1958) and developed to be a method that capable to be bias corrected of estimator by remove observation-i for i=1hellipm and performs parameter estimation
Rao (2003) the Jackknife step to estimate MSE( EB
i ) are
1 Assume that )ˆˆ(ˆ iiEBi yk
)ˆˆ(ˆ111 ii
EBi yk then calculate
m
l
EBi
EBii m
mM 2
12 )ˆˆ(1ˆ
2 Calculate the delete-i estimator 1
ˆ
and
1 then calculate
4
)]ˆˆ()ˆˆ([1
)ˆˆ(ˆ111111 ii
m
miiiiii ygyg
m
mygM
And )ˆ( 21 vig is the variance estimator of
posterior distribution which is used to measure the variability associated with i
The use of )ˆ( 21 vig is leads to severe of
underestimation of )ˆ( EBiJMSE related
with estimation in prior parameter Therefore the estimator
iM1ˆ correct the
bias of )ˆ( 21 vig
3 Calculate the jackknife estimator of MSE( EB
i ) as
iiEBiJ MMMSE 21
ˆˆ)ˆ(
METHODOLOGY
DataThis research assumed that the available
auxiliary data is on area level so this research used basic area level model The data were simulated with 30 small areas and one covariate Every batch generated different conditions of excess-zeros data start from 01 until 09 probability of zero in small area This research assumed structure of relation between respond and covariate was linear
MethodsThe following steps in generating data
using SAS 91 were used1 Fix the value of
iX for the- i th area
2 Define the expected probability of zero in each small area ))0(( iYP then
calculate ))0(log( ii YPLambda
3 Generate )11(~ Gammai4 Calculate )log(
iiLambda 5 Fit linear regression between and
iX to
obtain0 and
16 Calculate )`exp(X= ii 7 Calculate
iiparmlambda 8 Generate )(~ parmlambdaPoissonyi
Moreover in analyzing data the following steps were applied 1 Generate the negative binomial regression
with genmod procedure in SAS 91 and Zero-Inflated Negative binomial Regression with countreg procedure in SAS 92
2 Estimate the prior parameter which are and
3 Estimate using EB method4 Calculate MSE for indirect estimation5 Calculate RRMSE (Root Relative Mean
Square Error)
i
ii
MSERRMSE
ˆ)ˆ(
)ˆ(
RESULT AND DISCUSSION
Estimation of Prior Parameter is Based on EB Method with Negative Binomial
RegressionIn case of non-excess-zero data the
estimator produced small and consistent MSE Meanwhile if the number of excess-zero isapproximately 30 or more with expected probability of zero 06 the performance of estimates tends to be unreliable As a result EB estimation produced negative values
RRMSE of the estimator increasessimultaneously along with the increase of number of zero in the data Furthermore if thedata contain excess zero at least 30 theestimator is unreliable
Table 1 MSE and RRMSE of EB Estimator with NBR
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 033 016 018 01302 035 020 026 02003 040 023 036 03004 042 027 050 04205 045 031 072 05906 -12875 033 -038 08107 253671 040 -1216 13508 -584495 030 30946 21109 39135606 016 116E+10 664
Table 2 MSE (II) and RRMSE (II) of EB Estimator with NBR
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 033 016 018 01302 035 020 026 02003 040 023 036 03004 042 027 050 04205 046 031 071 05806 26197 033 -035 07507 950007 040 -1002 09908 1444250 030 22054 11009 41595285 016 677E+09 056
5
Table 1 show that the iterative process produced unexpected negative values of MSEThe simplest way to solve this problem is tochange the negative value to zero MSE (II) and RRMSE (II) in table 2 are the result of MSE and RRMSE after the negative value of MSE has been changed to zero
When data have expected probability of zero by 06 to 09 mean of MSE (II) increases drastically Similarly mean of RRMSE (II) increases sharply when data have 08 to 09 expected probability of zero However when data have 06 to 07 expected probability of zero the mean of RRMSE (II) is negative due to the negative value of EB estimates
Estimation of Prior Parameter is Based on EB Method with Zero-Inflated
Negative Binomial RegressionThe EB estimates are similar to the
estimates produced by NBR method although they are slightly outperformed NBR method when the data only contain small number of zeros In particular as shown by table 3 if data have expected probability of zero by 01 to 05 ZINB produces bigger MSE for EB estimator than which NBR produces
Whereas if data have expected probability of zero by 06 to 07 ZINB gives better estimates The estimates were also unbiased as it covers parameter values adequately However ZINB begins to produce inconsistent estimates if data have expected probability of zero by 08 or more due to enormous MSE
Besides when data have expected is because ZINB generates small estimates which is close to the parameter values
Mean of MSE (II) with ZINB is biggerthan the mean of MSE with ZINB That is because when negative value of MSE changed to zero it doesnrsquot have reduction factor in the mean calculation
Comparison of EB estimator withNegative Binomial Regression and EB
estimator with ZINBEB estimates given by both NBR and
ZINB methods are similar for data with small numbers of zero However ZINB method produces bigger MSE than NBR do as long as expected probability of zero in data does not exceed 06 thresholds
But ZINB method performs better if data have expected probability of zero by 06 to 07 In this case EB estimates given by NBR method are unstable and inconsistent due to estimatesrsquo negative value and huge MSE that
can be thousand times larger than theiracceptable value On the other hand EB estimator with ZINB works well it givesunbiased estimates and its MSE values are more stable than EB estimates with NBR
Both methods would have performed poorly if data had expected probability of zero by 08 or more EB estimators with both methods were inconsistent as a result of very huge MSE values they produced
Table 3 MSE and RRMSE of EB Estimator with ZINB
Probability of zero
Mean of MSE
Median ofMSE
Mean ofRRMSE
Median of RRMSE
01 045 017 024 01402 043 020 033 02103 071 028 052 03204 054 028 0632 04205 086 033 7322807 06606 061 038 29817 10307 058 025 218119 19408 -128 -14E-07 162697 37509 2954790 -1E-06 35E+278 609508
Table 4 MSE (II) and RRMSE (II) of EB Estimator with ZINB
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 045 017 024 01402 0436 020 0324 02103 072 028 051 031104 055 028 061 04105 095 033 6561235 05806 075 038 23406 07007 150 025 134655 06908 175 0 733506 009 2954908 0 12E+278 0
CONCLUSION
Excess-zero in data highly influenced the result of EB estimation Conventional method such as negative binomial regression in prior estimation has produced unbiased and unreliable EB estimator for data with expected probability of zero by 06 This is shown bybig number of MSE and negative value of estimator
Meanwhile EB estimation by ZINB method produced more reliable estimator even when the data have expected probability of zero by 06 to 07
The ZINB has also provided a reliable estimator for data with less than 5333 of zeros This means that performance of ZINB
6
declines when the data have expected probability of zero by 08 or more As shown by the big MSE and inconsistent estimator
RECOMMENDATION
This research is based on many assumptions and suffered by several limitations If the assumptions and boundaries can be relaxed can be expected better result There are some recommendations for the next research1 The generating process in this research
does not reflect the real sampling processIf the generating process similar to the real sampling process it might give better result because it will be closer with the real application
2 It will be more interesting to runexperiment which takes account of larger number of areas since the number of areas will influence data modeling
3 The Restricted Maximum Likelihood maybe applied when estimating prior parameter with ZINB and NBR in other to solve the negative value of MSE
4 Theoretical research of ZINB and Empirical Bayes estimator is important to understand the behavior of parameter estimates of ZINB in Empirical Bayes setting
REFERENCES
Erdman D L Jackson A Sinko 2008 Zero-Inflated Poisson and Zero-Inflated Negative Binomial Models Using the COUNTREG Procedure SAS Global Forum 2008322-2008httpwww2sascomproceedingsforum2008322-2008pdf [25 Agustus 2008]
Famoye F KP Singh 2006 Zero-Inflated Generalized Poisson Regression Model with an Application to Domestic Violence Data Journal of Data Science 4117-130
Hardin JW JM Hilbe 2007 Generalized Linear Models and Extensions Texas A Stata Press Publication
Kurnia A KA Notodiputro 2006 Penerapan Metode Jackknife dalam pendugaan Area Kecil Forum Statistika dan Komputasi April 2006 p12-15
Kismiantini 2007 Pendugaan Statistik Area Kecil Berbasis Model Poisson-Gamma [Tesis] Bogor Institut Pertanian Bogor Fakultas Matematika dan Pengetahuan Alam
McCullagh P J A Nelder 1983 Generalized Linear Models London Chapmann and Hall
Ramsini B et all 2001 Uninsured Estimates by County A Review of Options and IssueshttpwwwodhohiogovDataOFHSurvofhsrfq7pdf [24 April 2008]
Rao JNK 2003 Small Area Estimation New York John Wiley amp Sons
Wakefield J 2006 Disease mapping and spatial regression with count data httpwwwbepresscomuwbiostatpaper286pdf [24 April 2008]
7
Appendix 1 Result of EB estimation with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE -100569 0194196 0271669 041875 045917 3239598RRMSE 0123605 0300339 0422426 0503566 0642418 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE -237643 0235733 0306641 0452955 05091 3652167RRMSE 0038956 0412708 0588924 0717336 0844735 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE -749011 0254097 0330078 -12875 0539873 2354887RRMSE -663045 051763 0813734 -038057 1287528 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE -7513075 0235378 0402092 2536714 0876569 6051162RRMSE -10741 0704796 1355566 -121606 3040291 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601
bias 0000395 0116669 0254473 1091172 0497898 6297454MSE -6E+09 -016583 0301527 -584495 5718409 185E+09RRMSE -212936 0927338 2115163 3094627 1359703 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655MSE -38E+09 -130817 0159682 39135606 3074073 12E+11
RRMSE -909131 1647188 6639631 116E+10 1585472 706E+11
8
Appendix 2 Result of EB estimation with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE -053954 010933 0168797 0449506 0369775 360843RRMSE 0022947 0096443 0136424 0238099 0241955 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE -07309 0126202 0201463 0425844 0414597 1734815RRMSE 0021807 0144983 0210692 0326097 0401786 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE -229891 0156942 0277017 0707983 0590466 7469014RRMSE 0023998 0210095 0317195 0519524 0618802 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE -125713 0181557 0284338 0540615 0498521 423089RRMSE 0054916 028362 0420396 0630776 0778033 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442
MSE -181856 0194818 0334706 0859252 0711939 7997074RRMSE 0026206 0387294 0662251 7322807 1312302 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE -34589 0078006 0376514 060793 0804116 3426488RRMSE 000461 0502807 1033578 2981671 2012552 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE -142213 -001433 0255331 0584152 1132152 264456RRMSE 0064209 0847956 1942286 2181192 4589042 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE -10651 -56E-05 -14E-07 -127819 1452962 1132741RRMSE 0063244 1475413 3754705 162697 9221163 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE -175652 -33E-05 -1E-06 2954790 152E-06 613E+08
RRMSE 0040681 4059441 6095076 35E+278 5569021 16E+281
9
Appendix 3 Result of EB estimation (II) with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE 0 0194196 0271669 0419181 045917 3239598RRMSE 0 0300116 0422209 0502895 0641904 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE 0 0235733 0306641 0456258 05091 3652167RRMSE 0 0410357 0585765 0712314 0841838 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE 0 0254097 0330078 2619677 0539873 2354887RRMSE -663045 0448118 0750369 -034911 1209918 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE 0 0235378 0402092 9500073 0876569 6051162RRMSE -10741 0288999 0995659 -100163 2527784 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601bias 0000395 0116669 0254473 1091172 0497898 6297454MSE 0 0 0301527 1444250 5718409 185E+09RRMSE -212936 0 1104113 2205437 5656681 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655
MSE 0 0 0159682 41595285 3074073 12E+11
RRMSE -909131 0 0557622 677E+09 9311925 706E+11
10
Appendix 4 Result of EB estimation (II) with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE 0 010933 0168797 0450626 0369775 360843RRMSE 0 0095932 0135647 023675 0239669 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE 0 0126202 0201463 0428006 0414597 1734815RRMSE 0 0142648 020709 0320663 0395479 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE 0 0156942 0277017 0716543 0590466 7469014RRMSE 0 0203913 0311937 0506882 0615401 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE 0 0181557 0284338 0549835 0498521 423089RRMSE 0 0270309 0405926 0606317 0766631 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442MSE 0 0194818 0334706 094973 0711939 7997074RRMSE 0 0316402 0576343 6561235 1240175 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE 0 0078006 0376514 0749436 0804116 3426488RRMSE 0 0258286 0698814 2340612 1714808 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE 0 0 0255331 1501268 1132152 264456RRMSE 0 0 0688797 1346552 2500825 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE 0 0 0 1755486 1452962 1132741RRMSE 0 0 0 7335062 3311711 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE 0 0 0 2954908 152E-06 613E+08
RRMSE 0 0 0 12E+278 416189 16E+281
11
Appendix 5 Syntax program for generate data
data b generate x1(covariate) and ei input x1cards0222831971100013131702314625252218171412202210run
macro bangkit_datado r=1 to 100
data egenerate poisson-gamma with excess zerodo kk=1 to 30set btetha = rangam(11)lambda = -log(01) peluang munculnya nilai nol yang diinginkan (01-09)starlambda = log(lambdatetha)output endrun
proc regmodel starlambda = x1 ods output ParameterEstimates=workbetha_lr (keep=Parameter Estimate)run
proc transpose data=workbetha_lr out=workbetha_lr_t
12
Appendix 5 Syntax program for generate data (continued)
rundata _null_set workbetha_lr_tcall symput (Intercept col1)call symput (x1 col2)run
data ddo kk=1 to 30set emu = exp(ampIntercept + ampx1x1)parmlambda = mutethaypoi = rand(poissonparmlambda)output endrun
ods trace onto take percent zero on dataproc freq data=dtables ypoi ods output OneWayFreqs=workzerorundata zeroset zerokeep percentrunproc transpose data=zero out=zero1 rundata _null_set workzero1call symput (pctz col1)rundata dset dpzero=amppctzr=amprrun
proc append data=d base=d1run
endmend
bangkit_data
13
Appendix 6 Syntax program EB with NBR
macro sae_nbdo x=1 to 900
data workaset workeif ^(u=ampx) then deleterun
this genmod procedure estimates the response without zero-inflation proc genmod data=amodel ypoi = x1 dist=nb link=logods output ParameterEstimates=workbetha_nb (keep=Parameter Estimate)run
proc transpose data=workbetha_nb out=workbetha_nb_trun
data _null_set workbetha_nb_tcall symput (Intercept col1)call symput (x1 col2)call symput (Dispersion col3)run
EB with negbin-regdata workduga_nbset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + ampDispersion)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(ampDispersion+ypoi)((mu_hat_b+ampDispersion)2)bias_b=abs(teta_hat_bayes-parmlambda)run
proc append data=workduga_nb base=workduga_nb1run
jacknifedo h=1 to 30
data workdset workduga_nb1if ^(u=ampx) then deleterundata workjacknbamphset workdif u=ampxif kk=amph then deleterun
proc genmod data=workjacknbamph output p out=sasyi_estmodel ypoi = x1 dist = nb link=logods output parameterestimates=workbetha_est_nbamph (keep=parameter Estimate)
14
Appendix 6 Syntax program EB with NBR (continued)
runproc transpose data=workbetha_est_nbamph out=workbetha_est_nbtamphrundata _null_set workbetha_est_nbtamphcall symput (Intercept_ col1)call symput (x1_ col2)call symput (Dispersion_ col3)run
data workduganbamphset workdmu_hat_b_amph=exp(ampIntercept_ + ampx1_x1)w_b_amph=mu_hat_b_amph (mu_hat_b_amph + ampDispersion_)teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2g1_amph=(ampDispersion_+ypoi)((mu_hat_b_amph+ampDispersion_)2)beda_g_amph=g1_amph-g1run
data workmse_nb_jmerge workduganb1 workduganb2 workduganb3 workduganb4 workduganb5 workduganb6 workduganb7 workduganb8 workduganb9 workduganb10 workduganb11 workduganb12workduganb13 workduganb14 workduganb15 workduganb16 workduganb17workduganb18 workduganb19 workduganb20 workduganb21 workduganb22workduganb23 workduganb24 workduganb25 workduganb26 workduganb27workduganb28 workduganb29 workduganb30by kkrun
data workmse_nb_jset workmse_nb_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampjendm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesul = ampxrun
proc append data=workmse_nb_j base=workmse_nb_j1run
data workhasilnbmerge workd workmse_nb_j keep kk x1 tetha mu parmlambda ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_b
15
Appendix 6 Syntax program EB with NBR (continued)
run
ods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilnb BASE=workhasilnb1 appendver=v6run
ENDmend
sae_nb
16
Appendix 7 Syntax program EB with ZINB
macro sae_zinb
do x=1 to 900
data workaset work eif ^(u=ampx) then deleterun
proc countreg data=amodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workpe(keep=Parameter Estimate)run
proc transpose data=workpe out=workpe_trun
data _null_set workpe_tcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + amp_Alpha)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(amp_Alpha+ypoi)((mu_hat_b+amp_Alpha)2)bias_b=abs(teta_hat_bayes-parmlamdha)
run
proc append data=workduga base=workduga1run
do h=1 to 30
data workdset workduga1if ^(u=ampx) then deleterundata workjackzinbamphset workdif u=ampxif kk=amph then deleterun
proc countreg data=jackzinbamphmodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workbetha_est_ZINBamph
17
Appendix 7 Syntax program EB with ZINB (continued)
(keep=Parameter Estimate)run
proc transpose data=workbetha_est_ZINBamph out=workbetha_est_ZINBtamphrun
data _null_set workbetha_est_ZINBtamphcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaZINBamphset workdmu_hat_b_amph=exp(ampIntercept + ampx1x1)mu_hat_b_amph= ampb_o- + ampb_1- x1w_b_amph=mu_hat_b_amph (mu_hat_b_amph + (amp_Alpha))teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2
g1_amph =((mu_hat_b_amph2ampalpha_)2)(ampalpha_+y_i)((mu_hat_b_amph2ampalpha_)+mu_hat_b_amph)2
g1_amph=(amp_Alpha+ypoi)((mu_hat_b_amph+amp_Alpha)2)
g1_amph =(A2)(ampk- + y_i)( a +mu_hat_b)2
beda_g_amph=g1_amph-g1run
data workmse_ZINB_jmerge workdugaZINB1 workdugaZINB2 workdugaZINB3 workdugaZINB4 workdugaZINB5 workdugaZINB6 workdugaZINB7 workdugaZINB8 workdugaZINB9 workdugaZINB10 workdugaZINB11 workdugaZINB12workdugaZINB13 workdugaZINB14 workdugaZINB15 workdugaZINB16 workdugaZINB17workdugaZINB18 workdugaZINB19 workdugaZINB20 workdugaZINB21 workdugaZINB22workdugaZINB23 workdugaZINB24 workdugaZINB25 workdugaZINB26 workdugaZINB27workdugaZINB28 workdugaZINB29 workdugaZINB30by kkrun
data workmse_ZINB_jset workmse_ZINB_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampj
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb
v
BIOGRAPHY
Irene Muflikh Nadhiroh was born in Padang on October 3th 1986 as a first daughter of Ir Irianto Oetomo and Fine Analisa Maharani She has two siblings
In 1998 she graduated from SD Dukuh 09 East Jakarta and then she continued his study at SLTP Negeri 1 Bogor and graduated at 2001 She finished her study at SMU Labschool Rawamangun Jakarta in 2004 and then enrolled in Bogor Agricultural University through USMI In 2004 she joined Department of Statistics Faculty of Mathematics and Natural Sciences
During her time of study she was signed up as lecturer assistant for Basic Statistics class and Experimental Design class in 2006 and 2007 respectively She was also a member of Gamma Sigma Beta (Statistics Students AssociationGSB) and had occupied the head of science division of GSB in 2006-2007 On February-March 2008 complete her fields practice at PT Field Dimension Indonesia
vi
ACKNOWLEDGEMENTS
First of all the author modestly admitted that completion of this paper would not be possible without invaluable help from many generous and extraordinary people The author was deeply in debt for their helps ideas critics and improvement advices during writing process However they should not be hold responsible for all mistakes and deficiencies in this paper which were purely authors So hereby I would like to express my graceful to
1 All praise and gratitude for Allah SWT Alhamdulillah hirabbil alamin With his bless I able to finish this paper Thanks Allah for giving me a wonderful life with extraordinary people around me
2 Prof Dr Khairil Anwar Notodiputro and Ir Indahwati MSi for the early motivation discussion advices support and their great enthusiasm
3 Mr Bambang Sumantri MSi as examiner thanks for the spirit advices and critics4 My beloved family for the unlimited love ever after 5 Mr Alfian Futuhul Hadi MSi for enlightening discussion when I was in trouble6 Mr Bagus Sartono MSi thank you very much to run my data at your lab with your
wonderful computer Sorry if it might disturb you7 Mr Anang MSi and Mr Rahman MSi for sharing their knowledge and technical support8 Mr Dr Ir Hari Wijayanto MS all lecturer and staff at Statistics Department Thanks for
knowledge of statistics and knowledge of life that you shared It means a lot for me9 Rahmatullah Sigit Dodiet Sasongko SSi for the spirit love care time and patience Keep
it real Still love me forever and ever10 Mr Dionisius Laksmana Bisara Putra SSi for edited my paper critics and provided useful
discussion for author 11 Maulana Chistanto SSi and Yhanuar Ismail SSi thank for being my best brother12 Nikhen Sevrien and (alm Dini) thanks for lighting my day13 Rere Yusri Agus Ika Cinong Toki Cheri Fisca Wiwik Neng Mala Lilis Dika
Rangga Lele Dodi Kus Inal Bebek Koler and all of Statisticsrsquo41 14 Everyone that helps me in this study which can not be named personally
This thesis is not perfect so I am expecting the critics advices and recommendation to people who read my thesis Thank You God bless you all
Bogor January 2009
Irene Muflikh Nadhiroh
1
TABLE OF CONTENTS
PageINTRODUCTION 1
Background 1Objectives1
LITERATURE REVIEW 1Direct Estimation1Small Area Estimation 1Small Area Models1Empirical Bayes Methods 2Poisson-Gamma Models 2Negative Binomial Regression 2Over-disperse at Count Data 3Zero-Inflated Models3Zero-Inflated Negative Binomial 3Jackknife Method of Estimating MSE( EB
i )3
METHODOLOGY 4Data 4Methods4
RESULT AND DISCUSSION 4Estimation of Prior Parameter is Based of EB Method with Negative Binomial Regression 4Estimation of Prior Parameter is Based of EB Method with Zero-Inflated Negative Binomial Regression 5Comparison of EB estimator with Negative Binomial Regression and EB estimator with ZINB 5
CONCLUSION 5RECOMMENDATION 6REFERENCES 6
LIST OF TABLES
PageTable 1 MSE and RRMSE of EB Estimator with NBR 4Table 2 MSE (II) and RRMSE (II) of EB Estimator with NBR 4Table 3 MSE and RRMSE of EB Estimator with ZINB 5Table 4 MSE (II) and RRMSE (II) of EB Estimator with ZINB 5
LIST OF APPENDICES
PageAppendix 1 Result of EB estimation with NBR 7Appendix 2 Result of EB estimation with ZINB 8Appendix 3 Result of EB estimation (II) with NBR 9Appendix 4 Result of EB estimation (II) with ZINB 10Appendix 5 Syntax program for generate data 11Appendix 6 Syntax program EB with NBR 13Appendix 7 Syntax program EB with ZINB 16
1
INTRODUCTION
BackgroundDirect estimation is usually applied in big
scale survey but it is sometime difficult to utilize such estimator in a smaller region especially the sample size is too small In this case indirect estimation which adds covariates to estimate the parameter is usually used This type of estimation is broadly known as Small Area Estimation
Kismiantini (2007) conducted a research in Small Area Estimation based on Poisson-Gamma models Maximum Likelihood Estimation was used with Negative Binomial Regression techniques to estimate the respective prior parameter Moreover Negative Binomial Regression was used to resolve over-dispersion problem in the data
In reality count data is not onlycharacterized by over-dispersion but sometimes by excess-zero Excess-zero is a condition when the data contains too many zero or exceeds the distributionrsquos expectation 100 observations from Poisson model with response mean of 4 we could expect that there will be 2 zeros If the data have 30 zeros it should be obvious that the distributional assumptions have been violated Therefore the estimated parameter and standard error will be biased (Hardin amp Hilbe 2007) In this paper Zero-Inflated models were adapted to solve this type of problem
ObjectivesThe research objectives are
1 To investigate the performance of Negative Binomial Regression on Small Area Estimation in case of excess-zero
2 To apply Zero-Inflated Count Models on Small Area Estimation in case of excess-zero
3 To evaluate the performance of Zero-Inflated Count Models in estimating prior parameter for Small Area Estimation
LITERATURE REVIEW
Direct EstimationDirect estimates are generally ldquodesign
basedrdquo in the sense that they make use of ldquosurvey weightrdquo and associated inferences are based on the probability distribution by the sample design with the population values held fixed (Rao 2003) In particular direct estimates of a domain parameter are based only on the domain-specific sample data
Data from sample survey have been used to be a reliable estimate of parameter Ramsini et al (2001) mentioned that direct estimates of small area are unbiased although it would have big variance cause itrsquos small sample size
Small Area EstimationThe term of small area can be everything
depending on our object of interest It can be a city age group sex group region and rural district In general small area is used to denote any domain which the direct estimation with adequate precision can not be produced (Rao 2003) It happens because the sample size in small area is too small As a result direct estimation based on sampling design is not capable to produce direct estimation with adequate precision Furthermore small area estimation is developed as a statistic technique for estimating the parameter of small area This technique is used in effort to make estimation with adequate level of precision It works as indirect estimation that lend the strength of variable interest values from related areas through the use of supplementary information related to variable interest such as recent census count and current administrative records (Rao 2003) Indirect estimation is a process of estimating a domainrsquos parameter by connecting the information in that domain with another domain using an appropriate model So the estimator works by including other domainrsquos data (Kurnia amp Notodiputro 2006)
Small Area ModelsThere are two link models in indirect
estimation First traditional method based on implicit models that provide a link to relate small area through supplementary data Second explicit small area models that make specific allowance between area variations (Rao 2003) This research used the second model and it could be classified into two broad types of basic model1 Basic area level (type A) model
Basic area level model or aggregate model includes all models that relate small area with area-specific auxiliary variables These models are essential if unit (element) level data are not available Assuming parameter estimators
i is
related to area specific auxiliary data or covariate variables T
pii xxx )( 11 by
a linear model
2
iiT
ii vbx with i=1hellipm
iv ~N(0 2v ) are area-specific random
effect and Tp )( 1 is 1p vector of
regression coefficients Therefore ib are
known as positive constants For making inferences about
i direct estimators iy
are assumed available Accordingly assuming
iii ey where i=1hellipm with
sampling error ie ~N(0 ei2 ) and ei
2are known At the end both models are combined and as a result is new model
iiiT
ii evbxy where i=1hellipm
(Rao 2003)2 Basic unit level (type B) model
Unit level model includes all models that relate unit values of the study variable to unit-specific auxiliary variables Assuming unit-specific auxiliary variables T
ijpijij xxx )( 1 and
correspondingly a nested regression model
ijiT
ijij evxy where
i=1hellipm and j=1hellip in with
iv ~N(0 2v ) and also ie ~N(0 ei
2 )
Empirical Bayes MethodsThe Bayesian approach is based on Bayes
Law which was found by Thomas Bayes This law was introduced by Richard Proce in 1763 two years after Thomas Bayes passed away In 1774 and 1781 Laplace gave the details and relevancies for modern Bayesian statistics (Gill 2002 in Kismiantini 2007)
Novick in Good (1980) mentioned that Bayes method is difficult to adopt and sometimes is very sensitive due to the requirement of prior probability informationwhich is usually difficult to obtain Robbin (1955) introduced Empirical Bayes methods by assuming a particular prior distribution estimating based on the sample Rao (2003) said that EB (Empirical Bayes) and HB (Hierarchical Bayes) are compatible for binary and count data in Small Area Estimation Therefore EB method was used in this research
Rao (2003) summarized EB methods in Small Area Estimation as follows 1 Obtain the posterior probability density
function of the small area parameter2 Estimate the parameters from the
marginal density function
3 Use the estimated posterior density forinferences regarding the parameters ofinterest
Poisson-Gamma ModelsPoisson model is a standard model in
dealing with count data Generally count data can be suffered by over-dispersion problem Therefore a Poisson formula had been developed to accommodate extra variance from sample data Two-stage models have been introduced for count data known as mixed model Poisson-Gamma Wakefield (2006) introduced Poisson-Gamma model which was easier to use with SMR (Standard Mortality Ratio) as a direct estimator This study used Wakefield model with alteration in direct estimator
Let iy be a number of specific individual
at small area-i which has specific characteristic of interest and written as follow
j
iji yy
ijy are the-jth object at the-ith small area where
j=1hellipn and i=1hellipm
First stage )(~ ii
ind
i Poissony is assumed
where )( ii x describes a regression
model in area level ix is a vector of
covariates and Tpii)( is a vector of
regression coefficientsSecond stage distribution
)1(~ gammaiid
iis assumed as a prior
distribution with mean 1 and variance 1
Then the marginal distribution |iy is
negative binomialMoreover Wakefield (2006) used Bayes
Theorem and acquired posterior distributionas
)1(~|i
iii ygammay
and EB estimator as
iiiiB
iEB
i )ˆ1(ˆˆ)ˆˆ(ˆˆ
with )ˆˆ(ˆˆ iii ii y are direct
estimation from i and iy are the number of
observation
Negative Binomial Regression The negative binomial regression model
seems have been first discussed by Anscombe (1972) Others have pointed out its success indealing with over-dispersed count data
3
Lawless (1987) elaborated the mixture model parameterization of the negative binomial providing formulas for its log likelihood mean variance and moments Later Breslow (1990) cited Lawlessrsquo work and since its inception to the late 1980rsquos the negative binomial regression model have been construed as a mixture model that is useful for accommodating otherwise over-dispersed Poisson data (Hardin amp Hilbe 2007) The negative binomial distribution function is written as
yk
kk
k
y
kyxyg
)1()(
)()|(
where y=012hellip k and are negative
binomial parameter with )(yE and
ky 2)var( k mention as disperse
parameter which is shown that the data consist of over-dispersed
Over-disperse at Count DataCount data for Poisson regression
including by over-disperse if variance bigger than mean or if the expected value of variance is smaller than expected This phenomenon is written as
)()( ii yEyVar (McCullagh amp Nelder 1989)
Zero-Inflated ModelsZero-Inflated models consider two distinct
sources of zero outcomes One source is generated from individuals who do not enter into the counting process the other from those who do enter the count process but result in a zero outcome (Hardin amp Hilbe 2007)
Lambert (1992) first described this type of mixture model in the context of process control in manufacturing It has since been used in many applications and is now founddiscussed in nearly every book or article dealing with count response models
For the zero-inflated model the probability of observing a zero outcome equals the probability that an individual is in the always-zero group plus the probability that individual is not that group times the probability that the counting process produces a zero If )0(B as
the probability that the binary process result in a zero outcomes and )0Pr( as the probability
that the counting of a zero outcomes the probability of a zero outcome for the system is then given by (Hardin amp Hilbe 2007)
)0Pr()1()0()0Pr( ZBy The probability of a nonzero count is
)Pr()]0(1[)0Pr( kBkky This model would produce two groups of
parameter one is zero-inflation parameter which shown that the covariate significantly contribute to having a zero outcomes And the other parameter is negative binomial parameter which modeling the response with the covariate
Zero-Inflated Negative BinomialThere are many kinds of zero-inflated
model each model has plus and minus and is used in different type of data Zero-Inflated negative binomial is one kind of them This model is used in over-disperse and excess-zero data As a result among parameter estimators there would be k parameters which indicate that over-disperse occur in data just as disperse parameter in negative binomial regression
The probability distribution of this model is as follow
)|( iii xyYP )|0()(1)( iii xgxx )|()(1 iii xygx
Where is a function of iz ix are vector
of zero-inflated covariate and is a vector of
zero-inflated coefficient which will be estimated Meanwhile )|( ii xyg is probability
distribution of negative binomial written asiy
i
i
iii
iii y
yxyg
)1()(
)()|(
Mean and variance of ZINB are
))(1)(1()|(
)1()|(
iiiiii
iiii
xyV
xyE
Jackknife Method of Estimating MSE( EBi )
Jackknife methods is one of general methods used in survey because itrsquos unpretentious concept (Jiang Lahiri and Wan 2002) This methods have been known by Tukey (1958) and developed to be a method that capable to be bias corrected of estimator by remove observation-i for i=1hellipm and performs parameter estimation
Rao (2003) the Jackknife step to estimate MSE( EB
i ) are
1 Assume that )ˆˆ(ˆ iiEBi yk
)ˆˆ(ˆ111 ii
EBi yk then calculate
m
l
EBi
EBii m
mM 2
12 )ˆˆ(1ˆ
2 Calculate the delete-i estimator 1
ˆ
and
1 then calculate
4
)]ˆˆ()ˆˆ([1
)ˆˆ(ˆ111111 ii
m
miiiiii ygyg
m
mygM
And )ˆ( 21 vig is the variance estimator of
posterior distribution which is used to measure the variability associated with i
The use of )ˆ( 21 vig is leads to severe of
underestimation of )ˆ( EBiJMSE related
with estimation in prior parameter Therefore the estimator
iM1ˆ correct the
bias of )ˆ( 21 vig
3 Calculate the jackknife estimator of MSE( EB
i ) as
iiEBiJ MMMSE 21
ˆˆ)ˆ(
METHODOLOGY
DataThis research assumed that the available
auxiliary data is on area level so this research used basic area level model The data were simulated with 30 small areas and one covariate Every batch generated different conditions of excess-zeros data start from 01 until 09 probability of zero in small area This research assumed structure of relation between respond and covariate was linear
MethodsThe following steps in generating data
using SAS 91 were used1 Fix the value of
iX for the- i th area
2 Define the expected probability of zero in each small area ))0(( iYP then
calculate ))0(log( ii YPLambda
3 Generate )11(~ Gammai4 Calculate )log(
iiLambda 5 Fit linear regression between and
iX to
obtain0 and
16 Calculate )`exp(X= ii 7 Calculate
iiparmlambda 8 Generate )(~ parmlambdaPoissonyi
Moreover in analyzing data the following steps were applied 1 Generate the negative binomial regression
with genmod procedure in SAS 91 and Zero-Inflated Negative binomial Regression with countreg procedure in SAS 92
2 Estimate the prior parameter which are and
3 Estimate using EB method4 Calculate MSE for indirect estimation5 Calculate RRMSE (Root Relative Mean
Square Error)
i
ii
MSERRMSE
ˆ)ˆ(
)ˆ(
RESULT AND DISCUSSION
Estimation of Prior Parameter is Based on EB Method with Negative Binomial
RegressionIn case of non-excess-zero data the
estimator produced small and consistent MSE Meanwhile if the number of excess-zero isapproximately 30 or more with expected probability of zero 06 the performance of estimates tends to be unreliable As a result EB estimation produced negative values
RRMSE of the estimator increasessimultaneously along with the increase of number of zero in the data Furthermore if thedata contain excess zero at least 30 theestimator is unreliable
Table 1 MSE and RRMSE of EB Estimator with NBR
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 033 016 018 01302 035 020 026 02003 040 023 036 03004 042 027 050 04205 045 031 072 05906 -12875 033 -038 08107 253671 040 -1216 13508 -584495 030 30946 21109 39135606 016 116E+10 664
Table 2 MSE (II) and RRMSE (II) of EB Estimator with NBR
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 033 016 018 01302 035 020 026 02003 040 023 036 03004 042 027 050 04205 046 031 071 05806 26197 033 -035 07507 950007 040 -1002 09908 1444250 030 22054 11009 41595285 016 677E+09 056
5
Table 1 show that the iterative process produced unexpected negative values of MSEThe simplest way to solve this problem is tochange the negative value to zero MSE (II) and RRMSE (II) in table 2 are the result of MSE and RRMSE after the negative value of MSE has been changed to zero
When data have expected probability of zero by 06 to 09 mean of MSE (II) increases drastically Similarly mean of RRMSE (II) increases sharply when data have 08 to 09 expected probability of zero However when data have 06 to 07 expected probability of zero the mean of RRMSE (II) is negative due to the negative value of EB estimates
Estimation of Prior Parameter is Based on EB Method with Zero-Inflated
Negative Binomial RegressionThe EB estimates are similar to the
estimates produced by NBR method although they are slightly outperformed NBR method when the data only contain small number of zeros In particular as shown by table 3 if data have expected probability of zero by 01 to 05 ZINB produces bigger MSE for EB estimator than which NBR produces
Whereas if data have expected probability of zero by 06 to 07 ZINB gives better estimates The estimates were also unbiased as it covers parameter values adequately However ZINB begins to produce inconsistent estimates if data have expected probability of zero by 08 or more due to enormous MSE
Besides when data have expected is because ZINB generates small estimates which is close to the parameter values
Mean of MSE (II) with ZINB is biggerthan the mean of MSE with ZINB That is because when negative value of MSE changed to zero it doesnrsquot have reduction factor in the mean calculation
Comparison of EB estimator withNegative Binomial Regression and EB
estimator with ZINBEB estimates given by both NBR and
ZINB methods are similar for data with small numbers of zero However ZINB method produces bigger MSE than NBR do as long as expected probability of zero in data does not exceed 06 thresholds
But ZINB method performs better if data have expected probability of zero by 06 to 07 In this case EB estimates given by NBR method are unstable and inconsistent due to estimatesrsquo negative value and huge MSE that
can be thousand times larger than theiracceptable value On the other hand EB estimator with ZINB works well it givesunbiased estimates and its MSE values are more stable than EB estimates with NBR
Both methods would have performed poorly if data had expected probability of zero by 08 or more EB estimators with both methods were inconsistent as a result of very huge MSE values they produced
Table 3 MSE and RRMSE of EB Estimator with ZINB
Probability of zero
Mean of MSE
Median ofMSE
Mean ofRRMSE
Median of RRMSE
01 045 017 024 01402 043 020 033 02103 071 028 052 03204 054 028 0632 04205 086 033 7322807 06606 061 038 29817 10307 058 025 218119 19408 -128 -14E-07 162697 37509 2954790 -1E-06 35E+278 609508
Table 4 MSE (II) and RRMSE (II) of EB Estimator with ZINB
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 045 017 024 01402 0436 020 0324 02103 072 028 051 031104 055 028 061 04105 095 033 6561235 05806 075 038 23406 07007 150 025 134655 06908 175 0 733506 009 2954908 0 12E+278 0
CONCLUSION
Excess-zero in data highly influenced the result of EB estimation Conventional method such as negative binomial regression in prior estimation has produced unbiased and unreliable EB estimator for data with expected probability of zero by 06 This is shown bybig number of MSE and negative value of estimator
Meanwhile EB estimation by ZINB method produced more reliable estimator even when the data have expected probability of zero by 06 to 07
The ZINB has also provided a reliable estimator for data with less than 5333 of zeros This means that performance of ZINB
6
declines when the data have expected probability of zero by 08 or more As shown by the big MSE and inconsistent estimator
RECOMMENDATION
This research is based on many assumptions and suffered by several limitations If the assumptions and boundaries can be relaxed can be expected better result There are some recommendations for the next research1 The generating process in this research
does not reflect the real sampling processIf the generating process similar to the real sampling process it might give better result because it will be closer with the real application
2 It will be more interesting to runexperiment which takes account of larger number of areas since the number of areas will influence data modeling
3 The Restricted Maximum Likelihood maybe applied when estimating prior parameter with ZINB and NBR in other to solve the negative value of MSE
4 Theoretical research of ZINB and Empirical Bayes estimator is important to understand the behavior of parameter estimates of ZINB in Empirical Bayes setting
REFERENCES
Erdman D L Jackson A Sinko 2008 Zero-Inflated Poisson and Zero-Inflated Negative Binomial Models Using the COUNTREG Procedure SAS Global Forum 2008322-2008httpwww2sascomproceedingsforum2008322-2008pdf [25 Agustus 2008]
Famoye F KP Singh 2006 Zero-Inflated Generalized Poisson Regression Model with an Application to Domestic Violence Data Journal of Data Science 4117-130
Hardin JW JM Hilbe 2007 Generalized Linear Models and Extensions Texas A Stata Press Publication
Kurnia A KA Notodiputro 2006 Penerapan Metode Jackknife dalam pendugaan Area Kecil Forum Statistika dan Komputasi April 2006 p12-15
Kismiantini 2007 Pendugaan Statistik Area Kecil Berbasis Model Poisson-Gamma [Tesis] Bogor Institut Pertanian Bogor Fakultas Matematika dan Pengetahuan Alam
McCullagh P J A Nelder 1983 Generalized Linear Models London Chapmann and Hall
Ramsini B et all 2001 Uninsured Estimates by County A Review of Options and IssueshttpwwwodhohiogovDataOFHSurvofhsrfq7pdf [24 April 2008]
Rao JNK 2003 Small Area Estimation New York John Wiley amp Sons
Wakefield J 2006 Disease mapping and spatial regression with count data httpwwwbepresscomuwbiostatpaper286pdf [24 April 2008]
7
Appendix 1 Result of EB estimation with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE -100569 0194196 0271669 041875 045917 3239598RRMSE 0123605 0300339 0422426 0503566 0642418 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE -237643 0235733 0306641 0452955 05091 3652167RRMSE 0038956 0412708 0588924 0717336 0844735 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE -749011 0254097 0330078 -12875 0539873 2354887RRMSE -663045 051763 0813734 -038057 1287528 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE -7513075 0235378 0402092 2536714 0876569 6051162RRMSE -10741 0704796 1355566 -121606 3040291 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601
bias 0000395 0116669 0254473 1091172 0497898 6297454MSE -6E+09 -016583 0301527 -584495 5718409 185E+09RRMSE -212936 0927338 2115163 3094627 1359703 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655MSE -38E+09 -130817 0159682 39135606 3074073 12E+11
RRMSE -909131 1647188 6639631 116E+10 1585472 706E+11
8
Appendix 2 Result of EB estimation with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE -053954 010933 0168797 0449506 0369775 360843RRMSE 0022947 0096443 0136424 0238099 0241955 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE -07309 0126202 0201463 0425844 0414597 1734815RRMSE 0021807 0144983 0210692 0326097 0401786 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE -229891 0156942 0277017 0707983 0590466 7469014RRMSE 0023998 0210095 0317195 0519524 0618802 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE -125713 0181557 0284338 0540615 0498521 423089RRMSE 0054916 028362 0420396 0630776 0778033 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442
MSE -181856 0194818 0334706 0859252 0711939 7997074RRMSE 0026206 0387294 0662251 7322807 1312302 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE -34589 0078006 0376514 060793 0804116 3426488RRMSE 000461 0502807 1033578 2981671 2012552 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE -142213 -001433 0255331 0584152 1132152 264456RRMSE 0064209 0847956 1942286 2181192 4589042 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE -10651 -56E-05 -14E-07 -127819 1452962 1132741RRMSE 0063244 1475413 3754705 162697 9221163 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE -175652 -33E-05 -1E-06 2954790 152E-06 613E+08
RRMSE 0040681 4059441 6095076 35E+278 5569021 16E+281
9
Appendix 3 Result of EB estimation (II) with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE 0 0194196 0271669 0419181 045917 3239598RRMSE 0 0300116 0422209 0502895 0641904 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE 0 0235733 0306641 0456258 05091 3652167RRMSE 0 0410357 0585765 0712314 0841838 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE 0 0254097 0330078 2619677 0539873 2354887RRMSE -663045 0448118 0750369 -034911 1209918 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE 0 0235378 0402092 9500073 0876569 6051162RRMSE -10741 0288999 0995659 -100163 2527784 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601bias 0000395 0116669 0254473 1091172 0497898 6297454MSE 0 0 0301527 1444250 5718409 185E+09RRMSE -212936 0 1104113 2205437 5656681 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655
MSE 0 0 0159682 41595285 3074073 12E+11
RRMSE -909131 0 0557622 677E+09 9311925 706E+11
10
Appendix 4 Result of EB estimation (II) with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE 0 010933 0168797 0450626 0369775 360843RRMSE 0 0095932 0135647 023675 0239669 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE 0 0126202 0201463 0428006 0414597 1734815RRMSE 0 0142648 020709 0320663 0395479 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE 0 0156942 0277017 0716543 0590466 7469014RRMSE 0 0203913 0311937 0506882 0615401 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE 0 0181557 0284338 0549835 0498521 423089RRMSE 0 0270309 0405926 0606317 0766631 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442MSE 0 0194818 0334706 094973 0711939 7997074RRMSE 0 0316402 0576343 6561235 1240175 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE 0 0078006 0376514 0749436 0804116 3426488RRMSE 0 0258286 0698814 2340612 1714808 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE 0 0 0255331 1501268 1132152 264456RRMSE 0 0 0688797 1346552 2500825 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE 0 0 0 1755486 1452962 1132741RRMSE 0 0 0 7335062 3311711 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE 0 0 0 2954908 152E-06 613E+08
RRMSE 0 0 0 12E+278 416189 16E+281
11
Appendix 5 Syntax program for generate data
data b generate x1(covariate) and ei input x1cards0222831971100013131702314625252218171412202210run
macro bangkit_datado r=1 to 100
data egenerate poisson-gamma with excess zerodo kk=1 to 30set btetha = rangam(11)lambda = -log(01) peluang munculnya nilai nol yang diinginkan (01-09)starlambda = log(lambdatetha)output endrun
proc regmodel starlambda = x1 ods output ParameterEstimates=workbetha_lr (keep=Parameter Estimate)run
proc transpose data=workbetha_lr out=workbetha_lr_t
12
Appendix 5 Syntax program for generate data (continued)
rundata _null_set workbetha_lr_tcall symput (Intercept col1)call symput (x1 col2)run
data ddo kk=1 to 30set emu = exp(ampIntercept + ampx1x1)parmlambda = mutethaypoi = rand(poissonparmlambda)output endrun
ods trace onto take percent zero on dataproc freq data=dtables ypoi ods output OneWayFreqs=workzerorundata zeroset zerokeep percentrunproc transpose data=zero out=zero1 rundata _null_set workzero1call symput (pctz col1)rundata dset dpzero=amppctzr=amprrun
proc append data=d base=d1run
endmend
bangkit_data
13
Appendix 6 Syntax program EB with NBR
macro sae_nbdo x=1 to 900
data workaset workeif ^(u=ampx) then deleterun
this genmod procedure estimates the response without zero-inflation proc genmod data=amodel ypoi = x1 dist=nb link=logods output ParameterEstimates=workbetha_nb (keep=Parameter Estimate)run
proc transpose data=workbetha_nb out=workbetha_nb_trun
data _null_set workbetha_nb_tcall symput (Intercept col1)call symput (x1 col2)call symput (Dispersion col3)run
EB with negbin-regdata workduga_nbset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + ampDispersion)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(ampDispersion+ypoi)((mu_hat_b+ampDispersion)2)bias_b=abs(teta_hat_bayes-parmlambda)run
proc append data=workduga_nb base=workduga_nb1run
jacknifedo h=1 to 30
data workdset workduga_nb1if ^(u=ampx) then deleterundata workjacknbamphset workdif u=ampxif kk=amph then deleterun
proc genmod data=workjacknbamph output p out=sasyi_estmodel ypoi = x1 dist = nb link=logods output parameterestimates=workbetha_est_nbamph (keep=parameter Estimate)
14
Appendix 6 Syntax program EB with NBR (continued)
runproc transpose data=workbetha_est_nbamph out=workbetha_est_nbtamphrundata _null_set workbetha_est_nbtamphcall symput (Intercept_ col1)call symput (x1_ col2)call symput (Dispersion_ col3)run
data workduganbamphset workdmu_hat_b_amph=exp(ampIntercept_ + ampx1_x1)w_b_amph=mu_hat_b_amph (mu_hat_b_amph + ampDispersion_)teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2g1_amph=(ampDispersion_+ypoi)((mu_hat_b_amph+ampDispersion_)2)beda_g_amph=g1_amph-g1run
data workmse_nb_jmerge workduganb1 workduganb2 workduganb3 workduganb4 workduganb5 workduganb6 workduganb7 workduganb8 workduganb9 workduganb10 workduganb11 workduganb12workduganb13 workduganb14 workduganb15 workduganb16 workduganb17workduganb18 workduganb19 workduganb20 workduganb21 workduganb22workduganb23 workduganb24 workduganb25 workduganb26 workduganb27workduganb28 workduganb29 workduganb30by kkrun
data workmse_nb_jset workmse_nb_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampjendm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesul = ampxrun
proc append data=workmse_nb_j base=workmse_nb_j1run
data workhasilnbmerge workd workmse_nb_j keep kk x1 tetha mu parmlambda ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_b
15
Appendix 6 Syntax program EB with NBR (continued)
run
ods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilnb BASE=workhasilnb1 appendver=v6run
ENDmend
sae_nb
16
Appendix 7 Syntax program EB with ZINB
macro sae_zinb
do x=1 to 900
data workaset work eif ^(u=ampx) then deleterun
proc countreg data=amodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workpe(keep=Parameter Estimate)run
proc transpose data=workpe out=workpe_trun
data _null_set workpe_tcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + amp_Alpha)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(amp_Alpha+ypoi)((mu_hat_b+amp_Alpha)2)bias_b=abs(teta_hat_bayes-parmlamdha)
run
proc append data=workduga base=workduga1run
do h=1 to 30
data workdset workduga1if ^(u=ampx) then deleterundata workjackzinbamphset workdif u=ampxif kk=amph then deleterun
proc countreg data=jackzinbamphmodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workbetha_est_ZINBamph
17
Appendix 7 Syntax program EB with ZINB (continued)
(keep=Parameter Estimate)run
proc transpose data=workbetha_est_ZINBamph out=workbetha_est_ZINBtamphrun
data _null_set workbetha_est_ZINBtamphcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaZINBamphset workdmu_hat_b_amph=exp(ampIntercept + ampx1x1)mu_hat_b_amph= ampb_o- + ampb_1- x1w_b_amph=mu_hat_b_amph (mu_hat_b_amph + (amp_Alpha))teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2
g1_amph =((mu_hat_b_amph2ampalpha_)2)(ampalpha_+y_i)((mu_hat_b_amph2ampalpha_)+mu_hat_b_amph)2
g1_amph=(amp_Alpha+ypoi)((mu_hat_b_amph+amp_Alpha)2)
g1_amph =(A2)(ampk- + y_i)( a +mu_hat_b)2
beda_g_amph=g1_amph-g1run
data workmse_ZINB_jmerge workdugaZINB1 workdugaZINB2 workdugaZINB3 workdugaZINB4 workdugaZINB5 workdugaZINB6 workdugaZINB7 workdugaZINB8 workdugaZINB9 workdugaZINB10 workdugaZINB11 workdugaZINB12workdugaZINB13 workdugaZINB14 workdugaZINB15 workdugaZINB16 workdugaZINB17workdugaZINB18 workdugaZINB19 workdugaZINB20 workdugaZINB21 workdugaZINB22workdugaZINB23 workdugaZINB24 workdugaZINB25 workdugaZINB26 workdugaZINB27workdugaZINB28 workdugaZINB29 workdugaZINB30by kkrun
data workmse_ZINB_jset workmse_ZINB_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampj
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb
vi
ACKNOWLEDGEMENTS
First of all the author modestly admitted that completion of this paper would not be possible without invaluable help from many generous and extraordinary people The author was deeply in debt for their helps ideas critics and improvement advices during writing process However they should not be hold responsible for all mistakes and deficiencies in this paper which were purely authors So hereby I would like to express my graceful to
1 All praise and gratitude for Allah SWT Alhamdulillah hirabbil alamin With his bless I able to finish this paper Thanks Allah for giving me a wonderful life with extraordinary people around me
2 Prof Dr Khairil Anwar Notodiputro and Ir Indahwati MSi for the early motivation discussion advices support and their great enthusiasm
3 Mr Bambang Sumantri MSi as examiner thanks for the spirit advices and critics4 My beloved family for the unlimited love ever after 5 Mr Alfian Futuhul Hadi MSi for enlightening discussion when I was in trouble6 Mr Bagus Sartono MSi thank you very much to run my data at your lab with your
wonderful computer Sorry if it might disturb you7 Mr Anang MSi and Mr Rahman MSi for sharing their knowledge and technical support8 Mr Dr Ir Hari Wijayanto MS all lecturer and staff at Statistics Department Thanks for
knowledge of statistics and knowledge of life that you shared It means a lot for me9 Rahmatullah Sigit Dodiet Sasongko SSi for the spirit love care time and patience Keep
it real Still love me forever and ever10 Mr Dionisius Laksmana Bisara Putra SSi for edited my paper critics and provided useful
discussion for author 11 Maulana Chistanto SSi and Yhanuar Ismail SSi thank for being my best brother12 Nikhen Sevrien and (alm Dini) thanks for lighting my day13 Rere Yusri Agus Ika Cinong Toki Cheri Fisca Wiwik Neng Mala Lilis Dika
Rangga Lele Dodi Kus Inal Bebek Koler and all of Statisticsrsquo41 14 Everyone that helps me in this study which can not be named personally
This thesis is not perfect so I am expecting the critics advices and recommendation to people who read my thesis Thank You God bless you all
Bogor January 2009
Irene Muflikh Nadhiroh
1
TABLE OF CONTENTS
PageINTRODUCTION 1
Background 1Objectives1
LITERATURE REVIEW 1Direct Estimation1Small Area Estimation 1Small Area Models1Empirical Bayes Methods 2Poisson-Gamma Models 2Negative Binomial Regression 2Over-disperse at Count Data 3Zero-Inflated Models3Zero-Inflated Negative Binomial 3Jackknife Method of Estimating MSE( EB
i )3
METHODOLOGY 4Data 4Methods4
RESULT AND DISCUSSION 4Estimation of Prior Parameter is Based of EB Method with Negative Binomial Regression 4Estimation of Prior Parameter is Based of EB Method with Zero-Inflated Negative Binomial Regression 5Comparison of EB estimator with Negative Binomial Regression and EB estimator with ZINB 5
CONCLUSION 5RECOMMENDATION 6REFERENCES 6
LIST OF TABLES
PageTable 1 MSE and RRMSE of EB Estimator with NBR 4Table 2 MSE (II) and RRMSE (II) of EB Estimator with NBR 4Table 3 MSE and RRMSE of EB Estimator with ZINB 5Table 4 MSE (II) and RRMSE (II) of EB Estimator with ZINB 5
LIST OF APPENDICES
PageAppendix 1 Result of EB estimation with NBR 7Appendix 2 Result of EB estimation with ZINB 8Appendix 3 Result of EB estimation (II) with NBR 9Appendix 4 Result of EB estimation (II) with ZINB 10Appendix 5 Syntax program for generate data 11Appendix 6 Syntax program EB with NBR 13Appendix 7 Syntax program EB with ZINB 16
1
INTRODUCTION
BackgroundDirect estimation is usually applied in big
scale survey but it is sometime difficult to utilize such estimator in a smaller region especially the sample size is too small In this case indirect estimation which adds covariates to estimate the parameter is usually used This type of estimation is broadly known as Small Area Estimation
Kismiantini (2007) conducted a research in Small Area Estimation based on Poisson-Gamma models Maximum Likelihood Estimation was used with Negative Binomial Regression techniques to estimate the respective prior parameter Moreover Negative Binomial Regression was used to resolve over-dispersion problem in the data
In reality count data is not onlycharacterized by over-dispersion but sometimes by excess-zero Excess-zero is a condition when the data contains too many zero or exceeds the distributionrsquos expectation 100 observations from Poisson model with response mean of 4 we could expect that there will be 2 zeros If the data have 30 zeros it should be obvious that the distributional assumptions have been violated Therefore the estimated parameter and standard error will be biased (Hardin amp Hilbe 2007) In this paper Zero-Inflated models were adapted to solve this type of problem
ObjectivesThe research objectives are
1 To investigate the performance of Negative Binomial Regression on Small Area Estimation in case of excess-zero
2 To apply Zero-Inflated Count Models on Small Area Estimation in case of excess-zero
3 To evaluate the performance of Zero-Inflated Count Models in estimating prior parameter for Small Area Estimation
LITERATURE REVIEW
Direct EstimationDirect estimates are generally ldquodesign
basedrdquo in the sense that they make use of ldquosurvey weightrdquo and associated inferences are based on the probability distribution by the sample design with the population values held fixed (Rao 2003) In particular direct estimates of a domain parameter are based only on the domain-specific sample data
Data from sample survey have been used to be a reliable estimate of parameter Ramsini et al (2001) mentioned that direct estimates of small area are unbiased although it would have big variance cause itrsquos small sample size
Small Area EstimationThe term of small area can be everything
depending on our object of interest It can be a city age group sex group region and rural district In general small area is used to denote any domain which the direct estimation with adequate precision can not be produced (Rao 2003) It happens because the sample size in small area is too small As a result direct estimation based on sampling design is not capable to produce direct estimation with adequate precision Furthermore small area estimation is developed as a statistic technique for estimating the parameter of small area This technique is used in effort to make estimation with adequate level of precision It works as indirect estimation that lend the strength of variable interest values from related areas through the use of supplementary information related to variable interest such as recent census count and current administrative records (Rao 2003) Indirect estimation is a process of estimating a domainrsquos parameter by connecting the information in that domain with another domain using an appropriate model So the estimator works by including other domainrsquos data (Kurnia amp Notodiputro 2006)
Small Area ModelsThere are two link models in indirect
estimation First traditional method based on implicit models that provide a link to relate small area through supplementary data Second explicit small area models that make specific allowance between area variations (Rao 2003) This research used the second model and it could be classified into two broad types of basic model1 Basic area level (type A) model
Basic area level model or aggregate model includes all models that relate small area with area-specific auxiliary variables These models are essential if unit (element) level data are not available Assuming parameter estimators
i is
related to area specific auxiliary data or covariate variables T
pii xxx )( 11 by
a linear model
2
iiT
ii vbx with i=1hellipm
iv ~N(0 2v ) are area-specific random
effect and Tp )( 1 is 1p vector of
regression coefficients Therefore ib are
known as positive constants For making inferences about
i direct estimators iy
are assumed available Accordingly assuming
iii ey where i=1hellipm with
sampling error ie ~N(0 ei2 ) and ei
2are known At the end both models are combined and as a result is new model
iiiT
ii evbxy where i=1hellipm
(Rao 2003)2 Basic unit level (type B) model
Unit level model includes all models that relate unit values of the study variable to unit-specific auxiliary variables Assuming unit-specific auxiliary variables T
ijpijij xxx )( 1 and
correspondingly a nested regression model
ijiT
ijij evxy where
i=1hellipm and j=1hellip in with
iv ~N(0 2v ) and also ie ~N(0 ei
2 )
Empirical Bayes MethodsThe Bayesian approach is based on Bayes
Law which was found by Thomas Bayes This law was introduced by Richard Proce in 1763 two years after Thomas Bayes passed away In 1774 and 1781 Laplace gave the details and relevancies for modern Bayesian statistics (Gill 2002 in Kismiantini 2007)
Novick in Good (1980) mentioned that Bayes method is difficult to adopt and sometimes is very sensitive due to the requirement of prior probability informationwhich is usually difficult to obtain Robbin (1955) introduced Empirical Bayes methods by assuming a particular prior distribution estimating based on the sample Rao (2003) said that EB (Empirical Bayes) and HB (Hierarchical Bayes) are compatible for binary and count data in Small Area Estimation Therefore EB method was used in this research
Rao (2003) summarized EB methods in Small Area Estimation as follows 1 Obtain the posterior probability density
function of the small area parameter2 Estimate the parameters from the
marginal density function
3 Use the estimated posterior density forinferences regarding the parameters ofinterest
Poisson-Gamma ModelsPoisson model is a standard model in
dealing with count data Generally count data can be suffered by over-dispersion problem Therefore a Poisson formula had been developed to accommodate extra variance from sample data Two-stage models have been introduced for count data known as mixed model Poisson-Gamma Wakefield (2006) introduced Poisson-Gamma model which was easier to use with SMR (Standard Mortality Ratio) as a direct estimator This study used Wakefield model with alteration in direct estimator
Let iy be a number of specific individual
at small area-i which has specific characteristic of interest and written as follow
j
iji yy
ijy are the-jth object at the-ith small area where
j=1hellipn and i=1hellipm
First stage )(~ ii
ind
i Poissony is assumed
where )( ii x describes a regression
model in area level ix is a vector of
covariates and Tpii)( is a vector of
regression coefficientsSecond stage distribution
)1(~ gammaiid
iis assumed as a prior
distribution with mean 1 and variance 1
Then the marginal distribution |iy is
negative binomialMoreover Wakefield (2006) used Bayes
Theorem and acquired posterior distributionas
)1(~|i
iii ygammay
and EB estimator as
iiiiB
iEB
i )ˆ1(ˆˆ)ˆˆ(ˆˆ
with )ˆˆ(ˆˆ iii ii y are direct
estimation from i and iy are the number of
observation
Negative Binomial Regression The negative binomial regression model
seems have been first discussed by Anscombe (1972) Others have pointed out its success indealing with over-dispersed count data
3
Lawless (1987) elaborated the mixture model parameterization of the negative binomial providing formulas for its log likelihood mean variance and moments Later Breslow (1990) cited Lawlessrsquo work and since its inception to the late 1980rsquos the negative binomial regression model have been construed as a mixture model that is useful for accommodating otherwise over-dispersed Poisson data (Hardin amp Hilbe 2007) The negative binomial distribution function is written as
yk
kk
k
y
kyxyg
)1()(
)()|(
where y=012hellip k and are negative
binomial parameter with )(yE and
ky 2)var( k mention as disperse
parameter which is shown that the data consist of over-dispersed
Over-disperse at Count DataCount data for Poisson regression
including by over-disperse if variance bigger than mean or if the expected value of variance is smaller than expected This phenomenon is written as
)()( ii yEyVar (McCullagh amp Nelder 1989)
Zero-Inflated ModelsZero-Inflated models consider two distinct
sources of zero outcomes One source is generated from individuals who do not enter into the counting process the other from those who do enter the count process but result in a zero outcome (Hardin amp Hilbe 2007)
Lambert (1992) first described this type of mixture model in the context of process control in manufacturing It has since been used in many applications and is now founddiscussed in nearly every book or article dealing with count response models
For the zero-inflated model the probability of observing a zero outcome equals the probability that an individual is in the always-zero group plus the probability that individual is not that group times the probability that the counting process produces a zero If )0(B as
the probability that the binary process result in a zero outcomes and )0Pr( as the probability
that the counting of a zero outcomes the probability of a zero outcome for the system is then given by (Hardin amp Hilbe 2007)
)0Pr()1()0()0Pr( ZBy The probability of a nonzero count is
)Pr()]0(1[)0Pr( kBkky This model would produce two groups of
parameter one is zero-inflation parameter which shown that the covariate significantly contribute to having a zero outcomes And the other parameter is negative binomial parameter which modeling the response with the covariate
Zero-Inflated Negative BinomialThere are many kinds of zero-inflated
model each model has plus and minus and is used in different type of data Zero-Inflated negative binomial is one kind of them This model is used in over-disperse and excess-zero data As a result among parameter estimators there would be k parameters which indicate that over-disperse occur in data just as disperse parameter in negative binomial regression
The probability distribution of this model is as follow
)|( iii xyYP )|0()(1)( iii xgxx )|()(1 iii xygx
Where is a function of iz ix are vector
of zero-inflated covariate and is a vector of
zero-inflated coefficient which will be estimated Meanwhile )|( ii xyg is probability
distribution of negative binomial written asiy
i
i
iii
iii y
yxyg
)1()(
)()|(
Mean and variance of ZINB are
))(1)(1()|(
)1()|(
iiiiii
iiii
xyV
xyE
Jackknife Method of Estimating MSE( EBi )
Jackknife methods is one of general methods used in survey because itrsquos unpretentious concept (Jiang Lahiri and Wan 2002) This methods have been known by Tukey (1958) and developed to be a method that capable to be bias corrected of estimator by remove observation-i for i=1hellipm and performs parameter estimation
Rao (2003) the Jackknife step to estimate MSE( EB
i ) are
1 Assume that )ˆˆ(ˆ iiEBi yk
)ˆˆ(ˆ111 ii
EBi yk then calculate
m
l
EBi
EBii m
mM 2
12 )ˆˆ(1ˆ
2 Calculate the delete-i estimator 1
ˆ
and
1 then calculate
4
)]ˆˆ()ˆˆ([1
)ˆˆ(ˆ111111 ii
m
miiiiii ygyg
m
mygM
And )ˆ( 21 vig is the variance estimator of
posterior distribution which is used to measure the variability associated with i
The use of )ˆ( 21 vig is leads to severe of
underestimation of )ˆ( EBiJMSE related
with estimation in prior parameter Therefore the estimator
iM1ˆ correct the
bias of )ˆ( 21 vig
3 Calculate the jackknife estimator of MSE( EB
i ) as
iiEBiJ MMMSE 21
ˆˆ)ˆ(
METHODOLOGY
DataThis research assumed that the available
auxiliary data is on area level so this research used basic area level model The data were simulated with 30 small areas and one covariate Every batch generated different conditions of excess-zeros data start from 01 until 09 probability of zero in small area This research assumed structure of relation between respond and covariate was linear
MethodsThe following steps in generating data
using SAS 91 were used1 Fix the value of
iX for the- i th area
2 Define the expected probability of zero in each small area ))0(( iYP then
calculate ))0(log( ii YPLambda
3 Generate )11(~ Gammai4 Calculate )log(
iiLambda 5 Fit linear regression between and
iX to
obtain0 and
16 Calculate )`exp(X= ii 7 Calculate
iiparmlambda 8 Generate )(~ parmlambdaPoissonyi
Moreover in analyzing data the following steps were applied 1 Generate the negative binomial regression
with genmod procedure in SAS 91 and Zero-Inflated Negative binomial Regression with countreg procedure in SAS 92
2 Estimate the prior parameter which are and
3 Estimate using EB method4 Calculate MSE for indirect estimation5 Calculate RRMSE (Root Relative Mean
Square Error)
i
ii
MSERRMSE
ˆ)ˆ(
)ˆ(
RESULT AND DISCUSSION
Estimation of Prior Parameter is Based on EB Method with Negative Binomial
RegressionIn case of non-excess-zero data the
estimator produced small and consistent MSE Meanwhile if the number of excess-zero isapproximately 30 or more with expected probability of zero 06 the performance of estimates tends to be unreliable As a result EB estimation produced negative values
RRMSE of the estimator increasessimultaneously along with the increase of number of zero in the data Furthermore if thedata contain excess zero at least 30 theestimator is unreliable
Table 1 MSE and RRMSE of EB Estimator with NBR
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 033 016 018 01302 035 020 026 02003 040 023 036 03004 042 027 050 04205 045 031 072 05906 -12875 033 -038 08107 253671 040 -1216 13508 -584495 030 30946 21109 39135606 016 116E+10 664
Table 2 MSE (II) and RRMSE (II) of EB Estimator with NBR
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 033 016 018 01302 035 020 026 02003 040 023 036 03004 042 027 050 04205 046 031 071 05806 26197 033 -035 07507 950007 040 -1002 09908 1444250 030 22054 11009 41595285 016 677E+09 056
5
Table 1 show that the iterative process produced unexpected negative values of MSEThe simplest way to solve this problem is tochange the negative value to zero MSE (II) and RRMSE (II) in table 2 are the result of MSE and RRMSE after the negative value of MSE has been changed to zero
When data have expected probability of zero by 06 to 09 mean of MSE (II) increases drastically Similarly mean of RRMSE (II) increases sharply when data have 08 to 09 expected probability of zero However when data have 06 to 07 expected probability of zero the mean of RRMSE (II) is negative due to the negative value of EB estimates
Estimation of Prior Parameter is Based on EB Method with Zero-Inflated
Negative Binomial RegressionThe EB estimates are similar to the
estimates produced by NBR method although they are slightly outperformed NBR method when the data only contain small number of zeros In particular as shown by table 3 if data have expected probability of zero by 01 to 05 ZINB produces bigger MSE for EB estimator than which NBR produces
Whereas if data have expected probability of zero by 06 to 07 ZINB gives better estimates The estimates were also unbiased as it covers parameter values adequately However ZINB begins to produce inconsistent estimates if data have expected probability of zero by 08 or more due to enormous MSE
Besides when data have expected is because ZINB generates small estimates which is close to the parameter values
Mean of MSE (II) with ZINB is biggerthan the mean of MSE with ZINB That is because when negative value of MSE changed to zero it doesnrsquot have reduction factor in the mean calculation
Comparison of EB estimator withNegative Binomial Regression and EB
estimator with ZINBEB estimates given by both NBR and
ZINB methods are similar for data with small numbers of zero However ZINB method produces bigger MSE than NBR do as long as expected probability of zero in data does not exceed 06 thresholds
But ZINB method performs better if data have expected probability of zero by 06 to 07 In this case EB estimates given by NBR method are unstable and inconsistent due to estimatesrsquo negative value and huge MSE that
can be thousand times larger than theiracceptable value On the other hand EB estimator with ZINB works well it givesunbiased estimates and its MSE values are more stable than EB estimates with NBR
Both methods would have performed poorly if data had expected probability of zero by 08 or more EB estimators with both methods were inconsistent as a result of very huge MSE values they produced
Table 3 MSE and RRMSE of EB Estimator with ZINB
Probability of zero
Mean of MSE
Median ofMSE
Mean ofRRMSE
Median of RRMSE
01 045 017 024 01402 043 020 033 02103 071 028 052 03204 054 028 0632 04205 086 033 7322807 06606 061 038 29817 10307 058 025 218119 19408 -128 -14E-07 162697 37509 2954790 -1E-06 35E+278 609508
Table 4 MSE (II) and RRMSE (II) of EB Estimator with ZINB
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 045 017 024 01402 0436 020 0324 02103 072 028 051 031104 055 028 061 04105 095 033 6561235 05806 075 038 23406 07007 150 025 134655 06908 175 0 733506 009 2954908 0 12E+278 0
CONCLUSION
Excess-zero in data highly influenced the result of EB estimation Conventional method such as negative binomial regression in prior estimation has produced unbiased and unreliable EB estimator for data with expected probability of zero by 06 This is shown bybig number of MSE and negative value of estimator
Meanwhile EB estimation by ZINB method produced more reliable estimator even when the data have expected probability of zero by 06 to 07
The ZINB has also provided a reliable estimator for data with less than 5333 of zeros This means that performance of ZINB
6
declines when the data have expected probability of zero by 08 or more As shown by the big MSE and inconsistent estimator
RECOMMENDATION
This research is based on many assumptions and suffered by several limitations If the assumptions and boundaries can be relaxed can be expected better result There are some recommendations for the next research1 The generating process in this research
does not reflect the real sampling processIf the generating process similar to the real sampling process it might give better result because it will be closer with the real application
2 It will be more interesting to runexperiment which takes account of larger number of areas since the number of areas will influence data modeling
3 The Restricted Maximum Likelihood maybe applied when estimating prior parameter with ZINB and NBR in other to solve the negative value of MSE
4 Theoretical research of ZINB and Empirical Bayes estimator is important to understand the behavior of parameter estimates of ZINB in Empirical Bayes setting
REFERENCES
Erdman D L Jackson A Sinko 2008 Zero-Inflated Poisson and Zero-Inflated Negative Binomial Models Using the COUNTREG Procedure SAS Global Forum 2008322-2008httpwww2sascomproceedingsforum2008322-2008pdf [25 Agustus 2008]
Famoye F KP Singh 2006 Zero-Inflated Generalized Poisson Regression Model with an Application to Domestic Violence Data Journal of Data Science 4117-130
Hardin JW JM Hilbe 2007 Generalized Linear Models and Extensions Texas A Stata Press Publication
Kurnia A KA Notodiputro 2006 Penerapan Metode Jackknife dalam pendugaan Area Kecil Forum Statistika dan Komputasi April 2006 p12-15
Kismiantini 2007 Pendugaan Statistik Area Kecil Berbasis Model Poisson-Gamma [Tesis] Bogor Institut Pertanian Bogor Fakultas Matematika dan Pengetahuan Alam
McCullagh P J A Nelder 1983 Generalized Linear Models London Chapmann and Hall
Ramsini B et all 2001 Uninsured Estimates by County A Review of Options and IssueshttpwwwodhohiogovDataOFHSurvofhsrfq7pdf [24 April 2008]
Rao JNK 2003 Small Area Estimation New York John Wiley amp Sons
Wakefield J 2006 Disease mapping and spatial regression with count data httpwwwbepresscomuwbiostatpaper286pdf [24 April 2008]
7
Appendix 1 Result of EB estimation with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE -100569 0194196 0271669 041875 045917 3239598RRMSE 0123605 0300339 0422426 0503566 0642418 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE -237643 0235733 0306641 0452955 05091 3652167RRMSE 0038956 0412708 0588924 0717336 0844735 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE -749011 0254097 0330078 -12875 0539873 2354887RRMSE -663045 051763 0813734 -038057 1287528 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE -7513075 0235378 0402092 2536714 0876569 6051162RRMSE -10741 0704796 1355566 -121606 3040291 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601
bias 0000395 0116669 0254473 1091172 0497898 6297454MSE -6E+09 -016583 0301527 -584495 5718409 185E+09RRMSE -212936 0927338 2115163 3094627 1359703 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655MSE -38E+09 -130817 0159682 39135606 3074073 12E+11
RRMSE -909131 1647188 6639631 116E+10 1585472 706E+11
8
Appendix 2 Result of EB estimation with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE -053954 010933 0168797 0449506 0369775 360843RRMSE 0022947 0096443 0136424 0238099 0241955 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE -07309 0126202 0201463 0425844 0414597 1734815RRMSE 0021807 0144983 0210692 0326097 0401786 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE -229891 0156942 0277017 0707983 0590466 7469014RRMSE 0023998 0210095 0317195 0519524 0618802 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE -125713 0181557 0284338 0540615 0498521 423089RRMSE 0054916 028362 0420396 0630776 0778033 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442
MSE -181856 0194818 0334706 0859252 0711939 7997074RRMSE 0026206 0387294 0662251 7322807 1312302 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE -34589 0078006 0376514 060793 0804116 3426488RRMSE 000461 0502807 1033578 2981671 2012552 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE -142213 -001433 0255331 0584152 1132152 264456RRMSE 0064209 0847956 1942286 2181192 4589042 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE -10651 -56E-05 -14E-07 -127819 1452962 1132741RRMSE 0063244 1475413 3754705 162697 9221163 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE -175652 -33E-05 -1E-06 2954790 152E-06 613E+08
RRMSE 0040681 4059441 6095076 35E+278 5569021 16E+281
9
Appendix 3 Result of EB estimation (II) with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE 0 0194196 0271669 0419181 045917 3239598RRMSE 0 0300116 0422209 0502895 0641904 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE 0 0235733 0306641 0456258 05091 3652167RRMSE 0 0410357 0585765 0712314 0841838 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE 0 0254097 0330078 2619677 0539873 2354887RRMSE -663045 0448118 0750369 -034911 1209918 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE 0 0235378 0402092 9500073 0876569 6051162RRMSE -10741 0288999 0995659 -100163 2527784 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601bias 0000395 0116669 0254473 1091172 0497898 6297454MSE 0 0 0301527 1444250 5718409 185E+09RRMSE -212936 0 1104113 2205437 5656681 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655
MSE 0 0 0159682 41595285 3074073 12E+11
RRMSE -909131 0 0557622 677E+09 9311925 706E+11
10
Appendix 4 Result of EB estimation (II) with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE 0 010933 0168797 0450626 0369775 360843RRMSE 0 0095932 0135647 023675 0239669 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE 0 0126202 0201463 0428006 0414597 1734815RRMSE 0 0142648 020709 0320663 0395479 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE 0 0156942 0277017 0716543 0590466 7469014RRMSE 0 0203913 0311937 0506882 0615401 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE 0 0181557 0284338 0549835 0498521 423089RRMSE 0 0270309 0405926 0606317 0766631 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442MSE 0 0194818 0334706 094973 0711939 7997074RRMSE 0 0316402 0576343 6561235 1240175 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE 0 0078006 0376514 0749436 0804116 3426488RRMSE 0 0258286 0698814 2340612 1714808 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE 0 0 0255331 1501268 1132152 264456RRMSE 0 0 0688797 1346552 2500825 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE 0 0 0 1755486 1452962 1132741RRMSE 0 0 0 7335062 3311711 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE 0 0 0 2954908 152E-06 613E+08
RRMSE 0 0 0 12E+278 416189 16E+281
11
Appendix 5 Syntax program for generate data
data b generate x1(covariate) and ei input x1cards0222831971100013131702314625252218171412202210run
macro bangkit_datado r=1 to 100
data egenerate poisson-gamma with excess zerodo kk=1 to 30set btetha = rangam(11)lambda = -log(01) peluang munculnya nilai nol yang diinginkan (01-09)starlambda = log(lambdatetha)output endrun
proc regmodel starlambda = x1 ods output ParameterEstimates=workbetha_lr (keep=Parameter Estimate)run
proc transpose data=workbetha_lr out=workbetha_lr_t
12
Appendix 5 Syntax program for generate data (continued)
rundata _null_set workbetha_lr_tcall symput (Intercept col1)call symput (x1 col2)run
data ddo kk=1 to 30set emu = exp(ampIntercept + ampx1x1)parmlambda = mutethaypoi = rand(poissonparmlambda)output endrun
ods trace onto take percent zero on dataproc freq data=dtables ypoi ods output OneWayFreqs=workzerorundata zeroset zerokeep percentrunproc transpose data=zero out=zero1 rundata _null_set workzero1call symput (pctz col1)rundata dset dpzero=amppctzr=amprrun
proc append data=d base=d1run
endmend
bangkit_data
13
Appendix 6 Syntax program EB with NBR
macro sae_nbdo x=1 to 900
data workaset workeif ^(u=ampx) then deleterun
this genmod procedure estimates the response without zero-inflation proc genmod data=amodel ypoi = x1 dist=nb link=logods output ParameterEstimates=workbetha_nb (keep=Parameter Estimate)run
proc transpose data=workbetha_nb out=workbetha_nb_trun
data _null_set workbetha_nb_tcall symput (Intercept col1)call symput (x1 col2)call symput (Dispersion col3)run
EB with negbin-regdata workduga_nbset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + ampDispersion)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(ampDispersion+ypoi)((mu_hat_b+ampDispersion)2)bias_b=abs(teta_hat_bayes-parmlambda)run
proc append data=workduga_nb base=workduga_nb1run
jacknifedo h=1 to 30
data workdset workduga_nb1if ^(u=ampx) then deleterundata workjacknbamphset workdif u=ampxif kk=amph then deleterun
proc genmod data=workjacknbamph output p out=sasyi_estmodel ypoi = x1 dist = nb link=logods output parameterestimates=workbetha_est_nbamph (keep=parameter Estimate)
14
Appendix 6 Syntax program EB with NBR (continued)
runproc transpose data=workbetha_est_nbamph out=workbetha_est_nbtamphrundata _null_set workbetha_est_nbtamphcall symput (Intercept_ col1)call symput (x1_ col2)call symput (Dispersion_ col3)run
data workduganbamphset workdmu_hat_b_amph=exp(ampIntercept_ + ampx1_x1)w_b_amph=mu_hat_b_amph (mu_hat_b_amph + ampDispersion_)teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2g1_amph=(ampDispersion_+ypoi)((mu_hat_b_amph+ampDispersion_)2)beda_g_amph=g1_amph-g1run
data workmse_nb_jmerge workduganb1 workduganb2 workduganb3 workduganb4 workduganb5 workduganb6 workduganb7 workduganb8 workduganb9 workduganb10 workduganb11 workduganb12workduganb13 workduganb14 workduganb15 workduganb16 workduganb17workduganb18 workduganb19 workduganb20 workduganb21 workduganb22workduganb23 workduganb24 workduganb25 workduganb26 workduganb27workduganb28 workduganb29 workduganb30by kkrun
data workmse_nb_jset workmse_nb_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampjendm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesul = ampxrun
proc append data=workmse_nb_j base=workmse_nb_j1run
data workhasilnbmerge workd workmse_nb_j keep kk x1 tetha mu parmlambda ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_b
15
Appendix 6 Syntax program EB with NBR (continued)
run
ods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilnb BASE=workhasilnb1 appendver=v6run
ENDmend
sae_nb
16
Appendix 7 Syntax program EB with ZINB
macro sae_zinb
do x=1 to 900
data workaset work eif ^(u=ampx) then deleterun
proc countreg data=amodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workpe(keep=Parameter Estimate)run
proc transpose data=workpe out=workpe_trun
data _null_set workpe_tcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + amp_Alpha)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(amp_Alpha+ypoi)((mu_hat_b+amp_Alpha)2)bias_b=abs(teta_hat_bayes-parmlamdha)
run
proc append data=workduga base=workduga1run
do h=1 to 30
data workdset workduga1if ^(u=ampx) then deleterundata workjackzinbamphset workdif u=ampxif kk=amph then deleterun
proc countreg data=jackzinbamphmodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workbetha_est_ZINBamph
17
Appendix 7 Syntax program EB with ZINB (continued)
(keep=Parameter Estimate)run
proc transpose data=workbetha_est_ZINBamph out=workbetha_est_ZINBtamphrun
data _null_set workbetha_est_ZINBtamphcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaZINBamphset workdmu_hat_b_amph=exp(ampIntercept + ampx1x1)mu_hat_b_amph= ampb_o- + ampb_1- x1w_b_amph=mu_hat_b_amph (mu_hat_b_amph + (amp_Alpha))teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2
g1_amph =((mu_hat_b_amph2ampalpha_)2)(ampalpha_+y_i)((mu_hat_b_amph2ampalpha_)+mu_hat_b_amph)2
g1_amph=(amp_Alpha+ypoi)((mu_hat_b_amph+amp_Alpha)2)
g1_amph =(A2)(ampk- + y_i)( a +mu_hat_b)2
beda_g_amph=g1_amph-g1run
data workmse_ZINB_jmerge workdugaZINB1 workdugaZINB2 workdugaZINB3 workdugaZINB4 workdugaZINB5 workdugaZINB6 workdugaZINB7 workdugaZINB8 workdugaZINB9 workdugaZINB10 workdugaZINB11 workdugaZINB12workdugaZINB13 workdugaZINB14 workdugaZINB15 workdugaZINB16 workdugaZINB17workdugaZINB18 workdugaZINB19 workdugaZINB20 workdugaZINB21 workdugaZINB22workdugaZINB23 workdugaZINB24 workdugaZINB25 workdugaZINB26 workdugaZINB27workdugaZINB28 workdugaZINB29 workdugaZINB30by kkrun
data workmse_ZINB_jset workmse_ZINB_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampj
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb
1
TABLE OF CONTENTS
PageINTRODUCTION 1
Background 1Objectives1
LITERATURE REVIEW 1Direct Estimation1Small Area Estimation 1Small Area Models1Empirical Bayes Methods 2Poisson-Gamma Models 2Negative Binomial Regression 2Over-disperse at Count Data 3Zero-Inflated Models3Zero-Inflated Negative Binomial 3Jackknife Method of Estimating MSE( EB
i )3
METHODOLOGY 4Data 4Methods4
RESULT AND DISCUSSION 4Estimation of Prior Parameter is Based of EB Method with Negative Binomial Regression 4Estimation of Prior Parameter is Based of EB Method with Zero-Inflated Negative Binomial Regression 5Comparison of EB estimator with Negative Binomial Regression and EB estimator with ZINB 5
CONCLUSION 5RECOMMENDATION 6REFERENCES 6
LIST OF TABLES
PageTable 1 MSE and RRMSE of EB Estimator with NBR 4Table 2 MSE (II) and RRMSE (II) of EB Estimator with NBR 4Table 3 MSE and RRMSE of EB Estimator with ZINB 5Table 4 MSE (II) and RRMSE (II) of EB Estimator with ZINB 5
LIST OF APPENDICES
PageAppendix 1 Result of EB estimation with NBR 7Appendix 2 Result of EB estimation with ZINB 8Appendix 3 Result of EB estimation (II) with NBR 9Appendix 4 Result of EB estimation (II) with ZINB 10Appendix 5 Syntax program for generate data 11Appendix 6 Syntax program EB with NBR 13Appendix 7 Syntax program EB with ZINB 16
1
INTRODUCTION
BackgroundDirect estimation is usually applied in big
scale survey but it is sometime difficult to utilize such estimator in a smaller region especially the sample size is too small In this case indirect estimation which adds covariates to estimate the parameter is usually used This type of estimation is broadly known as Small Area Estimation
Kismiantini (2007) conducted a research in Small Area Estimation based on Poisson-Gamma models Maximum Likelihood Estimation was used with Negative Binomial Regression techniques to estimate the respective prior parameter Moreover Negative Binomial Regression was used to resolve over-dispersion problem in the data
In reality count data is not onlycharacterized by over-dispersion but sometimes by excess-zero Excess-zero is a condition when the data contains too many zero or exceeds the distributionrsquos expectation 100 observations from Poisson model with response mean of 4 we could expect that there will be 2 zeros If the data have 30 zeros it should be obvious that the distributional assumptions have been violated Therefore the estimated parameter and standard error will be biased (Hardin amp Hilbe 2007) In this paper Zero-Inflated models were adapted to solve this type of problem
ObjectivesThe research objectives are
1 To investigate the performance of Negative Binomial Regression on Small Area Estimation in case of excess-zero
2 To apply Zero-Inflated Count Models on Small Area Estimation in case of excess-zero
3 To evaluate the performance of Zero-Inflated Count Models in estimating prior parameter for Small Area Estimation
LITERATURE REVIEW
Direct EstimationDirect estimates are generally ldquodesign
basedrdquo in the sense that they make use of ldquosurvey weightrdquo and associated inferences are based on the probability distribution by the sample design with the population values held fixed (Rao 2003) In particular direct estimates of a domain parameter are based only on the domain-specific sample data
Data from sample survey have been used to be a reliable estimate of parameter Ramsini et al (2001) mentioned that direct estimates of small area are unbiased although it would have big variance cause itrsquos small sample size
Small Area EstimationThe term of small area can be everything
depending on our object of interest It can be a city age group sex group region and rural district In general small area is used to denote any domain which the direct estimation with adequate precision can not be produced (Rao 2003) It happens because the sample size in small area is too small As a result direct estimation based on sampling design is not capable to produce direct estimation with adequate precision Furthermore small area estimation is developed as a statistic technique for estimating the parameter of small area This technique is used in effort to make estimation with adequate level of precision It works as indirect estimation that lend the strength of variable interest values from related areas through the use of supplementary information related to variable interest such as recent census count and current administrative records (Rao 2003) Indirect estimation is a process of estimating a domainrsquos parameter by connecting the information in that domain with another domain using an appropriate model So the estimator works by including other domainrsquos data (Kurnia amp Notodiputro 2006)
Small Area ModelsThere are two link models in indirect
estimation First traditional method based on implicit models that provide a link to relate small area through supplementary data Second explicit small area models that make specific allowance between area variations (Rao 2003) This research used the second model and it could be classified into two broad types of basic model1 Basic area level (type A) model
Basic area level model or aggregate model includes all models that relate small area with area-specific auxiliary variables These models are essential if unit (element) level data are not available Assuming parameter estimators
i is
related to area specific auxiliary data or covariate variables T
pii xxx )( 11 by
a linear model
2
iiT
ii vbx with i=1hellipm
iv ~N(0 2v ) are area-specific random
effect and Tp )( 1 is 1p vector of
regression coefficients Therefore ib are
known as positive constants For making inferences about
i direct estimators iy
are assumed available Accordingly assuming
iii ey where i=1hellipm with
sampling error ie ~N(0 ei2 ) and ei
2are known At the end both models are combined and as a result is new model
iiiT
ii evbxy where i=1hellipm
(Rao 2003)2 Basic unit level (type B) model
Unit level model includes all models that relate unit values of the study variable to unit-specific auxiliary variables Assuming unit-specific auxiliary variables T
ijpijij xxx )( 1 and
correspondingly a nested regression model
ijiT
ijij evxy where
i=1hellipm and j=1hellip in with
iv ~N(0 2v ) and also ie ~N(0 ei
2 )
Empirical Bayes MethodsThe Bayesian approach is based on Bayes
Law which was found by Thomas Bayes This law was introduced by Richard Proce in 1763 two years after Thomas Bayes passed away In 1774 and 1781 Laplace gave the details and relevancies for modern Bayesian statistics (Gill 2002 in Kismiantini 2007)
Novick in Good (1980) mentioned that Bayes method is difficult to adopt and sometimes is very sensitive due to the requirement of prior probability informationwhich is usually difficult to obtain Robbin (1955) introduced Empirical Bayes methods by assuming a particular prior distribution estimating based on the sample Rao (2003) said that EB (Empirical Bayes) and HB (Hierarchical Bayes) are compatible for binary and count data in Small Area Estimation Therefore EB method was used in this research
Rao (2003) summarized EB methods in Small Area Estimation as follows 1 Obtain the posterior probability density
function of the small area parameter2 Estimate the parameters from the
marginal density function
3 Use the estimated posterior density forinferences regarding the parameters ofinterest
Poisson-Gamma ModelsPoisson model is a standard model in
dealing with count data Generally count data can be suffered by over-dispersion problem Therefore a Poisson formula had been developed to accommodate extra variance from sample data Two-stage models have been introduced for count data known as mixed model Poisson-Gamma Wakefield (2006) introduced Poisson-Gamma model which was easier to use with SMR (Standard Mortality Ratio) as a direct estimator This study used Wakefield model with alteration in direct estimator
Let iy be a number of specific individual
at small area-i which has specific characteristic of interest and written as follow
j
iji yy
ijy are the-jth object at the-ith small area where
j=1hellipn and i=1hellipm
First stage )(~ ii
ind
i Poissony is assumed
where )( ii x describes a regression
model in area level ix is a vector of
covariates and Tpii)( is a vector of
regression coefficientsSecond stage distribution
)1(~ gammaiid
iis assumed as a prior
distribution with mean 1 and variance 1
Then the marginal distribution |iy is
negative binomialMoreover Wakefield (2006) used Bayes
Theorem and acquired posterior distributionas
)1(~|i
iii ygammay
and EB estimator as
iiiiB
iEB
i )ˆ1(ˆˆ)ˆˆ(ˆˆ
with )ˆˆ(ˆˆ iii ii y are direct
estimation from i and iy are the number of
observation
Negative Binomial Regression The negative binomial regression model
seems have been first discussed by Anscombe (1972) Others have pointed out its success indealing with over-dispersed count data
3
Lawless (1987) elaborated the mixture model parameterization of the negative binomial providing formulas for its log likelihood mean variance and moments Later Breslow (1990) cited Lawlessrsquo work and since its inception to the late 1980rsquos the negative binomial regression model have been construed as a mixture model that is useful for accommodating otherwise over-dispersed Poisson data (Hardin amp Hilbe 2007) The negative binomial distribution function is written as
yk
kk
k
y
kyxyg
)1()(
)()|(
where y=012hellip k and are negative
binomial parameter with )(yE and
ky 2)var( k mention as disperse
parameter which is shown that the data consist of over-dispersed
Over-disperse at Count DataCount data for Poisson regression
including by over-disperse if variance bigger than mean or if the expected value of variance is smaller than expected This phenomenon is written as
)()( ii yEyVar (McCullagh amp Nelder 1989)
Zero-Inflated ModelsZero-Inflated models consider two distinct
sources of zero outcomes One source is generated from individuals who do not enter into the counting process the other from those who do enter the count process but result in a zero outcome (Hardin amp Hilbe 2007)
Lambert (1992) first described this type of mixture model in the context of process control in manufacturing It has since been used in many applications and is now founddiscussed in nearly every book or article dealing with count response models
For the zero-inflated model the probability of observing a zero outcome equals the probability that an individual is in the always-zero group plus the probability that individual is not that group times the probability that the counting process produces a zero If )0(B as
the probability that the binary process result in a zero outcomes and )0Pr( as the probability
that the counting of a zero outcomes the probability of a zero outcome for the system is then given by (Hardin amp Hilbe 2007)
)0Pr()1()0()0Pr( ZBy The probability of a nonzero count is
)Pr()]0(1[)0Pr( kBkky This model would produce two groups of
parameter one is zero-inflation parameter which shown that the covariate significantly contribute to having a zero outcomes And the other parameter is negative binomial parameter which modeling the response with the covariate
Zero-Inflated Negative BinomialThere are many kinds of zero-inflated
model each model has plus and minus and is used in different type of data Zero-Inflated negative binomial is one kind of them This model is used in over-disperse and excess-zero data As a result among parameter estimators there would be k parameters which indicate that over-disperse occur in data just as disperse parameter in negative binomial regression
The probability distribution of this model is as follow
)|( iii xyYP )|0()(1)( iii xgxx )|()(1 iii xygx
Where is a function of iz ix are vector
of zero-inflated covariate and is a vector of
zero-inflated coefficient which will be estimated Meanwhile )|( ii xyg is probability
distribution of negative binomial written asiy
i
i
iii
iii y
yxyg
)1()(
)()|(
Mean and variance of ZINB are
))(1)(1()|(
)1()|(
iiiiii
iiii
xyV
xyE
Jackknife Method of Estimating MSE( EBi )
Jackknife methods is one of general methods used in survey because itrsquos unpretentious concept (Jiang Lahiri and Wan 2002) This methods have been known by Tukey (1958) and developed to be a method that capable to be bias corrected of estimator by remove observation-i for i=1hellipm and performs parameter estimation
Rao (2003) the Jackknife step to estimate MSE( EB
i ) are
1 Assume that )ˆˆ(ˆ iiEBi yk
)ˆˆ(ˆ111 ii
EBi yk then calculate
m
l
EBi
EBii m
mM 2
12 )ˆˆ(1ˆ
2 Calculate the delete-i estimator 1
ˆ
and
1 then calculate
4
)]ˆˆ()ˆˆ([1
)ˆˆ(ˆ111111 ii
m
miiiiii ygyg
m
mygM
And )ˆ( 21 vig is the variance estimator of
posterior distribution which is used to measure the variability associated with i
The use of )ˆ( 21 vig is leads to severe of
underestimation of )ˆ( EBiJMSE related
with estimation in prior parameter Therefore the estimator
iM1ˆ correct the
bias of )ˆ( 21 vig
3 Calculate the jackknife estimator of MSE( EB
i ) as
iiEBiJ MMMSE 21
ˆˆ)ˆ(
METHODOLOGY
DataThis research assumed that the available
auxiliary data is on area level so this research used basic area level model The data were simulated with 30 small areas and one covariate Every batch generated different conditions of excess-zeros data start from 01 until 09 probability of zero in small area This research assumed structure of relation between respond and covariate was linear
MethodsThe following steps in generating data
using SAS 91 were used1 Fix the value of
iX for the- i th area
2 Define the expected probability of zero in each small area ))0(( iYP then
calculate ))0(log( ii YPLambda
3 Generate )11(~ Gammai4 Calculate )log(
iiLambda 5 Fit linear regression between and
iX to
obtain0 and
16 Calculate )`exp(X= ii 7 Calculate
iiparmlambda 8 Generate )(~ parmlambdaPoissonyi
Moreover in analyzing data the following steps were applied 1 Generate the negative binomial regression
with genmod procedure in SAS 91 and Zero-Inflated Negative binomial Regression with countreg procedure in SAS 92
2 Estimate the prior parameter which are and
3 Estimate using EB method4 Calculate MSE for indirect estimation5 Calculate RRMSE (Root Relative Mean
Square Error)
i
ii
MSERRMSE
ˆ)ˆ(
)ˆ(
RESULT AND DISCUSSION
Estimation of Prior Parameter is Based on EB Method with Negative Binomial
RegressionIn case of non-excess-zero data the
estimator produced small and consistent MSE Meanwhile if the number of excess-zero isapproximately 30 or more with expected probability of zero 06 the performance of estimates tends to be unreliable As a result EB estimation produced negative values
RRMSE of the estimator increasessimultaneously along with the increase of number of zero in the data Furthermore if thedata contain excess zero at least 30 theestimator is unreliable
Table 1 MSE and RRMSE of EB Estimator with NBR
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 033 016 018 01302 035 020 026 02003 040 023 036 03004 042 027 050 04205 045 031 072 05906 -12875 033 -038 08107 253671 040 -1216 13508 -584495 030 30946 21109 39135606 016 116E+10 664
Table 2 MSE (II) and RRMSE (II) of EB Estimator with NBR
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 033 016 018 01302 035 020 026 02003 040 023 036 03004 042 027 050 04205 046 031 071 05806 26197 033 -035 07507 950007 040 -1002 09908 1444250 030 22054 11009 41595285 016 677E+09 056
5
Table 1 show that the iterative process produced unexpected negative values of MSEThe simplest way to solve this problem is tochange the negative value to zero MSE (II) and RRMSE (II) in table 2 are the result of MSE and RRMSE after the negative value of MSE has been changed to zero
When data have expected probability of zero by 06 to 09 mean of MSE (II) increases drastically Similarly mean of RRMSE (II) increases sharply when data have 08 to 09 expected probability of zero However when data have 06 to 07 expected probability of zero the mean of RRMSE (II) is negative due to the negative value of EB estimates
Estimation of Prior Parameter is Based on EB Method with Zero-Inflated
Negative Binomial RegressionThe EB estimates are similar to the
estimates produced by NBR method although they are slightly outperformed NBR method when the data only contain small number of zeros In particular as shown by table 3 if data have expected probability of zero by 01 to 05 ZINB produces bigger MSE for EB estimator than which NBR produces
Whereas if data have expected probability of zero by 06 to 07 ZINB gives better estimates The estimates were also unbiased as it covers parameter values adequately However ZINB begins to produce inconsistent estimates if data have expected probability of zero by 08 or more due to enormous MSE
Besides when data have expected is because ZINB generates small estimates which is close to the parameter values
Mean of MSE (II) with ZINB is biggerthan the mean of MSE with ZINB That is because when negative value of MSE changed to zero it doesnrsquot have reduction factor in the mean calculation
Comparison of EB estimator withNegative Binomial Regression and EB
estimator with ZINBEB estimates given by both NBR and
ZINB methods are similar for data with small numbers of zero However ZINB method produces bigger MSE than NBR do as long as expected probability of zero in data does not exceed 06 thresholds
But ZINB method performs better if data have expected probability of zero by 06 to 07 In this case EB estimates given by NBR method are unstable and inconsistent due to estimatesrsquo negative value and huge MSE that
can be thousand times larger than theiracceptable value On the other hand EB estimator with ZINB works well it givesunbiased estimates and its MSE values are more stable than EB estimates with NBR
Both methods would have performed poorly if data had expected probability of zero by 08 or more EB estimators with both methods were inconsistent as a result of very huge MSE values they produced
Table 3 MSE and RRMSE of EB Estimator with ZINB
Probability of zero
Mean of MSE
Median ofMSE
Mean ofRRMSE
Median of RRMSE
01 045 017 024 01402 043 020 033 02103 071 028 052 03204 054 028 0632 04205 086 033 7322807 06606 061 038 29817 10307 058 025 218119 19408 -128 -14E-07 162697 37509 2954790 -1E-06 35E+278 609508
Table 4 MSE (II) and RRMSE (II) of EB Estimator with ZINB
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 045 017 024 01402 0436 020 0324 02103 072 028 051 031104 055 028 061 04105 095 033 6561235 05806 075 038 23406 07007 150 025 134655 06908 175 0 733506 009 2954908 0 12E+278 0
CONCLUSION
Excess-zero in data highly influenced the result of EB estimation Conventional method such as negative binomial regression in prior estimation has produced unbiased and unreliable EB estimator for data with expected probability of zero by 06 This is shown bybig number of MSE and negative value of estimator
Meanwhile EB estimation by ZINB method produced more reliable estimator even when the data have expected probability of zero by 06 to 07
The ZINB has also provided a reliable estimator for data with less than 5333 of zeros This means that performance of ZINB
6
declines when the data have expected probability of zero by 08 or more As shown by the big MSE and inconsistent estimator
RECOMMENDATION
This research is based on many assumptions and suffered by several limitations If the assumptions and boundaries can be relaxed can be expected better result There are some recommendations for the next research1 The generating process in this research
does not reflect the real sampling processIf the generating process similar to the real sampling process it might give better result because it will be closer with the real application
2 It will be more interesting to runexperiment which takes account of larger number of areas since the number of areas will influence data modeling
3 The Restricted Maximum Likelihood maybe applied when estimating prior parameter with ZINB and NBR in other to solve the negative value of MSE
4 Theoretical research of ZINB and Empirical Bayes estimator is important to understand the behavior of parameter estimates of ZINB in Empirical Bayes setting
REFERENCES
Erdman D L Jackson A Sinko 2008 Zero-Inflated Poisson and Zero-Inflated Negative Binomial Models Using the COUNTREG Procedure SAS Global Forum 2008322-2008httpwww2sascomproceedingsforum2008322-2008pdf [25 Agustus 2008]
Famoye F KP Singh 2006 Zero-Inflated Generalized Poisson Regression Model with an Application to Domestic Violence Data Journal of Data Science 4117-130
Hardin JW JM Hilbe 2007 Generalized Linear Models and Extensions Texas A Stata Press Publication
Kurnia A KA Notodiputro 2006 Penerapan Metode Jackknife dalam pendugaan Area Kecil Forum Statistika dan Komputasi April 2006 p12-15
Kismiantini 2007 Pendugaan Statistik Area Kecil Berbasis Model Poisson-Gamma [Tesis] Bogor Institut Pertanian Bogor Fakultas Matematika dan Pengetahuan Alam
McCullagh P J A Nelder 1983 Generalized Linear Models London Chapmann and Hall
Ramsini B et all 2001 Uninsured Estimates by County A Review of Options and IssueshttpwwwodhohiogovDataOFHSurvofhsrfq7pdf [24 April 2008]
Rao JNK 2003 Small Area Estimation New York John Wiley amp Sons
Wakefield J 2006 Disease mapping and spatial regression with count data httpwwwbepresscomuwbiostatpaper286pdf [24 April 2008]
7
Appendix 1 Result of EB estimation with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE -100569 0194196 0271669 041875 045917 3239598RRMSE 0123605 0300339 0422426 0503566 0642418 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE -237643 0235733 0306641 0452955 05091 3652167RRMSE 0038956 0412708 0588924 0717336 0844735 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE -749011 0254097 0330078 -12875 0539873 2354887RRMSE -663045 051763 0813734 -038057 1287528 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE -7513075 0235378 0402092 2536714 0876569 6051162RRMSE -10741 0704796 1355566 -121606 3040291 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601
bias 0000395 0116669 0254473 1091172 0497898 6297454MSE -6E+09 -016583 0301527 -584495 5718409 185E+09RRMSE -212936 0927338 2115163 3094627 1359703 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655MSE -38E+09 -130817 0159682 39135606 3074073 12E+11
RRMSE -909131 1647188 6639631 116E+10 1585472 706E+11
8
Appendix 2 Result of EB estimation with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE -053954 010933 0168797 0449506 0369775 360843RRMSE 0022947 0096443 0136424 0238099 0241955 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE -07309 0126202 0201463 0425844 0414597 1734815RRMSE 0021807 0144983 0210692 0326097 0401786 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE -229891 0156942 0277017 0707983 0590466 7469014RRMSE 0023998 0210095 0317195 0519524 0618802 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE -125713 0181557 0284338 0540615 0498521 423089RRMSE 0054916 028362 0420396 0630776 0778033 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442
MSE -181856 0194818 0334706 0859252 0711939 7997074RRMSE 0026206 0387294 0662251 7322807 1312302 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE -34589 0078006 0376514 060793 0804116 3426488RRMSE 000461 0502807 1033578 2981671 2012552 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE -142213 -001433 0255331 0584152 1132152 264456RRMSE 0064209 0847956 1942286 2181192 4589042 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE -10651 -56E-05 -14E-07 -127819 1452962 1132741RRMSE 0063244 1475413 3754705 162697 9221163 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE -175652 -33E-05 -1E-06 2954790 152E-06 613E+08
RRMSE 0040681 4059441 6095076 35E+278 5569021 16E+281
9
Appendix 3 Result of EB estimation (II) with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE 0 0194196 0271669 0419181 045917 3239598RRMSE 0 0300116 0422209 0502895 0641904 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE 0 0235733 0306641 0456258 05091 3652167RRMSE 0 0410357 0585765 0712314 0841838 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE 0 0254097 0330078 2619677 0539873 2354887RRMSE -663045 0448118 0750369 -034911 1209918 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE 0 0235378 0402092 9500073 0876569 6051162RRMSE -10741 0288999 0995659 -100163 2527784 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601bias 0000395 0116669 0254473 1091172 0497898 6297454MSE 0 0 0301527 1444250 5718409 185E+09RRMSE -212936 0 1104113 2205437 5656681 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655
MSE 0 0 0159682 41595285 3074073 12E+11
RRMSE -909131 0 0557622 677E+09 9311925 706E+11
10
Appendix 4 Result of EB estimation (II) with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE 0 010933 0168797 0450626 0369775 360843RRMSE 0 0095932 0135647 023675 0239669 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE 0 0126202 0201463 0428006 0414597 1734815RRMSE 0 0142648 020709 0320663 0395479 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE 0 0156942 0277017 0716543 0590466 7469014RRMSE 0 0203913 0311937 0506882 0615401 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE 0 0181557 0284338 0549835 0498521 423089RRMSE 0 0270309 0405926 0606317 0766631 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442MSE 0 0194818 0334706 094973 0711939 7997074RRMSE 0 0316402 0576343 6561235 1240175 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE 0 0078006 0376514 0749436 0804116 3426488RRMSE 0 0258286 0698814 2340612 1714808 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE 0 0 0255331 1501268 1132152 264456RRMSE 0 0 0688797 1346552 2500825 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE 0 0 0 1755486 1452962 1132741RRMSE 0 0 0 7335062 3311711 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE 0 0 0 2954908 152E-06 613E+08
RRMSE 0 0 0 12E+278 416189 16E+281
11
Appendix 5 Syntax program for generate data
data b generate x1(covariate) and ei input x1cards0222831971100013131702314625252218171412202210run
macro bangkit_datado r=1 to 100
data egenerate poisson-gamma with excess zerodo kk=1 to 30set btetha = rangam(11)lambda = -log(01) peluang munculnya nilai nol yang diinginkan (01-09)starlambda = log(lambdatetha)output endrun
proc regmodel starlambda = x1 ods output ParameterEstimates=workbetha_lr (keep=Parameter Estimate)run
proc transpose data=workbetha_lr out=workbetha_lr_t
12
Appendix 5 Syntax program for generate data (continued)
rundata _null_set workbetha_lr_tcall symput (Intercept col1)call symput (x1 col2)run
data ddo kk=1 to 30set emu = exp(ampIntercept + ampx1x1)parmlambda = mutethaypoi = rand(poissonparmlambda)output endrun
ods trace onto take percent zero on dataproc freq data=dtables ypoi ods output OneWayFreqs=workzerorundata zeroset zerokeep percentrunproc transpose data=zero out=zero1 rundata _null_set workzero1call symput (pctz col1)rundata dset dpzero=amppctzr=amprrun
proc append data=d base=d1run
endmend
bangkit_data
13
Appendix 6 Syntax program EB with NBR
macro sae_nbdo x=1 to 900
data workaset workeif ^(u=ampx) then deleterun
this genmod procedure estimates the response without zero-inflation proc genmod data=amodel ypoi = x1 dist=nb link=logods output ParameterEstimates=workbetha_nb (keep=Parameter Estimate)run
proc transpose data=workbetha_nb out=workbetha_nb_trun
data _null_set workbetha_nb_tcall symput (Intercept col1)call symput (x1 col2)call symput (Dispersion col3)run
EB with negbin-regdata workduga_nbset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + ampDispersion)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(ampDispersion+ypoi)((mu_hat_b+ampDispersion)2)bias_b=abs(teta_hat_bayes-parmlambda)run
proc append data=workduga_nb base=workduga_nb1run
jacknifedo h=1 to 30
data workdset workduga_nb1if ^(u=ampx) then deleterundata workjacknbamphset workdif u=ampxif kk=amph then deleterun
proc genmod data=workjacknbamph output p out=sasyi_estmodel ypoi = x1 dist = nb link=logods output parameterestimates=workbetha_est_nbamph (keep=parameter Estimate)
14
Appendix 6 Syntax program EB with NBR (continued)
runproc transpose data=workbetha_est_nbamph out=workbetha_est_nbtamphrundata _null_set workbetha_est_nbtamphcall symput (Intercept_ col1)call symput (x1_ col2)call symput (Dispersion_ col3)run
data workduganbamphset workdmu_hat_b_amph=exp(ampIntercept_ + ampx1_x1)w_b_amph=mu_hat_b_amph (mu_hat_b_amph + ampDispersion_)teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2g1_amph=(ampDispersion_+ypoi)((mu_hat_b_amph+ampDispersion_)2)beda_g_amph=g1_amph-g1run
data workmse_nb_jmerge workduganb1 workduganb2 workduganb3 workduganb4 workduganb5 workduganb6 workduganb7 workduganb8 workduganb9 workduganb10 workduganb11 workduganb12workduganb13 workduganb14 workduganb15 workduganb16 workduganb17workduganb18 workduganb19 workduganb20 workduganb21 workduganb22workduganb23 workduganb24 workduganb25 workduganb26 workduganb27workduganb28 workduganb29 workduganb30by kkrun
data workmse_nb_jset workmse_nb_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampjendm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesul = ampxrun
proc append data=workmse_nb_j base=workmse_nb_j1run
data workhasilnbmerge workd workmse_nb_j keep kk x1 tetha mu parmlambda ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_b
15
Appendix 6 Syntax program EB with NBR (continued)
run
ods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilnb BASE=workhasilnb1 appendver=v6run
ENDmend
sae_nb
16
Appendix 7 Syntax program EB with ZINB
macro sae_zinb
do x=1 to 900
data workaset work eif ^(u=ampx) then deleterun
proc countreg data=amodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workpe(keep=Parameter Estimate)run
proc transpose data=workpe out=workpe_trun
data _null_set workpe_tcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + amp_Alpha)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(amp_Alpha+ypoi)((mu_hat_b+amp_Alpha)2)bias_b=abs(teta_hat_bayes-parmlamdha)
run
proc append data=workduga base=workduga1run
do h=1 to 30
data workdset workduga1if ^(u=ampx) then deleterundata workjackzinbamphset workdif u=ampxif kk=amph then deleterun
proc countreg data=jackzinbamphmodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workbetha_est_ZINBamph
17
Appendix 7 Syntax program EB with ZINB (continued)
(keep=Parameter Estimate)run
proc transpose data=workbetha_est_ZINBamph out=workbetha_est_ZINBtamphrun
data _null_set workbetha_est_ZINBtamphcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaZINBamphset workdmu_hat_b_amph=exp(ampIntercept + ampx1x1)mu_hat_b_amph= ampb_o- + ampb_1- x1w_b_amph=mu_hat_b_amph (mu_hat_b_amph + (amp_Alpha))teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2
g1_amph =((mu_hat_b_amph2ampalpha_)2)(ampalpha_+y_i)((mu_hat_b_amph2ampalpha_)+mu_hat_b_amph)2
g1_amph=(amp_Alpha+ypoi)((mu_hat_b_amph+amp_Alpha)2)
g1_amph =(A2)(ampk- + y_i)( a +mu_hat_b)2
beda_g_amph=g1_amph-g1run
data workmse_ZINB_jmerge workdugaZINB1 workdugaZINB2 workdugaZINB3 workdugaZINB4 workdugaZINB5 workdugaZINB6 workdugaZINB7 workdugaZINB8 workdugaZINB9 workdugaZINB10 workdugaZINB11 workdugaZINB12workdugaZINB13 workdugaZINB14 workdugaZINB15 workdugaZINB16 workdugaZINB17workdugaZINB18 workdugaZINB19 workdugaZINB20 workdugaZINB21 workdugaZINB22workdugaZINB23 workdugaZINB24 workdugaZINB25 workdugaZINB26 workdugaZINB27workdugaZINB28 workdugaZINB29 workdugaZINB30by kkrun
data workmse_ZINB_jset workmse_ZINB_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampj
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb
1
INTRODUCTION
BackgroundDirect estimation is usually applied in big
scale survey but it is sometime difficult to utilize such estimator in a smaller region especially the sample size is too small In this case indirect estimation which adds covariates to estimate the parameter is usually used This type of estimation is broadly known as Small Area Estimation
Kismiantini (2007) conducted a research in Small Area Estimation based on Poisson-Gamma models Maximum Likelihood Estimation was used with Negative Binomial Regression techniques to estimate the respective prior parameter Moreover Negative Binomial Regression was used to resolve over-dispersion problem in the data
In reality count data is not onlycharacterized by over-dispersion but sometimes by excess-zero Excess-zero is a condition when the data contains too many zero or exceeds the distributionrsquos expectation 100 observations from Poisson model with response mean of 4 we could expect that there will be 2 zeros If the data have 30 zeros it should be obvious that the distributional assumptions have been violated Therefore the estimated parameter and standard error will be biased (Hardin amp Hilbe 2007) In this paper Zero-Inflated models were adapted to solve this type of problem
ObjectivesThe research objectives are
1 To investigate the performance of Negative Binomial Regression on Small Area Estimation in case of excess-zero
2 To apply Zero-Inflated Count Models on Small Area Estimation in case of excess-zero
3 To evaluate the performance of Zero-Inflated Count Models in estimating prior parameter for Small Area Estimation
LITERATURE REVIEW
Direct EstimationDirect estimates are generally ldquodesign
basedrdquo in the sense that they make use of ldquosurvey weightrdquo and associated inferences are based on the probability distribution by the sample design with the population values held fixed (Rao 2003) In particular direct estimates of a domain parameter are based only on the domain-specific sample data
Data from sample survey have been used to be a reliable estimate of parameter Ramsini et al (2001) mentioned that direct estimates of small area are unbiased although it would have big variance cause itrsquos small sample size
Small Area EstimationThe term of small area can be everything
depending on our object of interest It can be a city age group sex group region and rural district In general small area is used to denote any domain which the direct estimation with adequate precision can not be produced (Rao 2003) It happens because the sample size in small area is too small As a result direct estimation based on sampling design is not capable to produce direct estimation with adequate precision Furthermore small area estimation is developed as a statistic technique for estimating the parameter of small area This technique is used in effort to make estimation with adequate level of precision It works as indirect estimation that lend the strength of variable interest values from related areas through the use of supplementary information related to variable interest such as recent census count and current administrative records (Rao 2003) Indirect estimation is a process of estimating a domainrsquos parameter by connecting the information in that domain with another domain using an appropriate model So the estimator works by including other domainrsquos data (Kurnia amp Notodiputro 2006)
Small Area ModelsThere are two link models in indirect
estimation First traditional method based on implicit models that provide a link to relate small area through supplementary data Second explicit small area models that make specific allowance between area variations (Rao 2003) This research used the second model and it could be classified into two broad types of basic model1 Basic area level (type A) model
Basic area level model or aggregate model includes all models that relate small area with area-specific auxiliary variables These models are essential if unit (element) level data are not available Assuming parameter estimators
i is
related to area specific auxiliary data or covariate variables T
pii xxx )( 11 by
a linear model
2
iiT
ii vbx with i=1hellipm
iv ~N(0 2v ) are area-specific random
effect and Tp )( 1 is 1p vector of
regression coefficients Therefore ib are
known as positive constants For making inferences about
i direct estimators iy
are assumed available Accordingly assuming
iii ey where i=1hellipm with
sampling error ie ~N(0 ei2 ) and ei
2are known At the end both models are combined and as a result is new model
iiiT
ii evbxy where i=1hellipm
(Rao 2003)2 Basic unit level (type B) model
Unit level model includes all models that relate unit values of the study variable to unit-specific auxiliary variables Assuming unit-specific auxiliary variables T
ijpijij xxx )( 1 and
correspondingly a nested regression model
ijiT
ijij evxy where
i=1hellipm and j=1hellip in with
iv ~N(0 2v ) and also ie ~N(0 ei
2 )
Empirical Bayes MethodsThe Bayesian approach is based on Bayes
Law which was found by Thomas Bayes This law was introduced by Richard Proce in 1763 two years after Thomas Bayes passed away In 1774 and 1781 Laplace gave the details and relevancies for modern Bayesian statistics (Gill 2002 in Kismiantini 2007)
Novick in Good (1980) mentioned that Bayes method is difficult to adopt and sometimes is very sensitive due to the requirement of prior probability informationwhich is usually difficult to obtain Robbin (1955) introduced Empirical Bayes methods by assuming a particular prior distribution estimating based on the sample Rao (2003) said that EB (Empirical Bayes) and HB (Hierarchical Bayes) are compatible for binary and count data in Small Area Estimation Therefore EB method was used in this research
Rao (2003) summarized EB methods in Small Area Estimation as follows 1 Obtain the posterior probability density
function of the small area parameter2 Estimate the parameters from the
marginal density function
3 Use the estimated posterior density forinferences regarding the parameters ofinterest
Poisson-Gamma ModelsPoisson model is a standard model in
dealing with count data Generally count data can be suffered by over-dispersion problem Therefore a Poisson formula had been developed to accommodate extra variance from sample data Two-stage models have been introduced for count data known as mixed model Poisson-Gamma Wakefield (2006) introduced Poisson-Gamma model which was easier to use with SMR (Standard Mortality Ratio) as a direct estimator This study used Wakefield model with alteration in direct estimator
Let iy be a number of specific individual
at small area-i which has specific characteristic of interest and written as follow
j
iji yy
ijy are the-jth object at the-ith small area where
j=1hellipn and i=1hellipm
First stage )(~ ii
ind
i Poissony is assumed
where )( ii x describes a regression
model in area level ix is a vector of
covariates and Tpii)( is a vector of
regression coefficientsSecond stage distribution
)1(~ gammaiid
iis assumed as a prior
distribution with mean 1 and variance 1
Then the marginal distribution |iy is
negative binomialMoreover Wakefield (2006) used Bayes
Theorem and acquired posterior distributionas
)1(~|i
iii ygammay
and EB estimator as
iiiiB
iEB
i )ˆ1(ˆˆ)ˆˆ(ˆˆ
with )ˆˆ(ˆˆ iii ii y are direct
estimation from i and iy are the number of
observation
Negative Binomial Regression The negative binomial regression model
seems have been first discussed by Anscombe (1972) Others have pointed out its success indealing with over-dispersed count data
3
Lawless (1987) elaborated the mixture model parameterization of the negative binomial providing formulas for its log likelihood mean variance and moments Later Breslow (1990) cited Lawlessrsquo work and since its inception to the late 1980rsquos the negative binomial regression model have been construed as a mixture model that is useful for accommodating otherwise over-dispersed Poisson data (Hardin amp Hilbe 2007) The negative binomial distribution function is written as
yk
kk
k
y
kyxyg
)1()(
)()|(
where y=012hellip k and are negative
binomial parameter with )(yE and
ky 2)var( k mention as disperse
parameter which is shown that the data consist of over-dispersed
Over-disperse at Count DataCount data for Poisson regression
including by over-disperse if variance bigger than mean or if the expected value of variance is smaller than expected This phenomenon is written as
)()( ii yEyVar (McCullagh amp Nelder 1989)
Zero-Inflated ModelsZero-Inflated models consider two distinct
sources of zero outcomes One source is generated from individuals who do not enter into the counting process the other from those who do enter the count process but result in a zero outcome (Hardin amp Hilbe 2007)
Lambert (1992) first described this type of mixture model in the context of process control in manufacturing It has since been used in many applications and is now founddiscussed in nearly every book or article dealing with count response models
For the zero-inflated model the probability of observing a zero outcome equals the probability that an individual is in the always-zero group plus the probability that individual is not that group times the probability that the counting process produces a zero If )0(B as
the probability that the binary process result in a zero outcomes and )0Pr( as the probability
that the counting of a zero outcomes the probability of a zero outcome for the system is then given by (Hardin amp Hilbe 2007)
)0Pr()1()0()0Pr( ZBy The probability of a nonzero count is
)Pr()]0(1[)0Pr( kBkky This model would produce two groups of
parameter one is zero-inflation parameter which shown that the covariate significantly contribute to having a zero outcomes And the other parameter is negative binomial parameter which modeling the response with the covariate
Zero-Inflated Negative BinomialThere are many kinds of zero-inflated
model each model has plus and minus and is used in different type of data Zero-Inflated negative binomial is one kind of them This model is used in over-disperse and excess-zero data As a result among parameter estimators there would be k parameters which indicate that over-disperse occur in data just as disperse parameter in negative binomial regression
The probability distribution of this model is as follow
)|( iii xyYP )|0()(1)( iii xgxx )|()(1 iii xygx
Where is a function of iz ix are vector
of zero-inflated covariate and is a vector of
zero-inflated coefficient which will be estimated Meanwhile )|( ii xyg is probability
distribution of negative binomial written asiy
i
i
iii
iii y
yxyg
)1()(
)()|(
Mean and variance of ZINB are
))(1)(1()|(
)1()|(
iiiiii
iiii
xyV
xyE
Jackknife Method of Estimating MSE( EBi )
Jackknife methods is one of general methods used in survey because itrsquos unpretentious concept (Jiang Lahiri and Wan 2002) This methods have been known by Tukey (1958) and developed to be a method that capable to be bias corrected of estimator by remove observation-i for i=1hellipm and performs parameter estimation
Rao (2003) the Jackknife step to estimate MSE( EB
i ) are
1 Assume that )ˆˆ(ˆ iiEBi yk
)ˆˆ(ˆ111 ii
EBi yk then calculate
m
l
EBi
EBii m
mM 2
12 )ˆˆ(1ˆ
2 Calculate the delete-i estimator 1
ˆ
and
1 then calculate
4
)]ˆˆ()ˆˆ([1
)ˆˆ(ˆ111111 ii
m
miiiiii ygyg
m
mygM
And )ˆ( 21 vig is the variance estimator of
posterior distribution which is used to measure the variability associated with i
The use of )ˆ( 21 vig is leads to severe of
underestimation of )ˆ( EBiJMSE related
with estimation in prior parameter Therefore the estimator
iM1ˆ correct the
bias of )ˆ( 21 vig
3 Calculate the jackknife estimator of MSE( EB
i ) as
iiEBiJ MMMSE 21
ˆˆ)ˆ(
METHODOLOGY
DataThis research assumed that the available
auxiliary data is on area level so this research used basic area level model The data were simulated with 30 small areas and one covariate Every batch generated different conditions of excess-zeros data start from 01 until 09 probability of zero in small area This research assumed structure of relation between respond and covariate was linear
MethodsThe following steps in generating data
using SAS 91 were used1 Fix the value of
iX for the- i th area
2 Define the expected probability of zero in each small area ))0(( iYP then
calculate ))0(log( ii YPLambda
3 Generate )11(~ Gammai4 Calculate )log(
iiLambda 5 Fit linear regression between and
iX to
obtain0 and
16 Calculate )`exp(X= ii 7 Calculate
iiparmlambda 8 Generate )(~ parmlambdaPoissonyi
Moreover in analyzing data the following steps were applied 1 Generate the negative binomial regression
with genmod procedure in SAS 91 and Zero-Inflated Negative binomial Regression with countreg procedure in SAS 92
2 Estimate the prior parameter which are and
3 Estimate using EB method4 Calculate MSE for indirect estimation5 Calculate RRMSE (Root Relative Mean
Square Error)
i
ii
MSERRMSE
ˆ)ˆ(
)ˆ(
RESULT AND DISCUSSION
Estimation of Prior Parameter is Based on EB Method with Negative Binomial
RegressionIn case of non-excess-zero data the
estimator produced small and consistent MSE Meanwhile if the number of excess-zero isapproximately 30 or more with expected probability of zero 06 the performance of estimates tends to be unreliable As a result EB estimation produced negative values
RRMSE of the estimator increasessimultaneously along with the increase of number of zero in the data Furthermore if thedata contain excess zero at least 30 theestimator is unreliable
Table 1 MSE and RRMSE of EB Estimator with NBR
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 033 016 018 01302 035 020 026 02003 040 023 036 03004 042 027 050 04205 045 031 072 05906 -12875 033 -038 08107 253671 040 -1216 13508 -584495 030 30946 21109 39135606 016 116E+10 664
Table 2 MSE (II) and RRMSE (II) of EB Estimator with NBR
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 033 016 018 01302 035 020 026 02003 040 023 036 03004 042 027 050 04205 046 031 071 05806 26197 033 -035 07507 950007 040 -1002 09908 1444250 030 22054 11009 41595285 016 677E+09 056
5
Table 1 show that the iterative process produced unexpected negative values of MSEThe simplest way to solve this problem is tochange the negative value to zero MSE (II) and RRMSE (II) in table 2 are the result of MSE and RRMSE after the negative value of MSE has been changed to zero
When data have expected probability of zero by 06 to 09 mean of MSE (II) increases drastically Similarly mean of RRMSE (II) increases sharply when data have 08 to 09 expected probability of zero However when data have 06 to 07 expected probability of zero the mean of RRMSE (II) is negative due to the negative value of EB estimates
Estimation of Prior Parameter is Based on EB Method with Zero-Inflated
Negative Binomial RegressionThe EB estimates are similar to the
estimates produced by NBR method although they are slightly outperformed NBR method when the data only contain small number of zeros In particular as shown by table 3 if data have expected probability of zero by 01 to 05 ZINB produces bigger MSE for EB estimator than which NBR produces
Whereas if data have expected probability of zero by 06 to 07 ZINB gives better estimates The estimates were also unbiased as it covers parameter values adequately However ZINB begins to produce inconsistent estimates if data have expected probability of zero by 08 or more due to enormous MSE
Besides when data have expected is because ZINB generates small estimates which is close to the parameter values
Mean of MSE (II) with ZINB is biggerthan the mean of MSE with ZINB That is because when negative value of MSE changed to zero it doesnrsquot have reduction factor in the mean calculation
Comparison of EB estimator withNegative Binomial Regression and EB
estimator with ZINBEB estimates given by both NBR and
ZINB methods are similar for data with small numbers of zero However ZINB method produces bigger MSE than NBR do as long as expected probability of zero in data does not exceed 06 thresholds
But ZINB method performs better if data have expected probability of zero by 06 to 07 In this case EB estimates given by NBR method are unstable and inconsistent due to estimatesrsquo negative value and huge MSE that
can be thousand times larger than theiracceptable value On the other hand EB estimator with ZINB works well it givesunbiased estimates and its MSE values are more stable than EB estimates with NBR
Both methods would have performed poorly if data had expected probability of zero by 08 or more EB estimators with both methods were inconsistent as a result of very huge MSE values they produced
Table 3 MSE and RRMSE of EB Estimator with ZINB
Probability of zero
Mean of MSE
Median ofMSE
Mean ofRRMSE
Median of RRMSE
01 045 017 024 01402 043 020 033 02103 071 028 052 03204 054 028 0632 04205 086 033 7322807 06606 061 038 29817 10307 058 025 218119 19408 -128 -14E-07 162697 37509 2954790 -1E-06 35E+278 609508
Table 4 MSE (II) and RRMSE (II) of EB Estimator with ZINB
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 045 017 024 01402 0436 020 0324 02103 072 028 051 031104 055 028 061 04105 095 033 6561235 05806 075 038 23406 07007 150 025 134655 06908 175 0 733506 009 2954908 0 12E+278 0
CONCLUSION
Excess-zero in data highly influenced the result of EB estimation Conventional method such as negative binomial regression in prior estimation has produced unbiased and unreliable EB estimator for data with expected probability of zero by 06 This is shown bybig number of MSE and negative value of estimator
Meanwhile EB estimation by ZINB method produced more reliable estimator even when the data have expected probability of zero by 06 to 07
The ZINB has also provided a reliable estimator for data with less than 5333 of zeros This means that performance of ZINB
6
declines when the data have expected probability of zero by 08 or more As shown by the big MSE and inconsistent estimator
RECOMMENDATION
This research is based on many assumptions and suffered by several limitations If the assumptions and boundaries can be relaxed can be expected better result There are some recommendations for the next research1 The generating process in this research
does not reflect the real sampling processIf the generating process similar to the real sampling process it might give better result because it will be closer with the real application
2 It will be more interesting to runexperiment which takes account of larger number of areas since the number of areas will influence data modeling
3 The Restricted Maximum Likelihood maybe applied when estimating prior parameter with ZINB and NBR in other to solve the negative value of MSE
4 Theoretical research of ZINB and Empirical Bayes estimator is important to understand the behavior of parameter estimates of ZINB in Empirical Bayes setting
REFERENCES
Erdman D L Jackson A Sinko 2008 Zero-Inflated Poisson and Zero-Inflated Negative Binomial Models Using the COUNTREG Procedure SAS Global Forum 2008322-2008httpwww2sascomproceedingsforum2008322-2008pdf [25 Agustus 2008]
Famoye F KP Singh 2006 Zero-Inflated Generalized Poisson Regression Model with an Application to Domestic Violence Data Journal of Data Science 4117-130
Hardin JW JM Hilbe 2007 Generalized Linear Models and Extensions Texas A Stata Press Publication
Kurnia A KA Notodiputro 2006 Penerapan Metode Jackknife dalam pendugaan Area Kecil Forum Statistika dan Komputasi April 2006 p12-15
Kismiantini 2007 Pendugaan Statistik Area Kecil Berbasis Model Poisson-Gamma [Tesis] Bogor Institut Pertanian Bogor Fakultas Matematika dan Pengetahuan Alam
McCullagh P J A Nelder 1983 Generalized Linear Models London Chapmann and Hall
Ramsini B et all 2001 Uninsured Estimates by County A Review of Options and IssueshttpwwwodhohiogovDataOFHSurvofhsrfq7pdf [24 April 2008]
Rao JNK 2003 Small Area Estimation New York John Wiley amp Sons
Wakefield J 2006 Disease mapping and spatial regression with count data httpwwwbepresscomuwbiostatpaper286pdf [24 April 2008]
7
Appendix 1 Result of EB estimation with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE -100569 0194196 0271669 041875 045917 3239598RRMSE 0123605 0300339 0422426 0503566 0642418 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE -237643 0235733 0306641 0452955 05091 3652167RRMSE 0038956 0412708 0588924 0717336 0844735 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE -749011 0254097 0330078 -12875 0539873 2354887RRMSE -663045 051763 0813734 -038057 1287528 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE -7513075 0235378 0402092 2536714 0876569 6051162RRMSE -10741 0704796 1355566 -121606 3040291 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601
bias 0000395 0116669 0254473 1091172 0497898 6297454MSE -6E+09 -016583 0301527 -584495 5718409 185E+09RRMSE -212936 0927338 2115163 3094627 1359703 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655MSE -38E+09 -130817 0159682 39135606 3074073 12E+11
RRMSE -909131 1647188 6639631 116E+10 1585472 706E+11
8
Appendix 2 Result of EB estimation with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE -053954 010933 0168797 0449506 0369775 360843RRMSE 0022947 0096443 0136424 0238099 0241955 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE -07309 0126202 0201463 0425844 0414597 1734815RRMSE 0021807 0144983 0210692 0326097 0401786 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE -229891 0156942 0277017 0707983 0590466 7469014RRMSE 0023998 0210095 0317195 0519524 0618802 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE -125713 0181557 0284338 0540615 0498521 423089RRMSE 0054916 028362 0420396 0630776 0778033 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442
MSE -181856 0194818 0334706 0859252 0711939 7997074RRMSE 0026206 0387294 0662251 7322807 1312302 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE -34589 0078006 0376514 060793 0804116 3426488RRMSE 000461 0502807 1033578 2981671 2012552 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE -142213 -001433 0255331 0584152 1132152 264456RRMSE 0064209 0847956 1942286 2181192 4589042 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE -10651 -56E-05 -14E-07 -127819 1452962 1132741RRMSE 0063244 1475413 3754705 162697 9221163 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE -175652 -33E-05 -1E-06 2954790 152E-06 613E+08
RRMSE 0040681 4059441 6095076 35E+278 5569021 16E+281
9
Appendix 3 Result of EB estimation (II) with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE 0 0194196 0271669 0419181 045917 3239598RRMSE 0 0300116 0422209 0502895 0641904 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE 0 0235733 0306641 0456258 05091 3652167RRMSE 0 0410357 0585765 0712314 0841838 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE 0 0254097 0330078 2619677 0539873 2354887RRMSE -663045 0448118 0750369 -034911 1209918 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE 0 0235378 0402092 9500073 0876569 6051162RRMSE -10741 0288999 0995659 -100163 2527784 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601bias 0000395 0116669 0254473 1091172 0497898 6297454MSE 0 0 0301527 1444250 5718409 185E+09RRMSE -212936 0 1104113 2205437 5656681 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655
MSE 0 0 0159682 41595285 3074073 12E+11
RRMSE -909131 0 0557622 677E+09 9311925 706E+11
10
Appendix 4 Result of EB estimation (II) with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE 0 010933 0168797 0450626 0369775 360843RRMSE 0 0095932 0135647 023675 0239669 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE 0 0126202 0201463 0428006 0414597 1734815RRMSE 0 0142648 020709 0320663 0395479 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE 0 0156942 0277017 0716543 0590466 7469014RRMSE 0 0203913 0311937 0506882 0615401 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE 0 0181557 0284338 0549835 0498521 423089RRMSE 0 0270309 0405926 0606317 0766631 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442MSE 0 0194818 0334706 094973 0711939 7997074RRMSE 0 0316402 0576343 6561235 1240175 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE 0 0078006 0376514 0749436 0804116 3426488RRMSE 0 0258286 0698814 2340612 1714808 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE 0 0 0255331 1501268 1132152 264456RRMSE 0 0 0688797 1346552 2500825 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE 0 0 0 1755486 1452962 1132741RRMSE 0 0 0 7335062 3311711 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE 0 0 0 2954908 152E-06 613E+08
RRMSE 0 0 0 12E+278 416189 16E+281
11
Appendix 5 Syntax program for generate data
data b generate x1(covariate) and ei input x1cards0222831971100013131702314625252218171412202210run
macro bangkit_datado r=1 to 100
data egenerate poisson-gamma with excess zerodo kk=1 to 30set btetha = rangam(11)lambda = -log(01) peluang munculnya nilai nol yang diinginkan (01-09)starlambda = log(lambdatetha)output endrun
proc regmodel starlambda = x1 ods output ParameterEstimates=workbetha_lr (keep=Parameter Estimate)run
proc transpose data=workbetha_lr out=workbetha_lr_t
12
Appendix 5 Syntax program for generate data (continued)
rundata _null_set workbetha_lr_tcall symput (Intercept col1)call symput (x1 col2)run
data ddo kk=1 to 30set emu = exp(ampIntercept + ampx1x1)parmlambda = mutethaypoi = rand(poissonparmlambda)output endrun
ods trace onto take percent zero on dataproc freq data=dtables ypoi ods output OneWayFreqs=workzerorundata zeroset zerokeep percentrunproc transpose data=zero out=zero1 rundata _null_set workzero1call symput (pctz col1)rundata dset dpzero=amppctzr=amprrun
proc append data=d base=d1run
endmend
bangkit_data
13
Appendix 6 Syntax program EB with NBR
macro sae_nbdo x=1 to 900
data workaset workeif ^(u=ampx) then deleterun
this genmod procedure estimates the response without zero-inflation proc genmod data=amodel ypoi = x1 dist=nb link=logods output ParameterEstimates=workbetha_nb (keep=Parameter Estimate)run
proc transpose data=workbetha_nb out=workbetha_nb_trun
data _null_set workbetha_nb_tcall symput (Intercept col1)call symput (x1 col2)call symput (Dispersion col3)run
EB with negbin-regdata workduga_nbset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + ampDispersion)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(ampDispersion+ypoi)((mu_hat_b+ampDispersion)2)bias_b=abs(teta_hat_bayes-parmlambda)run
proc append data=workduga_nb base=workduga_nb1run
jacknifedo h=1 to 30
data workdset workduga_nb1if ^(u=ampx) then deleterundata workjacknbamphset workdif u=ampxif kk=amph then deleterun
proc genmod data=workjacknbamph output p out=sasyi_estmodel ypoi = x1 dist = nb link=logods output parameterestimates=workbetha_est_nbamph (keep=parameter Estimate)
14
Appendix 6 Syntax program EB with NBR (continued)
runproc transpose data=workbetha_est_nbamph out=workbetha_est_nbtamphrundata _null_set workbetha_est_nbtamphcall symput (Intercept_ col1)call symput (x1_ col2)call symput (Dispersion_ col3)run
data workduganbamphset workdmu_hat_b_amph=exp(ampIntercept_ + ampx1_x1)w_b_amph=mu_hat_b_amph (mu_hat_b_amph + ampDispersion_)teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2g1_amph=(ampDispersion_+ypoi)((mu_hat_b_amph+ampDispersion_)2)beda_g_amph=g1_amph-g1run
data workmse_nb_jmerge workduganb1 workduganb2 workduganb3 workduganb4 workduganb5 workduganb6 workduganb7 workduganb8 workduganb9 workduganb10 workduganb11 workduganb12workduganb13 workduganb14 workduganb15 workduganb16 workduganb17workduganb18 workduganb19 workduganb20 workduganb21 workduganb22workduganb23 workduganb24 workduganb25 workduganb26 workduganb27workduganb28 workduganb29 workduganb30by kkrun
data workmse_nb_jset workmse_nb_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampjendm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesul = ampxrun
proc append data=workmse_nb_j base=workmse_nb_j1run
data workhasilnbmerge workd workmse_nb_j keep kk x1 tetha mu parmlambda ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_b
15
Appendix 6 Syntax program EB with NBR (continued)
run
ods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilnb BASE=workhasilnb1 appendver=v6run
ENDmend
sae_nb
16
Appendix 7 Syntax program EB with ZINB
macro sae_zinb
do x=1 to 900
data workaset work eif ^(u=ampx) then deleterun
proc countreg data=amodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workpe(keep=Parameter Estimate)run
proc transpose data=workpe out=workpe_trun
data _null_set workpe_tcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + amp_Alpha)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(amp_Alpha+ypoi)((mu_hat_b+amp_Alpha)2)bias_b=abs(teta_hat_bayes-parmlamdha)
run
proc append data=workduga base=workduga1run
do h=1 to 30
data workdset workduga1if ^(u=ampx) then deleterundata workjackzinbamphset workdif u=ampxif kk=amph then deleterun
proc countreg data=jackzinbamphmodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workbetha_est_ZINBamph
17
Appendix 7 Syntax program EB with ZINB (continued)
(keep=Parameter Estimate)run
proc transpose data=workbetha_est_ZINBamph out=workbetha_est_ZINBtamphrun
data _null_set workbetha_est_ZINBtamphcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaZINBamphset workdmu_hat_b_amph=exp(ampIntercept + ampx1x1)mu_hat_b_amph= ampb_o- + ampb_1- x1w_b_amph=mu_hat_b_amph (mu_hat_b_amph + (amp_Alpha))teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2
g1_amph =((mu_hat_b_amph2ampalpha_)2)(ampalpha_+y_i)((mu_hat_b_amph2ampalpha_)+mu_hat_b_amph)2
g1_amph=(amp_Alpha+ypoi)((mu_hat_b_amph+amp_Alpha)2)
g1_amph =(A2)(ampk- + y_i)( a +mu_hat_b)2
beda_g_amph=g1_amph-g1run
data workmse_ZINB_jmerge workdugaZINB1 workdugaZINB2 workdugaZINB3 workdugaZINB4 workdugaZINB5 workdugaZINB6 workdugaZINB7 workdugaZINB8 workdugaZINB9 workdugaZINB10 workdugaZINB11 workdugaZINB12workdugaZINB13 workdugaZINB14 workdugaZINB15 workdugaZINB16 workdugaZINB17workdugaZINB18 workdugaZINB19 workdugaZINB20 workdugaZINB21 workdugaZINB22workdugaZINB23 workdugaZINB24 workdugaZINB25 workdugaZINB26 workdugaZINB27workdugaZINB28 workdugaZINB29 workdugaZINB30by kkrun
data workmse_ZINB_jset workmse_ZINB_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampj
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb
2
iiT
ii vbx with i=1hellipm
iv ~N(0 2v ) are area-specific random
effect and Tp )( 1 is 1p vector of
regression coefficients Therefore ib are
known as positive constants For making inferences about
i direct estimators iy
are assumed available Accordingly assuming
iii ey where i=1hellipm with
sampling error ie ~N(0 ei2 ) and ei
2are known At the end both models are combined and as a result is new model
iiiT
ii evbxy where i=1hellipm
(Rao 2003)2 Basic unit level (type B) model
Unit level model includes all models that relate unit values of the study variable to unit-specific auxiliary variables Assuming unit-specific auxiliary variables T
ijpijij xxx )( 1 and
correspondingly a nested regression model
ijiT
ijij evxy where
i=1hellipm and j=1hellip in with
iv ~N(0 2v ) and also ie ~N(0 ei
2 )
Empirical Bayes MethodsThe Bayesian approach is based on Bayes
Law which was found by Thomas Bayes This law was introduced by Richard Proce in 1763 two years after Thomas Bayes passed away In 1774 and 1781 Laplace gave the details and relevancies for modern Bayesian statistics (Gill 2002 in Kismiantini 2007)
Novick in Good (1980) mentioned that Bayes method is difficult to adopt and sometimes is very sensitive due to the requirement of prior probability informationwhich is usually difficult to obtain Robbin (1955) introduced Empirical Bayes methods by assuming a particular prior distribution estimating based on the sample Rao (2003) said that EB (Empirical Bayes) and HB (Hierarchical Bayes) are compatible for binary and count data in Small Area Estimation Therefore EB method was used in this research
Rao (2003) summarized EB methods in Small Area Estimation as follows 1 Obtain the posterior probability density
function of the small area parameter2 Estimate the parameters from the
marginal density function
3 Use the estimated posterior density forinferences regarding the parameters ofinterest
Poisson-Gamma ModelsPoisson model is a standard model in
dealing with count data Generally count data can be suffered by over-dispersion problem Therefore a Poisson formula had been developed to accommodate extra variance from sample data Two-stage models have been introduced for count data known as mixed model Poisson-Gamma Wakefield (2006) introduced Poisson-Gamma model which was easier to use with SMR (Standard Mortality Ratio) as a direct estimator This study used Wakefield model with alteration in direct estimator
Let iy be a number of specific individual
at small area-i which has specific characteristic of interest and written as follow
j
iji yy
ijy are the-jth object at the-ith small area where
j=1hellipn and i=1hellipm
First stage )(~ ii
ind
i Poissony is assumed
where )( ii x describes a regression
model in area level ix is a vector of
covariates and Tpii)( is a vector of
regression coefficientsSecond stage distribution
)1(~ gammaiid
iis assumed as a prior
distribution with mean 1 and variance 1
Then the marginal distribution |iy is
negative binomialMoreover Wakefield (2006) used Bayes
Theorem and acquired posterior distributionas
)1(~|i
iii ygammay
and EB estimator as
iiiiB
iEB
i )ˆ1(ˆˆ)ˆˆ(ˆˆ
with )ˆˆ(ˆˆ iii ii y are direct
estimation from i and iy are the number of
observation
Negative Binomial Regression The negative binomial regression model
seems have been first discussed by Anscombe (1972) Others have pointed out its success indealing with over-dispersed count data
3
Lawless (1987) elaborated the mixture model parameterization of the negative binomial providing formulas for its log likelihood mean variance and moments Later Breslow (1990) cited Lawlessrsquo work and since its inception to the late 1980rsquos the negative binomial regression model have been construed as a mixture model that is useful for accommodating otherwise over-dispersed Poisson data (Hardin amp Hilbe 2007) The negative binomial distribution function is written as
yk
kk
k
y
kyxyg
)1()(
)()|(
where y=012hellip k and are negative
binomial parameter with )(yE and
ky 2)var( k mention as disperse
parameter which is shown that the data consist of over-dispersed
Over-disperse at Count DataCount data for Poisson regression
including by over-disperse if variance bigger than mean or if the expected value of variance is smaller than expected This phenomenon is written as
)()( ii yEyVar (McCullagh amp Nelder 1989)
Zero-Inflated ModelsZero-Inflated models consider two distinct
sources of zero outcomes One source is generated from individuals who do not enter into the counting process the other from those who do enter the count process but result in a zero outcome (Hardin amp Hilbe 2007)
Lambert (1992) first described this type of mixture model in the context of process control in manufacturing It has since been used in many applications and is now founddiscussed in nearly every book or article dealing with count response models
For the zero-inflated model the probability of observing a zero outcome equals the probability that an individual is in the always-zero group plus the probability that individual is not that group times the probability that the counting process produces a zero If )0(B as
the probability that the binary process result in a zero outcomes and )0Pr( as the probability
that the counting of a zero outcomes the probability of a zero outcome for the system is then given by (Hardin amp Hilbe 2007)
)0Pr()1()0()0Pr( ZBy The probability of a nonzero count is
)Pr()]0(1[)0Pr( kBkky This model would produce two groups of
parameter one is zero-inflation parameter which shown that the covariate significantly contribute to having a zero outcomes And the other parameter is negative binomial parameter which modeling the response with the covariate
Zero-Inflated Negative BinomialThere are many kinds of zero-inflated
model each model has plus and minus and is used in different type of data Zero-Inflated negative binomial is one kind of them This model is used in over-disperse and excess-zero data As a result among parameter estimators there would be k parameters which indicate that over-disperse occur in data just as disperse parameter in negative binomial regression
The probability distribution of this model is as follow
)|( iii xyYP )|0()(1)( iii xgxx )|()(1 iii xygx
Where is a function of iz ix are vector
of zero-inflated covariate and is a vector of
zero-inflated coefficient which will be estimated Meanwhile )|( ii xyg is probability
distribution of negative binomial written asiy
i
i
iii
iii y
yxyg
)1()(
)()|(
Mean and variance of ZINB are
))(1)(1()|(
)1()|(
iiiiii
iiii
xyV
xyE
Jackknife Method of Estimating MSE( EBi )
Jackknife methods is one of general methods used in survey because itrsquos unpretentious concept (Jiang Lahiri and Wan 2002) This methods have been known by Tukey (1958) and developed to be a method that capable to be bias corrected of estimator by remove observation-i for i=1hellipm and performs parameter estimation
Rao (2003) the Jackknife step to estimate MSE( EB
i ) are
1 Assume that )ˆˆ(ˆ iiEBi yk
)ˆˆ(ˆ111 ii
EBi yk then calculate
m
l
EBi
EBii m
mM 2
12 )ˆˆ(1ˆ
2 Calculate the delete-i estimator 1
ˆ
and
1 then calculate
4
)]ˆˆ()ˆˆ([1
)ˆˆ(ˆ111111 ii
m
miiiiii ygyg
m
mygM
And )ˆ( 21 vig is the variance estimator of
posterior distribution which is used to measure the variability associated with i
The use of )ˆ( 21 vig is leads to severe of
underestimation of )ˆ( EBiJMSE related
with estimation in prior parameter Therefore the estimator
iM1ˆ correct the
bias of )ˆ( 21 vig
3 Calculate the jackknife estimator of MSE( EB
i ) as
iiEBiJ MMMSE 21
ˆˆ)ˆ(
METHODOLOGY
DataThis research assumed that the available
auxiliary data is on area level so this research used basic area level model The data were simulated with 30 small areas and one covariate Every batch generated different conditions of excess-zeros data start from 01 until 09 probability of zero in small area This research assumed structure of relation between respond and covariate was linear
MethodsThe following steps in generating data
using SAS 91 were used1 Fix the value of
iX for the- i th area
2 Define the expected probability of zero in each small area ))0(( iYP then
calculate ))0(log( ii YPLambda
3 Generate )11(~ Gammai4 Calculate )log(
iiLambda 5 Fit linear regression between and
iX to
obtain0 and
16 Calculate )`exp(X= ii 7 Calculate
iiparmlambda 8 Generate )(~ parmlambdaPoissonyi
Moreover in analyzing data the following steps were applied 1 Generate the negative binomial regression
with genmod procedure in SAS 91 and Zero-Inflated Negative binomial Regression with countreg procedure in SAS 92
2 Estimate the prior parameter which are and
3 Estimate using EB method4 Calculate MSE for indirect estimation5 Calculate RRMSE (Root Relative Mean
Square Error)
i
ii
MSERRMSE
ˆ)ˆ(
)ˆ(
RESULT AND DISCUSSION
Estimation of Prior Parameter is Based on EB Method with Negative Binomial
RegressionIn case of non-excess-zero data the
estimator produced small and consistent MSE Meanwhile if the number of excess-zero isapproximately 30 or more with expected probability of zero 06 the performance of estimates tends to be unreliable As a result EB estimation produced negative values
RRMSE of the estimator increasessimultaneously along with the increase of number of zero in the data Furthermore if thedata contain excess zero at least 30 theestimator is unreliable
Table 1 MSE and RRMSE of EB Estimator with NBR
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 033 016 018 01302 035 020 026 02003 040 023 036 03004 042 027 050 04205 045 031 072 05906 -12875 033 -038 08107 253671 040 -1216 13508 -584495 030 30946 21109 39135606 016 116E+10 664
Table 2 MSE (II) and RRMSE (II) of EB Estimator with NBR
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 033 016 018 01302 035 020 026 02003 040 023 036 03004 042 027 050 04205 046 031 071 05806 26197 033 -035 07507 950007 040 -1002 09908 1444250 030 22054 11009 41595285 016 677E+09 056
5
Table 1 show that the iterative process produced unexpected negative values of MSEThe simplest way to solve this problem is tochange the negative value to zero MSE (II) and RRMSE (II) in table 2 are the result of MSE and RRMSE after the negative value of MSE has been changed to zero
When data have expected probability of zero by 06 to 09 mean of MSE (II) increases drastically Similarly mean of RRMSE (II) increases sharply when data have 08 to 09 expected probability of zero However when data have 06 to 07 expected probability of zero the mean of RRMSE (II) is negative due to the negative value of EB estimates
Estimation of Prior Parameter is Based on EB Method with Zero-Inflated
Negative Binomial RegressionThe EB estimates are similar to the
estimates produced by NBR method although they are slightly outperformed NBR method when the data only contain small number of zeros In particular as shown by table 3 if data have expected probability of zero by 01 to 05 ZINB produces bigger MSE for EB estimator than which NBR produces
Whereas if data have expected probability of zero by 06 to 07 ZINB gives better estimates The estimates were also unbiased as it covers parameter values adequately However ZINB begins to produce inconsistent estimates if data have expected probability of zero by 08 or more due to enormous MSE
Besides when data have expected is because ZINB generates small estimates which is close to the parameter values
Mean of MSE (II) with ZINB is biggerthan the mean of MSE with ZINB That is because when negative value of MSE changed to zero it doesnrsquot have reduction factor in the mean calculation
Comparison of EB estimator withNegative Binomial Regression and EB
estimator with ZINBEB estimates given by both NBR and
ZINB methods are similar for data with small numbers of zero However ZINB method produces bigger MSE than NBR do as long as expected probability of zero in data does not exceed 06 thresholds
But ZINB method performs better if data have expected probability of zero by 06 to 07 In this case EB estimates given by NBR method are unstable and inconsistent due to estimatesrsquo negative value and huge MSE that
can be thousand times larger than theiracceptable value On the other hand EB estimator with ZINB works well it givesunbiased estimates and its MSE values are more stable than EB estimates with NBR
Both methods would have performed poorly if data had expected probability of zero by 08 or more EB estimators with both methods were inconsistent as a result of very huge MSE values they produced
Table 3 MSE and RRMSE of EB Estimator with ZINB
Probability of zero
Mean of MSE
Median ofMSE
Mean ofRRMSE
Median of RRMSE
01 045 017 024 01402 043 020 033 02103 071 028 052 03204 054 028 0632 04205 086 033 7322807 06606 061 038 29817 10307 058 025 218119 19408 -128 -14E-07 162697 37509 2954790 -1E-06 35E+278 609508
Table 4 MSE (II) and RRMSE (II) of EB Estimator with ZINB
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 045 017 024 01402 0436 020 0324 02103 072 028 051 031104 055 028 061 04105 095 033 6561235 05806 075 038 23406 07007 150 025 134655 06908 175 0 733506 009 2954908 0 12E+278 0
CONCLUSION
Excess-zero in data highly influenced the result of EB estimation Conventional method such as negative binomial regression in prior estimation has produced unbiased and unreliable EB estimator for data with expected probability of zero by 06 This is shown bybig number of MSE and negative value of estimator
Meanwhile EB estimation by ZINB method produced more reliable estimator even when the data have expected probability of zero by 06 to 07
The ZINB has also provided a reliable estimator for data with less than 5333 of zeros This means that performance of ZINB
6
declines when the data have expected probability of zero by 08 or more As shown by the big MSE and inconsistent estimator
RECOMMENDATION
This research is based on many assumptions and suffered by several limitations If the assumptions and boundaries can be relaxed can be expected better result There are some recommendations for the next research1 The generating process in this research
does not reflect the real sampling processIf the generating process similar to the real sampling process it might give better result because it will be closer with the real application
2 It will be more interesting to runexperiment which takes account of larger number of areas since the number of areas will influence data modeling
3 The Restricted Maximum Likelihood maybe applied when estimating prior parameter with ZINB and NBR in other to solve the negative value of MSE
4 Theoretical research of ZINB and Empirical Bayes estimator is important to understand the behavior of parameter estimates of ZINB in Empirical Bayes setting
REFERENCES
Erdman D L Jackson A Sinko 2008 Zero-Inflated Poisson and Zero-Inflated Negative Binomial Models Using the COUNTREG Procedure SAS Global Forum 2008322-2008httpwww2sascomproceedingsforum2008322-2008pdf [25 Agustus 2008]
Famoye F KP Singh 2006 Zero-Inflated Generalized Poisson Regression Model with an Application to Domestic Violence Data Journal of Data Science 4117-130
Hardin JW JM Hilbe 2007 Generalized Linear Models and Extensions Texas A Stata Press Publication
Kurnia A KA Notodiputro 2006 Penerapan Metode Jackknife dalam pendugaan Area Kecil Forum Statistika dan Komputasi April 2006 p12-15
Kismiantini 2007 Pendugaan Statistik Area Kecil Berbasis Model Poisson-Gamma [Tesis] Bogor Institut Pertanian Bogor Fakultas Matematika dan Pengetahuan Alam
McCullagh P J A Nelder 1983 Generalized Linear Models London Chapmann and Hall
Ramsini B et all 2001 Uninsured Estimates by County A Review of Options and IssueshttpwwwodhohiogovDataOFHSurvofhsrfq7pdf [24 April 2008]
Rao JNK 2003 Small Area Estimation New York John Wiley amp Sons
Wakefield J 2006 Disease mapping and spatial regression with count data httpwwwbepresscomuwbiostatpaper286pdf [24 April 2008]
7
Appendix 1 Result of EB estimation with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE -100569 0194196 0271669 041875 045917 3239598RRMSE 0123605 0300339 0422426 0503566 0642418 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE -237643 0235733 0306641 0452955 05091 3652167RRMSE 0038956 0412708 0588924 0717336 0844735 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE -749011 0254097 0330078 -12875 0539873 2354887RRMSE -663045 051763 0813734 -038057 1287528 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE -7513075 0235378 0402092 2536714 0876569 6051162RRMSE -10741 0704796 1355566 -121606 3040291 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601
bias 0000395 0116669 0254473 1091172 0497898 6297454MSE -6E+09 -016583 0301527 -584495 5718409 185E+09RRMSE -212936 0927338 2115163 3094627 1359703 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655MSE -38E+09 -130817 0159682 39135606 3074073 12E+11
RRMSE -909131 1647188 6639631 116E+10 1585472 706E+11
8
Appendix 2 Result of EB estimation with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE -053954 010933 0168797 0449506 0369775 360843RRMSE 0022947 0096443 0136424 0238099 0241955 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE -07309 0126202 0201463 0425844 0414597 1734815RRMSE 0021807 0144983 0210692 0326097 0401786 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE -229891 0156942 0277017 0707983 0590466 7469014RRMSE 0023998 0210095 0317195 0519524 0618802 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE -125713 0181557 0284338 0540615 0498521 423089RRMSE 0054916 028362 0420396 0630776 0778033 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442
MSE -181856 0194818 0334706 0859252 0711939 7997074RRMSE 0026206 0387294 0662251 7322807 1312302 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE -34589 0078006 0376514 060793 0804116 3426488RRMSE 000461 0502807 1033578 2981671 2012552 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE -142213 -001433 0255331 0584152 1132152 264456RRMSE 0064209 0847956 1942286 2181192 4589042 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE -10651 -56E-05 -14E-07 -127819 1452962 1132741RRMSE 0063244 1475413 3754705 162697 9221163 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE -175652 -33E-05 -1E-06 2954790 152E-06 613E+08
RRMSE 0040681 4059441 6095076 35E+278 5569021 16E+281
9
Appendix 3 Result of EB estimation (II) with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE 0 0194196 0271669 0419181 045917 3239598RRMSE 0 0300116 0422209 0502895 0641904 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE 0 0235733 0306641 0456258 05091 3652167RRMSE 0 0410357 0585765 0712314 0841838 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE 0 0254097 0330078 2619677 0539873 2354887RRMSE -663045 0448118 0750369 -034911 1209918 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE 0 0235378 0402092 9500073 0876569 6051162RRMSE -10741 0288999 0995659 -100163 2527784 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601bias 0000395 0116669 0254473 1091172 0497898 6297454MSE 0 0 0301527 1444250 5718409 185E+09RRMSE -212936 0 1104113 2205437 5656681 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655
MSE 0 0 0159682 41595285 3074073 12E+11
RRMSE -909131 0 0557622 677E+09 9311925 706E+11
10
Appendix 4 Result of EB estimation (II) with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE 0 010933 0168797 0450626 0369775 360843RRMSE 0 0095932 0135647 023675 0239669 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE 0 0126202 0201463 0428006 0414597 1734815RRMSE 0 0142648 020709 0320663 0395479 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE 0 0156942 0277017 0716543 0590466 7469014RRMSE 0 0203913 0311937 0506882 0615401 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE 0 0181557 0284338 0549835 0498521 423089RRMSE 0 0270309 0405926 0606317 0766631 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442MSE 0 0194818 0334706 094973 0711939 7997074RRMSE 0 0316402 0576343 6561235 1240175 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE 0 0078006 0376514 0749436 0804116 3426488RRMSE 0 0258286 0698814 2340612 1714808 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE 0 0 0255331 1501268 1132152 264456RRMSE 0 0 0688797 1346552 2500825 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE 0 0 0 1755486 1452962 1132741RRMSE 0 0 0 7335062 3311711 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE 0 0 0 2954908 152E-06 613E+08
RRMSE 0 0 0 12E+278 416189 16E+281
11
Appendix 5 Syntax program for generate data
data b generate x1(covariate) and ei input x1cards0222831971100013131702314625252218171412202210run
macro bangkit_datado r=1 to 100
data egenerate poisson-gamma with excess zerodo kk=1 to 30set btetha = rangam(11)lambda = -log(01) peluang munculnya nilai nol yang diinginkan (01-09)starlambda = log(lambdatetha)output endrun
proc regmodel starlambda = x1 ods output ParameterEstimates=workbetha_lr (keep=Parameter Estimate)run
proc transpose data=workbetha_lr out=workbetha_lr_t
12
Appendix 5 Syntax program for generate data (continued)
rundata _null_set workbetha_lr_tcall symput (Intercept col1)call symput (x1 col2)run
data ddo kk=1 to 30set emu = exp(ampIntercept + ampx1x1)parmlambda = mutethaypoi = rand(poissonparmlambda)output endrun
ods trace onto take percent zero on dataproc freq data=dtables ypoi ods output OneWayFreqs=workzerorundata zeroset zerokeep percentrunproc transpose data=zero out=zero1 rundata _null_set workzero1call symput (pctz col1)rundata dset dpzero=amppctzr=amprrun
proc append data=d base=d1run
endmend
bangkit_data
13
Appendix 6 Syntax program EB with NBR
macro sae_nbdo x=1 to 900
data workaset workeif ^(u=ampx) then deleterun
this genmod procedure estimates the response without zero-inflation proc genmod data=amodel ypoi = x1 dist=nb link=logods output ParameterEstimates=workbetha_nb (keep=Parameter Estimate)run
proc transpose data=workbetha_nb out=workbetha_nb_trun
data _null_set workbetha_nb_tcall symput (Intercept col1)call symput (x1 col2)call symput (Dispersion col3)run
EB with negbin-regdata workduga_nbset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + ampDispersion)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(ampDispersion+ypoi)((mu_hat_b+ampDispersion)2)bias_b=abs(teta_hat_bayes-parmlambda)run
proc append data=workduga_nb base=workduga_nb1run
jacknifedo h=1 to 30
data workdset workduga_nb1if ^(u=ampx) then deleterundata workjacknbamphset workdif u=ampxif kk=amph then deleterun
proc genmod data=workjacknbamph output p out=sasyi_estmodel ypoi = x1 dist = nb link=logods output parameterestimates=workbetha_est_nbamph (keep=parameter Estimate)
14
Appendix 6 Syntax program EB with NBR (continued)
runproc transpose data=workbetha_est_nbamph out=workbetha_est_nbtamphrundata _null_set workbetha_est_nbtamphcall symput (Intercept_ col1)call symput (x1_ col2)call symput (Dispersion_ col3)run
data workduganbamphset workdmu_hat_b_amph=exp(ampIntercept_ + ampx1_x1)w_b_amph=mu_hat_b_amph (mu_hat_b_amph + ampDispersion_)teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2g1_amph=(ampDispersion_+ypoi)((mu_hat_b_amph+ampDispersion_)2)beda_g_amph=g1_amph-g1run
data workmse_nb_jmerge workduganb1 workduganb2 workduganb3 workduganb4 workduganb5 workduganb6 workduganb7 workduganb8 workduganb9 workduganb10 workduganb11 workduganb12workduganb13 workduganb14 workduganb15 workduganb16 workduganb17workduganb18 workduganb19 workduganb20 workduganb21 workduganb22workduganb23 workduganb24 workduganb25 workduganb26 workduganb27workduganb28 workduganb29 workduganb30by kkrun
data workmse_nb_jset workmse_nb_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampjendm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesul = ampxrun
proc append data=workmse_nb_j base=workmse_nb_j1run
data workhasilnbmerge workd workmse_nb_j keep kk x1 tetha mu parmlambda ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_b
15
Appendix 6 Syntax program EB with NBR (continued)
run
ods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilnb BASE=workhasilnb1 appendver=v6run
ENDmend
sae_nb
16
Appendix 7 Syntax program EB with ZINB
macro sae_zinb
do x=1 to 900
data workaset work eif ^(u=ampx) then deleterun
proc countreg data=amodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workpe(keep=Parameter Estimate)run
proc transpose data=workpe out=workpe_trun
data _null_set workpe_tcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + amp_Alpha)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(amp_Alpha+ypoi)((mu_hat_b+amp_Alpha)2)bias_b=abs(teta_hat_bayes-parmlamdha)
run
proc append data=workduga base=workduga1run
do h=1 to 30
data workdset workduga1if ^(u=ampx) then deleterundata workjackzinbamphset workdif u=ampxif kk=amph then deleterun
proc countreg data=jackzinbamphmodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workbetha_est_ZINBamph
17
Appendix 7 Syntax program EB with ZINB (continued)
(keep=Parameter Estimate)run
proc transpose data=workbetha_est_ZINBamph out=workbetha_est_ZINBtamphrun
data _null_set workbetha_est_ZINBtamphcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaZINBamphset workdmu_hat_b_amph=exp(ampIntercept + ampx1x1)mu_hat_b_amph= ampb_o- + ampb_1- x1w_b_amph=mu_hat_b_amph (mu_hat_b_amph + (amp_Alpha))teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2
g1_amph =((mu_hat_b_amph2ampalpha_)2)(ampalpha_+y_i)((mu_hat_b_amph2ampalpha_)+mu_hat_b_amph)2
g1_amph=(amp_Alpha+ypoi)((mu_hat_b_amph+amp_Alpha)2)
g1_amph =(A2)(ampk- + y_i)( a +mu_hat_b)2
beda_g_amph=g1_amph-g1run
data workmse_ZINB_jmerge workdugaZINB1 workdugaZINB2 workdugaZINB3 workdugaZINB4 workdugaZINB5 workdugaZINB6 workdugaZINB7 workdugaZINB8 workdugaZINB9 workdugaZINB10 workdugaZINB11 workdugaZINB12workdugaZINB13 workdugaZINB14 workdugaZINB15 workdugaZINB16 workdugaZINB17workdugaZINB18 workdugaZINB19 workdugaZINB20 workdugaZINB21 workdugaZINB22workdugaZINB23 workdugaZINB24 workdugaZINB25 workdugaZINB26 workdugaZINB27workdugaZINB28 workdugaZINB29 workdugaZINB30by kkrun
data workmse_ZINB_jset workmse_ZINB_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampj
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb
3
Lawless (1987) elaborated the mixture model parameterization of the negative binomial providing formulas for its log likelihood mean variance and moments Later Breslow (1990) cited Lawlessrsquo work and since its inception to the late 1980rsquos the negative binomial regression model have been construed as a mixture model that is useful for accommodating otherwise over-dispersed Poisson data (Hardin amp Hilbe 2007) The negative binomial distribution function is written as
yk
kk
k
y
kyxyg
)1()(
)()|(
where y=012hellip k and are negative
binomial parameter with )(yE and
ky 2)var( k mention as disperse
parameter which is shown that the data consist of over-dispersed
Over-disperse at Count DataCount data for Poisson regression
including by over-disperse if variance bigger than mean or if the expected value of variance is smaller than expected This phenomenon is written as
)()( ii yEyVar (McCullagh amp Nelder 1989)
Zero-Inflated ModelsZero-Inflated models consider two distinct
sources of zero outcomes One source is generated from individuals who do not enter into the counting process the other from those who do enter the count process but result in a zero outcome (Hardin amp Hilbe 2007)
Lambert (1992) first described this type of mixture model in the context of process control in manufacturing It has since been used in many applications and is now founddiscussed in nearly every book or article dealing with count response models
For the zero-inflated model the probability of observing a zero outcome equals the probability that an individual is in the always-zero group plus the probability that individual is not that group times the probability that the counting process produces a zero If )0(B as
the probability that the binary process result in a zero outcomes and )0Pr( as the probability
that the counting of a zero outcomes the probability of a zero outcome for the system is then given by (Hardin amp Hilbe 2007)
)0Pr()1()0()0Pr( ZBy The probability of a nonzero count is
)Pr()]0(1[)0Pr( kBkky This model would produce two groups of
parameter one is zero-inflation parameter which shown that the covariate significantly contribute to having a zero outcomes And the other parameter is negative binomial parameter which modeling the response with the covariate
Zero-Inflated Negative BinomialThere are many kinds of zero-inflated
model each model has plus and minus and is used in different type of data Zero-Inflated negative binomial is one kind of them This model is used in over-disperse and excess-zero data As a result among parameter estimators there would be k parameters which indicate that over-disperse occur in data just as disperse parameter in negative binomial regression
The probability distribution of this model is as follow
)|( iii xyYP )|0()(1)( iii xgxx )|()(1 iii xygx
Where is a function of iz ix are vector
of zero-inflated covariate and is a vector of
zero-inflated coefficient which will be estimated Meanwhile )|( ii xyg is probability
distribution of negative binomial written asiy
i
i
iii
iii y
yxyg
)1()(
)()|(
Mean and variance of ZINB are
))(1)(1()|(
)1()|(
iiiiii
iiii
xyV
xyE
Jackknife Method of Estimating MSE( EBi )
Jackknife methods is one of general methods used in survey because itrsquos unpretentious concept (Jiang Lahiri and Wan 2002) This methods have been known by Tukey (1958) and developed to be a method that capable to be bias corrected of estimator by remove observation-i for i=1hellipm and performs parameter estimation
Rao (2003) the Jackknife step to estimate MSE( EB
i ) are
1 Assume that )ˆˆ(ˆ iiEBi yk
)ˆˆ(ˆ111 ii
EBi yk then calculate
m
l
EBi
EBii m
mM 2
12 )ˆˆ(1ˆ
2 Calculate the delete-i estimator 1
ˆ
and
1 then calculate
4
)]ˆˆ()ˆˆ([1
)ˆˆ(ˆ111111 ii
m
miiiiii ygyg
m
mygM
And )ˆ( 21 vig is the variance estimator of
posterior distribution which is used to measure the variability associated with i
The use of )ˆ( 21 vig is leads to severe of
underestimation of )ˆ( EBiJMSE related
with estimation in prior parameter Therefore the estimator
iM1ˆ correct the
bias of )ˆ( 21 vig
3 Calculate the jackknife estimator of MSE( EB
i ) as
iiEBiJ MMMSE 21
ˆˆ)ˆ(
METHODOLOGY
DataThis research assumed that the available
auxiliary data is on area level so this research used basic area level model The data were simulated with 30 small areas and one covariate Every batch generated different conditions of excess-zeros data start from 01 until 09 probability of zero in small area This research assumed structure of relation between respond and covariate was linear
MethodsThe following steps in generating data
using SAS 91 were used1 Fix the value of
iX for the- i th area
2 Define the expected probability of zero in each small area ))0(( iYP then
calculate ))0(log( ii YPLambda
3 Generate )11(~ Gammai4 Calculate )log(
iiLambda 5 Fit linear regression between and
iX to
obtain0 and
16 Calculate )`exp(X= ii 7 Calculate
iiparmlambda 8 Generate )(~ parmlambdaPoissonyi
Moreover in analyzing data the following steps were applied 1 Generate the negative binomial regression
with genmod procedure in SAS 91 and Zero-Inflated Negative binomial Regression with countreg procedure in SAS 92
2 Estimate the prior parameter which are and
3 Estimate using EB method4 Calculate MSE for indirect estimation5 Calculate RRMSE (Root Relative Mean
Square Error)
i
ii
MSERRMSE
ˆ)ˆ(
)ˆ(
RESULT AND DISCUSSION
Estimation of Prior Parameter is Based on EB Method with Negative Binomial
RegressionIn case of non-excess-zero data the
estimator produced small and consistent MSE Meanwhile if the number of excess-zero isapproximately 30 or more with expected probability of zero 06 the performance of estimates tends to be unreliable As a result EB estimation produced negative values
RRMSE of the estimator increasessimultaneously along with the increase of number of zero in the data Furthermore if thedata contain excess zero at least 30 theestimator is unreliable
Table 1 MSE and RRMSE of EB Estimator with NBR
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 033 016 018 01302 035 020 026 02003 040 023 036 03004 042 027 050 04205 045 031 072 05906 -12875 033 -038 08107 253671 040 -1216 13508 -584495 030 30946 21109 39135606 016 116E+10 664
Table 2 MSE (II) and RRMSE (II) of EB Estimator with NBR
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 033 016 018 01302 035 020 026 02003 040 023 036 03004 042 027 050 04205 046 031 071 05806 26197 033 -035 07507 950007 040 -1002 09908 1444250 030 22054 11009 41595285 016 677E+09 056
5
Table 1 show that the iterative process produced unexpected negative values of MSEThe simplest way to solve this problem is tochange the negative value to zero MSE (II) and RRMSE (II) in table 2 are the result of MSE and RRMSE after the negative value of MSE has been changed to zero
When data have expected probability of zero by 06 to 09 mean of MSE (II) increases drastically Similarly mean of RRMSE (II) increases sharply when data have 08 to 09 expected probability of zero However when data have 06 to 07 expected probability of zero the mean of RRMSE (II) is negative due to the negative value of EB estimates
Estimation of Prior Parameter is Based on EB Method with Zero-Inflated
Negative Binomial RegressionThe EB estimates are similar to the
estimates produced by NBR method although they are slightly outperformed NBR method when the data only contain small number of zeros In particular as shown by table 3 if data have expected probability of zero by 01 to 05 ZINB produces bigger MSE for EB estimator than which NBR produces
Whereas if data have expected probability of zero by 06 to 07 ZINB gives better estimates The estimates were also unbiased as it covers parameter values adequately However ZINB begins to produce inconsistent estimates if data have expected probability of zero by 08 or more due to enormous MSE
Besides when data have expected is because ZINB generates small estimates which is close to the parameter values
Mean of MSE (II) with ZINB is biggerthan the mean of MSE with ZINB That is because when negative value of MSE changed to zero it doesnrsquot have reduction factor in the mean calculation
Comparison of EB estimator withNegative Binomial Regression and EB
estimator with ZINBEB estimates given by both NBR and
ZINB methods are similar for data with small numbers of zero However ZINB method produces bigger MSE than NBR do as long as expected probability of zero in data does not exceed 06 thresholds
But ZINB method performs better if data have expected probability of zero by 06 to 07 In this case EB estimates given by NBR method are unstable and inconsistent due to estimatesrsquo negative value and huge MSE that
can be thousand times larger than theiracceptable value On the other hand EB estimator with ZINB works well it givesunbiased estimates and its MSE values are more stable than EB estimates with NBR
Both methods would have performed poorly if data had expected probability of zero by 08 or more EB estimators with both methods were inconsistent as a result of very huge MSE values they produced
Table 3 MSE and RRMSE of EB Estimator with ZINB
Probability of zero
Mean of MSE
Median ofMSE
Mean ofRRMSE
Median of RRMSE
01 045 017 024 01402 043 020 033 02103 071 028 052 03204 054 028 0632 04205 086 033 7322807 06606 061 038 29817 10307 058 025 218119 19408 -128 -14E-07 162697 37509 2954790 -1E-06 35E+278 609508
Table 4 MSE (II) and RRMSE (II) of EB Estimator with ZINB
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 045 017 024 01402 0436 020 0324 02103 072 028 051 031104 055 028 061 04105 095 033 6561235 05806 075 038 23406 07007 150 025 134655 06908 175 0 733506 009 2954908 0 12E+278 0
CONCLUSION
Excess-zero in data highly influenced the result of EB estimation Conventional method such as negative binomial regression in prior estimation has produced unbiased and unreliable EB estimator for data with expected probability of zero by 06 This is shown bybig number of MSE and negative value of estimator
Meanwhile EB estimation by ZINB method produced more reliable estimator even when the data have expected probability of zero by 06 to 07
The ZINB has also provided a reliable estimator for data with less than 5333 of zeros This means that performance of ZINB
6
declines when the data have expected probability of zero by 08 or more As shown by the big MSE and inconsistent estimator
RECOMMENDATION
This research is based on many assumptions and suffered by several limitations If the assumptions and boundaries can be relaxed can be expected better result There are some recommendations for the next research1 The generating process in this research
does not reflect the real sampling processIf the generating process similar to the real sampling process it might give better result because it will be closer with the real application
2 It will be more interesting to runexperiment which takes account of larger number of areas since the number of areas will influence data modeling
3 The Restricted Maximum Likelihood maybe applied when estimating prior parameter with ZINB and NBR in other to solve the negative value of MSE
4 Theoretical research of ZINB and Empirical Bayes estimator is important to understand the behavior of parameter estimates of ZINB in Empirical Bayes setting
REFERENCES
Erdman D L Jackson A Sinko 2008 Zero-Inflated Poisson and Zero-Inflated Negative Binomial Models Using the COUNTREG Procedure SAS Global Forum 2008322-2008httpwww2sascomproceedingsforum2008322-2008pdf [25 Agustus 2008]
Famoye F KP Singh 2006 Zero-Inflated Generalized Poisson Regression Model with an Application to Domestic Violence Data Journal of Data Science 4117-130
Hardin JW JM Hilbe 2007 Generalized Linear Models and Extensions Texas A Stata Press Publication
Kurnia A KA Notodiputro 2006 Penerapan Metode Jackknife dalam pendugaan Area Kecil Forum Statistika dan Komputasi April 2006 p12-15
Kismiantini 2007 Pendugaan Statistik Area Kecil Berbasis Model Poisson-Gamma [Tesis] Bogor Institut Pertanian Bogor Fakultas Matematika dan Pengetahuan Alam
McCullagh P J A Nelder 1983 Generalized Linear Models London Chapmann and Hall
Ramsini B et all 2001 Uninsured Estimates by County A Review of Options and IssueshttpwwwodhohiogovDataOFHSurvofhsrfq7pdf [24 April 2008]
Rao JNK 2003 Small Area Estimation New York John Wiley amp Sons
Wakefield J 2006 Disease mapping and spatial regression with count data httpwwwbepresscomuwbiostatpaper286pdf [24 April 2008]
7
Appendix 1 Result of EB estimation with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE -100569 0194196 0271669 041875 045917 3239598RRMSE 0123605 0300339 0422426 0503566 0642418 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE -237643 0235733 0306641 0452955 05091 3652167RRMSE 0038956 0412708 0588924 0717336 0844735 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE -749011 0254097 0330078 -12875 0539873 2354887RRMSE -663045 051763 0813734 -038057 1287528 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE -7513075 0235378 0402092 2536714 0876569 6051162RRMSE -10741 0704796 1355566 -121606 3040291 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601
bias 0000395 0116669 0254473 1091172 0497898 6297454MSE -6E+09 -016583 0301527 -584495 5718409 185E+09RRMSE -212936 0927338 2115163 3094627 1359703 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655MSE -38E+09 -130817 0159682 39135606 3074073 12E+11
RRMSE -909131 1647188 6639631 116E+10 1585472 706E+11
8
Appendix 2 Result of EB estimation with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE -053954 010933 0168797 0449506 0369775 360843RRMSE 0022947 0096443 0136424 0238099 0241955 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE -07309 0126202 0201463 0425844 0414597 1734815RRMSE 0021807 0144983 0210692 0326097 0401786 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE -229891 0156942 0277017 0707983 0590466 7469014RRMSE 0023998 0210095 0317195 0519524 0618802 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE -125713 0181557 0284338 0540615 0498521 423089RRMSE 0054916 028362 0420396 0630776 0778033 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442
MSE -181856 0194818 0334706 0859252 0711939 7997074RRMSE 0026206 0387294 0662251 7322807 1312302 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE -34589 0078006 0376514 060793 0804116 3426488RRMSE 000461 0502807 1033578 2981671 2012552 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE -142213 -001433 0255331 0584152 1132152 264456RRMSE 0064209 0847956 1942286 2181192 4589042 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE -10651 -56E-05 -14E-07 -127819 1452962 1132741RRMSE 0063244 1475413 3754705 162697 9221163 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE -175652 -33E-05 -1E-06 2954790 152E-06 613E+08
RRMSE 0040681 4059441 6095076 35E+278 5569021 16E+281
9
Appendix 3 Result of EB estimation (II) with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE 0 0194196 0271669 0419181 045917 3239598RRMSE 0 0300116 0422209 0502895 0641904 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE 0 0235733 0306641 0456258 05091 3652167RRMSE 0 0410357 0585765 0712314 0841838 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE 0 0254097 0330078 2619677 0539873 2354887RRMSE -663045 0448118 0750369 -034911 1209918 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE 0 0235378 0402092 9500073 0876569 6051162RRMSE -10741 0288999 0995659 -100163 2527784 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601bias 0000395 0116669 0254473 1091172 0497898 6297454MSE 0 0 0301527 1444250 5718409 185E+09RRMSE -212936 0 1104113 2205437 5656681 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655
MSE 0 0 0159682 41595285 3074073 12E+11
RRMSE -909131 0 0557622 677E+09 9311925 706E+11
10
Appendix 4 Result of EB estimation (II) with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE 0 010933 0168797 0450626 0369775 360843RRMSE 0 0095932 0135647 023675 0239669 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE 0 0126202 0201463 0428006 0414597 1734815RRMSE 0 0142648 020709 0320663 0395479 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE 0 0156942 0277017 0716543 0590466 7469014RRMSE 0 0203913 0311937 0506882 0615401 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE 0 0181557 0284338 0549835 0498521 423089RRMSE 0 0270309 0405926 0606317 0766631 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442MSE 0 0194818 0334706 094973 0711939 7997074RRMSE 0 0316402 0576343 6561235 1240175 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE 0 0078006 0376514 0749436 0804116 3426488RRMSE 0 0258286 0698814 2340612 1714808 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE 0 0 0255331 1501268 1132152 264456RRMSE 0 0 0688797 1346552 2500825 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE 0 0 0 1755486 1452962 1132741RRMSE 0 0 0 7335062 3311711 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE 0 0 0 2954908 152E-06 613E+08
RRMSE 0 0 0 12E+278 416189 16E+281
11
Appendix 5 Syntax program for generate data
data b generate x1(covariate) and ei input x1cards0222831971100013131702314625252218171412202210run
macro bangkit_datado r=1 to 100
data egenerate poisson-gamma with excess zerodo kk=1 to 30set btetha = rangam(11)lambda = -log(01) peluang munculnya nilai nol yang diinginkan (01-09)starlambda = log(lambdatetha)output endrun
proc regmodel starlambda = x1 ods output ParameterEstimates=workbetha_lr (keep=Parameter Estimate)run
proc transpose data=workbetha_lr out=workbetha_lr_t
12
Appendix 5 Syntax program for generate data (continued)
rundata _null_set workbetha_lr_tcall symput (Intercept col1)call symput (x1 col2)run
data ddo kk=1 to 30set emu = exp(ampIntercept + ampx1x1)parmlambda = mutethaypoi = rand(poissonparmlambda)output endrun
ods trace onto take percent zero on dataproc freq data=dtables ypoi ods output OneWayFreqs=workzerorundata zeroset zerokeep percentrunproc transpose data=zero out=zero1 rundata _null_set workzero1call symput (pctz col1)rundata dset dpzero=amppctzr=amprrun
proc append data=d base=d1run
endmend
bangkit_data
13
Appendix 6 Syntax program EB with NBR
macro sae_nbdo x=1 to 900
data workaset workeif ^(u=ampx) then deleterun
this genmod procedure estimates the response without zero-inflation proc genmod data=amodel ypoi = x1 dist=nb link=logods output ParameterEstimates=workbetha_nb (keep=Parameter Estimate)run
proc transpose data=workbetha_nb out=workbetha_nb_trun
data _null_set workbetha_nb_tcall symput (Intercept col1)call symput (x1 col2)call symput (Dispersion col3)run
EB with negbin-regdata workduga_nbset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + ampDispersion)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(ampDispersion+ypoi)((mu_hat_b+ampDispersion)2)bias_b=abs(teta_hat_bayes-parmlambda)run
proc append data=workduga_nb base=workduga_nb1run
jacknifedo h=1 to 30
data workdset workduga_nb1if ^(u=ampx) then deleterundata workjacknbamphset workdif u=ampxif kk=amph then deleterun
proc genmod data=workjacknbamph output p out=sasyi_estmodel ypoi = x1 dist = nb link=logods output parameterestimates=workbetha_est_nbamph (keep=parameter Estimate)
14
Appendix 6 Syntax program EB with NBR (continued)
runproc transpose data=workbetha_est_nbamph out=workbetha_est_nbtamphrundata _null_set workbetha_est_nbtamphcall symput (Intercept_ col1)call symput (x1_ col2)call symput (Dispersion_ col3)run
data workduganbamphset workdmu_hat_b_amph=exp(ampIntercept_ + ampx1_x1)w_b_amph=mu_hat_b_amph (mu_hat_b_amph + ampDispersion_)teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2g1_amph=(ampDispersion_+ypoi)((mu_hat_b_amph+ampDispersion_)2)beda_g_amph=g1_amph-g1run
data workmse_nb_jmerge workduganb1 workduganb2 workduganb3 workduganb4 workduganb5 workduganb6 workduganb7 workduganb8 workduganb9 workduganb10 workduganb11 workduganb12workduganb13 workduganb14 workduganb15 workduganb16 workduganb17workduganb18 workduganb19 workduganb20 workduganb21 workduganb22workduganb23 workduganb24 workduganb25 workduganb26 workduganb27workduganb28 workduganb29 workduganb30by kkrun
data workmse_nb_jset workmse_nb_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampjendm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesul = ampxrun
proc append data=workmse_nb_j base=workmse_nb_j1run
data workhasilnbmerge workd workmse_nb_j keep kk x1 tetha mu parmlambda ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_b
15
Appendix 6 Syntax program EB with NBR (continued)
run
ods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilnb BASE=workhasilnb1 appendver=v6run
ENDmend
sae_nb
16
Appendix 7 Syntax program EB with ZINB
macro sae_zinb
do x=1 to 900
data workaset work eif ^(u=ampx) then deleterun
proc countreg data=amodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workpe(keep=Parameter Estimate)run
proc transpose data=workpe out=workpe_trun
data _null_set workpe_tcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + amp_Alpha)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(amp_Alpha+ypoi)((mu_hat_b+amp_Alpha)2)bias_b=abs(teta_hat_bayes-parmlamdha)
run
proc append data=workduga base=workduga1run
do h=1 to 30
data workdset workduga1if ^(u=ampx) then deleterundata workjackzinbamphset workdif u=ampxif kk=amph then deleterun
proc countreg data=jackzinbamphmodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workbetha_est_ZINBamph
17
Appendix 7 Syntax program EB with ZINB (continued)
(keep=Parameter Estimate)run
proc transpose data=workbetha_est_ZINBamph out=workbetha_est_ZINBtamphrun
data _null_set workbetha_est_ZINBtamphcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaZINBamphset workdmu_hat_b_amph=exp(ampIntercept + ampx1x1)mu_hat_b_amph= ampb_o- + ampb_1- x1w_b_amph=mu_hat_b_amph (mu_hat_b_amph + (amp_Alpha))teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2
g1_amph =((mu_hat_b_amph2ampalpha_)2)(ampalpha_+y_i)((mu_hat_b_amph2ampalpha_)+mu_hat_b_amph)2
g1_amph=(amp_Alpha+ypoi)((mu_hat_b_amph+amp_Alpha)2)
g1_amph =(A2)(ampk- + y_i)( a +mu_hat_b)2
beda_g_amph=g1_amph-g1run
data workmse_ZINB_jmerge workdugaZINB1 workdugaZINB2 workdugaZINB3 workdugaZINB4 workdugaZINB5 workdugaZINB6 workdugaZINB7 workdugaZINB8 workdugaZINB9 workdugaZINB10 workdugaZINB11 workdugaZINB12workdugaZINB13 workdugaZINB14 workdugaZINB15 workdugaZINB16 workdugaZINB17workdugaZINB18 workdugaZINB19 workdugaZINB20 workdugaZINB21 workdugaZINB22workdugaZINB23 workdugaZINB24 workdugaZINB25 workdugaZINB26 workdugaZINB27workdugaZINB28 workdugaZINB29 workdugaZINB30by kkrun
data workmse_ZINB_jset workmse_ZINB_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampj
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb
4
)]ˆˆ()ˆˆ([1
)ˆˆ(ˆ111111 ii
m
miiiiii ygyg
m
mygM
And )ˆ( 21 vig is the variance estimator of
posterior distribution which is used to measure the variability associated with i
The use of )ˆ( 21 vig is leads to severe of
underestimation of )ˆ( EBiJMSE related
with estimation in prior parameter Therefore the estimator
iM1ˆ correct the
bias of )ˆ( 21 vig
3 Calculate the jackknife estimator of MSE( EB
i ) as
iiEBiJ MMMSE 21
ˆˆ)ˆ(
METHODOLOGY
DataThis research assumed that the available
auxiliary data is on area level so this research used basic area level model The data were simulated with 30 small areas and one covariate Every batch generated different conditions of excess-zeros data start from 01 until 09 probability of zero in small area This research assumed structure of relation between respond and covariate was linear
MethodsThe following steps in generating data
using SAS 91 were used1 Fix the value of
iX for the- i th area
2 Define the expected probability of zero in each small area ))0(( iYP then
calculate ))0(log( ii YPLambda
3 Generate )11(~ Gammai4 Calculate )log(
iiLambda 5 Fit linear regression between and
iX to
obtain0 and
16 Calculate )`exp(X= ii 7 Calculate
iiparmlambda 8 Generate )(~ parmlambdaPoissonyi
Moreover in analyzing data the following steps were applied 1 Generate the negative binomial regression
with genmod procedure in SAS 91 and Zero-Inflated Negative binomial Regression with countreg procedure in SAS 92
2 Estimate the prior parameter which are and
3 Estimate using EB method4 Calculate MSE for indirect estimation5 Calculate RRMSE (Root Relative Mean
Square Error)
i
ii
MSERRMSE
ˆ)ˆ(
)ˆ(
RESULT AND DISCUSSION
Estimation of Prior Parameter is Based on EB Method with Negative Binomial
RegressionIn case of non-excess-zero data the
estimator produced small and consistent MSE Meanwhile if the number of excess-zero isapproximately 30 or more with expected probability of zero 06 the performance of estimates tends to be unreliable As a result EB estimation produced negative values
RRMSE of the estimator increasessimultaneously along with the increase of number of zero in the data Furthermore if thedata contain excess zero at least 30 theestimator is unreliable
Table 1 MSE and RRMSE of EB Estimator with NBR
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 033 016 018 01302 035 020 026 02003 040 023 036 03004 042 027 050 04205 045 031 072 05906 -12875 033 -038 08107 253671 040 -1216 13508 -584495 030 30946 21109 39135606 016 116E+10 664
Table 2 MSE (II) and RRMSE (II) of EB Estimator with NBR
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 033 016 018 01302 035 020 026 02003 040 023 036 03004 042 027 050 04205 046 031 071 05806 26197 033 -035 07507 950007 040 -1002 09908 1444250 030 22054 11009 41595285 016 677E+09 056
5
Table 1 show that the iterative process produced unexpected negative values of MSEThe simplest way to solve this problem is tochange the negative value to zero MSE (II) and RRMSE (II) in table 2 are the result of MSE and RRMSE after the negative value of MSE has been changed to zero
When data have expected probability of zero by 06 to 09 mean of MSE (II) increases drastically Similarly mean of RRMSE (II) increases sharply when data have 08 to 09 expected probability of zero However when data have 06 to 07 expected probability of zero the mean of RRMSE (II) is negative due to the negative value of EB estimates
Estimation of Prior Parameter is Based on EB Method with Zero-Inflated
Negative Binomial RegressionThe EB estimates are similar to the
estimates produced by NBR method although they are slightly outperformed NBR method when the data only contain small number of zeros In particular as shown by table 3 if data have expected probability of zero by 01 to 05 ZINB produces bigger MSE for EB estimator than which NBR produces
Whereas if data have expected probability of zero by 06 to 07 ZINB gives better estimates The estimates were also unbiased as it covers parameter values adequately However ZINB begins to produce inconsistent estimates if data have expected probability of zero by 08 or more due to enormous MSE
Besides when data have expected is because ZINB generates small estimates which is close to the parameter values
Mean of MSE (II) with ZINB is biggerthan the mean of MSE with ZINB That is because when negative value of MSE changed to zero it doesnrsquot have reduction factor in the mean calculation
Comparison of EB estimator withNegative Binomial Regression and EB
estimator with ZINBEB estimates given by both NBR and
ZINB methods are similar for data with small numbers of zero However ZINB method produces bigger MSE than NBR do as long as expected probability of zero in data does not exceed 06 thresholds
But ZINB method performs better if data have expected probability of zero by 06 to 07 In this case EB estimates given by NBR method are unstable and inconsistent due to estimatesrsquo negative value and huge MSE that
can be thousand times larger than theiracceptable value On the other hand EB estimator with ZINB works well it givesunbiased estimates and its MSE values are more stable than EB estimates with NBR
Both methods would have performed poorly if data had expected probability of zero by 08 or more EB estimators with both methods were inconsistent as a result of very huge MSE values they produced
Table 3 MSE and RRMSE of EB Estimator with ZINB
Probability of zero
Mean of MSE
Median ofMSE
Mean ofRRMSE
Median of RRMSE
01 045 017 024 01402 043 020 033 02103 071 028 052 03204 054 028 0632 04205 086 033 7322807 06606 061 038 29817 10307 058 025 218119 19408 -128 -14E-07 162697 37509 2954790 -1E-06 35E+278 609508
Table 4 MSE (II) and RRMSE (II) of EB Estimator with ZINB
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 045 017 024 01402 0436 020 0324 02103 072 028 051 031104 055 028 061 04105 095 033 6561235 05806 075 038 23406 07007 150 025 134655 06908 175 0 733506 009 2954908 0 12E+278 0
CONCLUSION
Excess-zero in data highly influenced the result of EB estimation Conventional method such as negative binomial regression in prior estimation has produced unbiased and unreliable EB estimator for data with expected probability of zero by 06 This is shown bybig number of MSE and negative value of estimator
Meanwhile EB estimation by ZINB method produced more reliable estimator even when the data have expected probability of zero by 06 to 07
The ZINB has also provided a reliable estimator for data with less than 5333 of zeros This means that performance of ZINB
6
declines when the data have expected probability of zero by 08 or more As shown by the big MSE and inconsistent estimator
RECOMMENDATION
This research is based on many assumptions and suffered by several limitations If the assumptions and boundaries can be relaxed can be expected better result There are some recommendations for the next research1 The generating process in this research
does not reflect the real sampling processIf the generating process similar to the real sampling process it might give better result because it will be closer with the real application
2 It will be more interesting to runexperiment which takes account of larger number of areas since the number of areas will influence data modeling
3 The Restricted Maximum Likelihood maybe applied when estimating prior parameter with ZINB and NBR in other to solve the negative value of MSE
4 Theoretical research of ZINB and Empirical Bayes estimator is important to understand the behavior of parameter estimates of ZINB in Empirical Bayes setting
REFERENCES
Erdman D L Jackson A Sinko 2008 Zero-Inflated Poisson and Zero-Inflated Negative Binomial Models Using the COUNTREG Procedure SAS Global Forum 2008322-2008httpwww2sascomproceedingsforum2008322-2008pdf [25 Agustus 2008]
Famoye F KP Singh 2006 Zero-Inflated Generalized Poisson Regression Model with an Application to Domestic Violence Data Journal of Data Science 4117-130
Hardin JW JM Hilbe 2007 Generalized Linear Models and Extensions Texas A Stata Press Publication
Kurnia A KA Notodiputro 2006 Penerapan Metode Jackknife dalam pendugaan Area Kecil Forum Statistika dan Komputasi April 2006 p12-15
Kismiantini 2007 Pendugaan Statistik Area Kecil Berbasis Model Poisson-Gamma [Tesis] Bogor Institut Pertanian Bogor Fakultas Matematika dan Pengetahuan Alam
McCullagh P J A Nelder 1983 Generalized Linear Models London Chapmann and Hall
Ramsini B et all 2001 Uninsured Estimates by County A Review of Options and IssueshttpwwwodhohiogovDataOFHSurvofhsrfq7pdf [24 April 2008]
Rao JNK 2003 Small Area Estimation New York John Wiley amp Sons
Wakefield J 2006 Disease mapping and spatial regression with count data httpwwwbepresscomuwbiostatpaper286pdf [24 April 2008]
7
Appendix 1 Result of EB estimation with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE -100569 0194196 0271669 041875 045917 3239598RRMSE 0123605 0300339 0422426 0503566 0642418 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE -237643 0235733 0306641 0452955 05091 3652167RRMSE 0038956 0412708 0588924 0717336 0844735 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE -749011 0254097 0330078 -12875 0539873 2354887RRMSE -663045 051763 0813734 -038057 1287528 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE -7513075 0235378 0402092 2536714 0876569 6051162RRMSE -10741 0704796 1355566 -121606 3040291 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601
bias 0000395 0116669 0254473 1091172 0497898 6297454MSE -6E+09 -016583 0301527 -584495 5718409 185E+09RRMSE -212936 0927338 2115163 3094627 1359703 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655MSE -38E+09 -130817 0159682 39135606 3074073 12E+11
RRMSE -909131 1647188 6639631 116E+10 1585472 706E+11
8
Appendix 2 Result of EB estimation with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE -053954 010933 0168797 0449506 0369775 360843RRMSE 0022947 0096443 0136424 0238099 0241955 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE -07309 0126202 0201463 0425844 0414597 1734815RRMSE 0021807 0144983 0210692 0326097 0401786 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE -229891 0156942 0277017 0707983 0590466 7469014RRMSE 0023998 0210095 0317195 0519524 0618802 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE -125713 0181557 0284338 0540615 0498521 423089RRMSE 0054916 028362 0420396 0630776 0778033 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442
MSE -181856 0194818 0334706 0859252 0711939 7997074RRMSE 0026206 0387294 0662251 7322807 1312302 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE -34589 0078006 0376514 060793 0804116 3426488RRMSE 000461 0502807 1033578 2981671 2012552 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE -142213 -001433 0255331 0584152 1132152 264456RRMSE 0064209 0847956 1942286 2181192 4589042 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE -10651 -56E-05 -14E-07 -127819 1452962 1132741RRMSE 0063244 1475413 3754705 162697 9221163 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE -175652 -33E-05 -1E-06 2954790 152E-06 613E+08
RRMSE 0040681 4059441 6095076 35E+278 5569021 16E+281
9
Appendix 3 Result of EB estimation (II) with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE 0 0194196 0271669 0419181 045917 3239598RRMSE 0 0300116 0422209 0502895 0641904 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE 0 0235733 0306641 0456258 05091 3652167RRMSE 0 0410357 0585765 0712314 0841838 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE 0 0254097 0330078 2619677 0539873 2354887RRMSE -663045 0448118 0750369 -034911 1209918 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE 0 0235378 0402092 9500073 0876569 6051162RRMSE -10741 0288999 0995659 -100163 2527784 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601bias 0000395 0116669 0254473 1091172 0497898 6297454MSE 0 0 0301527 1444250 5718409 185E+09RRMSE -212936 0 1104113 2205437 5656681 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655
MSE 0 0 0159682 41595285 3074073 12E+11
RRMSE -909131 0 0557622 677E+09 9311925 706E+11
10
Appendix 4 Result of EB estimation (II) with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE 0 010933 0168797 0450626 0369775 360843RRMSE 0 0095932 0135647 023675 0239669 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE 0 0126202 0201463 0428006 0414597 1734815RRMSE 0 0142648 020709 0320663 0395479 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE 0 0156942 0277017 0716543 0590466 7469014RRMSE 0 0203913 0311937 0506882 0615401 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE 0 0181557 0284338 0549835 0498521 423089RRMSE 0 0270309 0405926 0606317 0766631 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442MSE 0 0194818 0334706 094973 0711939 7997074RRMSE 0 0316402 0576343 6561235 1240175 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE 0 0078006 0376514 0749436 0804116 3426488RRMSE 0 0258286 0698814 2340612 1714808 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE 0 0 0255331 1501268 1132152 264456RRMSE 0 0 0688797 1346552 2500825 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE 0 0 0 1755486 1452962 1132741RRMSE 0 0 0 7335062 3311711 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE 0 0 0 2954908 152E-06 613E+08
RRMSE 0 0 0 12E+278 416189 16E+281
11
Appendix 5 Syntax program for generate data
data b generate x1(covariate) and ei input x1cards0222831971100013131702314625252218171412202210run
macro bangkit_datado r=1 to 100
data egenerate poisson-gamma with excess zerodo kk=1 to 30set btetha = rangam(11)lambda = -log(01) peluang munculnya nilai nol yang diinginkan (01-09)starlambda = log(lambdatetha)output endrun
proc regmodel starlambda = x1 ods output ParameterEstimates=workbetha_lr (keep=Parameter Estimate)run
proc transpose data=workbetha_lr out=workbetha_lr_t
12
Appendix 5 Syntax program for generate data (continued)
rundata _null_set workbetha_lr_tcall symput (Intercept col1)call symput (x1 col2)run
data ddo kk=1 to 30set emu = exp(ampIntercept + ampx1x1)parmlambda = mutethaypoi = rand(poissonparmlambda)output endrun
ods trace onto take percent zero on dataproc freq data=dtables ypoi ods output OneWayFreqs=workzerorundata zeroset zerokeep percentrunproc transpose data=zero out=zero1 rundata _null_set workzero1call symput (pctz col1)rundata dset dpzero=amppctzr=amprrun
proc append data=d base=d1run
endmend
bangkit_data
13
Appendix 6 Syntax program EB with NBR
macro sae_nbdo x=1 to 900
data workaset workeif ^(u=ampx) then deleterun
this genmod procedure estimates the response without zero-inflation proc genmod data=amodel ypoi = x1 dist=nb link=logods output ParameterEstimates=workbetha_nb (keep=Parameter Estimate)run
proc transpose data=workbetha_nb out=workbetha_nb_trun
data _null_set workbetha_nb_tcall symput (Intercept col1)call symput (x1 col2)call symput (Dispersion col3)run
EB with negbin-regdata workduga_nbset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + ampDispersion)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(ampDispersion+ypoi)((mu_hat_b+ampDispersion)2)bias_b=abs(teta_hat_bayes-parmlambda)run
proc append data=workduga_nb base=workduga_nb1run
jacknifedo h=1 to 30
data workdset workduga_nb1if ^(u=ampx) then deleterundata workjacknbamphset workdif u=ampxif kk=amph then deleterun
proc genmod data=workjacknbamph output p out=sasyi_estmodel ypoi = x1 dist = nb link=logods output parameterestimates=workbetha_est_nbamph (keep=parameter Estimate)
14
Appendix 6 Syntax program EB with NBR (continued)
runproc transpose data=workbetha_est_nbamph out=workbetha_est_nbtamphrundata _null_set workbetha_est_nbtamphcall symput (Intercept_ col1)call symput (x1_ col2)call symput (Dispersion_ col3)run
data workduganbamphset workdmu_hat_b_amph=exp(ampIntercept_ + ampx1_x1)w_b_amph=mu_hat_b_amph (mu_hat_b_amph + ampDispersion_)teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2g1_amph=(ampDispersion_+ypoi)((mu_hat_b_amph+ampDispersion_)2)beda_g_amph=g1_amph-g1run
data workmse_nb_jmerge workduganb1 workduganb2 workduganb3 workduganb4 workduganb5 workduganb6 workduganb7 workduganb8 workduganb9 workduganb10 workduganb11 workduganb12workduganb13 workduganb14 workduganb15 workduganb16 workduganb17workduganb18 workduganb19 workduganb20 workduganb21 workduganb22workduganb23 workduganb24 workduganb25 workduganb26 workduganb27workduganb28 workduganb29 workduganb30by kkrun
data workmse_nb_jset workmse_nb_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampjendm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesul = ampxrun
proc append data=workmse_nb_j base=workmse_nb_j1run
data workhasilnbmerge workd workmse_nb_j keep kk x1 tetha mu parmlambda ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_b
15
Appendix 6 Syntax program EB with NBR (continued)
run
ods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilnb BASE=workhasilnb1 appendver=v6run
ENDmend
sae_nb
16
Appendix 7 Syntax program EB with ZINB
macro sae_zinb
do x=1 to 900
data workaset work eif ^(u=ampx) then deleterun
proc countreg data=amodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workpe(keep=Parameter Estimate)run
proc transpose data=workpe out=workpe_trun
data _null_set workpe_tcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + amp_Alpha)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(amp_Alpha+ypoi)((mu_hat_b+amp_Alpha)2)bias_b=abs(teta_hat_bayes-parmlamdha)
run
proc append data=workduga base=workduga1run
do h=1 to 30
data workdset workduga1if ^(u=ampx) then deleterundata workjackzinbamphset workdif u=ampxif kk=amph then deleterun
proc countreg data=jackzinbamphmodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workbetha_est_ZINBamph
17
Appendix 7 Syntax program EB with ZINB (continued)
(keep=Parameter Estimate)run
proc transpose data=workbetha_est_ZINBamph out=workbetha_est_ZINBtamphrun
data _null_set workbetha_est_ZINBtamphcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaZINBamphset workdmu_hat_b_amph=exp(ampIntercept + ampx1x1)mu_hat_b_amph= ampb_o- + ampb_1- x1w_b_amph=mu_hat_b_amph (mu_hat_b_amph + (amp_Alpha))teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2
g1_amph =((mu_hat_b_amph2ampalpha_)2)(ampalpha_+y_i)((mu_hat_b_amph2ampalpha_)+mu_hat_b_amph)2
g1_amph=(amp_Alpha+ypoi)((mu_hat_b_amph+amp_Alpha)2)
g1_amph =(A2)(ampk- + y_i)( a +mu_hat_b)2
beda_g_amph=g1_amph-g1run
data workmse_ZINB_jmerge workdugaZINB1 workdugaZINB2 workdugaZINB3 workdugaZINB4 workdugaZINB5 workdugaZINB6 workdugaZINB7 workdugaZINB8 workdugaZINB9 workdugaZINB10 workdugaZINB11 workdugaZINB12workdugaZINB13 workdugaZINB14 workdugaZINB15 workdugaZINB16 workdugaZINB17workdugaZINB18 workdugaZINB19 workdugaZINB20 workdugaZINB21 workdugaZINB22workdugaZINB23 workdugaZINB24 workdugaZINB25 workdugaZINB26 workdugaZINB27workdugaZINB28 workdugaZINB29 workdugaZINB30by kkrun
data workmse_ZINB_jset workmse_ZINB_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampj
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb
5
Table 1 show that the iterative process produced unexpected negative values of MSEThe simplest way to solve this problem is tochange the negative value to zero MSE (II) and RRMSE (II) in table 2 are the result of MSE and RRMSE after the negative value of MSE has been changed to zero
When data have expected probability of zero by 06 to 09 mean of MSE (II) increases drastically Similarly mean of RRMSE (II) increases sharply when data have 08 to 09 expected probability of zero However when data have 06 to 07 expected probability of zero the mean of RRMSE (II) is negative due to the negative value of EB estimates
Estimation of Prior Parameter is Based on EB Method with Zero-Inflated
Negative Binomial RegressionThe EB estimates are similar to the
estimates produced by NBR method although they are slightly outperformed NBR method when the data only contain small number of zeros In particular as shown by table 3 if data have expected probability of zero by 01 to 05 ZINB produces bigger MSE for EB estimator than which NBR produces
Whereas if data have expected probability of zero by 06 to 07 ZINB gives better estimates The estimates were also unbiased as it covers parameter values adequately However ZINB begins to produce inconsistent estimates if data have expected probability of zero by 08 or more due to enormous MSE
Besides when data have expected is because ZINB generates small estimates which is close to the parameter values
Mean of MSE (II) with ZINB is biggerthan the mean of MSE with ZINB That is because when negative value of MSE changed to zero it doesnrsquot have reduction factor in the mean calculation
Comparison of EB estimator withNegative Binomial Regression and EB
estimator with ZINBEB estimates given by both NBR and
ZINB methods are similar for data with small numbers of zero However ZINB method produces bigger MSE than NBR do as long as expected probability of zero in data does not exceed 06 thresholds
But ZINB method performs better if data have expected probability of zero by 06 to 07 In this case EB estimates given by NBR method are unstable and inconsistent due to estimatesrsquo negative value and huge MSE that
can be thousand times larger than theiracceptable value On the other hand EB estimator with ZINB works well it givesunbiased estimates and its MSE values are more stable than EB estimates with NBR
Both methods would have performed poorly if data had expected probability of zero by 08 or more EB estimators with both methods were inconsistent as a result of very huge MSE values they produced
Table 3 MSE and RRMSE of EB Estimator with ZINB
Probability of zero
Mean of MSE
Median ofMSE
Mean ofRRMSE
Median of RRMSE
01 045 017 024 01402 043 020 033 02103 071 028 052 03204 054 028 0632 04205 086 033 7322807 06606 061 038 29817 10307 058 025 218119 19408 -128 -14E-07 162697 37509 2954790 -1E-06 35E+278 609508
Table 4 MSE (II) and RRMSE (II) of EB Estimator with ZINB
Probability of zero
Mean of MSE
Median of MSE
Mean of RRMSE
Median of RRMSE
01 045 017 024 01402 0436 020 0324 02103 072 028 051 031104 055 028 061 04105 095 033 6561235 05806 075 038 23406 07007 150 025 134655 06908 175 0 733506 009 2954908 0 12E+278 0
CONCLUSION
Excess-zero in data highly influenced the result of EB estimation Conventional method such as negative binomial regression in prior estimation has produced unbiased and unreliable EB estimator for data with expected probability of zero by 06 This is shown bybig number of MSE and negative value of estimator
Meanwhile EB estimation by ZINB method produced more reliable estimator even when the data have expected probability of zero by 06 to 07
The ZINB has also provided a reliable estimator for data with less than 5333 of zeros This means that performance of ZINB
6
declines when the data have expected probability of zero by 08 or more As shown by the big MSE and inconsistent estimator
RECOMMENDATION
This research is based on many assumptions and suffered by several limitations If the assumptions and boundaries can be relaxed can be expected better result There are some recommendations for the next research1 The generating process in this research
does not reflect the real sampling processIf the generating process similar to the real sampling process it might give better result because it will be closer with the real application
2 It will be more interesting to runexperiment which takes account of larger number of areas since the number of areas will influence data modeling
3 The Restricted Maximum Likelihood maybe applied when estimating prior parameter with ZINB and NBR in other to solve the negative value of MSE
4 Theoretical research of ZINB and Empirical Bayes estimator is important to understand the behavior of parameter estimates of ZINB in Empirical Bayes setting
REFERENCES
Erdman D L Jackson A Sinko 2008 Zero-Inflated Poisson and Zero-Inflated Negative Binomial Models Using the COUNTREG Procedure SAS Global Forum 2008322-2008httpwww2sascomproceedingsforum2008322-2008pdf [25 Agustus 2008]
Famoye F KP Singh 2006 Zero-Inflated Generalized Poisson Regression Model with an Application to Domestic Violence Data Journal of Data Science 4117-130
Hardin JW JM Hilbe 2007 Generalized Linear Models and Extensions Texas A Stata Press Publication
Kurnia A KA Notodiputro 2006 Penerapan Metode Jackknife dalam pendugaan Area Kecil Forum Statistika dan Komputasi April 2006 p12-15
Kismiantini 2007 Pendugaan Statistik Area Kecil Berbasis Model Poisson-Gamma [Tesis] Bogor Institut Pertanian Bogor Fakultas Matematika dan Pengetahuan Alam
McCullagh P J A Nelder 1983 Generalized Linear Models London Chapmann and Hall
Ramsini B et all 2001 Uninsured Estimates by County A Review of Options and IssueshttpwwwodhohiogovDataOFHSurvofhsrfq7pdf [24 April 2008]
Rao JNK 2003 Small Area Estimation New York John Wiley amp Sons
Wakefield J 2006 Disease mapping and spatial regression with count data httpwwwbepresscomuwbiostatpaper286pdf [24 April 2008]
7
Appendix 1 Result of EB estimation with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE -100569 0194196 0271669 041875 045917 3239598RRMSE 0123605 0300339 0422426 0503566 0642418 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE -237643 0235733 0306641 0452955 05091 3652167RRMSE 0038956 0412708 0588924 0717336 0844735 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE -749011 0254097 0330078 -12875 0539873 2354887RRMSE -663045 051763 0813734 -038057 1287528 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE -7513075 0235378 0402092 2536714 0876569 6051162RRMSE -10741 0704796 1355566 -121606 3040291 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601
bias 0000395 0116669 0254473 1091172 0497898 6297454MSE -6E+09 -016583 0301527 -584495 5718409 185E+09RRMSE -212936 0927338 2115163 3094627 1359703 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655MSE -38E+09 -130817 0159682 39135606 3074073 12E+11
RRMSE -909131 1647188 6639631 116E+10 1585472 706E+11
8
Appendix 2 Result of EB estimation with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE -053954 010933 0168797 0449506 0369775 360843RRMSE 0022947 0096443 0136424 0238099 0241955 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE -07309 0126202 0201463 0425844 0414597 1734815RRMSE 0021807 0144983 0210692 0326097 0401786 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE -229891 0156942 0277017 0707983 0590466 7469014RRMSE 0023998 0210095 0317195 0519524 0618802 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE -125713 0181557 0284338 0540615 0498521 423089RRMSE 0054916 028362 0420396 0630776 0778033 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442
MSE -181856 0194818 0334706 0859252 0711939 7997074RRMSE 0026206 0387294 0662251 7322807 1312302 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE -34589 0078006 0376514 060793 0804116 3426488RRMSE 000461 0502807 1033578 2981671 2012552 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE -142213 -001433 0255331 0584152 1132152 264456RRMSE 0064209 0847956 1942286 2181192 4589042 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE -10651 -56E-05 -14E-07 -127819 1452962 1132741RRMSE 0063244 1475413 3754705 162697 9221163 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE -175652 -33E-05 -1E-06 2954790 152E-06 613E+08
RRMSE 0040681 4059441 6095076 35E+278 5569021 16E+281
9
Appendix 3 Result of EB estimation (II) with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE 0 0194196 0271669 0419181 045917 3239598RRMSE 0 0300116 0422209 0502895 0641904 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE 0 0235733 0306641 0456258 05091 3652167RRMSE 0 0410357 0585765 0712314 0841838 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE 0 0254097 0330078 2619677 0539873 2354887RRMSE -663045 0448118 0750369 -034911 1209918 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE 0 0235378 0402092 9500073 0876569 6051162RRMSE -10741 0288999 0995659 -100163 2527784 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601bias 0000395 0116669 0254473 1091172 0497898 6297454MSE 0 0 0301527 1444250 5718409 185E+09RRMSE -212936 0 1104113 2205437 5656681 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655
MSE 0 0 0159682 41595285 3074073 12E+11
RRMSE -909131 0 0557622 677E+09 9311925 706E+11
10
Appendix 4 Result of EB estimation (II) with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE 0 010933 0168797 0450626 0369775 360843RRMSE 0 0095932 0135647 023675 0239669 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE 0 0126202 0201463 0428006 0414597 1734815RRMSE 0 0142648 020709 0320663 0395479 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE 0 0156942 0277017 0716543 0590466 7469014RRMSE 0 0203913 0311937 0506882 0615401 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE 0 0181557 0284338 0549835 0498521 423089RRMSE 0 0270309 0405926 0606317 0766631 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442MSE 0 0194818 0334706 094973 0711939 7997074RRMSE 0 0316402 0576343 6561235 1240175 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE 0 0078006 0376514 0749436 0804116 3426488RRMSE 0 0258286 0698814 2340612 1714808 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE 0 0 0255331 1501268 1132152 264456RRMSE 0 0 0688797 1346552 2500825 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE 0 0 0 1755486 1452962 1132741RRMSE 0 0 0 7335062 3311711 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE 0 0 0 2954908 152E-06 613E+08
RRMSE 0 0 0 12E+278 416189 16E+281
11
Appendix 5 Syntax program for generate data
data b generate x1(covariate) and ei input x1cards0222831971100013131702314625252218171412202210run
macro bangkit_datado r=1 to 100
data egenerate poisson-gamma with excess zerodo kk=1 to 30set btetha = rangam(11)lambda = -log(01) peluang munculnya nilai nol yang diinginkan (01-09)starlambda = log(lambdatetha)output endrun
proc regmodel starlambda = x1 ods output ParameterEstimates=workbetha_lr (keep=Parameter Estimate)run
proc transpose data=workbetha_lr out=workbetha_lr_t
12
Appendix 5 Syntax program for generate data (continued)
rundata _null_set workbetha_lr_tcall symput (Intercept col1)call symput (x1 col2)run
data ddo kk=1 to 30set emu = exp(ampIntercept + ampx1x1)parmlambda = mutethaypoi = rand(poissonparmlambda)output endrun
ods trace onto take percent zero on dataproc freq data=dtables ypoi ods output OneWayFreqs=workzerorundata zeroset zerokeep percentrunproc transpose data=zero out=zero1 rundata _null_set workzero1call symput (pctz col1)rundata dset dpzero=amppctzr=amprrun
proc append data=d base=d1run
endmend
bangkit_data
13
Appendix 6 Syntax program EB with NBR
macro sae_nbdo x=1 to 900
data workaset workeif ^(u=ampx) then deleterun
this genmod procedure estimates the response without zero-inflation proc genmod data=amodel ypoi = x1 dist=nb link=logods output ParameterEstimates=workbetha_nb (keep=Parameter Estimate)run
proc transpose data=workbetha_nb out=workbetha_nb_trun
data _null_set workbetha_nb_tcall symput (Intercept col1)call symput (x1 col2)call symput (Dispersion col3)run
EB with negbin-regdata workduga_nbset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + ampDispersion)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(ampDispersion+ypoi)((mu_hat_b+ampDispersion)2)bias_b=abs(teta_hat_bayes-parmlambda)run
proc append data=workduga_nb base=workduga_nb1run
jacknifedo h=1 to 30
data workdset workduga_nb1if ^(u=ampx) then deleterundata workjacknbamphset workdif u=ampxif kk=amph then deleterun
proc genmod data=workjacknbamph output p out=sasyi_estmodel ypoi = x1 dist = nb link=logods output parameterestimates=workbetha_est_nbamph (keep=parameter Estimate)
14
Appendix 6 Syntax program EB with NBR (continued)
runproc transpose data=workbetha_est_nbamph out=workbetha_est_nbtamphrundata _null_set workbetha_est_nbtamphcall symput (Intercept_ col1)call symput (x1_ col2)call symput (Dispersion_ col3)run
data workduganbamphset workdmu_hat_b_amph=exp(ampIntercept_ + ampx1_x1)w_b_amph=mu_hat_b_amph (mu_hat_b_amph + ampDispersion_)teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2g1_amph=(ampDispersion_+ypoi)((mu_hat_b_amph+ampDispersion_)2)beda_g_amph=g1_amph-g1run
data workmse_nb_jmerge workduganb1 workduganb2 workduganb3 workduganb4 workduganb5 workduganb6 workduganb7 workduganb8 workduganb9 workduganb10 workduganb11 workduganb12workduganb13 workduganb14 workduganb15 workduganb16 workduganb17workduganb18 workduganb19 workduganb20 workduganb21 workduganb22workduganb23 workduganb24 workduganb25 workduganb26 workduganb27workduganb28 workduganb29 workduganb30by kkrun
data workmse_nb_jset workmse_nb_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampjendm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesul = ampxrun
proc append data=workmse_nb_j base=workmse_nb_j1run
data workhasilnbmerge workd workmse_nb_j keep kk x1 tetha mu parmlambda ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_b
15
Appendix 6 Syntax program EB with NBR (continued)
run
ods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilnb BASE=workhasilnb1 appendver=v6run
ENDmend
sae_nb
16
Appendix 7 Syntax program EB with ZINB
macro sae_zinb
do x=1 to 900
data workaset work eif ^(u=ampx) then deleterun
proc countreg data=amodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workpe(keep=Parameter Estimate)run
proc transpose data=workpe out=workpe_trun
data _null_set workpe_tcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + amp_Alpha)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(amp_Alpha+ypoi)((mu_hat_b+amp_Alpha)2)bias_b=abs(teta_hat_bayes-parmlamdha)
run
proc append data=workduga base=workduga1run
do h=1 to 30
data workdset workduga1if ^(u=ampx) then deleterundata workjackzinbamphset workdif u=ampxif kk=amph then deleterun
proc countreg data=jackzinbamphmodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workbetha_est_ZINBamph
17
Appendix 7 Syntax program EB with ZINB (continued)
(keep=Parameter Estimate)run
proc transpose data=workbetha_est_ZINBamph out=workbetha_est_ZINBtamphrun
data _null_set workbetha_est_ZINBtamphcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaZINBamphset workdmu_hat_b_amph=exp(ampIntercept + ampx1x1)mu_hat_b_amph= ampb_o- + ampb_1- x1w_b_amph=mu_hat_b_amph (mu_hat_b_amph + (amp_Alpha))teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2
g1_amph =((mu_hat_b_amph2ampalpha_)2)(ampalpha_+y_i)((mu_hat_b_amph2ampalpha_)+mu_hat_b_amph)2
g1_amph=(amp_Alpha+ypoi)((mu_hat_b_amph+amp_Alpha)2)
g1_amph =(A2)(ampk- + y_i)( a +mu_hat_b)2
beda_g_amph=g1_amph-g1run
data workmse_ZINB_jmerge workdugaZINB1 workdugaZINB2 workdugaZINB3 workdugaZINB4 workdugaZINB5 workdugaZINB6 workdugaZINB7 workdugaZINB8 workdugaZINB9 workdugaZINB10 workdugaZINB11 workdugaZINB12workdugaZINB13 workdugaZINB14 workdugaZINB15 workdugaZINB16 workdugaZINB17workdugaZINB18 workdugaZINB19 workdugaZINB20 workdugaZINB21 workdugaZINB22workdugaZINB23 workdugaZINB24 workdugaZINB25 workdugaZINB26 workdugaZINB27workdugaZINB28 workdugaZINB29 workdugaZINB30by kkrun
data workmse_ZINB_jset workmse_ZINB_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampj
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb
6
declines when the data have expected probability of zero by 08 or more As shown by the big MSE and inconsistent estimator
RECOMMENDATION
This research is based on many assumptions and suffered by several limitations If the assumptions and boundaries can be relaxed can be expected better result There are some recommendations for the next research1 The generating process in this research
does not reflect the real sampling processIf the generating process similar to the real sampling process it might give better result because it will be closer with the real application
2 It will be more interesting to runexperiment which takes account of larger number of areas since the number of areas will influence data modeling
3 The Restricted Maximum Likelihood maybe applied when estimating prior parameter with ZINB and NBR in other to solve the negative value of MSE
4 Theoretical research of ZINB and Empirical Bayes estimator is important to understand the behavior of parameter estimates of ZINB in Empirical Bayes setting
REFERENCES
Erdman D L Jackson A Sinko 2008 Zero-Inflated Poisson and Zero-Inflated Negative Binomial Models Using the COUNTREG Procedure SAS Global Forum 2008322-2008httpwww2sascomproceedingsforum2008322-2008pdf [25 Agustus 2008]
Famoye F KP Singh 2006 Zero-Inflated Generalized Poisson Regression Model with an Application to Domestic Violence Data Journal of Data Science 4117-130
Hardin JW JM Hilbe 2007 Generalized Linear Models and Extensions Texas A Stata Press Publication
Kurnia A KA Notodiputro 2006 Penerapan Metode Jackknife dalam pendugaan Area Kecil Forum Statistika dan Komputasi April 2006 p12-15
Kismiantini 2007 Pendugaan Statistik Area Kecil Berbasis Model Poisson-Gamma [Tesis] Bogor Institut Pertanian Bogor Fakultas Matematika dan Pengetahuan Alam
McCullagh P J A Nelder 1983 Generalized Linear Models London Chapmann and Hall
Ramsini B et all 2001 Uninsured Estimates by County A Review of Options and IssueshttpwwwodhohiogovDataOFHSurvofhsrfq7pdf [24 April 2008]
Rao JNK 2003 Small Area Estimation New York John Wiley amp Sons
Wakefield J 2006 Disease mapping and spatial regression with count data httpwwwbepresscomuwbiostatpaper286pdf [24 April 2008]
7
Appendix 1 Result of EB estimation with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE -100569 0194196 0271669 041875 045917 3239598RRMSE 0123605 0300339 0422426 0503566 0642418 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE -237643 0235733 0306641 0452955 05091 3652167RRMSE 0038956 0412708 0588924 0717336 0844735 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE -749011 0254097 0330078 -12875 0539873 2354887RRMSE -663045 051763 0813734 -038057 1287528 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE -7513075 0235378 0402092 2536714 0876569 6051162RRMSE -10741 0704796 1355566 -121606 3040291 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601
bias 0000395 0116669 0254473 1091172 0497898 6297454MSE -6E+09 -016583 0301527 -584495 5718409 185E+09RRMSE -212936 0927338 2115163 3094627 1359703 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655MSE -38E+09 -130817 0159682 39135606 3074073 12E+11
RRMSE -909131 1647188 6639631 116E+10 1585472 706E+11
8
Appendix 2 Result of EB estimation with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE -053954 010933 0168797 0449506 0369775 360843RRMSE 0022947 0096443 0136424 0238099 0241955 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE -07309 0126202 0201463 0425844 0414597 1734815RRMSE 0021807 0144983 0210692 0326097 0401786 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE -229891 0156942 0277017 0707983 0590466 7469014RRMSE 0023998 0210095 0317195 0519524 0618802 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE -125713 0181557 0284338 0540615 0498521 423089RRMSE 0054916 028362 0420396 0630776 0778033 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442
MSE -181856 0194818 0334706 0859252 0711939 7997074RRMSE 0026206 0387294 0662251 7322807 1312302 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE -34589 0078006 0376514 060793 0804116 3426488RRMSE 000461 0502807 1033578 2981671 2012552 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE -142213 -001433 0255331 0584152 1132152 264456RRMSE 0064209 0847956 1942286 2181192 4589042 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE -10651 -56E-05 -14E-07 -127819 1452962 1132741RRMSE 0063244 1475413 3754705 162697 9221163 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE -175652 -33E-05 -1E-06 2954790 152E-06 613E+08
RRMSE 0040681 4059441 6095076 35E+278 5569021 16E+281
9
Appendix 3 Result of EB estimation (II) with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE 0 0194196 0271669 0419181 045917 3239598RRMSE 0 0300116 0422209 0502895 0641904 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE 0 0235733 0306641 0456258 05091 3652167RRMSE 0 0410357 0585765 0712314 0841838 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE 0 0254097 0330078 2619677 0539873 2354887RRMSE -663045 0448118 0750369 -034911 1209918 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE 0 0235378 0402092 9500073 0876569 6051162RRMSE -10741 0288999 0995659 -100163 2527784 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601bias 0000395 0116669 0254473 1091172 0497898 6297454MSE 0 0 0301527 1444250 5718409 185E+09RRMSE -212936 0 1104113 2205437 5656681 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655
MSE 0 0 0159682 41595285 3074073 12E+11
RRMSE -909131 0 0557622 677E+09 9311925 706E+11
10
Appendix 4 Result of EB estimation (II) with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE 0 010933 0168797 0450626 0369775 360843RRMSE 0 0095932 0135647 023675 0239669 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE 0 0126202 0201463 0428006 0414597 1734815RRMSE 0 0142648 020709 0320663 0395479 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE 0 0156942 0277017 0716543 0590466 7469014RRMSE 0 0203913 0311937 0506882 0615401 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE 0 0181557 0284338 0549835 0498521 423089RRMSE 0 0270309 0405926 0606317 0766631 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442MSE 0 0194818 0334706 094973 0711939 7997074RRMSE 0 0316402 0576343 6561235 1240175 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE 0 0078006 0376514 0749436 0804116 3426488RRMSE 0 0258286 0698814 2340612 1714808 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE 0 0 0255331 1501268 1132152 264456RRMSE 0 0 0688797 1346552 2500825 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE 0 0 0 1755486 1452962 1132741RRMSE 0 0 0 7335062 3311711 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE 0 0 0 2954908 152E-06 613E+08
RRMSE 0 0 0 12E+278 416189 16E+281
11
Appendix 5 Syntax program for generate data
data b generate x1(covariate) and ei input x1cards0222831971100013131702314625252218171412202210run
macro bangkit_datado r=1 to 100
data egenerate poisson-gamma with excess zerodo kk=1 to 30set btetha = rangam(11)lambda = -log(01) peluang munculnya nilai nol yang diinginkan (01-09)starlambda = log(lambdatetha)output endrun
proc regmodel starlambda = x1 ods output ParameterEstimates=workbetha_lr (keep=Parameter Estimate)run
proc transpose data=workbetha_lr out=workbetha_lr_t
12
Appendix 5 Syntax program for generate data (continued)
rundata _null_set workbetha_lr_tcall symput (Intercept col1)call symput (x1 col2)run
data ddo kk=1 to 30set emu = exp(ampIntercept + ampx1x1)parmlambda = mutethaypoi = rand(poissonparmlambda)output endrun
ods trace onto take percent zero on dataproc freq data=dtables ypoi ods output OneWayFreqs=workzerorundata zeroset zerokeep percentrunproc transpose data=zero out=zero1 rundata _null_set workzero1call symput (pctz col1)rundata dset dpzero=amppctzr=amprrun
proc append data=d base=d1run
endmend
bangkit_data
13
Appendix 6 Syntax program EB with NBR
macro sae_nbdo x=1 to 900
data workaset workeif ^(u=ampx) then deleterun
this genmod procedure estimates the response without zero-inflation proc genmod data=amodel ypoi = x1 dist=nb link=logods output ParameterEstimates=workbetha_nb (keep=Parameter Estimate)run
proc transpose data=workbetha_nb out=workbetha_nb_trun
data _null_set workbetha_nb_tcall symput (Intercept col1)call symput (x1 col2)call symput (Dispersion col3)run
EB with negbin-regdata workduga_nbset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + ampDispersion)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(ampDispersion+ypoi)((mu_hat_b+ampDispersion)2)bias_b=abs(teta_hat_bayes-parmlambda)run
proc append data=workduga_nb base=workduga_nb1run
jacknifedo h=1 to 30
data workdset workduga_nb1if ^(u=ampx) then deleterundata workjacknbamphset workdif u=ampxif kk=amph then deleterun
proc genmod data=workjacknbamph output p out=sasyi_estmodel ypoi = x1 dist = nb link=logods output parameterestimates=workbetha_est_nbamph (keep=parameter Estimate)
14
Appendix 6 Syntax program EB with NBR (continued)
runproc transpose data=workbetha_est_nbamph out=workbetha_est_nbtamphrundata _null_set workbetha_est_nbtamphcall symput (Intercept_ col1)call symput (x1_ col2)call symput (Dispersion_ col3)run
data workduganbamphset workdmu_hat_b_amph=exp(ampIntercept_ + ampx1_x1)w_b_amph=mu_hat_b_amph (mu_hat_b_amph + ampDispersion_)teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2g1_amph=(ampDispersion_+ypoi)((mu_hat_b_amph+ampDispersion_)2)beda_g_amph=g1_amph-g1run
data workmse_nb_jmerge workduganb1 workduganb2 workduganb3 workduganb4 workduganb5 workduganb6 workduganb7 workduganb8 workduganb9 workduganb10 workduganb11 workduganb12workduganb13 workduganb14 workduganb15 workduganb16 workduganb17workduganb18 workduganb19 workduganb20 workduganb21 workduganb22workduganb23 workduganb24 workduganb25 workduganb26 workduganb27workduganb28 workduganb29 workduganb30by kkrun
data workmse_nb_jset workmse_nb_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampjendm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesul = ampxrun
proc append data=workmse_nb_j base=workmse_nb_j1run
data workhasilnbmerge workd workmse_nb_j keep kk x1 tetha mu parmlambda ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_b
15
Appendix 6 Syntax program EB with NBR (continued)
run
ods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilnb BASE=workhasilnb1 appendver=v6run
ENDmend
sae_nb
16
Appendix 7 Syntax program EB with ZINB
macro sae_zinb
do x=1 to 900
data workaset work eif ^(u=ampx) then deleterun
proc countreg data=amodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workpe(keep=Parameter Estimate)run
proc transpose data=workpe out=workpe_trun
data _null_set workpe_tcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + amp_Alpha)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(amp_Alpha+ypoi)((mu_hat_b+amp_Alpha)2)bias_b=abs(teta_hat_bayes-parmlamdha)
run
proc append data=workduga base=workduga1run
do h=1 to 30
data workdset workduga1if ^(u=ampx) then deleterundata workjackzinbamphset workdif u=ampxif kk=amph then deleterun
proc countreg data=jackzinbamphmodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workbetha_est_ZINBamph
17
Appendix 7 Syntax program EB with ZINB (continued)
(keep=Parameter Estimate)run
proc transpose data=workbetha_est_ZINBamph out=workbetha_est_ZINBtamphrun
data _null_set workbetha_est_ZINBtamphcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaZINBamphset workdmu_hat_b_amph=exp(ampIntercept + ampx1x1)mu_hat_b_amph= ampb_o- + ampb_1- x1w_b_amph=mu_hat_b_amph (mu_hat_b_amph + (amp_Alpha))teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2
g1_amph =((mu_hat_b_amph2ampalpha_)2)(ampalpha_+y_i)((mu_hat_b_amph2ampalpha_)+mu_hat_b_amph)2
g1_amph=(amp_Alpha+ypoi)((mu_hat_b_amph+amp_Alpha)2)
g1_amph =(A2)(ampk- + y_i)( a +mu_hat_b)2
beda_g_amph=g1_amph-g1run
data workmse_ZINB_jmerge workdugaZINB1 workdugaZINB2 workdugaZINB3 workdugaZINB4 workdugaZINB5 workdugaZINB6 workdugaZINB7 workdugaZINB8 workdugaZINB9 workdugaZINB10 workdugaZINB11 workdugaZINB12workdugaZINB13 workdugaZINB14 workdugaZINB15 workdugaZINB16 workdugaZINB17workdugaZINB18 workdugaZINB19 workdugaZINB20 workdugaZINB21 workdugaZINB22workdugaZINB23 workdugaZINB24 workdugaZINB25 workdugaZINB26 workdugaZINB27workdugaZINB28 workdugaZINB29 workdugaZINB30by kkrun
data workmse_ZINB_jset workmse_ZINB_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampj
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb
7
Appendix 1 Result of EB estimation with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE -100569 0194196 0271669 041875 045917 3239598RRMSE 0123605 0300339 0422426 0503566 0642418 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE -237643 0235733 0306641 0452955 05091 3652167RRMSE 0038956 0412708 0588924 0717336 0844735 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE -749011 0254097 0330078 -12875 0539873 2354887RRMSE -663045 051763 0813734 -038057 1287528 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE -7513075 0235378 0402092 2536714 0876569 6051162RRMSE -10741 0704796 1355566 -121606 3040291 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601
bias 0000395 0116669 0254473 1091172 0497898 6297454MSE -6E+09 -016583 0301527 -584495 5718409 185E+09RRMSE -212936 0927338 2115163 3094627 1359703 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655MSE -38E+09 -130817 0159682 39135606 3074073 12E+11
RRMSE -909131 1647188 6639631 116E+10 1585472 706E+11
8
Appendix 2 Result of EB estimation with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE -053954 010933 0168797 0449506 0369775 360843RRMSE 0022947 0096443 0136424 0238099 0241955 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE -07309 0126202 0201463 0425844 0414597 1734815RRMSE 0021807 0144983 0210692 0326097 0401786 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE -229891 0156942 0277017 0707983 0590466 7469014RRMSE 0023998 0210095 0317195 0519524 0618802 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE -125713 0181557 0284338 0540615 0498521 423089RRMSE 0054916 028362 0420396 0630776 0778033 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442
MSE -181856 0194818 0334706 0859252 0711939 7997074RRMSE 0026206 0387294 0662251 7322807 1312302 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE -34589 0078006 0376514 060793 0804116 3426488RRMSE 000461 0502807 1033578 2981671 2012552 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE -142213 -001433 0255331 0584152 1132152 264456RRMSE 0064209 0847956 1942286 2181192 4589042 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE -10651 -56E-05 -14E-07 -127819 1452962 1132741RRMSE 0063244 1475413 3754705 162697 9221163 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE -175652 -33E-05 -1E-06 2954790 152E-06 613E+08
RRMSE 0040681 4059441 6095076 35E+278 5569021 16E+281
9
Appendix 3 Result of EB estimation (II) with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE 0 0194196 0271669 0419181 045917 3239598RRMSE 0 0300116 0422209 0502895 0641904 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE 0 0235733 0306641 0456258 05091 3652167RRMSE 0 0410357 0585765 0712314 0841838 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE 0 0254097 0330078 2619677 0539873 2354887RRMSE -663045 0448118 0750369 -034911 1209918 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE 0 0235378 0402092 9500073 0876569 6051162RRMSE -10741 0288999 0995659 -100163 2527784 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601bias 0000395 0116669 0254473 1091172 0497898 6297454MSE 0 0 0301527 1444250 5718409 185E+09RRMSE -212936 0 1104113 2205437 5656681 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655
MSE 0 0 0159682 41595285 3074073 12E+11
RRMSE -909131 0 0557622 677E+09 9311925 706E+11
10
Appendix 4 Result of EB estimation (II) with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE 0 010933 0168797 0450626 0369775 360843RRMSE 0 0095932 0135647 023675 0239669 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE 0 0126202 0201463 0428006 0414597 1734815RRMSE 0 0142648 020709 0320663 0395479 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE 0 0156942 0277017 0716543 0590466 7469014RRMSE 0 0203913 0311937 0506882 0615401 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE 0 0181557 0284338 0549835 0498521 423089RRMSE 0 0270309 0405926 0606317 0766631 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442MSE 0 0194818 0334706 094973 0711939 7997074RRMSE 0 0316402 0576343 6561235 1240175 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE 0 0078006 0376514 0749436 0804116 3426488RRMSE 0 0258286 0698814 2340612 1714808 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE 0 0 0255331 1501268 1132152 264456RRMSE 0 0 0688797 1346552 2500825 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE 0 0 0 1755486 1452962 1132741RRMSE 0 0 0 7335062 3311711 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE 0 0 0 2954908 152E-06 613E+08
RRMSE 0 0 0 12E+278 416189 16E+281
11
Appendix 5 Syntax program for generate data
data b generate x1(covariate) and ei input x1cards0222831971100013131702314625252218171412202210run
macro bangkit_datado r=1 to 100
data egenerate poisson-gamma with excess zerodo kk=1 to 30set btetha = rangam(11)lambda = -log(01) peluang munculnya nilai nol yang diinginkan (01-09)starlambda = log(lambdatetha)output endrun
proc regmodel starlambda = x1 ods output ParameterEstimates=workbetha_lr (keep=Parameter Estimate)run
proc transpose data=workbetha_lr out=workbetha_lr_t
12
Appendix 5 Syntax program for generate data (continued)
rundata _null_set workbetha_lr_tcall symput (Intercept col1)call symput (x1 col2)run
data ddo kk=1 to 30set emu = exp(ampIntercept + ampx1x1)parmlambda = mutethaypoi = rand(poissonparmlambda)output endrun
ods trace onto take percent zero on dataproc freq data=dtables ypoi ods output OneWayFreqs=workzerorundata zeroset zerokeep percentrunproc transpose data=zero out=zero1 rundata _null_set workzero1call symput (pctz col1)rundata dset dpzero=amppctzr=amprrun
proc append data=d base=d1run
endmend
bangkit_data
13
Appendix 6 Syntax program EB with NBR
macro sae_nbdo x=1 to 900
data workaset workeif ^(u=ampx) then deleterun
this genmod procedure estimates the response without zero-inflation proc genmod data=amodel ypoi = x1 dist=nb link=logods output ParameterEstimates=workbetha_nb (keep=Parameter Estimate)run
proc transpose data=workbetha_nb out=workbetha_nb_trun
data _null_set workbetha_nb_tcall symput (Intercept col1)call symput (x1 col2)call symput (Dispersion col3)run
EB with negbin-regdata workduga_nbset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + ampDispersion)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(ampDispersion+ypoi)((mu_hat_b+ampDispersion)2)bias_b=abs(teta_hat_bayes-parmlambda)run
proc append data=workduga_nb base=workduga_nb1run
jacknifedo h=1 to 30
data workdset workduga_nb1if ^(u=ampx) then deleterundata workjacknbamphset workdif u=ampxif kk=amph then deleterun
proc genmod data=workjacknbamph output p out=sasyi_estmodel ypoi = x1 dist = nb link=logods output parameterestimates=workbetha_est_nbamph (keep=parameter Estimate)
14
Appendix 6 Syntax program EB with NBR (continued)
runproc transpose data=workbetha_est_nbamph out=workbetha_est_nbtamphrundata _null_set workbetha_est_nbtamphcall symput (Intercept_ col1)call symput (x1_ col2)call symput (Dispersion_ col3)run
data workduganbamphset workdmu_hat_b_amph=exp(ampIntercept_ + ampx1_x1)w_b_amph=mu_hat_b_amph (mu_hat_b_amph + ampDispersion_)teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2g1_amph=(ampDispersion_+ypoi)((mu_hat_b_amph+ampDispersion_)2)beda_g_amph=g1_amph-g1run
data workmse_nb_jmerge workduganb1 workduganb2 workduganb3 workduganb4 workduganb5 workduganb6 workduganb7 workduganb8 workduganb9 workduganb10 workduganb11 workduganb12workduganb13 workduganb14 workduganb15 workduganb16 workduganb17workduganb18 workduganb19 workduganb20 workduganb21 workduganb22workduganb23 workduganb24 workduganb25 workduganb26 workduganb27workduganb28 workduganb29 workduganb30by kkrun
data workmse_nb_jset workmse_nb_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampjendm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesul = ampxrun
proc append data=workmse_nb_j base=workmse_nb_j1run
data workhasilnbmerge workd workmse_nb_j keep kk x1 tetha mu parmlambda ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_b
15
Appendix 6 Syntax program EB with NBR (continued)
run
ods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilnb BASE=workhasilnb1 appendver=v6run
ENDmend
sae_nb
16
Appendix 7 Syntax program EB with ZINB
macro sae_zinb
do x=1 to 900
data workaset work eif ^(u=ampx) then deleterun
proc countreg data=amodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workpe(keep=Parameter Estimate)run
proc transpose data=workpe out=workpe_trun
data _null_set workpe_tcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + amp_Alpha)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(amp_Alpha+ypoi)((mu_hat_b+amp_Alpha)2)bias_b=abs(teta_hat_bayes-parmlamdha)
run
proc append data=workduga base=workduga1run
do h=1 to 30
data workdset workduga1if ^(u=ampx) then deleterundata workjackzinbamphset workdif u=ampxif kk=amph then deleterun
proc countreg data=jackzinbamphmodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workbetha_est_ZINBamph
17
Appendix 7 Syntax program EB with ZINB (continued)
(keep=Parameter Estimate)run
proc transpose data=workbetha_est_ZINBamph out=workbetha_est_ZINBtamphrun
data _null_set workbetha_est_ZINBtamphcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaZINBamphset workdmu_hat_b_amph=exp(ampIntercept + ampx1x1)mu_hat_b_amph= ampb_o- + ampb_1- x1w_b_amph=mu_hat_b_amph (mu_hat_b_amph + (amp_Alpha))teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2
g1_amph =((mu_hat_b_amph2ampalpha_)2)(ampalpha_+y_i)((mu_hat_b_amph2ampalpha_)+mu_hat_b_amph)2
g1_amph=(amp_Alpha+ypoi)((mu_hat_b_amph+amp_Alpha)2)
g1_amph =(A2)(ampk- + y_i)( a +mu_hat_b)2
beda_g_amph=g1_amph-g1run
data workmse_ZINB_jmerge workdugaZINB1 workdugaZINB2 workdugaZINB3 workdugaZINB4 workdugaZINB5 workdugaZINB6 workdugaZINB7 workdugaZINB8 workdugaZINB9 workdugaZINB10 workdugaZINB11 workdugaZINB12workdugaZINB13 workdugaZINB14 workdugaZINB15 workdugaZINB16 workdugaZINB17workdugaZINB18 workdugaZINB19 workdugaZINB20 workdugaZINB21 workdugaZINB22workdugaZINB23 workdugaZINB24 workdugaZINB25 workdugaZINB26 workdugaZINB27workdugaZINB28 workdugaZINB29 workdugaZINB30by kkrun
data workmse_ZINB_jset workmse_ZINB_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampj
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb
8
Appendix 2 Result of EB estimation with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE -053954 010933 0168797 0449506 0369775 360843RRMSE 0022947 0096443 0136424 0238099 0241955 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE -07309 0126202 0201463 0425844 0414597 1734815RRMSE 0021807 0144983 0210692 0326097 0401786 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE -229891 0156942 0277017 0707983 0590466 7469014RRMSE 0023998 0210095 0317195 0519524 0618802 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE -125713 0181557 0284338 0540615 0498521 423089RRMSE 0054916 028362 0420396 0630776 0778033 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442
MSE -181856 0194818 0334706 0859252 0711939 7997074RRMSE 0026206 0387294 0662251 7322807 1312302 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE -34589 0078006 0376514 060793 0804116 3426488RRMSE 000461 0502807 1033578 2981671 2012552 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE -142213 -001433 0255331 0584152 1132152 264456RRMSE 0064209 0847956 1942286 2181192 4589042 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE -10651 -56E-05 -14E-07 -127819 1452962 1132741RRMSE 0063244 1475413 3754705 162697 9221163 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE -175652 -33E-05 -1E-06 2954790 152E-06 613E+08
RRMSE 0040681 4059441 6095076 35E+278 5569021 16E+281
9
Appendix 3 Result of EB estimation (II) with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE 0 0194196 0271669 0419181 045917 3239598RRMSE 0 0300116 0422209 0502895 0641904 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE 0 0235733 0306641 0456258 05091 3652167RRMSE 0 0410357 0585765 0712314 0841838 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE 0 0254097 0330078 2619677 0539873 2354887RRMSE -663045 0448118 0750369 -034911 1209918 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE 0 0235378 0402092 9500073 0876569 6051162RRMSE -10741 0288999 0995659 -100163 2527784 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601bias 0000395 0116669 0254473 1091172 0497898 6297454MSE 0 0 0301527 1444250 5718409 185E+09RRMSE -212936 0 1104113 2205437 5656681 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655
MSE 0 0 0159682 41595285 3074073 12E+11
RRMSE -909131 0 0557622 677E+09 9311925 706E+11
10
Appendix 4 Result of EB estimation (II) with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE 0 010933 0168797 0450626 0369775 360843RRMSE 0 0095932 0135647 023675 0239669 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE 0 0126202 0201463 0428006 0414597 1734815RRMSE 0 0142648 020709 0320663 0395479 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE 0 0156942 0277017 0716543 0590466 7469014RRMSE 0 0203913 0311937 0506882 0615401 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE 0 0181557 0284338 0549835 0498521 423089RRMSE 0 0270309 0405926 0606317 0766631 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442MSE 0 0194818 0334706 094973 0711939 7997074RRMSE 0 0316402 0576343 6561235 1240175 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE 0 0078006 0376514 0749436 0804116 3426488RRMSE 0 0258286 0698814 2340612 1714808 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE 0 0 0255331 1501268 1132152 264456RRMSE 0 0 0688797 1346552 2500825 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE 0 0 0 1755486 1452962 1132741RRMSE 0 0 0 7335062 3311711 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE 0 0 0 2954908 152E-06 613E+08
RRMSE 0 0 0 12E+278 416189 16E+281
11
Appendix 5 Syntax program for generate data
data b generate x1(covariate) and ei input x1cards0222831971100013131702314625252218171412202210run
macro bangkit_datado r=1 to 100
data egenerate poisson-gamma with excess zerodo kk=1 to 30set btetha = rangam(11)lambda = -log(01) peluang munculnya nilai nol yang diinginkan (01-09)starlambda = log(lambdatetha)output endrun
proc regmodel starlambda = x1 ods output ParameterEstimates=workbetha_lr (keep=Parameter Estimate)run
proc transpose data=workbetha_lr out=workbetha_lr_t
12
Appendix 5 Syntax program for generate data (continued)
rundata _null_set workbetha_lr_tcall symput (Intercept col1)call symput (x1 col2)run
data ddo kk=1 to 30set emu = exp(ampIntercept + ampx1x1)parmlambda = mutethaypoi = rand(poissonparmlambda)output endrun
ods trace onto take percent zero on dataproc freq data=dtables ypoi ods output OneWayFreqs=workzerorundata zeroset zerokeep percentrunproc transpose data=zero out=zero1 rundata _null_set workzero1call symput (pctz col1)rundata dset dpzero=amppctzr=amprrun
proc append data=d base=d1run
endmend
bangkit_data
13
Appendix 6 Syntax program EB with NBR
macro sae_nbdo x=1 to 900
data workaset workeif ^(u=ampx) then deleterun
this genmod procedure estimates the response without zero-inflation proc genmod data=amodel ypoi = x1 dist=nb link=logods output ParameterEstimates=workbetha_nb (keep=Parameter Estimate)run
proc transpose data=workbetha_nb out=workbetha_nb_trun
data _null_set workbetha_nb_tcall symput (Intercept col1)call symput (x1 col2)call symput (Dispersion col3)run
EB with negbin-regdata workduga_nbset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + ampDispersion)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(ampDispersion+ypoi)((mu_hat_b+ampDispersion)2)bias_b=abs(teta_hat_bayes-parmlambda)run
proc append data=workduga_nb base=workduga_nb1run
jacknifedo h=1 to 30
data workdset workduga_nb1if ^(u=ampx) then deleterundata workjacknbamphset workdif u=ampxif kk=amph then deleterun
proc genmod data=workjacknbamph output p out=sasyi_estmodel ypoi = x1 dist = nb link=logods output parameterestimates=workbetha_est_nbamph (keep=parameter Estimate)
14
Appendix 6 Syntax program EB with NBR (continued)
runproc transpose data=workbetha_est_nbamph out=workbetha_est_nbtamphrundata _null_set workbetha_est_nbtamphcall symput (Intercept_ col1)call symput (x1_ col2)call symput (Dispersion_ col3)run
data workduganbamphset workdmu_hat_b_amph=exp(ampIntercept_ + ampx1_x1)w_b_amph=mu_hat_b_amph (mu_hat_b_amph + ampDispersion_)teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2g1_amph=(ampDispersion_+ypoi)((mu_hat_b_amph+ampDispersion_)2)beda_g_amph=g1_amph-g1run
data workmse_nb_jmerge workduganb1 workduganb2 workduganb3 workduganb4 workduganb5 workduganb6 workduganb7 workduganb8 workduganb9 workduganb10 workduganb11 workduganb12workduganb13 workduganb14 workduganb15 workduganb16 workduganb17workduganb18 workduganb19 workduganb20 workduganb21 workduganb22workduganb23 workduganb24 workduganb25 workduganb26 workduganb27workduganb28 workduganb29 workduganb30by kkrun
data workmse_nb_jset workmse_nb_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampjendm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesul = ampxrun
proc append data=workmse_nb_j base=workmse_nb_j1run
data workhasilnbmerge workd workmse_nb_j keep kk x1 tetha mu parmlambda ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_b
15
Appendix 6 Syntax program EB with NBR (continued)
run
ods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilnb BASE=workhasilnb1 appendver=v6run
ENDmend
sae_nb
16
Appendix 7 Syntax program EB with ZINB
macro sae_zinb
do x=1 to 900
data workaset work eif ^(u=ampx) then deleterun
proc countreg data=amodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workpe(keep=Parameter Estimate)run
proc transpose data=workpe out=workpe_trun
data _null_set workpe_tcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + amp_Alpha)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(amp_Alpha+ypoi)((mu_hat_b+amp_Alpha)2)bias_b=abs(teta_hat_bayes-parmlamdha)
run
proc append data=workduga base=workduga1run
do h=1 to 30
data workdset workduga1if ^(u=ampx) then deleterundata workjackzinbamphset workdif u=ampxif kk=amph then deleterun
proc countreg data=jackzinbamphmodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workbetha_est_ZINBamph
17
Appendix 7 Syntax program EB with ZINB (continued)
(keep=Parameter Estimate)run
proc transpose data=workbetha_est_ZINBamph out=workbetha_est_ZINBtamphrun
data _null_set workbetha_est_ZINBtamphcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaZINBamphset workdmu_hat_b_amph=exp(ampIntercept + ampx1x1)mu_hat_b_amph= ampb_o- + ampb_1- x1w_b_amph=mu_hat_b_amph (mu_hat_b_amph + (amp_Alpha))teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2
g1_amph =((mu_hat_b_amph2ampalpha_)2)(ampalpha_+y_i)((mu_hat_b_amph2ampalpha_)+mu_hat_b_amph)2
g1_amph=(amp_Alpha+ypoi)((mu_hat_b_amph+amp_Alpha)2)
g1_amph =(A2)(ampk- + y_i)( a +mu_hat_b)2
beda_g_amph=g1_amph-g1run
data workmse_ZINB_jmerge workdugaZINB1 workdugaZINB2 workdugaZINB3 workdugaZINB4 workdugaZINB5 workdugaZINB6 workdugaZINB7 workdugaZINB8 workdugaZINB9 workdugaZINB10 workdugaZINB11 workdugaZINB12workdugaZINB13 workdugaZINB14 workdugaZINB15 workdugaZINB16 workdugaZINB17workdugaZINB18 workdugaZINB19 workdugaZINB20 workdugaZINB21 workdugaZINB22workdugaZINB23 workdugaZINB24 workdugaZINB25 workdugaZINB26 workdugaZINB27workdugaZINB28 workdugaZINB29 workdugaZINB30by kkrun
data workmse_ZINB_jset workmse_ZINB_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampj
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb
9
Appendix 3 Result of EB estimation (II) with NBR
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0426011 1525665 3188832 4252666 5752756 205939
bias 0000446 05164 0878579 1315093 1721091 8704671MSE 0040547 0109118 0159448 0333613 0335256 4167064RRMSE 0041258 0100045 01356 018188 0220426 0576793
20 1333-3667 100 EB estimator 0342831 1013993 2218265 2984668 3953417 1815693bias 0000587 0413611 079407 1100373 1454889 7906915MSE 0055631 0131969 0196963 0353033 0386291 3778251RRMSE 0070449 015421 0205182 0262006 0352726 0788718
30 20-5333 100 EB estimator 0323311 0836545 1562163 2263684 2918741 1214482bias 0000151 0372382 067041 0916482 122012 5950225MSE 0074364 0163462 0231014 0400207 0432371 5250254RRMSE 0102324 0214697 0299247 0361013 0474077 1192032
40 2333-5667 100 EB estimator 024882 064963 1219656 17107 2248716 930007bias 0000564 0293602 0549809 0757937 1007851 486688MSE 0 0194196 0271669 0419181 045917 3239598RRMSE 0 0300116 0422209 0502895 0641904 2202294
50 2333-6333 100 EB estimator 0122548 0570083 1028619 1291758 1728067 6750472bias 000029 0250747 0453265 0622838 0803185 4009352MSE 0 0235733 0306641 0456258 05091 3652167RRMSE 0 0410357 0585765 0712314 0841838 3240156
60 30-70 100 EB estimator -077338 044443 0699758 0944038 1131071 6323352bias 0000452 020433 0398131 0534095 0679938 3848209MSE 0 0254097 0330078 2619677 0539873 2354887RRMSE -663045 0448118 0750369 -034911 1209918 1767434
70 4333-7333 100 EB estimator -33274 0249515 0442513 0659375 0922519 9258959bias 0000375 0155154 0316124 0476883 0588926 8475103MSE 0 0235378 0402092 9500073 0876569 6051162RRMSE -10741 0288999 0995659 -100163 2527784 3332419
80 5333-90 100 EB estimator -232889 017621 0305365 0569959 0576346 6303601bias 0000395 0116669 0254473 1091172 0497898 6297454MSE 0 0 0301527 1444250 5718409 185E+09RRMSE -212936 0 1104113 2205437 5656681 4151289
90 70-100 100 EB estimator -108767 0111208 0230315 0212247 0353129 3625557bias 000016 0086 0177169 0425532 0314714 1092655
MSE 0 0 0159682 41595285 3074073 12E+11
RRMSE -909131 0 0557622 677E+09 9311925 706E+11
10
Appendix 4 Result of EB estimation (II) with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE 0 010933 0168797 0450626 0369775 360843RRMSE 0 0095932 0135647 023675 0239669 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE 0 0126202 0201463 0428006 0414597 1734815RRMSE 0 0142648 020709 0320663 0395479 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE 0 0156942 0277017 0716543 0590466 7469014RRMSE 0 0203913 0311937 0506882 0615401 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE 0 0181557 0284338 0549835 0498521 423089RRMSE 0 0270309 0405926 0606317 0766631 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442MSE 0 0194818 0334706 094973 0711939 7997074RRMSE 0 0316402 0576343 6561235 1240175 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE 0 0078006 0376514 0749436 0804116 3426488RRMSE 0 0258286 0698814 2340612 1714808 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE 0 0 0255331 1501268 1132152 264456RRMSE 0 0 0688797 1346552 2500825 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE 0 0 0 1755486 1452962 1132741RRMSE 0 0 0 7335062 3311711 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE 0 0 0 2954908 152E-06 613E+08
RRMSE 0 0 0 12E+278 416189 16E+281
11
Appendix 5 Syntax program for generate data
data b generate x1(covariate) and ei input x1cards0222831971100013131702314625252218171412202210run
macro bangkit_datado r=1 to 100
data egenerate poisson-gamma with excess zerodo kk=1 to 30set btetha = rangam(11)lambda = -log(01) peluang munculnya nilai nol yang diinginkan (01-09)starlambda = log(lambdatetha)output endrun
proc regmodel starlambda = x1 ods output ParameterEstimates=workbetha_lr (keep=Parameter Estimate)run
proc transpose data=workbetha_lr out=workbetha_lr_t
12
Appendix 5 Syntax program for generate data (continued)
rundata _null_set workbetha_lr_tcall symput (Intercept col1)call symput (x1 col2)run
data ddo kk=1 to 30set emu = exp(ampIntercept + ampx1x1)parmlambda = mutethaypoi = rand(poissonparmlambda)output endrun
ods trace onto take percent zero on dataproc freq data=dtables ypoi ods output OneWayFreqs=workzerorundata zeroset zerokeep percentrunproc transpose data=zero out=zero1 rundata _null_set workzero1call symput (pctz col1)rundata dset dpzero=amppctzr=amprrun
proc append data=d base=d1run
endmend
bangkit_data
13
Appendix 6 Syntax program EB with NBR
macro sae_nbdo x=1 to 900
data workaset workeif ^(u=ampx) then deleterun
this genmod procedure estimates the response without zero-inflation proc genmod data=amodel ypoi = x1 dist=nb link=logods output ParameterEstimates=workbetha_nb (keep=Parameter Estimate)run
proc transpose data=workbetha_nb out=workbetha_nb_trun
data _null_set workbetha_nb_tcall symput (Intercept col1)call symput (x1 col2)call symput (Dispersion col3)run
EB with negbin-regdata workduga_nbset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + ampDispersion)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(ampDispersion+ypoi)((mu_hat_b+ampDispersion)2)bias_b=abs(teta_hat_bayes-parmlambda)run
proc append data=workduga_nb base=workduga_nb1run
jacknifedo h=1 to 30
data workdset workduga_nb1if ^(u=ampx) then deleterundata workjacknbamphset workdif u=ampxif kk=amph then deleterun
proc genmod data=workjacknbamph output p out=sasyi_estmodel ypoi = x1 dist = nb link=logods output parameterestimates=workbetha_est_nbamph (keep=parameter Estimate)
14
Appendix 6 Syntax program EB with NBR (continued)
runproc transpose data=workbetha_est_nbamph out=workbetha_est_nbtamphrundata _null_set workbetha_est_nbtamphcall symput (Intercept_ col1)call symput (x1_ col2)call symput (Dispersion_ col3)run
data workduganbamphset workdmu_hat_b_amph=exp(ampIntercept_ + ampx1_x1)w_b_amph=mu_hat_b_amph (mu_hat_b_amph + ampDispersion_)teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2g1_amph=(ampDispersion_+ypoi)((mu_hat_b_amph+ampDispersion_)2)beda_g_amph=g1_amph-g1run
data workmse_nb_jmerge workduganb1 workduganb2 workduganb3 workduganb4 workduganb5 workduganb6 workduganb7 workduganb8 workduganb9 workduganb10 workduganb11 workduganb12workduganb13 workduganb14 workduganb15 workduganb16 workduganb17workduganb18 workduganb19 workduganb20 workduganb21 workduganb22workduganb23 workduganb24 workduganb25 workduganb26 workduganb27workduganb28 workduganb29 workduganb30by kkrun
data workmse_nb_jset workmse_nb_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampjendm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesul = ampxrun
proc append data=workmse_nb_j base=workmse_nb_j1run
data workhasilnbmerge workd workmse_nb_j keep kk x1 tetha mu parmlambda ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_b
15
Appendix 6 Syntax program EB with NBR (continued)
run
ods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilnb BASE=workhasilnb1 appendver=v6run
ENDmend
sae_nb
16
Appendix 7 Syntax program EB with ZINB
macro sae_zinb
do x=1 to 900
data workaset work eif ^(u=ampx) then deleterun
proc countreg data=amodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workpe(keep=Parameter Estimate)run
proc transpose data=workpe out=workpe_trun
data _null_set workpe_tcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + amp_Alpha)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(amp_Alpha+ypoi)((mu_hat_b+amp_Alpha)2)bias_b=abs(teta_hat_bayes-parmlamdha)
run
proc append data=workduga base=workduga1run
do h=1 to 30
data workdset workduga1if ^(u=ampx) then deleterundata workjackzinbamphset workdif u=ampxif kk=amph then deleterun
proc countreg data=jackzinbamphmodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workbetha_est_ZINBamph
17
Appendix 7 Syntax program EB with ZINB (continued)
(keep=Parameter Estimate)run
proc transpose data=workbetha_est_ZINBamph out=workbetha_est_ZINBtamphrun
data _null_set workbetha_est_ZINBtamphcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaZINBamphset workdmu_hat_b_amph=exp(ampIntercept + ampx1x1)mu_hat_b_amph= ampb_o- + ampb_1- x1w_b_amph=mu_hat_b_amph (mu_hat_b_amph + (amp_Alpha))teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2
g1_amph =((mu_hat_b_amph2ampalpha_)2)(ampalpha_+y_i)((mu_hat_b_amph2ampalpha_)+mu_hat_b_amph)2
g1_amph=(amp_Alpha+ypoi)((mu_hat_b_amph+amp_Alpha)2)
g1_amph =(A2)(ampk- + y_i)( a +mu_hat_b)2
beda_g_amph=g1_amph-g1run
data workmse_ZINB_jmerge workdugaZINB1 workdugaZINB2 workdugaZINB3 workdugaZINB4 workdugaZINB5 workdugaZINB6 workdugaZINB7 workdugaZINB8 workdugaZINB9 workdugaZINB10 workdugaZINB11 workdugaZINB12workdugaZINB13 workdugaZINB14 workdugaZINB15 workdugaZINB16 workdugaZINB17workdugaZINB18 workdugaZINB19 workdugaZINB20 workdugaZINB21 workdugaZINB22workdugaZINB23 workdugaZINB24 workdugaZINB25 workdugaZINB26 workdugaZINB27workdugaZINB28 workdugaZINB29 workdugaZINB30by kkrun
data workmse_ZINB_jset workmse_ZINB_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampj
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb
10
Appendix 4 Result of EB estimation (II) with ZINB
P(Y=0) Aktual r min Q1 median mean Q3 max10 10-3333 100 EB estimator 0267752 1500256 3195861 4280907 5833922 2220705
bias 0000603 0485515 0882468 1315721 1750173 8704672MSE 0 010933 0168797 0450626 0369775 360843RRMSE 0 0095932 0135647 023675 0239669 5518468
20 1333-3667 100 EB estimator 105E-08 0914898 221594 3017228 401361 1815694bias 0000368 0383426 0780984 1105029 1496623 7906918MSE 0 0126202 0201463 0428006 0414597 1734815RRMSE 0 0142648 020709 0320663 0395479 3177943
30 20-5333 100 EB estimator 0132041 0719086 1523909 2308745 3012309 1228058bias 0000508 0332427 0680187 0928947 1254604 6314973MSE 0 0156942 0277017 0716543 0590466 7469014RRMSE 0 0203913 0311937 0506882 0615401 3500387
40 2333-5667 100 EB estimator 105E-08 0574265 1209034 1742928 2368713 104953bias 0000564 0268248 0544049 0771741 1067061 4889872MSE 0 0181557 0284338 0549835 0498521 423089RRMSE 0 0270309 0405926 0606317 0766631 5394515
50 2333-6333 100 EB estimator 105E-08 0426701 1033816 133848 1906961 8018962bias 0000453 0224726 0454522 0661709 0900005 4414442MSE 0 0194818 0334706 094973 0711939 7997074RRMSE 0 0316402 0576343 6561235 1240175 13388294
60 30-70 100 EB estimator 105E-08 030085 0645848 0985327 1154975 728326bias 62E-05 0190886 0406245 0567657 074167 3923952MSE 0 0078006 0376514 0749436 0804116 3426488RRMSE 0 0258286 0698814 2340612 1714808 3308816
70 4333-7333 100 EB estimator 105E-08 105E-08 0341315 0677841 1 5005491bias 979E-05 0128017 0358257 0487174 0654423 3733981MSE 0 0 0255331 1501268 1132152 264456RRMSE 0 0 0688797 1346552 2500825 7899681
80 5333-90 100 EB estimator 105E-08 105E-08 0142906 0445315 0859305 5bias 0000161 0083397 0272773 0392826 0557213 3532556MSE 0 0 0 1755486 1452962 1132741RRMSE 0 0 0 7335062 3311711 3786684
90 70-100 100 EB estimator 1E-277 105E-08 105E-08 0225165 0135512 3bias 0000495 0054221 0153374 027819 0350213 2736904MSE 0 0 0 2954908 152E-06 613E+08
RRMSE 0 0 0 12E+278 416189 16E+281
11
Appendix 5 Syntax program for generate data
data b generate x1(covariate) and ei input x1cards0222831971100013131702314625252218171412202210run
macro bangkit_datado r=1 to 100
data egenerate poisson-gamma with excess zerodo kk=1 to 30set btetha = rangam(11)lambda = -log(01) peluang munculnya nilai nol yang diinginkan (01-09)starlambda = log(lambdatetha)output endrun
proc regmodel starlambda = x1 ods output ParameterEstimates=workbetha_lr (keep=Parameter Estimate)run
proc transpose data=workbetha_lr out=workbetha_lr_t
12
Appendix 5 Syntax program for generate data (continued)
rundata _null_set workbetha_lr_tcall symput (Intercept col1)call symput (x1 col2)run
data ddo kk=1 to 30set emu = exp(ampIntercept + ampx1x1)parmlambda = mutethaypoi = rand(poissonparmlambda)output endrun
ods trace onto take percent zero on dataproc freq data=dtables ypoi ods output OneWayFreqs=workzerorundata zeroset zerokeep percentrunproc transpose data=zero out=zero1 rundata _null_set workzero1call symput (pctz col1)rundata dset dpzero=amppctzr=amprrun
proc append data=d base=d1run
endmend
bangkit_data
13
Appendix 6 Syntax program EB with NBR
macro sae_nbdo x=1 to 900
data workaset workeif ^(u=ampx) then deleterun
this genmod procedure estimates the response without zero-inflation proc genmod data=amodel ypoi = x1 dist=nb link=logods output ParameterEstimates=workbetha_nb (keep=Parameter Estimate)run
proc transpose data=workbetha_nb out=workbetha_nb_trun
data _null_set workbetha_nb_tcall symput (Intercept col1)call symput (x1 col2)call symput (Dispersion col3)run
EB with negbin-regdata workduga_nbset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + ampDispersion)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(ampDispersion+ypoi)((mu_hat_b+ampDispersion)2)bias_b=abs(teta_hat_bayes-parmlambda)run
proc append data=workduga_nb base=workduga_nb1run
jacknifedo h=1 to 30
data workdset workduga_nb1if ^(u=ampx) then deleterundata workjacknbamphset workdif u=ampxif kk=amph then deleterun
proc genmod data=workjacknbamph output p out=sasyi_estmodel ypoi = x1 dist = nb link=logods output parameterestimates=workbetha_est_nbamph (keep=parameter Estimate)
14
Appendix 6 Syntax program EB with NBR (continued)
runproc transpose data=workbetha_est_nbamph out=workbetha_est_nbtamphrundata _null_set workbetha_est_nbtamphcall symput (Intercept_ col1)call symput (x1_ col2)call symput (Dispersion_ col3)run
data workduganbamphset workdmu_hat_b_amph=exp(ampIntercept_ + ampx1_x1)w_b_amph=mu_hat_b_amph (mu_hat_b_amph + ampDispersion_)teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2g1_amph=(ampDispersion_+ypoi)((mu_hat_b_amph+ampDispersion_)2)beda_g_amph=g1_amph-g1run
data workmse_nb_jmerge workduganb1 workduganb2 workduganb3 workduganb4 workduganb5 workduganb6 workduganb7 workduganb8 workduganb9 workduganb10 workduganb11 workduganb12workduganb13 workduganb14 workduganb15 workduganb16 workduganb17workduganb18 workduganb19 workduganb20 workduganb21 workduganb22workduganb23 workduganb24 workduganb25 workduganb26 workduganb27workduganb28 workduganb29 workduganb30by kkrun
data workmse_nb_jset workmse_nb_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampjendm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesul = ampxrun
proc append data=workmse_nb_j base=workmse_nb_j1run
data workhasilnbmerge workd workmse_nb_j keep kk x1 tetha mu parmlambda ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_b
15
Appendix 6 Syntax program EB with NBR (continued)
run
ods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilnb BASE=workhasilnb1 appendver=v6run
ENDmend
sae_nb
16
Appendix 7 Syntax program EB with ZINB
macro sae_zinb
do x=1 to 900
data workaset work eif ^(u=ampx) then deleterun
proc countreg data=amodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workpe(keep=Parameter Estimate)run
proc transpose data=workpe out=workpe_trun
data _null_set workpe_tcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + amp_Alpha)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(amp_Alpha+ypoi)((mu_hat_b+amp_Alpha)2)bias_b=abs(teta_hat_bayes-parmlamdha)
run
proc append data=workduga base=workduga1run
do h=1 to 30
data workdset workduga1if ^(u=ampx) then deleterundata workjackzinbamphset workdif u=ampxif kk=amph then deleterun
proc countreg data=jackzinbamphmodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workbetha_est_ZINBamph
17
Appendix 7 Syntax program EB with ZINB (continued)
(keep=Parameter Estimate)run
proc transpose data=workbetha_est_ZINBamph out=workbetha_est_ZINBtamphrun
data _null_set workbetha_est_ZINBtamphcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaZINBamphset workdmu_hat_b_amph=exp(ampIntercept + ampx1x1)mu_hat_b_amph= ampb_o- + ampb_1- x1w_b_amph=mu_hat_b_amph (mu_hat_b_amph + (amp_Alpha))teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2
g1_amph =((mu_hat_b_amph2ampalpha_)2)(ampalpha_+y_i)((mu_hat_b_amph2ampalpha_)+mu_hat_b_amph)2
g1_amph=(amp_Alpha+ypoi)((mu_hat_b_amph+amp_Alpha)2)
g1_amph =(A2)(ampk- + y_i)( a +mu_hat_b)2
beda_g_amph=g1_amph-g1run
data workmse_ZINB_jmerge workdugaZINB1 workdugaZINB2 workdugaZINB3 workdugaZINB4 workdugaZINB5 workdugaZINB6 workdugaZINB7 workdugaZINB8 workdugaZINB9 workdugaZINB10 workdugaZINB11 workdugaZINB12workdugaZINB13 workdugaZINB14 workdugaZINB15 workdugaZINB16 workdugaZINB17workdugaZINB18 workdugaZINB19 workdugaZINB20 workdugaZINB21 workdugaZINB22workdugaZINB23 workdugaZINB24 workdugaZINB25 workdugaZINB26 workdugaZINB27workdugaZINB28 workdugaZINB29 workdugaZINB30by kkrun
data workmse_ZINB_jset workmse_ZINB_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampj
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb
11
Appendix 5 Syntax program for generate data
data b generate x1(covariate) and ei input x1cards0222831971100013131702314625252218171412202210run
macro bangkit_datado r=1 to 100
data egenerate poisson-gamma with excess zerodo kk=1 to 30set btetha = rangam(11)lambda = -log(01) peluang munculnya nilai nol yang diinginkan (01-09)starlambda = log(lambdatetha)output endrun
proc regmodel starlambda = x1 ods output ParameterEstimates=workbetha_lr (keep=Parameter Estimate)run
proc transpose data=workbetha_lr out=workbetha_lr_t
12
Appendix 5 Syntax program for generate data (continued)
rundata _null_set workbetha_lr_tcall symput (Intercept col1)call symput (x1 col2)run
data ddo kk=1 to 30set emu = exp(ampIntercept + ampx1x1)parmlambda = mutethaypoi = rand(poissonparmlambda)output endrun
ods trace onto take percent zero on dataproc freq data=dtables ypoi ods output OneWayFreqs=workzerorundata zeroset zerokeep percentrunproc transpose data=zero out=zero1 rundata _null_set workzero1call symput (pctz col1)rundata dset dpzero=amppctzr=amprrun
proc append data=d base=d1run
endmend
bangkit_data
13
Appendix 6 Syntax program EB with NBR
macro sae_nbdo x=1 to 900
data workaset workeif ^(u=ampx) then deleterun
this genmod procedure estimates the response without zero-inflation proc genmod data=amodel ypoi = x1 dist=nb link=logods output ParameterEstimates=workbetha_nb (keep=Parameter Estimate)run
proc transpose data=workbetha_nb out=workbetha_nb_trun
data _null_set workbetha_nb_tcall symput (Intercept col1)call symput (x1 col2)call symput (Dispersion col3)run
EB with negbin-regdata workduga_nbset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + ampDispersion)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(ampDispersion+ypoi)((mu_hat_b+ampDispersion)2)bias_b=abs(teta_hat_bayes-parmlambda)run
proc append data=workduga_nb base=workduga_nb1run
jacknifedo h=1 to 30
data workdset workduga_nb1if ^(u=ampx) then deleterundata workjacknbamphset workdif u=ampxif kk=amph then deleterun
proc genmod data=workjacknbamph output p out=sasyi_estmodel ypoi = x1 dist = nb link=logods output parameterestimates=workbetha_est_nbamph (keep=parameter Estimate)
14
Appendix 6 Syntax program EB with NBR (continued)
runproc transpose data=workbetha_est_nbamph out=workbetha_est_nbtamphrundata _null_set workbetha_est_nbtamphcall symput (Intercept_ col1)call symput (x1_ col2)call symput (Dispersion_ col3)run
data workduganbamphset workdmu_hat_b_amph=exp(ampIntercept_ + ampx1_x1)w_b_amph=mu_hat_b_amph (mu_hat_b_amph + ampDispersion_)teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2g1_amph=(ampDispersion_+ypoi)((mu_hat_b_amph+ampDispersion_)2)beda_g_amph=g1_amph-g1run
data workmse_nb_jmerge workduganb1 workduganb2 workduganb3 workduganb4 workduganb5 workduganb6 workduganb7 workduganb8 workduganb9 workduganb10 workduganb11 workduganb12workduganb13 workduganb14 workduganb15 workduganb16 workduganb17workduganb18 workduganb19 workduganb20 workduganb21 workduganb22workduganb23 workduganb24 workduganb25 workduganb26 workduganb27workduganb28 workduganb29 workduganb30by kkrun
data workmse_nb_jset workmse_nb_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampjendm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesul = ampxrun
proc append data=workmse_nb_j base=workmse_nb_j1run
data workhasilnbmerge workd workmse_nb_j keep kk x1 tetha mu parmlambda ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_b
15
Appendix 6 Syntax program EB with NBR (continued)
run
ods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilnb BASE=workhasilnb1 appendver=v6run
ENDmend
sae_nb
16
Appendix 7 Syntax program EB with ZINB
macro sae_zinb
do x=1 to 900
data workaset work eif ^(u=ampx) then deleterun
proc countreg data=amodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workpe(keep=Parameter Estimate)run
proc transpose data=workpe out=workpe_trun
data _null_set workpe_tcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + amp_Alpha)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(amp_Alpha+ypoi)((mu_hat_b+amp_Alpha)2)bias_b=abs(teta_hat_bayes-parmlamdha)
run
proc append data=workduga base=workduga1run
do h=1 to 30
data workdset workduga1if ^(u=ampx) then deleterundata workjackzinbamphset workdif u=ampxif kk=amph then deleterun
proc countreg data=jackzinbamphmodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workbetha_est_ZINBamph
17
Appendix 7 Syntax program EB with ZINB (continued)
(keep=Parameter Estimate)run
proc transpose data=workbetha_est_ZINBamph out=workbetha_est_ZINBtamphrun
data _null_set workbetha_est_ZINBtamphcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaZINBamphset workdmu_hat_b_amph=exp(ampIntercept + ampx1x1)mu_hat_b_amph= ampb_o- + ampb_1- x1w_b_amph=mu_hat_b_amph (mu_hat_b_amph + (amp_Alpha))teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2
g1_amph =((mu_hat_b_amph2ampalpha_)2)(ampalpha_+y_i)((mu_hat_b_amph2ampalpha_)+mu_hat_b_amph)2
g1_amph=(amp_Alpha+ypoi)((mu_hat_b_amph+amp_Alpha)2)
g1_amph =(A2)(ampk- + y_i)( a +mu_hat_b)2
beda_g_amph=g1_amph-g1run
data workmse_ZINB_jmerge workdugaZINB1 workdugaZINB2 workdugaZINB3 workdugaZINB4 workdugaZINB5 workdugaZINB6 workdugaZINB7 workdugaZINB8 workdugaZINB9 workdugaZINB10 workdugaZINB11 workdugaZINB12workdugaZINB13 workdugaZINB14 workdugaZINB15 workdugaZINB16 workdugaZINB17workdugaZINB18 workdugaZINB19 workdugaZINB20 workdugaZINB21 workdugaZINB22workdugaZINB23 workdugaZINB24 workdugaZINB25 workdugaZINB26 workdugaZINB27workdugaZINB28 workdugaZINB29 workdugaZINB30by kkrun
data workmse_ZINB_jset workmse_ZINB_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampj
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb
12
Appendix 5 Syntax program for generate data (continued)
rundata _null_set workbetha_lr_tcall symput (Intercept col1)call symput (x1 col2)run
data ddo kk=1 to 30set emu = exp(ampIntercept + ampx1x1)parmlambda = mutethaypoi = rand(poissonparmlambda)output endrun
ods trace onto take percent zero on dataproc freq data=dtables ypoi ods output OneWayFreqs=workzerorundata zeroset zerokeep percentrunproc transpose data=zero out=zero1 rundata _null_set workzero1call symput (pctz col1)rundata dset dpzero=amppctzr=amprrun
proc append data=d base=d1run
endmend
bangkit_data
13
Appendix 6 Syntax program EB with NBR
macro sae_nbdo x=1 to 900
data workaset workeif ^(u=ampx) then deleterun
this genmod procedure estimates the response without zero-inflation proc genmod data=amodel ypoi = x1 dist=nb link=logods output ParameterEstimates=workbetha_nb (keep=Parameter Estimate)run
proc transpose data=workbetha_nb out=workbetha_nb_trun
data _null_set workbetha_nb_tcall symput (Intercept col1)call symput (x1 col2)call symput (Dispersion col3)run
EB with negbin-regdata workduga_nbset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + ampDispersion)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(ampDispersion+ypoi)((mu_hat_b+ampDispersion)2)bias_b=abs(teta_hat_bayes-parmlambda)run
proc append data=workduga_nb base=workduga_nb1run
jacknifedo h=1 to 30
data workdset workduga_nb1if ^(u=ampx) then deleterundata workjacknbamphset workdif u=ampxif kk=amph then deleterun
proc genmod data=workjacknbamph output p out=sasyi_estmodel ypoi = x1 dist = nb link=logods output parameterestimates=workbetha_est_nbamph (keep=parameter Estimate)
14
Appendix 6 Syntax program EB with NBR (continued)
runproc transpose data=workbetha_est_nbamph out=workbetha_est_nbtamphrundata _null_set workbetha_est_nbtamphcall symput (Intercept_ col1)call symput (x1_ col2)call symput (Dispersion_ col3)run
data workduganbamphset workdmu_hat_b_amph=exp(ampIntercept_ + ampx1_x1)w_b_amph=mu_hat_b_amph (mu_hat_b_amph + ampDispersion_)teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2g1_amph=(ampDispersion_+ypoi)((mu_hat_b_amph+ampDispersion_)2)beda_g_amph=g1_amph-g1run
data workmse_nb_jmerge workduganb1 workduganb2 workduganb3 workduganb4 workduganb5 workduganb6 workduganb7 workduganb8 workduganb9 workduganb10 workduganb11 workduganb12workduganb13 workduganb14 workduganb15 workduganb16 workduganb17workduganb18 workduganb19 workduganb20 workduganb21 workduganb22workduganb23 workduganb24 workduganb25 workduganb26 workduganb27workduganb28 workduganb29 workduganb30by kkrun
data workmse_nb_jset workmse_nb_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampjendm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesul = ampxrun
proc append data=workmse_nb_j base=workmse_nb_j1run
data workhasilnbmerge workd workmse_nb_j keep kk x1 tetha mu parmlambda ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_b
15
Appendix 6 Syntax program EB with NBR (continued)
run
ods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilnb BASE=workhasilnb1 appendver=v6run
ENDmend
sae_nb
16
Appendix 7 Syntax program EB with ZINB
macro sae_zinb
do x=1 to 900
data workaset work eif ^(u=ampx) then deleterun
proc countreg data=amodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workpe(keep=Parameter Estimate)run
proc transpose data=workpe out=workpe_trun
data _null_set workpe_tcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + amp_Alpha)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(amp_Alpha+ypoi)((mu_hat_b+amp_Alpha)2)bias_b=abs(teta_hat_bayes-parmlamdha)
run
proc append data=workduga base=workduga1run
do h=1 to 30
data workdset workduga1if ^(u=ampx) then deleterundata workjackzinbamphset workdif u=ampxif kk=amph then deleterun
proc countreg data=jackzinbamphmodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workbetha_est_ZINBamph
17
Appendix 7 Syntax program EB with ZINB (continued)
(keep=Parameter Estimate)run
proc transpose data=workbetha_est_ZINBamph out=workbetha_est_ZINBtamphrun
data _null_set workbetha_est_ZINBtamphcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaZINBamphset workdmu_hat_b_amph=exp(ampIntercept + ampx1x1)mu_hat_b_amph= ampb_o- + ampb_1- x1w_b_amph=mu_hat_b_amph (mu_hat_b_amph + (amp_Alpha))teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2
g1_amph =((mu_hat_b_amph2ampalpha_)2)(ampalpha_+y_i)((mu_hat_b_amph2ampalpha_)+mu_hat_b_amph)2
g1_amph=(amp_Alpha+ypoi)((mu_hat_b_amph+amp_Alpha)2)
g1_amph =(A2)(ampk- + y_i)( a +mu_hat_b)2
beda_g_amph=g1_amph-g1run
data workmse_ZINB_jmerge workdugaZINB1 workdugaZINB2 workdugaZINB3 workdugaZINB4 workdugaZINB5 workdugaZINB6 workdugaZINB7 workdugaZINB8 workdugaZINB9 workdugaZINB10 workdugaZINB11 workdugaZINB12workdugaZINB13 workdugaZINB14 workdugaZINB15 workdugaZINB16 workdugaZINB17workdugaZINB18 workdugaZINB19 workdugaZINB20 workdugaZINB21 workdugaZINB22workdugaZINB23 workdugaZINB24 workdugaZINB25 workdugaZINB26 workdugaZINB27workdugaZINB28 workdugaZINB29 workdugaZINB30by kkrun
data workmse_ZINB_jset workmse_ZINB_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampj
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb
13
Appendix 6 Syntax program EB with NBR
macro sae_nbdo x=1 to 900
data workaset workeif ^(u=ampx) then deleterun
this genmod procedure estimates the response without zero-inflation proc genmod data=amodel ypoi = x1 dist=nb link=logods output ParameterEstimates=workbetha_nb (keep=Parameter Estimate)run
proc transpose data=workbetha_nb out=workbetha_nb_trun
data _null_set workbetha_nb_tcall symput (Intercept col1)call symput (x1 col2)call symput (Dispersion col3)run
EB with negbin-regdata workduga_nbset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + ampDispersion)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(ampDispersion+ypoi)((mu_hat_b+ampDispersion)2)bias_b=abs(teta_hat_bayes-parmlambda)run
proc append data=workduga_nb base=workduga_nb1run
jacknifedo h=1 to 30
data workdset workduga_nb1if ^(u=ampx) then deleterundata workjacknbamphset workdif u=ampxif kk=amph then deleterun
proc genmod data=workjacknbamph output p out=sasyi_estmodel ypoi = x1 dist = nb link=logods output parameterestimates=workbetha_est_nbamph (keep=parameter Estimate)
14
Appendix 6 Syntax program EB with NBR (continued)
runproc transpose data=workbetha_est_nbamph out=workbetha_est_nbtamphrundata _null_set workbetha_est_nbtamphcall symput (Intercept_ col1)call symput (x1_ col2)call symput (Dispersion_ col3)run
data workduganbamphset workdmu_hat_b_amph=exp(ampIntercept_ + ampx1_x1)w_b_amph=mu_hat_b_amph (mu_hat_b_amph + ampDispersion_)teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2g1_amph=(ampDispersion_+ypoi)((mu_hat_b_amph+ampDispersion_)2)beda_g_amph=g1_amph-g1run
data workmse_nb_jmerge workduganb1 workduganb2 workduganb3 workduganb4 workduganb5 workduganb6 workduganb7 workduganb8 workduganb9 workduganb10 workduganb11 workduganb12workduganb13 workduganb14 workduganb15 workduganb16 workduganb17workduganb18 workduganb19 workduganb20 workduganb21 workduganb22workduganb23 workduganb24 workduganb25 workduganb26 workduganb27workduganb28 workduganb29 workduganb30by kkrun
data workmse_nb_jset workmse_nb_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampjendm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesul = ampxrun
proc append data=workmse_nb_j base=workmse_nb_j1run
data workhasilnbmerge workd workmse_nb_j keep kk x1 tetha mu parmlambda ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_b
15
Appendix 6 Syntax program EB with NBR (continued)
run
ods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilnb BASE=workhasilnb1 appendver=v6run
ENDmend
sae_nb
16
Appendix 7 Syntax program EB with ZINB
macro sae_zinb
do x=1 to 900
data workaset work eif ^(u=ampx) then deleterun
proc countreg data=amodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workpe(keep=Parameter Estimate)run
proc transpose data=workpe out=workpe_trun
data _null_set workpe_tcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + amp_Alpha)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(amp_Alpha+ypoi)((mu_hat_b+amp_Alpha)2)bias_b=abs(teta_hat_bayes-parmlamdha)
run
proc append data=workduga base=workduga1run
do h=1 to 30
data workdset workduga1if ^(u=ampx) then deleterundata workjackzinbamphset workdif u=ampxif kk=amph then deleterun
proc countreg data=jackzinbamphmodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workbetha_est_ZINBamph
17
Appendix 7 Syntax program EB with ZINB (continued)
(keep=Parameter Estimate)run
proc transpose data=workbetha_est_ZINBamph out=workbetha_est_ZINBtamphrun
data _null_set workbetha_est_ZINBtamphcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaZINBamphset workdmu_hat_b_amph=exp(ampIntercept + ampx1x1)mu_hat_b_amph= ampb_o- + ampb_1- x1w_b_amph=mu_hat_b_amph (mu_hat_b_amph + (amp_Alpha))teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2
g1_amph =((mu_hat_b_amph2ampalpha_)2)(ampalpha_+y_i)((mu_hat_b_amph2ampalpha_)+mu_hat_b_amph)2
g1_amph=(amp_Alpha+ypoi)((mu_hat_b_amph+amp_Alpha)2)
g1_amph =(A2)(ampk- + y_i)( a +mu_hat_b)2
beda_g_amph=g1_amph-g1run
data workmse_ZINB_jmerge workdugaZINB1 workdugaZINB2 workdugaZINB3 workdugaZINB4 workdugaZINB5 workdugaZINB6 workdugaZINB7 workdugaZINB8 workdugaZINB9 workdugaZINB10 workdugaZINB11 workdugaZINB12workdugaZINB13 workdugaZINB14 workdugaZINB15 workdugaZINB16 workdugaZINB17workdugaZINB18 workdugaZINB19 workdugaZINB20 workdugaZINB21 workdugaZINB22workdugaZINB23 workdugaZINB24 workdugaZINB25 workdugaZINB26 workdugaZINB27workdugaZINB28 workdugaZINB29 workdugaZINB30by kkrun
data workmse_ZINB_jset workmse_ZINB_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampj
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb
14
Appendix 6 Syntax program EB with NBR (continued)
runproc transpose data=workbetha_est_nbamph out=workbetha_est_nbtamphrundata _null_set workbetha_est_nbtamphcall symput (Intercept_ col1)call symput (x1_ col2)call symput (Dispersion_ col3)run
data workduganbamphset workdmu_hat_b_amph=exp(ampIntercept_ + ampx1_x1)w_b_amph=mu_hat_b_amph (mu_hat_b_amph + ampDispersion_)teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2g1_amph=(ampDispersion_+ypoi)((mu_hat_b_amph+ampDispersion_)2)beda_g_amph=g1_amph-g1run
data workmse_nb_jmerge workduganb1 workduganb2 workduganb3 workduganb4 workduganb5 workduganb6 workduganb7 workduganb8 workduganb9 workduganb10 workduganb11 workduganb12workduganb13 workduganb14 workduganb15 workduganb16 workduganb17workduganb18 workduganb19 workduganb20 workduganb21 workduganb22workduganb23 workduganb24 workduganb25 workduganb26 workduganb27workduganb28 workduganb29 workduganb30by kkrun
data workmse_nb_jset workmse_nb_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampjendm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesul = ampxrun
proc append data=workmse_nb_j base=workmse_nb_j1run
data workhasilnbmerge workd workmse_nb_j keep kk x1 tetha mu parmlambda ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_b
15
Appendix 6 Syntax program EB with NBR (continued)
run
ods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilnb BASE=workhasilnb1 appendver=v6run
ENDmend
sae_nb
16
Appendix 7 Syntax program EB with ZINB
macro sae_zinb
do x=1 to 900
data workaset work eif ^(u=ampx) then deleterun
proc countreg data=amodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workpe(keep=Parameter Estimate)run
proc transpose data=workpe out=workpe_trun
data _null_set workpe_tcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + amp_Alpha)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(amp_Alpha+ypoi)((mu_hat_b+amp_Alpha)2)bias_b=abs(teta_hat_bayes-parmlamdha)
run
proc append data=workduga base=workduga1run
do h=1 to 30
data workdset workduga1if ^(u=ampx) then deleterundata workjackzinbamphset workdif u=ampxif kk=amph then deleterun
proc countreg data=jackzinbamphmodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workbetha_est_ZINBamph
17
Appendix 7 Syntax program EB with ZINB (continued)
(keep=Parameter Estimate)run
proc transpose data=workbetha_est_ZINBamph out=workbetha_est_ZINBtamphrun
data _null_set workbetha_est_ZINBtamphcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaZINBamphset workdmu_hat_b_amph=exp(ampIntercept + ampx1x1)mu_hat_b_amph= ampb_o- + ampb_1- x1w_b_amph=mu_hat_b_amph (mu_hat_b_amph + (amp_Alpha))teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2
g1_amph =((mu_hat_b_amph2ampalpha_)2)(ampalpha_+y_i)((mu_hat_b_amph2ampalpha_)+mu_hat_b_amph)2
g1_amph=(amp_Alpha+ypoi)((mu_hat_b_amph+amp_Alpha)2)
g1_amph =(A2)(ampk- + y_i)( a +mu_hat_b)2
beda_g_amph=g1_amph-g1run
data workmse_ZINB_jmerge workdugaZINB1 workdugaZINB2 workdugaZINB3 workdugaZINB4 workdugaZINB5 workdugaZINB6 workdugaZINB7 workdugaZINB8 workdugaZINB9 workdugaZINB10 workdugaZINB11 workdugaZINB12workdugaZINB13 workdugaZINB14 workdugaZINB15 workdugaZINB16 workdugaZINB17workdugaZINB18 workdugaZINB19 workdugaZINB20 workdugaZINB21 workdugaZINB22workdugaZINB23 workdugaZINB24 workdugaZINB25 workdugaZINB26 workdugaZINB27workdugaZINB28 workdugaZINB29 workdugaZINB30by kkrun
data workmse_ZINB_jset workmse_ZINB_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampj
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb
15
Appendix 6 Syntax program EB with NBR (continued)
run
ods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilnb BASE=workhasilnb1 appendver=v6run
ENDmend
sae_nb
16
Appendix 7 Syntax program EB with ZINB
macro sae_zinb
do x=1 to 900
data workaset work eif ^(u=ampx) then deleterun
proc countreg data=amodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workpe(keep=Parameter Estimate)run
proc transpose data=workpe out=workpe_trun
data _null_set workpe_tcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + amp_Alpha)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(amp_Alpha+ypoi)((mu_hat_b+amp_Alpha)2)bias_b=abs(teta_hat_bayes-parmlamdha)
run
proc append data=workduga base=workduga1run
do h=1 to 30
data workdset workduga1if ^(u=ampx) then deleterundata workjackzinbamphset workdif u=ampxif kk=amph then deleterun
proc countreg data=jackzinbamphmodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workbetha_est_ZINBamph
17
Appendix 7 Syntax program EB with ZINB (continued)
(keep=Parameter Estimate)run
proc transpose data=workbetha_est_ZINBamph out=workbetha_est_ZINBtamphrun
data _null_set workbetha_est_ZINBtamphcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaZINBamphset workdmu_hat_b_amph=exp(ampIntercept + ampx1x1)mu_hat_b_amph= ampb_o- + ampb_1- x1w_b_amph=mu_hat_b_amph (mu_hat_b_amph + (amp_Alpha))teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2
g1_amph =((mu_hat_b_amph2ampalpha_)2)(ampalpha_+y_i)((mu_hat_b_amph2ampalpha_)+mu_hat_b_amph)2
g1_amph=(amp_Alpha+ypoi)((mu_hat_b_amph+amp_Alpha)2)
g1_amph =(A2)(ampk- + y_i)( a +mu_hat_b)2
beda_g_amph=g1_amph-g1run
data workmse_ZINB_jmerge workdugaZINB1 workdugaZINB2 workdugaZINB3 workdugaZINB4 workdugaZINB5 workdugaZINB6 workdugaZINB7 workdugaZINB8 workdugaZINB9 workdugaZINB10 workdugaZINB11 workdugaZINB12workdugaZINB13 workdugaZINB14 workdugaZINB15 workdugaZINB16 workdugaZINB17workdugaZINB18 workdugaZINB19 workdugaZINB20 workdugaZINB21 workdugaZINB22workdugaZINB23 workdugaZINB24 workdugaZINB25 workdugaZINB26 workdugaZINB27workdugaZINB28 workdugaZINB29 workdugaZINB30by kkrun
data workmse_ZINB_jset workmse_ZINB_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampj
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb
16
Appendix 7 Syntax program EB with ZINB
macro sae_zinb
do x=1 to 900
data workaset work eif ^(u=ampx) then deleterun
proc countreg data=amodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workpe(keep=Parameter Estimate)run
proc transpose data=workpe out=workpe_trun
data _null_set workpe_tcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaset amu_hat_b=exp(ampIntercept + ampx1x1) w_bayes=mu_hat_b(mu_hat_b + amp_Alpha)teta_hat_bayes=w_bayesypoi+(1-w_bayes)mu_hat_bg1=(amp_Alpha+ypoi)((mu_hat_b+amp_Alpha)2)bias_b=abs(teta_hat_bayes-parmlamdha)
run
proc append data=workduga base=workduga1run
do h=1 to 30
data workdset workduga1if ^(u=ampx) then deleterundata workjackzinbamphset workdif u=ampxif kk=amph then deleterun
proc countreg data=jackzinbamphmodel ypoi=x1dist=zinb method=qnzeromodel ypoi ~ x1ods output ParameterEstimates=workbetha_est_ZINBamph
17
Appendix 7 Syntax program EB with ZINB (continued)
(keep=Parameter Estimate)run
proc transpose data=workbetha_est_ZINBamph out=workbetha_est_ZINBtamphrun
data _null_set workbetha_est_ZINBtamphcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaZINBamphset workdmu_hat_b_amph=exp(ampIntercept + ampx1x1)mu_hat_b_amph= ampb_o- + ampb_1- x1w_b_amph=mu_hat_b_amph (mu_hat_b_amph + (amp_Alpha))teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2
g1_amph =((mu_hat_b_amph2ampalpha_)2)(ampalpha_+y_i)((mu_hat_b_amph2ampalpha_)+mu_hat_b_amph)2
g1_amph=(amp_Alpha+ypoi)((mu_hat_b_amph+amp_Alpha)2)
g1_amph =(A2)(ampk- + y_i)( a +mu_hat_b)2
beda_g_amph=g1_amph-g1run
data workmse_ZINB_jmerge workdugaZINB1 workdugaZINB2 workdugaZINB3 workdugaZINB4 workdugaZINB5 workdugaZINB6 workdugaZINB7 workdugaZINB8 workdugaZINB9 workdugaZINB10 workdugaZINB11 workdugaZINB12workdugaZINB13 workdugaZINB14 workdugaZINB15 workdugaZINB16 workdugaZINB17workdugaZINB18 workdugaZINB19 workdugaZINB20 workdugaZINB21 workdugaZINB22workdugaZINB23 workdugaZINB24 workdugaZINB25 workdugaZINB26 workdugaZINB27workdugaZINB28 workdugaZINB29 workdugaZINB30by kkrun
data workmse_ZINB_jset workmse_ZINB_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampj
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb
17
Appendix 7 Syntax program EB with ZINB (continued)
(keep=Parameter Estimate)run
proc transpose data=workbetha_est_ZINBamph out=workbetha_est_ZINBtamphrun
data _null_set workbetha_est_ZINBtamphcall symput (Intercept col1)call symput (x1 col2)call symput (Inf_Intercept col3)call symput (Inf_x1 col4)call symput (_Alpha col5)run
data workdugaZINBamphset workdmu_hat_b_amph=exp(ampIntercept + ampx1x1)mu_hat_b_amph= ampb_o- + ampb_1- x1w_b_amph=mu_hat_b_amph (mu_hat_b_amph + (amp_Alpha))teta_hat_amph=w_b_amph ypoi+(1-w_b_amph)mu_hat_b_amphdelta_amph=(teta_hat_amph - teta_hat_bayes)2
g1_amph =((mu_hat_b_amph2ampalpha_)2)(ampalpha_+y_i)((mu_hat_b_amph2ampalpha_)+mu_hat_b_amph)2
g1_amph=(amp_Alpha+ypoi)((mu_hat_b_amph+amp_Alpha)2)
g1_amph =(A2)(ampk- + y_i)( a +mu_hat_b)2
beda_g_amph=g1_amph-g1run
data workmse_ZINB_jmerge workdugaZINB1 workdugaZINB2 workdugaZINB3 workdugaZINB4 workdugaZINB5 workdugaZINB6 workdugaZINB7 workdugaZINB8 workdugaZINB9 workdugaZINB10 workdugaZINB11 workdugaZINB12workdugaZINB13 workdugaZINB14 workdugaZINB15 workdugaZINB16 workdugaZINB17workdugaZINB18 workdugaZINB19 workdugaZINB20 workdugaZINB21 workdugaZINB22workdugaZINB23 workdugaZINB24 workdugaZINB25 workdugaZINB26 workdugaZINB27workdugaZINB28 workdugaZINB29 workdugaZINB30by kkrun
data workmse_ZINB_jset workmse_ZINB_jt_sum=0g_sum=0do j=1 to 30g_sum=g_sum+beda_g_ampjt_sum=t_sum+delta_ampj
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb
18
Appendix 7 Syntax program EB with ZINB (continued)
endm2i=((30-1)30)t_summ1i=g1-((30-1)30)g_summse_j_b=m1i+m2irrmse_j_b=sqrt(mse_j_b)teta_hat_bayesrun
data workhasilZINBmerge workd workmse_ZINB_j keep kk x1 tetha mu lamdha ypoi pzero r peluangnol u mu_hat_b w_bayes teta_hat_bayes bias_b mse_j_b rrmse_j_brunods listingoption nodate ls=130 ps=130ods html
end
proc append data=workhasilZINB BASE=workhasilZINB1run
ENDmend
sae_zinb