Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Vol.:(0123456789)
Annals of Data Science (2021) 8(1):57–90https://doi.org/10.1007/s40745-020-00282-0
1 3
A Six Parameters Beta Distribution with Application for Modeling Waiting Time of Muslim Early Morning Prayer
Rafid S. A. Alshkaki1
Received: 21 February 2020 / Revised: 19 April 2020 / Accepted: 24 April 2020 / Published online: 18 May 2020 © The Author(s) 2020
AbstractBeta distribution is a well-known and widely used distribution for modeling and analyzing lifetime data, due to its interesting characteristics. In this paper, a six parameters beta distribution is introduced as a generalization of the two (standard) and the four parameters beta distributions. This distribution is closed under scal-ing and exponentiation, and has reflection symmetry property, has some well-known distributions as special cases, such as, the two and four parameters beta, general-ized modification of the Kumaraswamy, generalized beta of the first kind, the power function, Kumaraswamy power function, Minimax, exponentiated Pareto, and the generalized uniform distributions. Its moments about the origin, moment generating function, incomplete moments, mean deviations, are derived. The maximum likeli-hood estimation method is used for estimating its parameters and applied to estimate the parameters of the six different simulated data sets of this distribution, in order to check the performance of the estimation method through the estimated parameters mean squares errors computed from the different simulated sample sizes. Finally, two real life data sets, represent the waiting period of Muslim worshipers from the time of entering the mosque till the actual time of starting Alfajir pray in two differ-ent mosques, were used to illustrate the usefulness and the flexibility of this distribu-tion, as well as, presents better fitting than the other gamma, exponential, the four parameters beta, and the generalized beta of the first kind distributions
Keywords Beta distribution · Maximum likelihood estimator · Moments · Simulation study · Applications
Mathematics Subject Classification 60E05 · 62E15 · 65C05
* Rafid S. A. Alshkaki [email protected]
1 Ahmed Bin Mohammed Military College, Doha, Qatar
58 Annals of Data Science (2021) 8(1):57–90
1 3
1 Introduction
Due to its interesting characteristics, the beta distribution is one of the well-known con-tinuous distribution, that has a wide range of application in various filed, such as reli-ability applications and production quality control. It has a flexible shape, that reflects a wide range of natural and empirical phenomena in nature and reality that can be modelling with this distribution. Its domain, the interval from zero to one, add another interesting characteristic to this distribution by allowing it to consider as a probabil-ity distribution of probabilities, such as fraction of time, measurements whose values (or relative values) all lie between zero and one, or the random behavior of percent-ages and fractions, especially, in the cases when we have no idea about the probabil-ity, and therefore, it can be used to represents all probabilities. Another area that used beta distribution for representing possible values of probabilities or a distribution of the probabilities is the Bayesian studies, as being the prior distribution, that is widely used. In fact, it is one of the three common distributions, with the rectangular/uniform and normal distributions, that are employed to represents within the framework Bayesian analysis of continuous variables, Sheskin [1, p. 397]. Data mining methods and tech-niques need to use information about the prior probability knowledge, hence the beta distribution is representing a candidate for such situations, see Shi [2], and Olson and Shi [3] for further details. For an intensive reference of the beta distribution see John-son et al. [4, p. 210–275].
The probability density function (pdf) of the four parameters beta distribution, John-son et al. [5, p. 210], is given by;
where, the parameters �, �, a and b satisfy that 𝛼 > 0, 𝛽 > 0, a and b are real number such that a < b , B(�, �) is the beta function, Abramowitz and Stegun [6, p. 258], defined by;
and � (�), the gamma function, Abramowitz and Stegun [6, p. 255], defined by;
The common widely used form of beta distribution in the literature, is the pdf given by;
(1)f (t) =(t − a)𝛼−1(b − t)𝛽−1
B(𝛼, 𝛽)(b − a)𝛼+𝛽−1, a < t < b,
(2)B(�, �) =
1
∫0
t�−1(1 − t)�−1dt =� (�)� (�)
� (� + �)
(3)� (�) =
1
∫0
t�−1e−tdt
(4)f (x) =x𝛼−1(1 − x)𝛽−1
B(𝛼, 𝛽), 0 < x < 1,
59
1 3
Annals of Data Science (2021) 8(1):57–90
This two parameters form is called sometimes, the standard beta distribution, which is obtained from (1) by making the transformation; x = t−a
b−a.
One direction of the research employing the beta distribution is the generalization of the form given by (4), in order to be even more flexible and cover a lot of shapes.
Armero and Bayarri [7] introduced the Gauss hypergeometric distribution, with parameters p , q, r and �, as a generalization to the beta distribution when they stud-ied a Bayesian queuing theory problem, with the following pdf;
where p > 0, q > 0, −∞ < r < ∞ , 𝛿 > −1, and F(2,1) is the generalized hypergeo-metric function defined for non-negative integers n and m by;
and (a)k is defined by;
Gordy [8] introduced the confluent hypergeometric distribution, with parameters p , q and s, with pdf given by;
where p > 0, q > 0 and −∞ < s < ∞.Pathan et al. [9] introduced a five parameters distribution as a generalization beta
distribution, called it generalized beta distribution, with pdf given by;
where the parameters �, �, �, � and � satisfy that 𝛼 > 0, 𝛽 > 0, 0 ≤ 𝜎 < 1, � and � are real numbers and Φ1(.) is the Humbert’s confluent hypergeometric function given in Srivastava and Manocha [10, p. 58, Eq. (36)], and derive expressions for its distribu-tion function moments.
Ng et al. [11] study the properties and evaluate the prediction level of a 6 param-eters generalized beta distribution model with pdf given by;
(5)f (x) =xp−1(1 − x)q−1(1 + 𝛿x)−r
B(p, q)F(2,1)(r, p;p + q; − 𝛿), 0 < x < 1,
(6)F(n,m)
(a1,… , an;b1,… , bm;z
)=
∞∑k=0
(a1)k…
(an)k(
b1)k…
(bm
)k
zk
k!
(7)(a)k =
{0 if k = 0
a(a + 1)… (a + k − 1), k = 1, 2, 3,…
f (x) =xp−1(1 − x)q−1exp(−sx)
B(p, q)F(1,1)(p, p + q,−s), 0 < x < 1,
f (x) =x𝛼−1(1 − x)𝛽−1(1 − 𝜎x)𝜌−1exp(−𝛾x)
B(𝛼, 𝛽)Φ1(𝛼, 𝜌;𝛼 + 𝛽;𝜎,−𝛾), 0 < x < 1,
(8)
f (x) =
𝛤 (𝛾+𝜌−𝛼)𝛤 (𝛾+𝜌−𝛽)
𝛤 (𝛾+𝜌)𝛤 (𝛾+𝜌−𝛼−𝛽)(1 − z)𝜎x𝛾−1(1 − x)𝜌−1(1 − 𝜎x)𝜌−1(1 − zx)−𝜎F(2,1)(𝛼, 𝛽;𝛾;x)
F(3,2)
(𝜌, 𝜎;𝛾 + 𝜌 − 𝛼 − 𝛽;𝛾 + 𝜌 − 𝛼, 𝛾 + 𝜌 − 𝛽; z
z−1
)B(𝛾 , 𝜌)
0 < x < 1,
60 Annals of Data Science (2021) 8(1):57–90
1 3
where the parameters �, �, � , �, z and � satisfy that 𝛼 > 0, 𝛽 > 0, 𝛾 > 0, 𝜎 > 0, z < 0.5 and 𝜌 > 𝛼 + 𝛽 − 𝛾 .
Although Ng et al. [11], who provided a nice literature review for the beta family, showed interesting advances of this distribution with pdf given by (8) for fitting many different types of data, as well as that of Armero and Bayarri [7], Gordy [8] and Pathan et al. [9], but it is not easy to work with empirically. Finally, Gómez-Déniz and Sarabia [12] introduced a generalization of the standard beta distribution with bounded support, and study some of its basic properties, the behavior of its maximum likelihood estima-tors through simulation and derive its multivariate version.
The rest of the paper is organized as follows. Section 2 defines the six parameters beta distribution (SPBD). Section 3 gives some properties of this distribution, these properties are; boundaries and some limits of the pdf of SPBD, series expansion of its pdf, its mode, quantile function, reliability function, hazard function, special cases of SPBD, some transformation of the SPBD, its scaling, exponentiation and reflection symmetry properties, generation of its random variates, its order statistics distribu-tion, moments about the origin, mean and variance, moment generating function, har-monic mean, incomplete moments, mean deviations, probability weighted moments, Renyi entropy, and Lorenz and Bonferroni curves. Section 4 introduces estimation of its parameters using the method of maximum likelihood estimation (MLE). Section 5 gives six miscellaneous simulation study of the SPBD to check the performance of the MLE. Section 6 uses the SPBD and other nested and related distributions to fit two dif-ferent real-life data. Finally, Sect. 7 ends with conclusions.
2 The Six Parameters Beta Distribution
Let 0 < a, b, 𝛼, 𝛽,A,B < ∞ , such that A < B , and define the function f by:
where B(�, �) is the beta function defined by (2). We will write f (x) instead of f (x;a, b, �, �,A,B) for simplicity. We have the following proposition;
Proposition 1 The function f defined by (9) is a pdf with its cumulative distribu-tion function (CDF) F given by;
(9)
f (x;a, b, 𝛼, 𝛽,A,B) =
⎧⎪⎨⎪⎩
bxb−1
ab(B−A)𝛼+𝛽−1B(𝛼,𝛽)
��x
a
�b
− A
�𝛼−1�B −
�x
a
�b�𝛽−1
, aA1
b < x < aB1
b
0 otherwise,
(10)FX(x;a, b, c, 𝛼, 𝛽) =
⎧⎪⎪⎨⎪⎪⎩
0, x ≤ a𝛼1
b
1
B(𝛼,𝛽)B
��x
a
�b
−A
B−A;𝛼, 𝛽
�, aA
1
b < x < aB1
b
1, x ≥ a𝛽1
b
61
1 3
Annals of Data Science (2021) 8(1):57–90
where B(z;�, �) is the incomplete beta function, Abramowitz and Stegun [6, p. 263], defined by;
Proof Since 0 < a, b, 𝛼, 𝛽,A,B < ∞ , and aA1
b < x < aB1
b , then A <(
x
a
)b
< B ,
hence (
x
a
)b
− A > 0 and also B −(
x
a
)b
> 0 , implying that f given in (9) is non-neg-ative. Now;
Let (
x
a
)b
− A = (B − A)t , then x = a[(B − A)t + A]1
b , and dx =
a(B−A)
b[(B − A)t + A]
1
b−1dt , then;
Hence, +∞∫−∞
f (x)dx = 1 . It follows that, for any x such that, a𝛼1
b < x < a𝛽1
b;
Now by using the transformation z =(
t
a
)b
−A
B−A , (12) reduces to;
(11)B(z;�, �) =
z
∫0
t�−1(1 − t)�−1dt.
+∞
∫−∞
f (x)dx =
aB1b
∫aA
1b
bxb−1
ab(B − A)�+�−1B(�, �)
[(x
a
)b
− A
]�−1[B −
(x
a
)b]�−1
dx
=b
ab(B − A)�+�−1B(�, �)
aB1b
∫aA
1b
xb−1[(
x
a
)b
− A
]�−1[B −
(x
a
)b]�−1
dx
aB1b
∫aA
1b
xb−1[(
x
a
)b
− A
]�−1[B −
(x
a
)b]�−1
dx =ab
b(B − A)�+�−1
1
∫0
t�−1e−tdt
=ab
b(B − A)�+�−1B(�, �)
(12)
FX(x) =
x
∫−∞
f (t)dt
=
x
∫aA
1b
btb−1
ab(B − A)�+�−1B(�, �)
[(t
a
)b
− A
]�−1[B −
(t
a
)b]�−1
dt
62 Annals of Data Science (2021) 8(1):57–90
1 3
from which we get (10).We note that the FX can be written, for aA
1
b < x < aB1
b , in the form;
where I(z;�, �) is the regularized incomplete beta function, Abramowitz and Stegun [6, p. 263], defined by;
□
Definition of the SPBD The rv X is said to have a SPBD with parameters a, b, �, �,A andB written as X ∼ SPBD(a, b, �, �,A,B ), if its pdf is given by (9), or equivalently, its CDF is given by (10) or (13).
Figure 1 shows some plots of the pdf of the SPBD for some of its parameter’s val-ues, inducting that this distribution has a lot of different flexible shapes.
3 Some Characteristics of the SPBD
3.1 Boundaries and Some Limits of the pdf
Let us study the behavior of the pdf of the SPBD(a, b, �, �,A,B ) at certain points. At the boundary’s points, we have from (9) for 0 < 𝛽 < ∞ , that;
Therefore;
=1
B(�, �)
( xa )
b−A
B−A
∫0
z�−1e−tdz
(13)FX(x) = I
⎛⎜⎜⎜⎝
�x
a
�b
− A
B − A; �, �
⎞⎟⎟⎟⎠
(14)I(z;�, �) =1
B(�, �)
z
∫0
t�−1(1 − t)�−1dt =B(z;�, �)
B(�, �)
f�aA
1
b
�=
⎧⎪⎨⎪⎩
∞, 0 < 𝛼 < 1
𝛽bA1−
1b
a(B−A), 𝛼 = 1
0, 𝛼 > 1
63
1 3
Annals of Data Science (2021) 8(1):57–90
1.5 2.0 2.5 3.0x
0.5
1.0
1.5
2.0f x
0.2 0.4 0.6 0.8 1.0 1.2 1.4x
0.2
0.4
0.6
0.8
1.0
1.2
1.4
f x
0.5 1.0 1.5 2.0 2.5 3.0x
0.2
0.4
0.6
0.8
1.0
f x
a 2, b 1.26, u 1, v 4.65, A 0.5, B 1.8
a 2, b 1.26, u 1, v 1.27, A 0.5, B 1.8
a 2, b 1.26, u 1, v 1.85, A 0.5, B 1.8
a 2, b 1.26, u 1, v 2.65, A 0.5, B 1.8
a 2, b 1.26, u 1, v 3.86, A 0.5, B 1.8
a 1.5, b 3.25, u 1.33, v 5.65, A 0, B 1
a 1.5, b 3.25, u 1.33, v 1.27, A 0, B 1
a 1.5, b 3.25, u 1.33, v 1.85, A 0, B 1
a 1.5, b 3.25, u 1.33, v 2.65, A 0, B 1
a 1.5, b 3.25, u 1.33, v 3.86, A 0, B 1
a 2, b 0.45, u 2.15, v 0.15, A 0.4, B 1.2
a 2, b 0.45, u 2.15, v 0.3, A 0.4, B 1.2
a 2, b 0.45, u 2.15, v 0.5, A 0.4, B 1.2
a 2, b 0.45, u 2.15, v 0.65, A 0.4, B 1.2
a 2, b 0.45, u 2.15, v 0.86, A 0.4, B 1.2
0.2 0.4 0.6 0.8 1.0 1.2 1.4x
0.2
0.4
0.6
0.8
1.0
1.2
f x
1.5 2.0 2.5 3.0x
0.2
0.4
0.6
0.8
1.0
1.2
f x
1.5 2.0 2.5 3.0x
0.2
0.4
0.6
0.8
1.0
f x
a 1.5, b 3.25, u 0.33, v 5.65, A 0, B 1
a 1.5, b 3.25, u 0.33, v 1.27, A 0, B 1
a 1.5, b 3.25, u 0.33, v 1.85, A 0, B 1
a 1.5, b 3.25, u 0.33, v 2.65, A 0, B 1
a 1.5, b 3.25, u 0.33, v 3.86, A 0, B 1
a 2, b 1.26, u 1.33, v 4.65, A 0.5, B 1.8
a 2, b 1.26, u 1.33, v 4.65, A 0.5, B 1.8
a 2, b 1.26, u 1.33, v 1.8, A 0.5, B 1.8
a 2, b 1.26, u 1.33, v 2.25, A 0.5, B 1.8
a 2, b 1.26, u 1.33, v 3.86, A 0.5, B 1.8
a 2, b 1.26, u 2.33, v 4.65, A 0.5, B 1.8
a 2, b 1.26, u 2.33, v 1.27, A 0.5, B 1.8
a 2, b 1.26, u 2.33, v 1.85, A 0.5, B 1.8
a 2, b 1.26, u 2.33, v 2.65, A 0.5, B 1.8
a 2, b 1.26, u 2.33, v 3.86, A 0.5, B 1.8
0.2 0.4 0.6 0.8 1.0 1.2 1.4x
0.5
1.0
1.5
2.0
2.5
f x
0.6 0.8 1.0 1.2 1.4x
0.5
1.0
1.5
f x
0.2 0.4 0.6 0.8 1.0 1.2 1.4x
0.5
1.0
1.5
2.0
f x
a 1.5, b 2.15, u 0.33, v 5.65, A 0, B 1
a 1.5, b 2.15, u 0.33, v 1.27, A 0, B 1
a 1.5, b 2.15, u 0.33, v 1.85, A 0, B 1
a 1.5, b 2.15, u 0.33, v 2.65, A 0, B 1
a 1.5, b 2.15, u 0.33, v 3.86, A 0, B 1
a 1.5, b 3.1, u 0.93, v 5.65, A 0.02, B 1.1
a 1.5, b 3.1, u 0.93, v 1.27, A 0.02, B 1.1
a 1.5, b 3.1, u 0.93, v 1.85, A 0.02, B 1.1
a 1.5, b 3.1, u 0.93, v 2.65, A 0.02, B 1.1
a 1.5, b 3.1, u 0.93, v 3.86, A 0.02, B 1.1
a 1.5, b 3.25, u 1.33, v 5.65, A 0, B 1
a 1.5, b 3.25, u 1.33, v 1.27, A 0, B 1
a 1.5, b 3.25, u 1.33, v 1.85, A 0, B 1
a 1.5, b 3.25, u 1.33, v 2.65, A 0, B 1
a 1.5, b 3.25, u 1.33, v 3.86, A 0, B 1
0.5 1.0 1.5x
0.2
0.4
0.6
0.8
1.0
1.2
1.4f x
0.8 1.0 1.2 1.4 1.6x
0.5
1.0
1.5
2.0
f x
0.5 1.0 1.5 2.0x
0.2
0.4
0.6
0.8
f x
a 1.5, b 1.97, u 0.95, v 5.65, A 0.05, B 1.7
a 1.5, b 1.97, u 0.95, v 4.27, A 0.05, B 1.7
a 1.5, b 1.97, u 0.95, v 1.85, A 0.05, B 1.7
a 1.5, b 1.97, u 0.95, v 2.65, A 0.05, B 1.7
a 1.5, b 1.97, u 0.95, v 3.26, A 0.05, B 1.7
a 1.5, b 5.75, u 0.43, v 5.65, A 0.01, B 2
a 1.5, b 5.75, u 0.43, v 1.27, A 0.01, B 2
a 1.5, b 5.75, u 0.43, v 1.85, A 0.01, B 2
a 1.5, b 5.75, u 0.43, v 2.65, A 0.01, B 2
a 1.5, b 5.75, u 0.43, v 3.86, A 0.01, B 2
a 2, b 3.25, u 0.43, v 5.65, A 0, B 1
a 2, b 3.25, u 0.43, v 1.27, A 0, B 1
a 2, b 3.25, u 0.43, v 1.85, A 0, B 1
a 2, b 3.25, u 0.43, v 2.65, A 0, B 1
a 2, b 3.25, u 0.43, v 3.86, A 0, B 1
Fig. 1 Different pdf plots of the SPBD models
64 Annals of Data Science (2021) 8(1):57–90
1 3
Similarly;
Therefore;
3.2 Series Expansion
Proposition 2 The function f given by (9) can be written in the following expan-sion series.
where
Proof Since 0 < a and aA1
b < x , then A <(
x
a
)b
, that is A(x
a
)b < 1 , we can write
Therefore, using the binomial series expansion, Abramowitz and Stegun [6, p. 14], we can write;
lima→0+
f(aA
1
b
)= lim
b→∞f(aA
1
b
)= lim
�→∞f(aA
1
b
)= ∞,
lima→∞
f(aA
1
b
)= lim
b→0+f(aA
1
b
)= lim
�→0+f(aA
1
b
)= lim
B→∞f(aA
1
b
)= 0,
limA→0+
f(aA
1
b
)= 0 if b ≠ 1, and lim
A→0+f(aA
1
b
)=
�b
aBif b = 1
f�aA
1
b
�=
⎧⎪⎨⎪⎩
∞, 0 < 𝛽 < 1
𝛼bB1−
1b
a(B−A), 𝛽 = 1
0, 𝛽 > 1
lima→0+
f(aB
1
b
)= lim
b→∞f(aB
1
b
)= lim
�→∞f(aB
1
b
)= lim
�→0+f(aB
1
b
)= ∞,
lima→∞
f(aB
1
b
)= lim
b→0+f(aB
1
b
)= lim
�→0+f(aB
1
b
)= lim
�→∞f(aB
1
b
)= 0,
and limA→0+
f(aB
1
b
)=
�b
aB−
1
b
(15)
f (x;a, b, �, �,A,B) =bB�−1
(B − A)�+�−1B(�, �)
∞∑i=0
∞∑j=0
C(i, j;a, b, �, �,A,B)xb(�+j−i)−1
C(i, j;a, b, �, �,A,B) = (−1)i+j(� − 1
i
)(� − 1
j
)Ai
ab(�+j−i)Bj
��x
a
�b
− A
��−1=�x
a
�b(�−1)⎡⎢⎢⎢⎣1 −
A�x
a
�b
⎤⎥⎥⎥⎦
�−1
65
1 3
Annals of Data Science (2021) 8(1):57–90
Similarly, we have that;
Hence, using (16) and (17) into the function f given by (9) we get (15).□
3.3 The Mode
For aA1
b < x < aB1
b , we can see that the pdf of the SPBD satisfies the following;
Therefore, �
�xf (x) = 0, is equivalent to either f (x) = 0, which is discussed in
Sect. 3.1 above, or
Multiplying (19) by ax[(
x
a
)b
− A
][B −
(x
a
)b] , and setting y =
(x
a
)b
, it reduces
to;
where
Let discuss the real roots of (20), according to the following cases.
(16)��
x
a
�b
− A
��−1=�x
a
�b(�−1)∞�i=0
(−1)i�� − 1
i
�⎡⎢⎢⎢⎣A�x
a
�b
⎤⎥⎥⎥⎦
i
(17)�B −
�x
a
�b��−1
= B�−1
∞�j=0
(−1)j�� − 1
j
�⎡⎢⎢⎢⎣
�x
a
�b
B
⎤⎥⎥⎥⎦
j
(18)�
�xf (x) =
⎧⎪⎪⎨⎪⎪⎩
b
a(� − 1)
�x
a
�b−1
�x
a
�b
− A
−
b(� − 1)�
x
a
�b−1
a
�B −
�x
a
�b� +
b − 1
x
⎫⎪⎪⎬⎪⎪⎭
f (x)
(19)
b
a(� − 1)
(x
a
)b−1
(x
a
)b
− A
−
b(� − 1)(
x
a
)b−1
a
[B −
(x
a
)b] +
b − 1
x= 0
(20)c1y2 + c2y + c3 = 0
c1 = b(� + � − 1) − 1
c2 = (1 − b�)B + b(1 − b�)A
c3 = (b − 1)AB
66 Annals of Data Science (2021) 8(1):57–90
1 3
Case 1 If � + � ≠ 1 , b =1
�+�−1 , and (1 − b�)B ≠ b(1 − b�)A , that is when c1 = 0
and c2 ≠ 0 , then (20) has a single root given by;
Hence; the root in term of x is given by;
Case 2 If b ≠ 1
�+�−1 , that is c1 ≠ 0 , then the real roots of (20) in terms of x, that is
when c22− 4c1c3 ≥ 0 , are given by;
Since �2
�x2f(xi), for i = 1, 2, and 3 is not easy to be evaluated, an empirical evalu-
ation has to be studied to see at which point xi we have a local maximum in order to determined the mode of the SPBD.
3.4 Quantile Function
Let 0 < p < 1 , then the quantile function of the rv X ∼ SPBD(a, b, �, �,A,B ), Q , is defined by;
can be found using (13), to be;
where I−1 is the inverse of regularized incomplete beta function.In particular, the median of X, Med(X) ; is given by;
Table 1 represents parameters values and domain ranges of the some selected SPBD data sets, which has different shapes and domain range, that will use for our simulation study in Sect. 5, as well as, will be used for computing of certain
y =−(b − 1)AB
(1 − b�)B + b(1 − b�)A
x1 = a
[(1 − b)AB
(1 − b�)B + b(1 − b�)A
] 1
b
x2 = a
⎡⎢⎢⎢⎣
(b� − 1)B + b(b� − 1)A −
�((1 − b�)B + b(1 − b�)A)2 − 4(b − 1)AB[b(� + � − 1) − 1]
2b(� + � − 1) − 2
⎤⎥⎥⎥⎦
1
b
x3 = a
⎡⎢⎢⎢⎣
(b� − 1)B + b(b� − 1)A +
�((1 − b�)B + b(1 − b�)A)2 − 4(b − 1)AB[b(� + � − 1) − 1]
2b(� + � − 1) − 2
⎤⎥⎥⎥⎦
1
b
Q(u) = inf{x ∈ ℝ;p ≤ F(x)}
(21)Q(p) = a[A + (B − A)I−1(p;�, �)
] 1
b
Med(X) = a[A + (B − A)I−1(0.5;�, �)
] 1
b
67
1 3
Annals of Data Science (2021) 8(1):57–90
statistics of SPBD later in this section, while Fig. 2 represents the plots of the quan-tile functions of these SPBD data sets.
3.5 Reliability Function
The reliability (survival) function of X ∼ SPBD(a, b, �, �,A,B ) using (13), is given by;
3.6 Hazard Function
The hazard function , h(x), of the rv X ∼ SPBD(a, b, �, �,A,B ), using (9) and (13), is given for aA
1
b < x < aB1
b , by;
(22)R(x) = 1 − F(x) = 1 − I
⎛⎜⎜⎜⎝
�x
a
�b
− A
B − A;𝛼, 𝛽
⎞⎟⎟⎟⎠, aA
1
b < x < aB1
b ,
Table 1 Parameters values of the some selected SPBD data sets
Data set Parameters Domain Range
a b α β A B Minimum Maximum
1 1.8 2.3 1.4 3.9 0 1 0 1.82 1.5 3.1 0.93 2.65 0.015 1.1 0.387020171 1.5468341023 1.5 5.75 0.43 5.65 0.01 2 0.673387689 1.692171214 2 1.3 1.6 3.8 0.5 1.8 1.17346046 3.1433552145 2 1.2 2.3 1.8 0.5 1.8 1.122462048 3.2640521086 2 0.45 2.15 0.65 0.4 1.2 0.261047095 2.999081861
0.2 0.4 0.6 0.8 1.0p
0.5
1.0
1.5
Q p
0.2 0.4 0.6 0.8 1.0p
0.5
1.0
1.5
2.0
2.5
3.0
Q pDS 1,
DS 2
DS 3
DS 4
DS 5
DS 6
Fig. 2 Plots of the quantile function of the six selected SPBD data sets
68 Annals of Data Science (2021) 8(1):57–90
1 3
3.7 Special Cases of SPBD
1. The SPBD(1, 1, p, q, a, b ) is the 4 parameters Beta distribution, Johnson et al. [4, p. 210], with pdf;
2. The SPBD(a, b, 1, c, �, �) is the generalized modification of the Kumaraswamy distribution, Alshkaki [13], with pdf;
3. The SPBD(1, a, 1, b, 0, 1 is the Kumaraswamy distribution, Kumaraswamy [14], with pdf;
4. The SPBD(a, b, p.q, 0, 1) is the generalized beta of the first kind distribution, McDonald [15], with pdf;
5. The SPBD(1, 1, 1, 1, 0, 1) is the standard uniform distribution with pdf;
6. The SPBD(1, 2, 1, 1, 0, 1) is the triangular distribution with pdf;
7. The SPBD(1, 1, 1, �, �, �) is the power function distribution with pdf;
8. The SPBD(�, a�, 1, b, 0, 1) is the Kumaraswamy power function distribution, Abdul-Moniem [16], with pdf;
h(x) =f(x)
1 − F(x)=
bxb−1
ab(B−A)�+�−1B(�,�)
[(x
a
)b
− A
]�−1[B −
(x
a
)b]�−1
1 − I
((x
a
)b
−A
B−A;�, �
)
f (x) =1
B(p, q)
(x − a)p−1(b − x)q−1
(b − a)p+q−1, a < x < b
f (x) =bc
a(𝛽 − 𝛼)−c
(x
a
)b−1[𝛽 −
(x
a
)b]c−1
, a𝛼1
b < x < a𝛽1
b
f (x) = abxa−1[1 − xa
]b−1, 0 < x < 1
f (x) =|a|
bapB(p, q)xap−1
[1 −
(x
a
)b]q−1
, 0 < x < 1
f (x) = 1, 0 < x < 1
f (x) = 2x, 0 < x < 1
f (x) =𝛿
(𝛽 − 𝛼)
[𝛽 − x
𝛽 − 𝛼
]𝛿−1, 𝛼 < x < 𝛽
69
1 3
Annals of Data Science (2021) 8(1):57–90
9. The SPBD(1, �, 1, � , 0, 1) is the Minimax distribution, McDonald [15], with pdf;
10. The SPBD(c,−α, 1, θ, 0, 1) is the exponentiated Pareto distribution, Gupta et al. [17], with pdf;
11. The SPBD(�, �, 1, 1, 0, 1) is the generalized uniform distribution, Tiwari et al. [18], with pdf;
We may note that the Kumaraswamy (Case 4), standard uniform (Case 5), tri-angular (Case 6), Kumaraswamy power function (Case 8), minimax (Case 9), Pareto (Case 10), and the generalized uniform (Case 11) distribution are all special cases of the generalized beta of the first kind distribution (Case 4).
3.8 Transformations
Lemma 2
1. Let the rv U has the standard uniform distribution, U(0, 1 ), and the rv X defined by X = a
[A + (B − A)I−1(U, �, �)
] 1
b , then X ∼ SPBD(a, b, �, �,A,B).
2. Let the rv X ∼ SPBD(1, 1, 1, 1, 0, 1 ), then the rv Y defined by Y =(
1−X
X
) 1
δ
⋅ e−
γ
δ has a log-logistic distribution with parameters δ and γ , Johnson et al. [4, p. 151], with CDF given by;
3. Let the rv X ∼ SPBD(a, b, 1, c,A,B ), then the rv Y defined by Y = B −(
x
a
)b
has the generalized uniform distribution, Tiwari et al. [18], with CDF given by;
4. Let the r v X ∼ SPBD(a, b, 1, c,A,B) then the r v Y def ined by
Y = (B − A)−1[B −
(x
a
)b] has the beta distribution with parameters 1 and c , with
CDF given by;
f (x) =ab𝜃
𝜆
(x
𝜆
)a𝜃−1[1 −
(x
𝜆
)a𝜃]b−1
, 0 < x < 𝜆
f (x) = 𝛽𝛾x𝛽−1[1 − x𝛽
]𝛾−1, 0 < x < 1
f (x) = 𝜃𝛼c𝛼x−(a+1)[1 −
(x
c
)−a]𝜃−1, 0 < c < x
f (x) =𝛽
𝛼
(x
𝛼
)𝛽−1
, 0 < x < 𝛼
FY (y) = 1 −[1 + y�e�
]−1, y ≥ 0
FY (y) =[ y
B − A
]c, 0 ≤ y ≤ B − A
70 Annals of Data Science (2021) 8(1):57–90
1 3
5. Let the rv X ∼ SPBD(1, b, 1, 1, 0, 1 ), then the rv Y defined by Y = � − b2log(X) has an exponential distribution with parameters � and b , Johnson et al. [5, p. 494], with CDF given by;
6. Let the r v X ∼ SPBD(1, b, 1, 1, 0, 1 ) , then the r v Y def ined by Y = � − �log
[−blog
(Xb
)] has a Gumbel (generalized extreme value type-I) dis-
tribution with parameters � and � , Forbes et al. [19, p. 98], with CDF given by;
7. Let the rv X ∼ SPBD(1, b, 1, 1, 0, 1 ), then the rv Y defined by Y = � + �log(
Xb
1−Xb
)
has a logistic distribution with parameters � and � , Johnson et al. [4, p. 115], with CDF given by;
8. Let the rv X ∼ SPBD(1, b, 1, 1, 0, 1) , then the rv Y defined by Y =k
X , where k is
a positive constant, has a Pareto distribution with parameters k and b , Johnson et al. [4, p. 574], with CDF given by;
9. Let the rv X ∼ SPBD(1, b, 1, 1, 0, 1) , then the rv Y defined by Y = � + �[−log
(Xb
)] 1
� has a Weibull distribution with parameters � , �, and � , Johnson et al. [4, p. 629], with its CDF given by;
Proof For case (1), we have;
Therefore, the rv X ∼ SPBD(a, b, �, �,A,B).Proof of cases (2) through (9) can be shown on the same lines as the proof of (1).
FY (y) = yc, 0 ≤ y ≤ 1
FY (y) = 1 − e−(
y−𝜃
b
), x > 𝜃
FY (y) = e−e−( y−�
� )
FY (y) =
[1 + e
−(
y−�
�
)]−1
FY (y) = 1 −
(k
y
)b
FY (y) = 1 − e−(
y−�
�
)�
FX(x) = P(X ≤ x) = P
�a�A + (B − A)I−1(U;�, �)
� 1
b ≤ x
�
= P
⎛⎜⎜⎜⎝I−1(U;�, �) ≤
�x
a
�b
− A
B − A
⎞⎟⎟⎟⎠= P
⎛⎜⎜⎜⎝U ≤ I
⎛⎜⎜⎜⎝
�x
a
�b
− A
B − A;�, �
⎞⎟⎟⎟⎠
⎞⎟⎟⎟⎠= I
⎛⎜⎜⎜⎝
�x
a
�b
− A
B − A;�, �
⎞⎟⎟⎟⎠
71
1 3
Annals of Data Science (2021) 8(1):57–90
We may note that the SPBDs stated in Cases 2, 5, 6, 8, and 9 are all special cases of the generalized beta of the first kind distribution (see Case 4 of Sect. 3.7).□
3.9 Scaling Property
Proposition 3 (The SPBD is closed under scaling) Let the rv X ∼ SPBD(a, b, �, �,A,B ) and let the rv Y = cX , where 0 < c < ∞ , then Y ∼ SPBD(ca, b, �, �,A,B).
Proof Therefore, using (13), we have that;
On the same lines as the proof of Proposition 3, we can prove the following Prop-ositions 4 and 5. □
3.10 Exponentiation Property
Proposition 4 (The SPBD is closed under exponentiation) Let the rv X ∼ SPBD(a, b, �, �,A,B ) and let the rv Y = Xc , where 0 < c < ∞ , then Y ∼ SPBD
(ac,
b
c, �, �,A,B
).
3.11 Reflection Symmetry Property
Proposition 5 Let the rv X ∼ SPBD(a, 1, �, �,A,B) and let the rv Y = a(B + A) − X , then Y ∼ SPBD(a, 1, �, �,A,B).
3.12 Generate SPBD Random Variates
Using result Lemma 2(1), we can generate SPBD(a, b, �, �,A,B) random variates as follows;
1. Generate u ∼ U(0, 1).
2. Compute y = I−1(U, �, �).
3. Set x = a[A + (B − A)y
] 1
b .
FY (y) = P(Y ≤ y) = P(cX ≤ y) = P(X ≤ y
c
)= FX
(yc;a, b, �, �,A,B
)
FY (y) = I
⎛⎜⎜⎜⎝
� y
c
a
�b
− A
B − A;�, �
⎞⎟⎟⎟⎠= I
⎛⎜⎜⎜⎝
�y
ca
�b
− A
B − A;�, �
⎞⎟⎟⎟⎠= FX(y;ca, b, �, �,A,B)
72 Annals of Data Science (2021) 8(1):57–90
1 3
3.13 Order Statistics
Let X1 , X2 , …, Xn be a random sample of size n from SPBD(a, b, �, �,A,B ), with pdf f and CDF F, and let X1∶n , X2∶n , …, Xn∶n be their order statistics, then for i = 1, 2, 3,… , n , the pdf of i-th order statistics Xi∶n , fi∶n(x), is given for by;
Hence, for a𝛼1
b < x < a𝛽1
b , and using the fact that;
we have that;
Then the pdf of the rv Xi∶n , fi∶n , can be written as;
where
3.14 Moments about the Origin
Let k = 1, 2, 3,… , then the moment of the rv X ∼ SPBD(a, b, �, �,A,B) of order k about zero is given by;
fi∶n(x) =
{n!
(i−1)!(n−i)!f (x)[F(x)]i−1[1 − F(x)]n−i, a𝛼
1
b < x < a𝛽1
b
0 otherwise,
[1 − F(x)]n−1 =
n−1∑j=0
(n − 1
j
)(−1)j[F(x)]j
fi∶n(x) =n!
(i − 1)!(n − i)!f (x)[F(x)]i−1
n−1�j=0
�n − 1
j
�(−1)j[F(x)]j
=n!
(i − 1)!(n − i)!f (x)
n−1�j=0
�n − 1
j
�(−1)j
⎡⎢⎢⎢⎣I
⎛⎜⎜⎜⎝
�x
a
�b
− A
B − A;�, �
⎞⎟⎟⎟⎠
⎤⎥⎥⎥⎦
i+j−1
(23)fi∶n(x) = f (x)
i−1�j=0
A(i, j;n)
⎡⎢⎢⎢⎣I
⎛⎜⎜⎜⎝
�x
a
�b
− A
B − A;�, �
⎞⎟⎟⎟⎠
⎤⎥⎥⎥⎦
i+j−1
A(i, j;n) = (−1)jn!
(i − 1)!(n − i − j)!j!.
E(Xk
)=
+∞
∫−∞
xkf (x)dx =
aB1b
∫aA
1b
bxk+p−1
ab(B − A)�+�−1B(�, �)
[(x
a
)b
− A
]�−1[B −
(x
a
)b]�−1
dx
73
1 3
Annals of Data Science (2021) 8(1):57–90
and by using the transformation y =(
x
a
)b
, it reduces to;
from which we have that;
where F(2,1) is the regularized hypergeometric function, Virchenko et al. [20], defined by;
and (a)m is as defined by (7).Note that, in case that A = 0 then;
3.15 Mean and Variance
Using (24), the mean of X ∼ SPBD(a, b, �, �,A,B) is given by;
And the variance;
Table 2 represents the mean, median, mode and variance of the selected SPBD data sets that are given in Table 1.
3.16 The Moment Generating Function
Similarly, the moment generating function of the rv X ∼ SPBD(a, b, �, �,A,B) , MX(t) , can be found to be;
E(Xk
)=
ak
(B − A)�+�−1B(�, �)
B
∫A
yk
b (y − A)�−1(B − y)�−1dy
(24)E(Xk
)= akA
k
b𝛤 (𝛼 + 𝛽)F(2,1)
(−k
b, 𝛼;𝛼 + 𝛽;1 −
B
A
)
F(2,1)(a, b;c;z) =
∞∑m=0
(a)m(b)m
𝛤 (c + m)m!zi
E(Xk
)= akB
k
b
�(� +
k
b
)� (� + �)
� (�)�(� + � +
k
b
) .
(25)E(X) = aA1
b𝛤 (𝛼 + 𝛽)F(2,1)
(−1
b, 𝛼, 𝛼 + 𝛽;1 −
B
A
)
Var(X) = a2A2
b 𝛤 (𝛼 + 𝛽)
{F(2,1)
(−2
b, 𝛼, 𝛼 + 𝛽;1 −
B
A
)− 𝛤 (𝛼 + 𝛽)
[F(2,1)
(−1
b, 𝛼, 𝛼 + 𝛽;1 −
B
A
)]2}
MX(t) = E(etX
)= 𝛤 (𝛼 + 𝛽)
∞∑i=0
aiAi
b
i!F(2,1)
(−i
b, 𝛼, 𝛼 + 𝛽;1 −
B
A
)ti.
74 Annals of Data Science (2021) 8(1):57–90
1 3
3.17 Harmonic Mean
The harmonic mean of X ∼ SPBD(a, b, �, �,A,B) , on the same lines as that of the moment of X, is given by;
3.18 Incomplete Moments
The k-th incomplete moment of X ∼ SPBD(a, b, �, �,A,B) , I(z, k) , is defined by;
Using the form of the pdf of X given in (15), then;
from which we have that;
E(1
X
)=
𝛤 (𝛼 + 𝛽)
aA1
b
F(2,1)
(1
b, 𝛼, 𝛼 + 𝛽;1 −
B
A
).
I(z, k) =
z
∫−∞
xkf (x)dx
I(z, k) =
z
∫aA
1b
xkbB�−1
(B − A)�+�−1B(�, �)
∞∑i=0
∞∑j=0
C(i, j;a, b, �, �,A,B)xb(�+j−i)−1dx
(26)
I(z, k) =bB�−1
(B − A)�+�−1B(�, �)
∞∑i=0
∞∑j=0
C(i, j;a, b, �, �,A,B)
k + b(� + j − i)
[zk+b(�+j−i) − (aA
1
b )k+b(�+j−i)].
Table 2 Mean, median, mode and variance of the selected SPBD data sets
Data set Variable range Moments
Minimum Maximum Mean Median Mode Variance
1 0 1.8 0.94632 0.955627 0.984679 0.0961452 0.38702 1.546834 0.942554 0.953689 1.01134 0.0698253 0.673388 1.692171 0.969019 0.948343 – 0.0486774 1.17346046 3.143355 1.80987 1.76616 1.60839 0.1350815 1.122462 3.264052 2.36608 2.39793 2.53353 0.2150756 0.261047 2.999082 2.14465 2.31082 – 0.501776
75
1 3
Annals of Data Science (2021) 8(1):57–90
3.19 Mean Deviations
The mean deviation of X ∼ SPBD(a, b, �, �,A,B) about its mean � = E(X) , MD(�) , is given by;
which can be found, Cordeiro et al. [21], to be;
Hence, using (13) and (26), for the rv X ∼ SPBD(a, b, �, �,A,B) , we have that;
where � is given from (25).Similarly, the mean deviation of X about its median m , MD(m) , is given by;
3.20 Probability Weighted Moments
The probability weighted moments of order s and r of X ∼ SPBD(a, b, �, �,A,B) , �s,r , is given by;
Using the fact that;
with the use of (24), we have that;
MD(�) = E|X − �|
MD(�) = 2[�F(�) − I(�, 1)]
MD(�) = 2
⎡⎢⎢⎢⎣�I
⎛⎜⎜⎜⎝
��
a
�b
− A
B − A;�, �
⎞⎟⎟⎟⎠−
bB�−1
(B − A)�+�−1B(�, �)
∞�i=0
∞�j=0
C(i, j;a, b, �, �,A,B)
k + b(� + j − i)
��k+b(�+j−i) − (aA
1
b )k+b(�+j−i)�⎤⎥⎥⎥⎦
MD(m) = aA1
b𝛤 (𝛼 + 𝛽)F(2,1)
�−1
b, 𝛼, 𝛼 + 𝛽;1 −
B
A
�− 2I
⎛⎜⎜⎜⎝
�m
a
�b
− A
B − A;𝛼, 𝛽
⎞⎟⎟⎟⎠.
�s,r = E(Xs[f (X)
]r)
[f (x;a, b, �, �,A,B)
]r=
x(b−1)(r−1)B((� − 1)r + 1, (� − 1)r + 1)
ab(r−1)b(1−r)[B(�, �)]r(B − A)r−1
f (x;a, b, (� − 1)r + 1, (� − 1)r + 1,A,B)
𝜌s,r =br(aA
1
b )s+r(b−1)
b 𝛤 [(𝛼 − 1)(r + 1) + 1]𝛤 [(𝛽 − 1)(r + 1) + 1]
ar−s[B(𝛼, 𝛽)]r+1(B − A)r
F(2,1)
(−s + r(b − 1)
b, (𝛼 − 1)(r + 1) + 1, (r + 1)(𝛼 + 𝛽 − 2) + 2;1 −
B
A
).
76 Annals of Data Science (2021) 8(1):57–90
1 3
3.21 Renyi Entropy
Let us compute the Renyi entropy as a measure of variation of the uncertainty of the rv X ∼ SPBD(a, b, �, �,A,B ). For 𝜃 > 0 such that � ≠ 1 , we have for the rv X ∼ SPBD(a, b, �, �,A,B ) that;
First, we note that;
Therefore,
where Y ∼ SPBD(a, b, (� − 1)� + 1, (� − 1)� + 1,A,B ). It follows, using (24) that;
3.22 Lorenz and Bonferroni Curves
For 0 < 𝜋 < 1 , the Lorenz curve, L(�) , and Bonferroni curves, B(�) , for the rv X ∼ SPBD(a, b, �, �,A,B ), are given by, respectively;
and
where Q(�) is the quantile function of the rv X at � , and I(z, k) is the incomplete moment of the rv X. Therefore, using (25) and (26), we have that;
(27)IX(�) =1
1 − �log
+∞
∫−∞
[f (x)
]�dx
[f (x;a, b, �, �,A,B)
]�=
{bxb−1
ab(B − A)�+�−1B(�, �)
[(x
a
)b
− A
]�−1[B −
(x
a
)b]�−1}�
=x(b−1)(�−1)B((� − 1)� + 1, (� − 1)� + 1)
ab(�−1)b(1−�)[B(�, �)]�(B − A)�−1f (x;a, b, (� − 1)� + 1, (� − 1)� + 1,A,B)
IX(�) =1
1 − �log
{[B((� − 1)� + 1, (� − 1)� + 1)
ab(�−1)b(1−�)[B(�, �)]�(B − A)�−1
]E[Y (b−1)(�−1)
]}
IX(𝜃) =1
1 − 𝜃log
{B((𝛼 − 1)𝜃 + 1, (𝛽 − 1)𝜃 + 1)
a𝜃−1[B(𝛼, 𝛽)]𝜃(B − A)𝜃−1𝛤 [(𝛼 + 𝛽 − 1)𝜃 + 2]
F(2,1)
(−(b − 1)(𝜃 − 1)
b, (𝛼 − 1)𝜃 + 1, (𝛼 − 1)𝜃 + (𝛽 − 1)𝜃 + 2;1 −
B
A
)}.
L(�) =I(Q(�), 1)
�
B(�) =I(Q(�), 1)
��
77
1 3
Annals of Data Science (2021) 8(1):57–90
And similarly, that;
4 Parameters Estimation of the SPBD
The maximum likelihood estimation (MLE) method will be used for estimat-ing the parameters of the SPBD. Let x1, x2,… , xn be a random sample from SPBD(a, b, �, �,A,B ), as given by (9), then we want to estimates the parameters a, b, �, �,A, andB by maximizing the log-likelihood function, where the likelihood function L = L ( a, b, �, �,A,B;x1, x2,… , xn) can be written as;
Let us inspect the normal equations ��alogL = 0,
�
�blogL = 0,… ,
�
�Blog L = 0, to
see if they admit an explicit solution. We have that;
where � is the digamma function, Abramowitz and Stegun [6, p. 258],
L(𝜋) =bB𝛽−1
∑∞
i=0
∑∞
j=0
C(i,j;a,b,𝛼,𝛽,A,B)
b(𝛼+j−i)+1
�(Q(𝜋))b(𝛼+j−i)+1 − (aA
1
b )b(𝛼+j−i)+1�
aA1
b (B − A)𝛼+𝛽−1𝛤 (𝛼)𝛤 (𝛽)F(2,1)
�−
1
b, 𝛼, 𝛼 + 𝛽;1 − B
A
�
B(𝜋) =
bB𝛽−1∑∞
i=0
∑∞
j=0
C(i,j;a,b,𝛼,𝛽,A,B)
b(𝛼+j−i)+1
�(Q(𝜋))b(𝛼+j−i)+1 −
�aA
1
b
�b(𝛼+j−i)+1�
a𝜋A1
b (B − A)𝛼+𝛽−1𝛤 (𝛼)𝛤 (𝛽)F(2,1)
�−
1
b, 𝛼, 𝛼 + 𝛽;1 − B
A
� .
L =
n∏i=1
f(xi)=
[b
ab(B − A)�+�−1B(�, �)
]n n∏i=1
{xb−1i
[(xia
)b
− A
]�−1[B −
(xia
)b]�−1}
(28)�
�alogL = −(� − 1)
b
a
n∑i=1
(xi
a
)b
(xi
a
)b
− A
+ (� − 1)b
a
n∑i=1
(xi
a
)b
B −(
xi
a
)b−
nb
a= 0,
(29)
�
�blogL =
n
b+
n∑i=1
log(xi)+ (� − 1)
n∑i=1
(xia
)b
log(xia
)
(xia
)b
− A
− (� − 1)
n∑i=1
(xia
)b
log(xia
)
B −(xia
)b− nlog(a) = 0,
(30)�
��logL =
n∑i=1
log
[(xia
)b
− A
]− nlog(B − A) − n[�(�) − �(� + �)] = 0,
(31)�
��logL =
n∑i=1
log
[B −
(xia
)b]− nlog(B − A) − n[�(�) − �(� + �)] = 0,
78 Annals of Data Science (2021) 8(1):57–90
1 3
and, since aA1
b < x < aB1
b , then the MLE of aA1
b and aB1
b are; respectively, x1∶n and xn∶n ,; that is aA
1
b = x1∶n and aB1
b = xn∶n , and hence;
and
Since Eqs. (28)–(33) are not easy to be solved explicitly, numerical technique, as Newton Rapson method or any other well-known optimization algorithm, see Shi et al. [22], may be employed to do so, or to use a well-known software package, such as maxLik, Henningsen and Toomet [23], or GAMLSS, Stasinopoulos and Rigby [24], to find the MLE of the parameters of the SPBD.
5 A Simulation Study
In order to examine the performance of the MLE method given in Sect. 4, we per-form a simulation study to do so. The bias and the mean squares errors (MSE) of the estimates are the principle measures of the performance.
The statistical software R and the Absoft Pro Fortran compiler are employed for computing. The maxLik package of the statistical software R is used mainly for computing the MLEs, see Henningsen and Toomet [23] for details of this package, while the Absoft Pro Fortran is used for other needed computations.
The six miscellaneous SPBD models given in Table 1, that have different pdf’s shapes and variable ranges, will be used to simulated data sets for each model, and for each data set, the bias and the MSE are computed for the MLE of the model parameters for different simulated sample sizes. The sample sizes that will be taken are 25, 50, 100, 300, 500, and 1000. In each situation, the parameters of, � say, the first model of the six SPBD models given in Table 1, are estimated from 5000 ran-dom variates generated from the given SPBD model, and the sample mean, bias, vari-ance, and the MSE for the parameters are computed as; Mean
����=
1
5000
∑5000
i=1��i =
𝜃 say , Bias (��)= 𝜃 − 𝜃 , Var
�𝜃�=
1
5000
∑5000
i=1
���i −
𝜃�2
,
and hence M SE(𝜃)= Var
(𝜃)+[Bias
(��)]2 . This procedure is repeated for each
sample size, then repeated for each SPBD model.Table 3 shows the bias of the estimated parameters of the different simulated
SPBD data sets for each sample size, while Table 4 presents the MSE of the estimated parameters of the different simulated SPBD data sets for each sample size. Both Tables 3 and 4 show, for each of the SPBD model parameters, that the bias and MSE decreases as the sample size increases. Figure 3 shows the behav-iour of the MSE plots of the estimated parameters for six the SPBD simulated data sets, which shows graphically, for of the SPBD model parameters, that the
(32)A =(x1∶n
a
)b
,
(33)B =(xn∶n
a
)b
79
1 3
Annals of Data Science (2021) 8(1):57–90
Tabl
e 3
The
bia
s of t
he e
stim
ated
par
amet
ers o
f the
sim
ulat
ed S
PBD
dat
a se
ts fo
r eac
h sa
mpl
e si
ze n
nA
ctua
l val
ueB
ias
ab
αβ
AB
ab
𝛼𝛽
AB
251.
82.
31.
43.
90
1−
0.34
878
0.32
9715
0.01
3104
0.39
9618
0.42
0087
0.60
1777
1.5
3.1
0.93
2.65
0.01
51.
10.
1403
42−
0.09
690.
0568
47−
0.05
489
0.46
4818
− 0.
4870
11.
55.
750.
435.
650.
012
− 0.
3780
8−
0.27
911
0.31
5561
− 0.
3024
2−
0.00
807
− 0.
1931
92
1.3
1.6
3.8
0.5
1.8
0.03
9897
− 0.
4132
20.
6123
450.
3669
370.
149
− 0.
1444
42
1.2
2.3
1.8
0.5
1.8
− 0.
1823
1−
0.25
249
0.33
4888
0.19
8619
− 0.
1128
50.
2321
562
0.45
2.15
0.65
0.4
1.2
0.39
6452
0.40
3079
− 0.
3437
20.
3354
51−
0.18
120.
2632
0550
1.8
2.3
1.4
3.9
01
0.51
2223
− 0.
2960
60.
5874
5−
0.58
925
0.45
1566
− 0.
3463
81.
53.
10.
932.
650.
015
1.1
0.41
4662
− 0.
1501
50.
6873
73−
0.73
152
0.29
4003
− 0.
4945
41.
55.
750.
435.
650.
012
0.07
0608
0.45
9072
0.27
8546
− 0.
1225
20.
1022
610.
1563
372
1.3
1.6
3.8
0.5
1.8
0.15
3985
− 0.
4023
− 0.
0833
70.
3812
55−
0.27
72−
0.32
743
21.
22.
31.
80.
51.
80.
1119
670.
1997
920.
1050
13−
0.25
070.
0809
380.
0518
122
0.45
2.15
0.65
0.4
1.2
0.27
5−
0.28
298
− 0.
3670
4−
0.26
281
− 0.
1431
5−
0.26
767
100
1.8
2.3
1.4
3.9
01
− 0.
1052
50.
2273
91−
0.09
405
− 0.
6714
60.
4522
0.26
2165
1.5
3.1
0.93
2.65
0.01
51.
10.
0194
08−
0.24
609
− 0.
1346
40.
0943
210.
0337
52−
0.31
785
1.5
5.75
0.43
5.65
0.01
20.
1535
69−
0.36
599
− 0.
4047
3−
0.06
738
0.19
1963
− 0.
1655
22
1.3
1.6
3.8
0.5
1.8
− 0.
0785
2−
0.21
469
− 0.
0884
20.
0909
490.
1605
29−
0.20
546
21.
22.
31.
80.
51.
80.
4316
350.
2341
70.
0374
470.
2411
97−
0.10
608
0.34
3295
20.
452.
150.
650.
41.
2−
0.33
975
0.16
6262
− 0.
0242
9−
0.45
210.
3344
710.
3604
0430
01.
82.
31.
43.
90
10.
2744
580.
0495
490.
4743
420.
3454
840.
3682
89−
0.24
068
1.5
3.1
0.93
2.65
0.01
51.
10.
4248
180.
0262
34−
0.15
182
0.23
7266
− 0.
4969
90.
3120
691.
55.
750.
435.
650.
012
0.09
1822
0.01
2653
− 0.
3740
5−
0.31
214
− 0.
4161
50.
3226
522
1.3
1.6
3.8
0.5
1.8
− 0.
3407
8−
0.25
521
0.33
6573
0.48
9306
0.16
9098
0.53
6246
21.
22.
31.
80.
51.
8−
0.21
183
0.07
2752
− 0.
1223
1−
0.05
324
− 0.
4737
90.
1069
582
0.45
2.15
0.65
0.4
1.2
− 0.
1578
80.
1779
510.
2086
110.
3527
640.
1535
25−
0.16
458
80 Annals of Data Science (2021) 8(1):57–90
1 3
Tabl
e 3
(con
tinue
d)
nA
ctua
l val
ueB
ias
ab
αβ
AB
ab
𝛼𝛽
AB
500
1.8
2.3
1.4
3.9
01
− 0.
3730
40.
2786
84−
0.31
915
− 0.
1944
10.
4321
840.
3771
81
1.5
3.1
0.93
2.65
0.01
51.
10.
3735
71−
0.40
604
− 0.
3817
8−
0.47
487
0.24
8089
0.21
9282
1.5
5.75
0.43
5.65
0.01
2−
0.02
872
− 0.
3208
20.
3808
35−
0.06
513
− 0.
0599
50.
1434
85
21.
31.
63.
80.
51.
80.
5074
51−
0.11
285
0.15
9177
0.14
7718
0.22
3398
0.35
6879
21.
22.
31.
80.
51.
8−
0.36
108
0.14
2093
0.10
2362
− 0.
3618
7−
0.50
241
0.26
8284
20.
452.
150.
650.
41.
20.
0942
340.
1346
63−
0.22
689
− 0.
2296
0.04
2163
− 0.
0038
610
001.
82.
31.
43.
90
10.
2047
21−
0.32
209
0.25
7236
0.11
0389
0.10
4925
0.21
5005
1.5
3.1
0.93
2.65
0.01
51.
10.
0982
760.
1183
520.
1049
070.
0029
790.
4100
95−
0.38
891
1.5
5.75
0.43
5.65
0.01
2−
0.14
047
0.25
2312
− 0.
4626
90.
1905
93−
0.14
857
0.29
962
21.
31.
63.
80.
51.
80.
1779
880.
0111
11−
0.31
224
− 0.
0495
80.
2272
56−
0.11
569
21.
22.
31.
80.
51.
80.
2589
150.
0157
33−
0.33
35−
0.37
711
− 0.
3599
8−
0.04
232
20.
452.
150.
650.
41.
20.
0600
130.
1712
330.
1379
930.
1160
840.
3895
42−
0.13
421
81
1 3
Annals of Data Science (2021) 8(1):57–90
Tabl
e 4
The
MSE
of t
he e
stim
ated
par
amet
ers o
f the
sim
ulat
ed S
PBD
dat
a se
ts fo
r eac
h sa
mpl
e si
ze n
nA
ctua
l val
ueM
SE
ab
αβ
AB
ab
𝛼𝛽
AB
251.
82.
31.
43.
90
11.
3420
131.
9544
710.
9328
491.
8953
582.
0849
471.
7624
171.
53.
10.
932.
650.
015
1.1
1.40
2194
2.39
6725
0.79
3264
1.62
9038
1.91
1926
2.32
5153
1.5
5.75
0.43
5.65
0.01
21.
4772
811.
8332
931.
4066
342.
2771
181.
5630
661.
6137
32
1.3
1.6
3.8
0.5
1.8
1.64
0697
2.31
3115
0.90
0516
1.30
6511
1.61
811.
9043
782
1.2
2.3
1.8
0.5
1.8
0.94
4398
2.11
7512
1.00
5152
2.53
4064
1.15
113
1.95
4144
20.
452.
150.
650.
41.
21.
3429
382.
3325
131.
4675
291.
0192
391.
2394
751.
4075
7350
1.8
2.3
1.4
3.9
01
1.17
0272
1.65
7704
0.79
9091
1.40
1564
1.75
5979
1.65
4412
1.5
3.1
0.93
2.65
0.01
51.
11.
1329
471.
7432
920.
5982
611.
3092
351.
6814
941.
7769
361.
55.
750.
435.
650.
012
1.32
4648
1.78
1486
1.32
801
2.06
0954
1.01
9974
1.36
1083
21.
31.
63.
80.
51.
81.
3165
21.
9978
40.
7017
761.
1576
321.
3111
151.
1870
632
1.2
2.3
1.8
0.5
1.8
0.78
5298
1.91
0105
0.86
2128
2.28
2315
1.02
1342
1.48
9123
20.
452.
150.
650.
41.
20.
9939
722.
0117
721.
1622
890.
8737
391.
1574
831.
2514
4210
01.
82.
31.
43.
90
10.
7565
391.
2100
550.
6070
760.
9929
141.
4606
461.
3457
881.
53.
10.
932.
650.
015
1.1
0.88
934
1.19
9509
0.40
4822
1.20
1444
1.35
7685
1.20
6554
1.5
5.75
0.43
5.65
0.01
21.
1174
831.
5298
781.
0375
741.
5958
880.
8290
851.
1212
122
1.3
1.6
3.8
0.5
1.8
0.99
8634
1.75
2594
0.53
7625
0.82
8906
1.06
102
0.93
9682
21.
22.
31.
80.
51.
80.
5659
061.
5468
550.
7052
822.
0179
840.
7177
131.
1538
982
0.45
2.15
0.65
0.4
1.2
0.80
5249
1.72
2256
0.77
7865
0.69
6402
0.85
1498
1.01
8501
300
1.8
2.3
1.4
3.9
01
0.54
7782
0.85
2198
0.40
1467
0.64
4066
0.77
3053
1.00
7341
1.5
3.1
0.93
2.65
0.01
51.
10.
7118
530.
7814
240.
2691
560.
7889
360.
9167
830.
8319
531.
55.
750.
435.
650.
012
0.91
0827
1.28
0392
0.93
0746
1.26
0824
0.41
0875
0.78
9677
21.
31.
63.
80.
51.
80.
6090
681.
3874
630.
3854
20.
5643
0.86
5376
0.66
4865
21.
22.
31.
80.
51.
80.
4126
691.
1726
850.
4756
871.
8187
850.
5005
910.
8919
262
0.45
2.15
0.65
0.4
1.2
0.48
3891
1.39
0424
0.57
0112
0.45
5555
0.53
0735
0.69
1226
82 Annals of Data Science (2021) 8(1):57–90
1 3
Tabl
e 4
(con
tinue
d)
nA
ctua
l val
ueM
SE
ab
αβ
AB
ab
𝛼𝛽
AB
500
1.8
2.3
1.4
3.9
01
0.42
7417
0.64
704
0.33
2296
0.43
5204
0.20
0255
0.85
4437
1.5
3.1
0.93
2.65
0.01
51.
10.
4494
410.
4344
710.
1554
220.
5758
850.
6509
760.
6509
11
1.5
5.75
0.43
5.65
0.01
20.
7656
1.07
478
0.70
4428
0.97
0532
0.32
5475
0.61
8804
21.
31.
63.
80.
51.
80.
2726
691.
1993
30.
2419
670.
2715
550.
4188
780.
4229
75
21.
22.
31.
80.
51.
80.
2783
610.
9959
570.
3219
931.
5315
320.
3679
420.
5425
43
20.
452.
150.
650.
41.
20.
1786
670.
8239
10.
3404
920.
2024
940.
3885
720.
4879
8710
001.
82.
31.
43.
90
10.
3548
720.
4154
30.
2136
080.
3044
660.
1016
160.
4723
491.
53.
10.
932.
650.
015
1.1
0.30
1129
0.20
5473
0.08
1764
0.27
106
0.33
213
0.37
6744
1.5
5.75
0.43
5.65
0.01
20.
6457
360.
6480
550.
6582
580.
5855
630.
1487
380.
4255
842
1.3
1.6
3.8
0.5
1.8
0.14
9529
0.62
8928
0.11
3052
0.24
3147
0.30
0697
0.31
1164
21.
22.
31.
80.
51.
80.
1780
810.
7143
940.
1638
71.
2812
510.
1980
190.
2153
852
0.45
2.15
0.65
0.4
1.2
0.05
2826
0.54
7684
0.10
2232
0.05
7344
0.15
9444
0.25
5718
83
1 3
Annals of Data Science (2021) 8(1):57–90
MSE decreases as the sample size increases. Hence, from the result, as the MLS plots decreases as the sample size increases, we may conclude that the MLE method seems to have high efficiency as the sample size become large.
Table 5 shows the actual values and the MLE parameter values (as the aver-age values for the 5000 replications) of the different simulated SPBD data sets, and Fig. 4 shows visually their corresponding pdf’s plots.
In conclusion, the simulation indicates that the MLE method is appropriate and can be used to estimate the parameters of the SPPBD models.
Sample Size Sample Size
Sample Size Sample Size
Sample Size Sample Size
0
0.5
1
1.5
2
0 100 200 300 400 500 600 700 800 90010000
0.5
1
1.5
2
2.5
3
0 100 200 300400500600 7008009001000
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0 100 200 300 400 500 600 700 800 900 1000
0
0.5
1
1.5
2
2.5
3
0 100 200 300 400 500 600 700 800 9001000
0
0.5
1
1.5
2
2.5
0 100 200 300 400 500 600 700 800 900 1000
0
0.5
1
1.5
2
2.5
0 100 200 300 400 500 600 700 800 900 1000
Fig. 3 Behaviour of the MSE plots of the estimated parameters for the SPBD simulated data sets
84 Annals of Data Science (2021) 8(1):57–90
1 3
6 Application of Fitting SPBD Model to Real‑Life Data
We consider two real-life data sets in order to show the usefulness of the proposed estimation procedure to estimate and fit the SPBD model to these real-life data sets. The data sets are;
Data Set 1 Represents the waiting period of Muslim worshipers from the time of entering the mosque till the actual time of starting Alfajir pray (the early morning and first pray of the day) in Al-Mani Jamieh Mosque (Masjid no. 942), where Friday prayers are held and it accommodates more than two thousand worshipers, in Al-Waab town in Doha-Qatar. The data consists of 4539 observations recorded in this masjid for the period from 30th October 2017 till 15th January 2020. We will abbre-viate this data set by main street mosque data.
Data Set 2 Represents the waiting period of Muslim worshipers from the time of entering the mosque till the actual time of starting Alfajir pray in Saeed bin Fahad Al-Dosari Mosque (Masjid no. 1031), where Friday prayers are not held and it accommodates no more than two hundred fifty worshipers, in Al-Waab town in Doha-Qatar. The data consists of 3360 observations recorded in this mosque for the period from 25th January 2015 to 20th October 2017. We will abbreviate this data set by within streets mosque data.
Table 6 presents some statistics of the observed mosque data sets.Using both mosque data sets, the MLE method was employed to estimate the
parameters of the SPBD model for each, and Table 7 shows the actual and the pre-dicted frequencies, model parameters estimates, the Chi squares goodness of fit test for the SPB, the gamma, the exponential, the four parameters beta, and the general-ized beta of the first kind distributions, as well as, the likelihood ratio test (LRT) for
Table 5 Actual and MLE parameters values of the simulated SPBD data sets
Data set
Value Parameters Variable Range
a b α β A B Minimum Maximum
1 Actual 1.8 2.3 1.4 3.9 0 1 0 1.8MLE 1.777778 2.339845 1.373519 3.877778 0.000111 1.01218 0.036285462 1.787000108
2 Actual 1.5 3.1 0.93 2.65 0.015 1.1 0.38702 1.546834MLE 1.435333 3.117889 0.923133 2.587889 0.019188 1.211111 0.403895566 1.526273077
3 Actual 1.5 5.75 0.43 5.65 0.01 2 0.673387689 1.69217121MLE 1.487889 5.855124 0.411889 5.444889 0.010134 2.227889 0.679167201 1.706033177
4 Actual 2 1.3 1.6 3.8 0.5 1.8 1.17346046 3.143355214MLE 1.993333 1.443825 1.473245 3.895556 0.46999 1.933889 1.18159022 3.147485177
5 Actual 2 1.2 2.3 1.8 0.5 1.8 1.122462048 3.264052108MLE 1.879889 1.213113 2.351245 1.778986 0.512333 1.933333 1.083200324 3.236995497
6 Actual 2 0.45 2.15 0.65 0.4 1.2 0.261047095 2.999081861MLE 2.054789 0.467478 2.035899 0.598789 0.397999 1.198758 0.286325174 3.028201867
85
1 3
Annals of Data Science (2021) 8(1):57–90
the nested models of the SPB distribution, namely; the four parameters beta, and the generalized beta of the first kind distributions. Figure 5, illustrating the histograms and the fitted pdfs for both main and within street mosque data sets. Now, for the main street data set case, since the p values of Chi squares goodness of fit test for the gamma, the exponential, the four parameters beta, and the generalized beta of the first kind distributions, is smaller than 0.05, and that the p value of the SPBD model equals to 0.9488, the SPBD performs better than all these distributions. Although, for the within street mosque data set, the Chi squares goodness of fit test p value of the generalized beta of the first kind distribution equals to 0.23087 inducting that this distribution can fit this data, the SPBD model perform better in this case since its p value equals to 0.96088, and since the p values of Chi squares goodness of fit test for the gamma, the exponential, and the four parameters beta, is smaller than 0.05, the SPBD performs better than all these distributions also. Next, the p val-ues of the likelihood ratio test (LRT) for the nested models of the SPB distribution,
Actual Predicated
0.5 1.0 1.5x
0.2
0.4
0.6
0.8
1.0
1.2f x
0.6 0.8 1.0 1.2 1.4x
0.2
0.4
0.6
0.8
1.0
1.2
f x
0.8 1.0 1.2 1.4 1.6x
0.5
1.0
1.5
2.0
f x
1.5 2.0 2.5 3.0x
0.2
0.4
0.6
0.8
1.0
f x
1.5 2.0 2.5 3.0x
0.2
0.4
0.6
f x
0.5 1.0 1.5 2.0 2.5 3.0x
0.2
0.4
0.6
0.8
1.0f x
Data Set 1 Data Set 2
Data Set 4Data Set 3
Data Set 6Data Set 5
Fig. 4 Plots of the actual and simulated SPBD pdf’s
86 Annals of Data Science (2021) 8(1):57–90
1 3
namely; the four parameters beta, and the generalized beta of the first kind distribu-tions, are less than 0.05, indicating statistically, that SPBD preforms better, in both main and within street data sets. These finding indicates that the SPBD outperforms the gamma, exponential, the four parameters beta, and the generalized beta of the first kind distributions and provides the best fit for both main and within mosque data sets.
7 Summary
A new six parameters beta distribution is introduced, which has a more flexible shape and a wide bounded domain than the than the two (standard) and the four parameters beta distributions, and its properties consisting of, and some of its different various shapes are given to show its flexibility. Its boundaries, limits, mode, quantities, reli-ability and hazard functions, Renyi entropy, Lorenz and Bonferroni curves are studied. This distribution is closed under scaling and exponentiation, and has reflection sym-metry property, and has some well-known distributions as special cases, such as, the two and four parameters beta, generalized modification of the Kumaraswamy, general-ized beta of the first kind, the power function, Kumaraswamy power function, Mini-max, exponentiated Pareto, and the generalized uniform distributions. Its order statis-tics, moment generating function, with its moments consisting of the mean, variance, moments about the origin, harmonic, incomplete, probability weighted moments, and mean deviations are derived. The maximum likelihood estimation method is used
Table 6 Some statistics of the observed mosque data sets
Statistics Observed
Main Within
No. of observation 4539 3360Mean 7.0986 5.2372Standard error of mean 0.08194 0.07554Median 5.65258 4.4779Mode 0.685519 0.6196016SD 5.52032 4.37859Variance 19.172 30.474Skewness 0.706 1.162Standard error of skewness 0.036 0.042Kurtosis − 0.378 1.161Standard error of kurtosis 0.073 0.084Minimum 0.07 0.08Maximum 24.9 24.9Percentiles25 2.44012 1.4632150 5.5 4.477975 10.64123 7.60603
87
1 3
Annals of Data Science (2021) 8(1):57–90
Tabl
e 7
Obs
erve
d an
d pr
edic
ted
freq
uenc
ies,
mod
el p
aram
eter
s esti
mat
es a
nd g
oodn
ess o
f fit f
or m
osqu
e da
ta se
ts
Dat
a ra
nge
Mai
n str
eets
mos
que
With
in st
reet
s mos
que
Obs
erve
dPr
edic
ted
Obs
erve
dPr
edic
ted
Prop
osed
6
para
met
ers
beta
Gam
ma
Expo
nen-
tial
4 pa
ram
-et
ers b
eta
Gen
eral
-iz
ed B
ata
of th
e fir
st ki
nd
Prop
osed
6
para
met
ers
beta
Gam
ma
Expo
nen-
tial
4 pa
ram
-et
ers b
eta
Gen
eral
ized
B
ata
of th
e fir
st ki
nd
0.0–
1.0
599
589
259
596
699
533
523
517
380
583
611
536
1.1–
2.0
405
412
417
518
451
426
430
428
465
482
416
430
2.1–
3.0
373
362
460
450
381
375
372
374
441
398
346
367
3.1–
4.0
323
330
453
391
335
339
330
329
388
329
297
317
4.1–
5.0
310
306
423
339
300
310
285
288
329
272
257
276
5.1–
6.0
291
284
382
295
272
285
250
251
273
224
224
240
6.1–
7.0
260
265
337
256
247
263
215
217
223
185
196
208
7.1–
8.0
243
247
294
222
226
242
182
186
181
153
171
179
8.1–
9.0
219
229
253
193
207
223
160
158
145
127
148
154
9.1–
10.0
221
212
215
168
189
205
129
133
116
105
128
131
10.1
–11.
018
819
518
214
617
218
811
011
192
8611
011
111
.1–1
2.0
185
178
153
126
157
172
8691
7371
9493
12.1
–13.
015
116
212
811
014
215
675
7358
5980
7713
.1–1
4.0
143
145
107
9512
814
151
5845
4967
6214
.1–1
5.0
135
128
8983
115
126
4945
3540
5550
15.1
–16.
010
811
274
7210
211
236
3428
3344
3916
.1–1
6.0
9996
6162
9099
2625
2227
3530
17.1
–17.
071
8150
5478
8518
1717
2327
2218
.1–1
8.0
6866
4147
6772
1311
1319
2016
19.1
–20.
049
5234
4156
608
710
1514
10
88 Annals of Data Science (2021) 8(1):57–90
1 3
Tabl
e 7
(con
tinue
d)
Dat
a ra
nge
Mai
n str
eets
mos
que
With
in st
reet
s mos
que
Obs
erve
dPr
edic
ted
Obs
erve
dPr
edic
ted
Prop
osed
6
para
met
ers
beta
Gam
ma
Expo
nen-
tial
4 pa
ram
-et
ers b
eta
Gen
eral
-iz
ed B
ata
of th
e fir
st ki
nd
Prop
osed
6
para
met
ers
beta
Gam
ma
Expo
nen-
tial
4 pa
ram
-et
ers b
eta
Gen
eral
ized
B
ata
of th
e fir
st ki
nd
20.1
–21.
044
3828
3645
485
48
1310
621
.1–2
2.0
2426
2331
3536
32
611
63
22.1
–23.
018
1619
2725
252
15
93
223
.1–2
4.0
87
1523
1514
10
47
11
24.1
–25.
04
142
158
54
10
340
00
Tota
l45
3945
3945
3945
3945
3945
3933
6033
6033
6033
6033
6033
60M
odel
pa
ram
eter
sa
25.0
1238
� =
0.21
3�
= 0.
141
124
.971
11a
24.2
0246
� =
0.27
3�
= 0.
191
124
.511
778
b2.
0234
41𝛽
= 1.
643
10.
9999
99b
1.15
9141
𝛽 =
1.43
11
0.99
9999
𝛼0.
3814
510.
7533
0.89
295
α0.
7994
510.
8094
50.
9155
56𝛽
2.61
2456
2.00
145
2.23
9999
β3.
9012
452.
9782
53.
3177
8
A0.
0000
120
0A
0.00
0132
00
B0.
9989
1225
1B
1.03
8113
251
Goo
dnes
s of
Fit
�2
8.71
4173
6.85
416.
134
53.1
875
48.2
2419
�2
6.21
1310
1.44
111
2.32
452
.473
319
.767
7df
17*
21*
2320
*18
*df
14*
21*
2317
*16
*p
valu
e0.
9488
0.0
0.0
0.00
008
0.00
0141
p va
lue
0.96
088
0.0
0.0
0.00
0017
0.23
087
Like
lihoo
d ra
tio te
st (n
este
d)**
LRT
10.3
345
8.76
259
LRT
20.5
047
7.61
264
df2
2df
22
p va
lue
0.00
560.
0125
1p
valu
e0.
0000
40.
0222
2
*The
num
ber o
f int
erna
ls w
ere
adju
sted
in o
rder
to m
ake
the
expe
cted
num
ber o
f obs
erva
tions
in e
ach
inte
rval
equ
al to
or g
reat
er th
an 5
, whi
ch is
in te
rn e
ffect
ed th
e nu
m-
ber o
f the
deg
ree
of th
e fr
eedo
m**
The
4 pa
ram
eter
s bet
a di
strib
utio
n an
d th
e ge
nera
lized
bet
a of
the
first
kind
dist
ribut
ion
are
spec
ial c
ases
of t
he S
PBD
, see
Sec
t. 3.
7 ca
ses 1
and
4
89
1 3
Annals of Data Science (2021) 8(1):57–90
for estimating its parameters and applied to estimate the parameters of six different simulated data sets of this distribution having different pdf shapes, in order to check the performance of the estimation method through the estimated parameters mean squares errors computed from different simulated sample sizes, which are shown to be decreasing as the sample size increases, indicating that the MLE method is appropriate and can be used to estimate the parameters of the SPPBD models. Finally, two real life data sets, represent the waiting period of Muslim worshipers from the time of entering the mosque till the actual time of starting Alfajir pray in two different mosques, are used in order to show the usefulness and the flexibility of this distribution in appli-cation to real-life data sets. The MLE method was employed using these data set to estimate the parameters of the SPBD, the gamma, the exponential, the four parameters beta, and the generalized beta of the first kind distributions, and the Chi squares good-ness of fit test for these distributions, as well as, the LRT for the nested models of the SPB distribution, namely; the four parameters beta, and the generalized beta of the first kind distributions, were employed, and all the results through the p values of these tests, statistically, outperforms SPBDs over the other stated distributions.
Acknowledgements Open Access funding provided by the Qatar National Library. The publication of this article was funded by the Qatar National Library.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Com-mons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.
References
1. Sheskin DJ (2011) Handbook of parametric and nonparametric statistical procedures, 5th edn. Chapman and Hall, New York
Main Within
6 Para Beta Gamma Exponential 4 Para Beta G Beta Type I
5 10 15 20 25x0.00
0.05
0.10
0.15
f x
5 10 15 20 25x0.00
0.05
0.10
0.15
0.20
f x
Fig. 5 Histograms and the fitted pdfs for the Mosque data sets
90 Annals of Data Science (2021) 8(1):57–90
1 3
2. Shi Y (2014) Big data: history, current status, and challenges going forward. Bridge US Natl Acad Eng 44(4):6–11
3. Olson D, Shi Y (2007) Introduction to business data mining. McGraw-Hill, New York 4. Johnson NL, Kemp AW, Balakrishnan N (1995) Continuous univariate distributions, vol 2, 2nd edn.
Wiley, New York 5. Johnson NL, Kemp AW, Balakrishnan N (1995) Continuous univariate distributions, vol 1, 2nd edn.
Wiley, New York 6. Abramowitz M, Stegun IA (2013) Handbook of mathematical functions with formulas, graphs, and
mathematical tables. Dover, New York 7. Armero C, Bayarri MJ (1994) Prior assessments for prediction in queues. Statistician 43(1):139–153 8. Gordy MB (1998) Computationally convenient distributional assumptions for common-value auc-
tions. Comput Econ 12:61–78. https ://doi.org/10.1023/A:10086 45531 911 9. Pathan MA, Garg M, Agrawal J (2008) On a new generalized beta distribution. East West J Math
10(1):45–55 10. Srivastava HM, Manocha HL (1984) A treatise on generating functions. Ellis Horwood Ltd/Wiley,
Chichester/New York 11. Ng DWW, Koh SK, Sim SZ, Lee MC (2018) The study of properties on generalized Beta distribu-
tion. J Phys Conf Ser. https ://doi.org/10.1088/1742-6596/1132/1/01208 0 12. Gómez-Déniz E, Sarabia JM (2018) A family of generalised beta distributions: properties and appli-
cations. Ann Data Sci 5:401–420 13. Alshkaki RSA (2020) A generalized modification of the Kumaraswamy distribution for modeling
and analyzing real-life data. Stat Optim Inf Comput J 14. Kumaraswamy P (1980) A generalized probability density function for double-bounded random
processes. J Hydrol 46(1–2):79–88. https ://doi.org/10.1016/0022-1694(80)90036 -0 15. McDonald JB (1984) Some generalized functions for the size distribution of income. Econometrica
52:7–664 16. Abdul-Moniem IB (2017) The Kumaraswamy power function distribution. J Stat Appl Probab Lett
6(1):81–90 17. Gupta RC, Gupta PI, Gupta RD (1998) Modeling failure time data by Lehmann alternatives. Com-
mun Stat Theory Methods 27:887–904 18. Tiwari RC, Yang Y, Zalkikar JN (1996) Bayes estimation for the Pareto failure model using Gibbs
sampling. IEEE Trans Reliab 45(3):471–476 19. Forbes C, Evans M, Hastings N, Peacock B (2011) Statistical distributions, 4th edn. Wiley, New
York 20. Virchenko N, Kalla S, Al-Zamel A (2001) Some results on a generalized hypergeometric function.
Integral Transforms Spec Funct 12(1):89–100. https ://doi.org/10.1080/10652 46010 88193 36 21. Cordeiro GM, Nadarajah S, Ortega EMM (2012) The Kumaraswamy Gumbel distribution. Stat
Methods Appl 21(2):139–168. https ://doi.org/10.1007/s1026 0-011-0183-y 22. Shi Y, Tian YJ, Kou G, Peng Y, Li JP (2011) Optimization based data mining: theory and applica-
tions. Springer, Berlin 23. Henningsen A, Toomet O (2011) maxLik: a package for maximum likelihood estimation in R. Com-
put Stat 26(3):443–458. https ://doi.org/10.1007/s0018 0-010-0217-1 24. Stasinopoulos DM, Rigby RA (2008) Generalized additive models for location scale and shape
(GAMLSS) in R. J Stat Softw 23(2008):1–46. https ://doi.org/10.18637 /jss.v023.i07
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.