34
Vol.:(0123456789) Annals of Data Science (2021) 8(1):57–90 https://doi.org/10.1007/s40745-020-00282-0 1 3 A Six Parameters Beta Distribution with Application for Modeling Waiting Time of Muslim Early Morning Prayer Rafid S. A. Alshkaki 1 Received: 21 February 2020 / Revised: 19 April 2020 / Accepted: 24 April 2020 / Published online: 18 May 2020 © The Author(s) 2020 Abstract Beta distribution is a well-known and widely used distribution for modeling and analyzing lifetime data, due to its interesting characteristics. In this paper, a six parameters beta distribution is introduced as a generalization of the two (standard) and the four parameters beta distributions. This distribution is closed under scal- ing and exponentiation, and has reflection symmetry property, has some well-known distributions as special cases, such as, the two and four parameters beta, general- ized modification of the Kumaraswamy, generalized beta of the first kind, the power function, Kumaraswamy power function, Minimax, exponentiated Pareto, and the generalized uniform distributions. Its moments about the origin, moment generating function, incomplete moments, mean deviations, are derived. The maximum likeli- hood estimation method is used for estimating its parameters and applied to estimate the parameters of the six different simulated data sets of this distribution, in order to check the performance of the estimation method through the estimated parameters mean squares errors computed from the different simulated sample sizes. Finally, two real life data sets, represent the waiting period of Muslim worshipers from the time of entering the mosque till the actual time of starting Alfajir pray in two differ- ent mosques, were used to illustrate the usefulness and the flexibility of this distribu- tion, as well as, presents better fitting than the other gamma, exponential, the four parameters beta, and the generalized beta of the first kind distributions Keywords Beta distribution · Maximum likelihood estimator · Moments · Simulation study · Applications Mathematics Subject Classification 60E05 · 62E15 · 65C05 * Rafid S. A. Alshkaki rafi[email protected] 1 Ahmed Bin Mohammed Military College, Doha, Qatar

A Six Parameters Beta Distribution with Application for

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Six Parameters Beta Distribution with Application for

Vol.:(0123456789)

Annals of Data Science (2021) 8(1):57–90https://doi.org/10.1007/s40745-020-00282-0

1 3

A Six Parameters Beta Distribution with Application for Modeling Waiting Time of Muslim Early Morning Prayer

Rafid S. A. Alshkaki1

Received: 21 February 2020 / Revised: 19 April 2020 / Accepted: 24 April 2020 / Published online: 18 May 2020 © The Author(s) 2020

AbstractBeta distribution is a well-known and widely used distribution for modeling and analyzing lifetime data, due to its interesting characteristics. In this paper, a six parameters beta distribution is introduced as a generalization of the two (standard) and the four parameters beta distributions. This distribution is closed under scal-ing and exponentiation, and has reflection symmetry property, has some well-known distributions as special cases, such as, the two and four parameters beta, general-ized modification of the Kumaraswamy, generalized beta of the first kind, the power function, Kumaraswamy power function, Minimax, exponentiated Pareto, and the generalized uniform distributions. Its moments about the origin, moment generating function, incomplete moments, mean deviations, are derived. The maximum likeli-hood estimation method is used for estimating its parameters and applied to estimate the parameters of the six different simulated data sets of this distribution, in order to check the performance of the estimation method through the estimated parameters mean squares errors computed from the different simulated sample sizes. Finally, two real life data sets, represent the waiting period of Muslim worshipers from the time of entering the mosque till the actual time of starting Alfajir pray in two differ-ent mosques, were used to illustrate the usefulness and the flexibility of this distribu-tion, as well as, presents better fitting than the other gamma, exponential, the four parameters beta, and the generalized beta of the first kind distributions

Keywords Beta distribution · Maximum likelihood estimator · Moments · Simulation study · Applications

Mathematics Subject Classification 60E05 · 62E15 · 65C05

* Rafid S. A. Alshkaki [email protected]

1 Ahmed Bin Mohammed Military College, Doha, Qatar

Page 2: A Six Parameters Beta Distribution with Application for

58 Annals of Data Science (2021) 8(1):57–90

1 3

1 Introduction

Due to its interesting characteristics, the beta distribution is one of the well-known con-tinuous distribution, that has a wide range of application in various filed, such as reli-ability applications and production quality control. It has a flexible shape, that reflects a wide range of natural and empirical phenomena in nature and reality that can be modelling with this distribution. Its domain, the interval from zero to one, add another interesting characteristic to this distribution by allowing it to consider as a probabil-ity distribution of probabilities, such as fraction of time, measurements whose values (or relative values) all lie between zero and one, or the random behavior of percent-ages and fractions, especially, in the cases when we have no idea about the probabil-ity, and therefore, it can be used to represents all probabilities. Another area that used beta distribution for representing possible values of probabilities or a distribution of the probabilities is the Bayesian studies, as being the prior distribution, that is widely used. In fact, it is one of the three common distributions, with the rectangular/uniform and normal distributions, that are employed to represents within the framework Bayesian analysis of continuous variables, Sheskin [1, p. 397]. Data mining methods and tech-niques need to use information about the prior probability knowledge, hence the beta distribution is representing a candidate for such situations, see Shi [2], and Olson and Shi [3] for further details. For an intensive reference of the beta distribution see John-son et al. [4, p. 210–275].

The probability density function (pdf) of the four parameters beta distribution, John-son et al. [5, p. 210], is given by;

where, the parameters �, �, a and b satisfy that 𝛼 > 0, 𝛽 > 0, a and b are real number such that a < b , B(�, �) is the beta function, Abramowitz and Stegun [6, p. 258], defined by;

and � (�), the gamma function, Abramowitz and Stegun [6, p. 255], defined by;

The common widely used form of beta distribution in the literature, is the pdf given by;

(1)f (t) =(t − a)𝛼−1(b − t)𝛽−1

B(𝛼, 𝛽)(b − a)𝛼+𝛽−1, a < t < b,

(2)B(�, �) =

1

∫0

t�−1(1 − t)�−1dt =� (�)� (�)

� (� + �)

(3)� (�) =

1

∫0

t�−1e−tdt

(4)f (x) =x𝛼−1(1 − x)𝛽−1

B(𝛼, 𝛽), 0 < x < 1,

Page 3: A Six Parameters Beta Distribution with Application for

59

1 3

Annals of Data Science (2021) 8(1):57–90

This two parameters form is called sometimes, the standard beta distribution, which is obtained from (1) by making the transformation; x = t−a

b−a.

One direction of the research employing the beta distribution is the generalization of the form given by (4), in order to be even more flexible and cover a lot of shapes.

Armero and Bayarri [7] introduced the Gauss hypergeometric distribution, with parameters p , q, r and �, as a generalization to the beta distribution when they stud-ied a Bayesian queuing theory problem, with the following pdf;

where p > 0, q > 0, −∞ < r < ∞ , 𝛿 > −1, and F(2,1) is the generalized hypergeo-metric function defined for non-negative integers n and m by;

and (a)k is defined by;

Gordy [8] introduced the confluent hypergeometric distribution, with parameters p , q and s, with pdf given by;

where p > 0, q > 0 and −∞ < s < ∞.Pathan et al. [9] introduced a five parameters distribution as a generalization beta

distribution, called it generalized beta distribution, with pdf given by;

where the parameters �, �, �, � and � satisfy that 𝛼 > 0, 𝛽 > 0, 0 ≤ 𝜎 < 1, � and � are real numbers and Φ1(.) is the Humbert’s confluent hypergeometric function given in Srivastava and Manocha [10, p. 58, Eq. (36)], and derive expressions for its distribu-tion function moments.

Ng et al. [11] study the properties and evaluate the prediction level of a 6 param-eters generalized beta distribution model with pdf given by;

(5)f (x) =xp−1(1 − x)q−1(1 + 𝛿x)−r

B(p, q)F(2,1)(r, p;p + q; − 𝛿), 0 < x < 1,

(6)F(n,m)

(a1,… , an;b1,… , bm;z

)=

∞∑k=0

(a1)k…

(an)k(

b1)k…

(bm

)k

zk

k!

(7)(a)k =

{0 if k = 0

a(a + 1)… (a + k − 1), k = 1, 2, 3,…

f (x) =xp−1(1 − x)q−1exp(−sx)

B(p, q)F(1,1)(p, p + q,−s), 0 < x < 1,

f (x) =x𝛼−1(1 − x)𝛽−1(1 − 𝜎x)𝜌−1exp(−𝛾x)

B(𝛼, 𝛽)Φ1(𝛼, 𝜌;𝛼 + 𝛽;𝜎,−𝛾), 0 < x < 1,

(8)

f (x) =

𝛤 (𝛾+𝜌−𝛼)𝛤 (𝛾+𝜌−𝛽)

𝛤 (𝛾+𝜌)𝛤 (𝛾+𝜌−𝛼−𝛽)(1 − z)𝜎x𝛾−1(1 − x)𝜌−1(1 − 𝜎x)𝜌−1(1 − zx)−𝜎F(2,1)(𝛼, 𝛽;𝛾;x)

F(3,2)

(𝜌, 𝜎;𝛾 + 𝜌 − 𝛼 − 𝛽;𝛾 + 𝜌 − 𝛼, 𝛾 + 𝜌 − 𝛽; z

z−1

)B(𝛾 , 𝜌)

0 < x < 1,

Page 4: A Six Parameters Beta Distribution with Application for

60 Annals of Data Science (2021) 8(1):57–90

1 3

where the parameters �, �, � , �, z and � satisfy that 𝛼 > 0, 𝛽 > 0, 𝛾 > 0, 𝜎 > 0, z < 0.5 and 𝜌 > 𝛼 + 𝛽 − 𝛾 .

Although Ng et al. [11], who provided a nice literature review for the beta family, showed interesting advances of this distribution with pdf given by (8) for fitting many different types of data, as well as that of Armero and Bayarri [7], Gordy [8] and Pathan et al. [9], but it is not easy to work with empirically. Finally, Gómez-Déniz and Sarabia [12] introduced a generalization of the standard beta distribution with bounded support, and study some of its basic properties, the behavior of its maximum likelihood estima-tors through simulation and derive its multivariate version.

The rest of the paper is organized as follows. Section 2 defines the six parameters beta distribution (SPBD). Section 3 gives some properties of this distribution, these properties are; boundaries and some limits of the pdf of SPBD, series expansion of its pdf, its mode, quantile function, reliability function, hazard function, special cases of SPBD, some transformation of the SPBD, its scaling, exponentiation and reflection symmetry properties, generation of its random variates, its order statistics distribu-tion, moments about the origin, mean and variance, moment generating function, har-monic mean, incomplete moments, mean deviations, probability weighted moments, Renyi entropy, and Lorenz and Bonferroni curves. Section 4 introduces estimation of its parameters using the method of maximum likelihood estimation (MLE). Section 5 gives six miscellaneous simulation study of the SPBD to check the performance of the MLE. Section 6 uses the SPBD and other nested and related distributions to fit two dif-ferent real-life data. Finally, Sect. 7 ends with conclusions.

2 The Six Parameters Beta Distribution

Let 0 < a, b, 𝛼, 𝛽,A,B < ∞ , such that A < B , and define the function f by:

where B(�, �) is the beta function defined by (2). We will write f (x) instead of f (x;a, b, �, �,A,B) for simplicity. We have the following proposition;

Proposition 1 The function f defined by (9) is a pdf with its cumulative distribu-tion function (CDF) F given by;

(9)

f (x;a, b, 𝛼, 𝛽,A,B) =

⎧⎪⎨⎪⎩

bxb−1

ab(B−A)𝛼+𝛽−1B(𝛼,𝛽)

��x

a

�b

− A

�𝛼−1�B −

�x

a

�b�𝛽−1

, aA1

b < x < aB1

b

0 otherwise,

(10)FX(x;a, b, c, 𝛼, 𝛽) =

⎧⎪⎪⎨⎪⎪⎩

0, x ≤ a𝛼1

b

1

B(𝛼,𝛽)B

��x

a

�b

−A

B−A;𝛼, 𝛽

�, aA

1

b < x < aB1

b

1, x ≥ a𝛽1

b

Page 5: A Six Parameters Beta Distribution with Application for

61

1 3

Annals of Data Science (2021) 8(1):57–90

where B(z;�, �) is the incomplete beta function, Abramowitz and Stegun [6, p. 263], defined by;

Proof Since 0 < a, b, 𝛼, 𝛽,A,B < ∞ , and aA1

b < x < aB1

b , then A <(

x

a

)b

< B ,

hence (

x

a

)b

− A > 0 and also B −(

x

a

)b

> 0 , implying that f given in (9) is non-neg-ative. Now;

Let (

x

a

)b

− A = (B − A)t , then x = a[(B − A)t + A]1

b , and dx =

a(B−A)

b[(B − A)t + A]

1

b−1dt , then;

Hence, +∞∫−∞

f (x)dx = 1 . It follows that, for any x such that, a𝛼1

b < x < a𝛽1

b;

Now by using the transformation z =(

t

a

)b

−A

B−A , (12) reduces to;

(11)B(z;�, �) =

z

∫0

t�−1(1 − t)�−1dt.

+∞

∫−∞

f (x)dx =

aB1b

∫aA

1b

bxb−1

ab(B − A)�+�−1B(�, �)

[(x

a

)b

− A

]�−1[B −

(x

a

)b]�−1

dx

=b

ab(B − A)�+�−1B(�, �)

aB1b

∫aA

1b

xb−1[(

x

a

)b

− A

]�−1[B −

(x

a

)b]�−1

dx

aB1b

∫aA

1b

xb−1[(

x

a

)b

− A

]�−1[B −

(x

a

)b]�−1

dx =ab

b(B − A)�+�−1

1

∫0

t�−1e−tdt

=ab

b(B − A)�+�−1B(�, �)

(12)

FX(x) =

x

∫−∞

f (t)dt

=

x

∫aA

1b

btb−1

ab(B − A)�+�−1B(�, �)

[(t

a

)b

− A

]�−1[B −

(t

a

)b]�−1

dt

Page 6: A Six Parameters Beta Distribution with Application for

62 Annals of Data Science (2021) 8(1):57–90

1 3

from which we get (10).We note that the FX can be written, for aA

1

b < x < aB1

b , in the form;

where I(z;�, �) is the regularized incomplete beta function, Abramowitz and Stegun [6, p. 263], defined by;

Definition of  the  SPBD The rv X is said to have a SPBD with parameters a, b, �, �,A andB written as X ∼ SPBD(a, b, �, �,A,B ), if its pdf is given by (9), or equivalently, its CDF is given by (10) or (13).

Figure 1 shows some plots of the pdf of the SPBD for some of its parameter’s val-ues, inducting that this distribution has a lot of different flexible shapes.

3 Some Characteristics of the SPBD

3.1 Boundaries and Some Limits of the pdf

Let us study the behavior of the pdf of the SPBD(a, b, �, �,A,B ) at certain points. At the boundary’s points, we have from (9) for 0 < 𝛽 < ∞ , that;

Therefore;

=1

B(�, �)

( xa )

b−A

B−A

∫0

z�−1e−tdz

(13)FX(x) = I

⎛⎜⎜⎜⎝

�x

a

�b

− A

B − A; �, �

⎞⎟⎟⎟⎠

(14)I(z;�, �) =1

B(�, �)

z

∫0

t�−1(1 − t)�−1dt =B(z;�, �)

B(�, �)

f�aA

1

b

�=

⎧⎪⎨⎪⎩

∞, 0 < 𝛼 < 1

𝛽bA1−

1b

a(B−A), 𝛼 = 1

0, 𝛼 > 1

Page 7: A Six Parameters Beta Distribution with Application for

63

1 3

Annals of Data Science (2021) 8(1):57–90

1.5 2.0 2.5 3.0x

0.5

1.0

1.5

2.0f x

0.2 0.4 0.6 0.8 1.0 1.2 1.4x

0.2

0.4

0.6

0.8

1.0

1.2

1.4

f x

0.5 1.0 1.5 2.0 2.5 3.0x

0.2

0.4

0.6

0.8

1.0

f x

a 2, b 1.26, u 1, v 4.65, A 0.5, B 1.8

a 2, b 1.26, u 1, v 1.27, A 0.5, B 1.8

a 2, b 1.26, u 1, v 1.85, A 0.5, B 1.8

a 2, b 1.26, u 1, v 2.65, A 0.5, B 1.8

a 2, b 1.26, u 1, v 3.86, A 0.5, B 1.8

a 1.5, b 3.25, u 1.33, v 5.65, A 0, B 1

a 1.5, b 3.25, u 1.33, v 1.27, A 0, B 1

a 1.5, b 3.25, u 1.33, v 1.85, A 0, B 1

a 1.5, b 3.25, u 1.33, v 2.65, A 0, B 1

a 1.5, b 3.25, u 1.33, v 3.86, A 0, B 1

a 2, b 0.45, u 2.15, v 0.15, A 0.4, B 1.2

a 2, b 0.45, u 2.15, v 0.3, A 0.4, B 1.2

a 2, b 0.45, u 2.15, v 0.5, A 0.4, B 1.2

a 2, b 0.45, u 2.15, v 0.65, A 0.4, B 1.2

a 2, b 0.45, u 2.15, v 0.86, A 0.4, B 1.2

0.2 0.4 0.6 0.8 1.0 1.2 1.4x

0.2

0.4

0.6

0.8

1.0

1.2

f x

1.5 2.0 2.5 3.0x

0.2

0.4

0.6

0.8

1.0

1.2

f x

1.5 2.0 2.5 3.0x

0.2

0.4

0.6

0.8

1.0

f x

a 1.5, b 3.25, u 0.33, v 5.65, A 0, B 1

a 1.5, b 3.25, u 0.33, v 1.27, A 0, B 1

a 1.5, b 3.25, u 0.33, v 1.85, A 0, B 1

a 1.5, b 3.25, u 0.33, v 2.65, A 0, B 1

a 1.5, b 3.25, u 0.33, v 3.86, A 0, B 1

a 2, b 1.26, u 1.33, v 4.65, A 0.5, B 1.8

a 2, b 1.26, u 1.33, v 4.65, A 0.5, B 1.8

a 2, b 1.26, u 1.33, v 1.8, A 0.5, B 1.8

a 2, b 1.26, u 1.33, v 2.25, A 0.5, B 1.8

a 2, b 1.26, u 1.33, v 3.86, A 0.5, B 1.8

a 2, b 1.26, u 2.33, v 4.65, A 0.5, B 1.8

a 2, b 1.26, u 2.33, v 1.27, A 0.5, B 1.8

a 2, b 1.26, u 2.33, v 1.85, A 0.5, B 1.8

a 2, b 1.26, u 2.33, v 2.65, A 0.5, B 1.8

a 2, b 1.26, u 2.33, v 3.86, A 0.5, B 1.8

0.2 0.4 0.6 0.8 1.0 1.2 1.4x

0.5

1.0

1.5

2.0

2.5

f x

0.6 0.8 1.0 1.2 1.4x

0.5

1.0

1.5

f x

0.2 0.4 0.6 0.8 1.0 1.2 1.4x

0.5

1.0

1.5

2.0

f x

a 1.5, b 2.15, u 0.33, v 5.65, A 0, B 1

a 1.5, b 2.15, u 0.33, v 1.27, A 0, B 1

a 1.5, b 2.15, u 0.33, v 1.85, A 0, B 1

a 1.5, b 2.15, u 0.33, v 2.65, A 0, B 1

a 1.5, b 2.15, u 0.33, v 3.86, A 0, B 1

a 1.5, b 3.1, u 0.93, v 5.65, A 0.02, B 1.1

a 1.5, b 3.1, u 0.93, v 1.27, A 0.02, B 1.1

a 1.5, b 3.1, u 0.93, v 1.85, A 0.02, B 1.1

a 1.5, b 3.1, u 0.93, v 2.65, A 0.02, B 1.1

a 1.5, b 3.1, u 0.93, v 3.86, A 0.02, B 1.1

a 1.5, b 3.25, u 1.33, v 5.65, A 0, B 1

a 1.5, b 3.25, u 1.33, v 1.27, A 0, B 1

a 1.5, b 3.25, u 1.33, v 1.85, A 0, B 1

a 1.5, b 3.25, u 1.33, v 2.65, A 0, B 1

a 1.5, b 3.25, u 1.33, v 3.86, A 0, B 1

0.5 1.0 1.5x

0.2

0.4

0.6

0.8

1.0

1.2

1.4f x

0.8 1.0 1.2 1.4 1.6x

0.5

1.0

1.5

2.0

f x

0.5 1.0 1.5 2.0x

0.2

0.4

0.6

0.8

f x

a 1.5, b 1.97, u 0.95, v 5.65, A 0.05, B 1.7

a 1.5, b 1.97, u 0.95, v 4.27, A 0.05, B 1.7

a 1.5, b 1.97, u 0.95, v 1.85, A 0.05, B 1.7

a 1.5, b 1.97, u 0.95, v 2.65, A 0.05, B 1.7

a 1.5, b 1.97, u 0.95, v 3.26, A 0.05, B 1.7

a 1.5, b 5.75, u 0.43, v 5.65, A 0.01, B 2

a 1.5, b 5.75, u 0.43, v 1.27, A 0.01, B 2

a 1.5, b 5.75, u 0.43, v 1.85, A 0.01, B 2

a 1.5, b 5.75, u 0.43, v 2.65, A 0.01, B 2

a 1.5, b 5.75, u 0.43, v 3.86, A 0.01, B 2

a 2, b 3.25, u 0.43, v 5.65, A 0, B 1

a 2, b 3.25, u 0.43, v 1.27, A 0, B 1

a 2, b 3.25, u 0.43, v 1.85, A 0, B 1

a 2, b 3.25, u 0.43, v 2.65, A 0, B 1

a 2, b 3.25, u 0.43, v 3.86, A 0, B 1

Fig. 1 Different pdf plots of the SPBD models

Page 8: A Six Parameters Beta Distribution with Application for

64 Annals of Data Science (2021) 8(1):57–90

1 3

Similarly;

Therefore;

3.2 Series Expansion

Proposition 2 The function f given by (9) can be written in the following expan-sion series.

where

Proof Since 0 < a and aA1

b < x , then A <(

x

a

)b

, that is A(x

a

)b < 1 , we can write

Therefore, using the binomial series expansion, Abramowitz and Stegun [6, p. 14], we can write;

lima→0+

f(aA

1

b

)= lim

b→∞f(aA

1

b

)= lim

�→∞f(aA

1

b

)= ∞,

lima→∞

f(aA

1

b

)= lim

b→0+f(aA

1

b

)= lim

�→0+f(aA

1

b

)= lim

B→∞f(aA

1

b

)= 0,

limA→0+

f(aA

1

b

)= 0 if b ≠ 1, and lim

A→0+f(aA

1

b

)=

�b

aBif b = 1

f�aA

1

b

�=

⎧⎪⎨⎪⎩

∞, 0 < 𝛽 < 1

𝛼bB1−

1b

a(B−A), 𝛽 = 1

0, 𝛽 > 1

lima→0+

f(aB

1

b

)= lim

b→∞f(aB

1

b

)= lim

�→∞f(aB

1

b

)= lim

�→0+f(aB

1

b

)= ∞,

lima→∞

f(aB

1

b

)= lim

b→0+f(aB

1

b

)= lim

�→0+f(aB

1

b

)= lim

�→∞f(aB

1

b

)= 0,

and limA→0+

f(aB

1

b

)=

�b

aB−

1

b

(15)

f (x;a, b, �, �,A,B) =bB�−1

(B − A)�+�−1B(�, �)

∞∑i=0

∞∑j=0

C(i, j;a, b, �, �,A,B)xb(�+j−i)−1

C(i, j;a, b, �, �,A,B) = (−1)i+j(� − 1

i

)(� − 1

j

)Ai

ab(�+j−i)Bj

��x

a

�b

− A

��−1=�x

a

�b(�−1)⎡⎢⎢⎢⎣1 −

A�x

a

�b

⎤⎥⎥⎥⎦

�−1

Page 9: A Six Parameters Beta Distribution with Application for

65

1 3

Annals of Data Science (2021) 8(1):57–90

Similarly, we have that;

Hence, using (16) and (17) into the function f given by (9) we get (15).□

3.3 The Mode

For aA1

b < x < aB1

b , we can see that the pdf of the SPBD satisfies the following;

Therefore, �

�xf (x) = 0, is equivalent to either f (x) = 0, which is discussed in

Sect. 3.1 above, or

Multiplying (19) by ax[(

x

a

)b

− A

][B −

(x

a

)b] , and setting y =

(x

a

)b

, it reduces

to;

where

Let discuss the real roots of (20), according to the following cases.

(16)��

x

a

�b

− A

��−1=�x

a

�b(�−1)∞�i=0

(−1)i�� − 1

i

�⎡⎢⎢⎢⎣A�x

a

�b

⎤⎥⎥⎥⎦

i

(17)�B −

�x

a

�b��−1

= B�−1

∞�j=0

(−1)j�� − 1

j

�⎡⎢⎢⎢⎣

�x

a

�b

B

⎤⎥⎥⎥⎦

j

(18)�

�xf (x) =

⎧⎪⎪⎨⎪⎪⎩

b

a(� − 1)

�x

a

�b−1

�x

a

�b

− A

b(� − 1)�

x

a

�b−1

a

�B −

�x

a

�b� +

b − 1

x

⎫⎪⎪⎬⎪⎪⎭

f (x)

(19)

b

a(� − 1)

(x

a

)b−1

(x

a

)b

− A

b(� − 1)(

x

a

)b−1

a

[B −

(x

a

)b] +

b − 1

x= 0

(20)c1y2 + c2y + c3 = 0

c1 = b(� + � − 1) − 1

c2 = (1 − b�)B + b(1 − b�)A

c3 = (b − 1)AB

Page 10: A Six Parameters Beta Distribution with Application for

66 Annals of Data Science (2021) 8(1):57–90

1 3

Case 1 If � + � ≠ 1 , b =1

�+�−1 , and (1 − b�)B ≠ b(1 − b�)A , that is when c1 = 0

and c2 ≠ 0 , then (20) has a single root given by;

Hence; the root in term of x is given by;

Case 2 If b ≠ 1

�+�−1 , that is c1 ≠ 0 , then the real roots of (20) in terms of x, that is

when c22− 4c1c3 ≥ 0 , are given by;

Since �2

�x2f(xi), for i = 1, 2, and 3 is not easy to be evaluated, an empirical evalu-

ation has to be studied to see at which point xi we have a local maximum in order to determined the mode of the SPBD.

3.4 Quantile Function

Let 0 < p < 1 , then the quantile function of the rv X ∼ SPBD(a, b, �, �,A,B ), Q , is defined by;

can be found using (13), to be;

where I−1 is the inverse of regularized incomplete beta function.In particular, the median of X, Med(X) ; is given by;

Table  1 represents parameters values and domain ranges of the some selected SPBD data sets, which has different shapes and domain range, that will use for our simulation study in Sect.  5, as well as, will be used for computing of certain

y =−(b − 1)AB

(1 − b�)B + b(1 − b�)A

x1 = a

[(1 − b)AB

(1 − b�)B + b(1 − b�)A

] 1

b

x2 = a

⎡⎢⎢⎢⎣

(b� − 1)B + b(b� − 1)A −

�((1 − b�)B + b(1 − b�)A)2 − 4(b − 1)AB[b(� + � − 1) − 1]

2b(� + � − 1) − 2

⎤⎥⎥⎥⎦

1

b

x3 = a

⎡⎢⎢⎢⎣

(b� − 1)B + b(b� − 1)A +

�((1 − b�)B + b(1 − b�)A)2 − 4(b − 1)AB[b(� + � − 1) − 1]

2b(� + � − 1) − 2

⎤⎥⎥⎥⎦

1

b

Q(u) = inf{x ∈ ℝ;p ≤ F(x)}

(21)Q(p) = a[A + (B − A)I−1(p;�, �)

] 1

b

Med(X) = a[A + (B − A)I−1(0.5;�, �)

] 1

b

Page 11: A Six Parameters Beta Distribution with Application for

67

1 3

Annals of Data Science (2021) 8(1):57–90

statistics of SPBD later in this section, while Fig. 2 represents the plots of the quan-tile functions of these SPBD data sets.

3.5 Reliability Function

The reliability (survival) function of X ∼ SPBD(a, b, �, �,A,B ) using (13), is given by;

3.6 Hazard Function

The hazard function , h(x), of the rv X ∼ SPBD(a, b, �, �,A,B ), using (9) and (13), is given for aA

1

b < x < aB1

b , by;

(22)R(x) = 1 − F(x) = 1 − I

⎛⎜⎜⎜⎝

�x

a

�b

− A

B − A;𝛼, 𝛽

⎞⎟⎟⎟⎠, aA

1

b < x < aB1

b ,

Table 1 Parameters values of the some selected SPBD data sets

Data set Parameters Domain Range

a b α β A B Minimum Maximum

1 1.8 2.3 1.4 3.9 0 1 0 1.82 1.5 3.1 0.93 2.65 0.015 1.1 0.387020171 1.5468341023 1.5 5.75 0.43 5.65 0.01 2 0.673387689 1.692171214 2 1.3 1.6 3.8 0.5 1.8 1.17346046 3.1433552145 2 1.2 2.3 1.8 0.5 1.8 1.122462048 3.2640521086 2 0.45 2.15 0.65 0.4 1.2 0.261047095 2.999081861

0.2 0.4 0.6 0.8 1.0p

0.5

1.0

1.5

Q p

0.2 0.4 0.6 0.8 1.0p

0.5

1.0

1.5

2.0

2.5

3.0

Q pDS 1,

DS 2

DS 3

DS 4

DS 5

DS 6

Fig. 2 Plots of the quantile function of the six selected SPBD data sets

Page 12: A Six Parameters Beta Distribution with Application for

68 Annals of Data Science (2021) 8(1):57–90

1 3

3.7 Special Cases of SPBD

1. The SPBD(1, 1, p, q, a, b ) is the 4 parameters Beta distribution, Johnson et al. [4, p. 210], with pdf;

2. The SPBD(a, b, 1, c, �, �) is the generalized modification of the Kumaraswamy distribution, Alshkaki [13], with pdf;

3. The SPBD(1, a, 1, b, 0, 1 is the Kumaraswamy distribution, Kumaraswamy [14], with pdf;

4. The SPBD(a, b, p.q, 0, 1) is the generalized beta of the first kind distribution, McDonald [15], with pdf;

5. The SPBD(1, 1, 1, 1, 0, 1) is the standard uniform distribution with pdf;

6. The SPBD(1, 2, 1, 1, 0, 1) is the triangular distribution with pdf;

7. The SPBD(1, 1, 1, �, �, �) is the power function distribution with pdf;

8. The SPBD(�, a�, 1, b, 0, 1) is the Kumaraswamy power function distribution, Abdul-Moniem [16], with pdf;

h(x) =f(x)

1 − F(x)=

bxb−1

ab(B−A)�+�−1B(�,�)

[(x

a

)b

− A

]�−1[B −

(x

a

)b]�−1

1 − I

((x

a

)b

−A

B−A;�, �

)

f (x) =1

B(p, q)

(x − a)p−1(b − x)q−1

(b − a)p+q−1, a < x < b

f (x) =bc

a(𝛽 − 𝛼)−c

(x

a

)b−1[𝛽 −

(x

a

)b]c−1

, a𝛼1

b < x < a𝛽1

b

f (x) = abxa−1[1 − xa

]b−1, 0 < x < 1

f (x) =|a|

bapB(p, q)xap−1

[1 −

(x

a

)b]q−1

, 0 < x < 1

f (x) = 1, 0 < x < 1

f (x) = 2x, 0 < x < 1

f (x) =𝛿

(𝛽 − 𝛼)

[𝛽 − x

𝛽 − 𝛼

]𝛿−1, 𝛼 < x < 𝛽

Page 13: A Six Parameters Beta Distribution with Application for

69

1 3

Annals of Data Science (2021) 8(1):57–90

9. The SPBD(1, �, 1, � , 0, 1) is the Minimax distribution, McDonald [15], with pdf;

10. The SPBD(c,−α, 1, θ, 0, 1) is the exponentiated Pareto distribution, Gupta et al. [17], with pdf;

11. The SPBD(�, �, 1, 1, 0, 1) is the generalized uniform distribution, Tiwari et al. [18], with pdf;

We may note that the Kumaraswamy (Case 4), standard uniform (Case 5), tri-angular (Case 6), Kumaraswamy power function (Case 8), minimax (Case 9), Pareto (Case 10), and the generalized uniform (Case 11) distribution are all special cases of the generalized beta of the first kind distribution (Case 4).

3.8 Transformations

Lemma 2

1. Let the rv U has the standard uniform distribution, U(0, 1 ), and the rv X defined by X = a

[A + (B − A)I−1(U, �, �)

] 1

b , then X ∼ SPBD(a, b, �, �,A,B).

2. Let the rv X ∼ SPBD(1, 1, 1, 1, 0, 1 ), then the rv Y defined by Y =(

1−X

X

) 1

δ

⋅ e−

γ

δ has a log-logistic distribution with parameters δ and γ , Johnson et al. [4, p. 151], with CDF given by;

3. Let the rv X ∼ SPBD(a, b, 1, c,A,B ), then the rv Y defined by Y = B −(

x

a

)b

has the generalized uniform distribution, Tiwari et al. [18], with CDF given by;

4. Let the r v X ∼ SPBD(a, b, 1, c,A,B) then the r v Y def ined by

Y = (B − A)−1[B −

(x

a

)b] has the beta distribution with parameters 1 and c , with

CDF given by;

f (x) =ab𝜃

𝜆

(x

𝜆

)a𝜃−1[1 −

(x

𝜆

)a𝜃]b−1

, 0 < x < 𝜆

f (x) = 𝛽𝛾x𝛽−1[1 − x𝛽

]𝛾−1, 0 < x < 1

f (x) = 𝜃𝛼c𝛼x−(a+1)[1 −

(x

c

)−a]𝜃−1, 0 < c < x

f (x) =𝛽

𝛼

(x

𝛼

)𝛽−1

, 0 < x < 𝛼

FY (y) = 1 −[1 + y�e�

]−1, y ≥ 0

FY (y) =[ y

B − A

]c, 0 ≤ y ≤ B − A

Page 14: A Six Parameters Beta Distribution with Application for

70 Annals of Data Science (2021) 8(1):57–90

1 3

5. Let the rv X ∼ SPBD(1, b, 1, 1, 0, 1 ), then the rv Y defined by Y = � − b2log(X) has an exponential distribution with parameters � and b , Johnson et al. [5, p. 494], with CDF given by;

6. Let the r v X ∼ SPBD(1, b, 1, 1, 0, 1 ) , then the r v Y def ined by Y = � − �log

[−blog

(Xb

)] has a Gumbel (generalized extreme value type-I) dis-

tribution with parameters � and � , Forbes et al. [19, p. 98], with CDF given by;

7. Let the rv X ∼ SPBD(1, b, 1, 1, 0, 1 ), then the rv Y defined by Y = � + �log(

Xb

1−Xb

)

has a logistic distribution with parameters � and � , Johnson et al. [4, p. 115], with CDF given by;

8. Let the rv X ∼ SPBD(1, b, 1, 1, 0, 1) , then the rv Y defined by Y =k

X , where k is

a positive constant, has a Pareto distribution with parameters k and b , Johnson et al. [4, p. 574], with CDF given by;

9. Let the rv X ∼ SPBD(1, b, 1, 1, 0, 1) , then the rv Y defined by Y = � + �[−log

(Xb

)] 1

� has a Weibull distribution with parameters � , �, and � , Johnson et al. [4, p. 629], with its CDF given by;

Proof For case (1), we have;

Therefore, the rv X ∼ SPBD(a, b, �, �,A,B).Proof of cases (2) through (9) can be shown on the same lines as the proof of (1).

FY (y) = yc, 0 ≤ y ≤ 1

FY (y) = 1 − e−(

y−𝜃

b

), x > 𝜃

FY (y) = e−e−( y−�

� )

FY (y) =

[1 + e

−(

y−�

)]−1

FY (y) = 1 −

(k

y

)b

FY (y) = 1 − e−(

y−�

)�

FX(x) = P(X ≤ x) = P

�a�A + (B − A)I−1(U;�, �)

� 1

b ≤ x

= P

⎛⎜⎜⎜⎝I−1(U;�, �) ≤

�x

a

�b

− A

B − A

⎞⎟⎟⎟⎠= P

⎛⎜⎜⎜⎝U ≤ I

⎛⎜⎜⎜⎝

�x

a

�b

− A

B − A;�, �

⎞⎟⎟⎟⎠

⎞⎟⎟⎟⎠= I

⎛⎜⎜⎜⎝

�x

a

�b

− A

B − A;�, �

⎞⎟⎟⎟⎠

Page 15: A Six Parameters Beta Distribution with Application for

71

1 3

Annals of Data Science (2021) 8(1):57–90

We may note that the SPBDs stated in Cases 2, 5, 6, 8, and 9 are all special cases of the generalized beta of the first kind distribution (see Case 4 of Sect. 3.7).□

3.9 Scaling Property

Proposition 3 (The SPBD is closed under scaling) Let the rv X ∼ SPBD(a, b, �, �,A,B ) and let the rv Y = cX , where 0 < c < ∞ , then Y ∼ SPBD(ca, b, �, �,A,B).

Proof Therefore, using (13), we have that;

On the same lines as the proof of Proposition 3, we can prove the following Prop-ositions 4 and 5. □

3.10 Exponentiation Property

Proposition 4 (The SPBD is closed under exponentiation) Let the rv X ∼ SPBD(a, b, �, �,A,B ) and let the rv Y = Xc , where 0 < c < ∞ , then Y ∼ SPBD

(ac,

b

c, �, �,A,B

).

3.11 Reflection Symmetry Property

Proposition 5 Let the rv X ∼ SPBD(a, 1, �, �,A,B) and let the rv Y = a(B + A) − X , then Y ∼ SPBD(a, 1, �, �,A,B).

3.12 Generate SPBD Random Variates

Using result Lemma 2(1), we can generate SPBD(a, b, �, �,A,B) random variates as follows;

1. Generate u ∼ U(0, 1).

2. Compute y = I−1(U, �, �).

3. Set x = a[A + (B − A)y

] 1

b .

FY (y) = P(Y ≤ y) = P(cX ≤ y) = P(X ≤ y

c

)= FX

(yc;a, b, �, �,A,B

)

FY (y) = I

⎛⎜⎜⎜⎝

� y

c

a

�b

− A

B − A;�, �

⎞⎟⎟⎟⎠= I

⎛⎜⎜⎜⎝

�y

ca

�b

− A

B − A;�, �

⎞⎟⎟⎟⎠= FX(y;ca, b, �, �,A,B)

Page 16: A Six Parameters Beta Distribution with Application for

72 Annals of Data Science (2021) 8(1):57–90

1 3

3.13 Order Statistics

Let X1 , X2 , …, Xn be a random sample of size n from SPBD(a, b, �, �,A,B ), with pdf f and CDF F, and let X1∶n , X2∶n , …, Xn∶n be their order statistics, then for i = 1, 2, 3,… , n , the pdf of i-th order statistics Xi∶n , fi∶n(x), is given for by;

Hence, for a𝛼1

b < x < a𝛽1

b , and using the fact that;

we have that;

Then the pdf of the rv Xi∶n , fi∶n , can be written as;

where

3.14 Moments about the Origin

Let k = 1, 2, 3,… , then the moment of the rv X ∼ SPBD(a, b, �, �,A,B) of order k about zero is given by;

fi∶n(x) =

{n!

(i−1)!(n−i)!f (x)[F(x)]i−1[1 − F(x)]n−i, a𝛼

1

b < x < a𝛽1

b

0 otherwise,

[1 − F(x)]n−1 =

n−1∑j=0

(n − 1

j

)(−1)j[F(x)]j

fi∶n(x) =n!

(i − 1)!(n − i)!f (x)[F(x)]i−1

n−1�j=0

�n − 1

j

�(−1)j[F(x)]j

=n!

(i − 1)!(n − i)!f (x)

n−1�j=0

�n − 1

j

�(−1)j

⎡⎢⎢⎢⎣I

⎛⎜⎜⎜⎝

�x

a

�b

− A

B − A;�, �

⎞⎟⎟⎟⎠

⎤⎥⎥⎥⎦

i+j−1

(23)fi∶n(x) = f (x)

i−1�j=0

A(i, j;n)

⎡⎢⎢⎢⎣I

⎛⎜⎜⎜⎝

�x

a

�b

− A

B − A;�, �

⎞⎟⎟⎟⎠

⎤⎥⎥⎥⎦

i+j−1

A(i, j;n) = (−1)jn!

(i − 1)!(n − i − j)!j!.

E(Xk

)=

+∞

∫−∞

xkf (x)dx =

aB1b

∫aA

1b

bxk+p−1

ab(B − A)�+�−1B(�, �)

[(x

a

)b

− A

]�−1[B −

(x

a

)b]�−1

dx

Page 17: A Six Parameters Beta Distribution with Application for

73

1 3

Annals of Data Science (2021) 8(1):57–90

and by using the transformation y =(

x

a

)b

, it reduces to;

from which we have that;

where F(2,1) is the regularized hypergeometric function, Virchenko et  al. [20], defined by;

and (a)m is as defined by (7).Note that, in case that A = 0 then;

3.15 Mean and Variance

Using (24), the mean of X ∼ SPBD(a, b, �, �,A,B) is given by;

And the variance;

Table 2 represents the mean, median, mode and variance of the selected SPBD data sets that are given in Table 1.

3.16 The Moment Generating Function

Similarly, the moment generating function of the rv X ∼ SPBD(a, b, �, �,A,B) , MX(t) , can be found to be;

E(Xk

)=

ak

(B − A)�+�−1B(�, �)

B

∫A

yk

b (y − A)�−1(B − y)�−1dy

(24)E(Xk

)= akA

k

b𝛤 (𝛼 + 𝛽)F(2,1)

(−k

b, 𝛼;𝛼 + 𝛽;1 −

B

A

)

F(2,1)(a, b;c;z) =

∞∑m=0

(a)m(b)m

𝛤 (c + m)m!zi

E(Xk

)= akB

k

b

�(� +

k

b

)� (� + �)

� (�)�(� + � +

k

b

) .

(25)E(X) = aA1

b𝛤 (𝛼 + 𝛽)F(2,1)

(−1

b, 𝛼, 𝛼 + 𝛽;1 −

B

A

)

Var(X) = a2A2

b 𝛤 (𝛼 + 𝛽)

{F(2,1)

(−2

b, 𝛼, 𝛼 + 𝛽;1 −

B

A

)− 𝛤 (𝛼 + 𝛽)

[F(2,1)

(−1

b, 𝛼, 𝛼 + 𝛽;1 −

B

A

)]2}

MX(t) = E(etX

)= 𝛤 (𝛼 + 𝛽)

∞∑i=0

aiAi

b

i!F(2,1)

(−i

b, 𝛼, 𝛼 + 𝛽;1 −

B

A

)ti.

Page 18: A Six Parameters Beta Distribution with Application for

74 Annals of Data Science (2021) 8(1):57–90

1 3

3.17 Harmonic Mean

The harmonic mean of X ∼ SPBD(a, b, �, �,A,B) , on the same lines as that of the moment of X, is given by;

3.18 Incomplete Moments

The k-th incomplete moment of X ∼ SPBD(a, b, �, �,A,B) , I(z, k) , is defined by;

Using the form of the pdf of X given in (15), then;

from which we have that;

E(1

X

)=

𝛤 (𝛼 + 𝛽)

aA1

b

F(2,1)

(1

b, 𝛼, 𝛼 + 𝛽;1 −

B

A

).

I(z, k) =

z

∫−∞

xkf (x)dx

I(z, k) =

z

∫aA

1b

xkbB�−1

(B − A)�+�−1B(�, �)

∞∑i=0

∞∑j=0

C(i, j;a, b, �, �,A,B)xb(�+j−i)−1dx

(26)

I(z, k) =bB�−1

(B − A)�+�−1B(�, �)

∞∑i=0

∞∑j=0

C(i, j;a, b, �, �,A,B)

k + b(� + j − i)

[zk+b(�+j−i) − (aA

1

b )k+b(�+j−i)].

Table 2 Mean, median, mode and variance of the selected SPBD data sets

Data set Variable range Moments

Minimum Maximum Mean Median Mode Variance

1 0 1.8 0.94632 0.955627 0.984679 0.0961452 0.38702 1.546834 0.942554 0.953689 1.01134 0.0698253 0.673388 1.692171 0.969019 0.948343 – 0.0486774 1.17346046 3.143355 1.80987 1.76616 1.60839 0.1350815 1.122462 3.264052 2.36608 2.39793 2.53353 0.2150756 0.261047 2.999082 2.14465 2.31082 – 0.501776

Page 19: A Six Parameters Beta Distribution with Application for

75

1 3

Annals of Data Science (2021) 8(1):57–90

3.19 Mean Deviations

The mean deviation of X ∼ SPBD(a, b, �, �,A,B) about its mean � = E(X) , MD(�) , is given by;

which can be found, Cordeiro et al. [21], to be;

Hence, using (13) and (26), for the rv X ∼ SPBD(a, b, �, �,A,B) , we have that;

where � is given from (25).Similarly, the mean deviation of X about its median m , MD(m) , is given by;

3.20 Probability Weighted Moments

The probability weighted moments of order s and r of X ∼ SPBD(a, b, �, �,A,B) , �s,r , is given by;

Using the fact that;

with the use of (24), we have that;

MD(�) = E|X − �|

MD(�) = 2[�F(�) − I(�, 1)]

MD(�) = 2

⎡⎢⎢⎢⎣�I

⎛⎜⎜⎜⎝

��

a

�b

− A

B − A;�, �

⎞⎟⎟⎟⎠−

bB�−1

(B − A)�+�−1B(�, �)

∞�i=0

∞�j=0

C(i, j;a, b, �, �,A,B)

k + b(� + j − i)

��k+b(�+j−i) − (aA

1

b )k+b(�+j−i)�⎤⎥⎥⎥⎦

MD(m) = aA1

b𝛤 (𝛼 + 𝛽)F(2,1)

�−1

b, 𝛼, 𝛼 + 𝛽;1 −

B

A

�− 2I

⎛⎜⎜⎜⎝

�m

a

�b

− A

B − A;𝛼, 𝛽

⎞⎟⎟⎟⎠.

�s,r = E(Xs[f (X)

]r)

[f (x;a, b, �, �,A,B)

]r=

x(b−1)(r−1)B((� − 1)r + 1, (� − 1)r + 1)

ab(r−1)b(1−r)[B(�, �)]r(B − A)r−1

f (x;a, b, (� − 1)r + 1, (� − 1)r + 1,A,B)

𝜌s,r =br(aA

1

b )s+r(b−1)

b 𝛤 [(𝛼 − 1)(r + 1) + 1]𝛤 [(𝛽 − 1)(r + 1) + 1]

ar−s[B(𝛼, 𝛽)]r+1(B − A)r

F(2,1)

(−s + r(b − 1)

b, (𝛼 − 1)(r + 1) + 1, (r + 1)(𝛼 + 𝛽 − 2) + 2;1 −

B

A

).

Page 20: A Six Parameters Beta Distribution with Application for

76 Annals of Data Science (2021) 8(1):57–90

1 3

3.21 Renyi Entropy

Let us compute the Renyi entropy as a measure of variation of the uncertainty of the rv X ∼ SPBD(a, b, �, �,A,B ). For 𝜃 > 0 such that � ≠ 1 , we have for the rv X ∼ SPBD(a, b, �, �,A,B ) that;

First, we note that;

Therefore,

where Y ∼ SPBD(a, b, (� − 1)� + 1, (� − 1)� + 1,A,B ). It follows, using (24) that;

3.22 Lorenz and Bonferroni Curves

For 0 < 𝜋 < 1 , the Lorenz curve, L(�) , and Bonferroni curves, B(�) , for the rv X ∼ SPBD(a, b, �, �,A,B ), are given by, respectively;

and

where Q(�) is the quantile function of the rv X at � , and I(z, k) is the incomplete moment of the rv X. Therefore, using (25) and (26), we have that;

(27)IX(�) =1

1 − �log

+∞

∫−∞

[f (x)

]�dx

[f (x;a, b, �, �,A,B)

]�=

{bxb−1

ab(B − A)�+�−1B(�, �)

[(x

a

)b

− A

]�−1[B −

(x

a

)b]�−1}�

=x(b−1)(�−1)B((� − 1)� + 1, (� − 1)� + 1)

ab(�−1)b(1−�)[B(�, �)]�(B − A)�−1f (x;a, b, (� − 1)� + 1, (� − 1)� + 1,A,B)

IX(�) =1

1 − �log

{[B((� − 1)� + 1, (� − 1)� + 1)

ab(�−1)b(1−�)[B(�, �)]�(B − A)�−1

]E[Y (b−1)(�−1)

]}

IX(𝜃) =1

1 − 𝜃log

{B((𝛼 − 1)𝜃 + 1, (𝛽 − 1)𝜃 + 1)

a𝜃−1[B(𝛼, 𝛽)]𝜃(B − A)𝜃−1𝛤 [(𝛼 + 𝛽 − 1)𝜃 + 2]

F(2,1)

(−(b − 1)(𝜃 − 1)

b, (𝛼 − 1)𝜃 + 1, (𝛼 − 1)𝜃 + (𝛽 − 1)𝜃 + 2;1 −

B

A

)}.

L(�) =I(Q(�), 1)

B(�) =I(Q(�), 1)

��

Page 21: A Six Parameters Beta Distribution with Application for

77

1 3

Annals of Data Science (2021) 8(1):57–90

And similarly, that;

4 Parameters Estimation of the SPBD

The maximum likelihood estimation (MLE) method will be used for estimat-ing the parameters of the SPBD. Let x1, x2,… , xn be a random sample from SPBD(a, b, �, �,A,B ), as given by (9), then we want to estimates the parameters a, b, �, �,A, andB by maximizing the log-likelihood function, where the likelihood function L = L ( a, b, �, �,A,B;x1, x2,… , xn) can be written as;

Let us inspect the normal equations ��alogL = 0,

�blogL = 0,… ,

�Blog L = 0, to

see if they admit an explicit solution. We have that;

where � is the digamma function, Abramowitz and Stegun [6, p. 258],

L(𝜋) =bB𝛽−1

∑∞

i=0

∑∞

j=0

C(i,j;a,b,𝛼,𝛽,A,B)

b(𝛼+j−i)+1

�(Q(𝜋))b(𝛼+j−i)+1 − (aA

1

b )b(𝛼+j−i)+1�

aA1

b (B − A)𝛼+𝛽−1𝛤 (𝛼)𝛤 (𝛽)F(2,1)

�−

1

b, 𝛼, 𝛼 + 𝛽;1 − B

A

B(𝜋) =

bB𝛽−1∑∞

i=0

∑∞

j=0

C(i,j;a,b,𝛼,𝛽,A,B)

b(𝛼+j−i)+1

�(Q(𝜋))b(𝛼+j−i)+1 −

�aA

1

b

�b(𝛼+j−i)+1�

a𝜋A1

b (B − A)𝛼+𝛽−1𝛤 (𝛼)𝛤 (𝛽)F(2,1)

�−

1

b, 𝛼, 𝛼 + 𝛽;1 − B

A

� .

L =

n∏i=1

f(xi)=

[b

ab(B − A)�+�−1B(�, �)

]n n∏i=1

{xb−1i

[(xia

)b

− A

]�−1[B −

(xia

)b]�−1}

(28)�

�alogL = −(� − 1)

b

a

n∑i=1

(xi

a

)b

(xi

a

)b

− A

+ (� − 1)b

a

n∑i=1

(xi

a

)b

B −(

xi

a

)b−

nb

a= 0,

(29)

�blogL =

n

b+

n∑i=1

log(xi)+ (� − 1)

n∑i=1

(xia

)b

log(xia

)

(xia

)b

− A

− (� − 1)

n∑i=1

(xia

)b

log(xia

)

B −(xia

)b− nlog(a) = 0,

(30)�

��logL =

n∑i=1

log

[(xia

)b

− A

]− nlog(B − A) − n[�(�) − �(� + �)] = 0,

(31)�

��logL =

n∑i=1

log

[B −

(xia

)b]− nlog(B − A) − n[�(�) − �(� + �)] = 0,

Page 22: A Six Parameters Beta Distribution with Application for

78 Annals of Data Science (2021) 8(1):57–90

1 3

and, since aA1

b < x < aB1

b , then the MLE of aA1

b and aB1

b are; respectively, x1∶n and xn∶n ,; that is aA

1

b = x1∶n and aB1

b = xn∶n , and hence;

and

Since Eqs. (28)–(33) are not easy to be solved explicitly, numerical technique, as Newton Rapson method or any other well-known optimization algorithm, see Shi et al. [22], may be employed to do so, or to use a well-known software package, such as maxLik, Henningsen and Toomet [23], or GAMLSS, Stasinopoulos and Rigby [24], to find the MLE of the parameters of the SPBD.

5 A Simulation Study

In order to examine the performance of the MLE method given in Sect. 4, we per-form a simulation study to do so. The bias and the mean squares errors (MSE) of the estimates are the principle measures of the performance.

The statistical software R and the Absoft Pro Fortran compiler are employed for computing. The maxLik package of the statistical software R is used mainly for computing the MLEs, see Henningsen and Toomet [23] for details of this package, while the Absoft Pro Fortran is used for other needed computations.

The six miscellaneous SPBD models given in Table 1, that have different pdf’s shapes and variable ranges, will be used to simulated data sets for each model, and for each data set, the bias and the MSE are computed for the MLE of the model parameters for different simulated sample sizes. The sample sizes that will be taken are 25, 50, 100, 300, 500, and 1000. In each situation, the parameters of, � say, the first model of the six SPBD models given in Table 1, are estimated from 5000 ran-dom variates generated from the given SPBD model, and the sample mean, bias, vari-ance, and the MSE for the parameters are computed as; Mean

����=

1

5000

∑5000

i=1��i =

𝜃 say , Bias (��)= 𝜃 − 𝜃 , Var

�𝜃�=

1

5000

∑5000

i=1

���i −

𝜃�2

,

and hence M SE(𝜃)= Var

(𝜃)+[Bias

(��)]2 . This procedure is repeated for each

sample size, then repeated for each SPBD model.Table 3 shows the bias of the estimated parameters of the different simulated

SPBD data sets for each sample size, while Table  4 presents the MSE of the estimated parameters of the different simulated SPBD data sets for each sample size. Both Tables 3 and 4 show, for each of the SPBD model parameters, that the bias and MSE decreases as the sample size increases. Figure 3 shows the behav-iour of the MSE plots of the estimated parameters for six the SPBD simulated data sets, which shows graphically, for of the SPBD model parameters, that the

(32)A =(x1∶n

a

)b

,

(33)B =(xn∶n

a

)b

Page 23: A Six Parameters Beta Distribution with Application for

79

1 3

Annals of Data Science (2021) 8(1):57–90

Tabl

e 3

The

bia

s of t

he e

stim

ated

par

amet

ers o

f the

sim

ulat

ed S

PBD

dat

a se

ts fo

r eac

h sa

mpl

e si

ze n

nA

ctua

l val

ueB

ias

ab

αβ

AB

ab

𝛼𝛽

AB

251.

82.

31.

43.

90

1−

0.34

878

0.32

9715

0.01

3104

0.39

9618

0.42

0087

0.60

1777

1.5

3.1

0.93

2.65

0.01

51.

10.

1403

42−

0.09

690.

0568

47−

0.05

489

0.46

4818

− 0.

4870

11.

55.

750.

435.

650.

012

− 0.

3780

8−

0.27

911

0.31

5561

− 0.

3024

2−

0.00

807

− 0.

1931

92

1.3

1.6

3.8

0.5

1.8

0.03

9897

− 0.

4132

20.

6123

450.

3669

370.

149

− 0.

1444

42

1.2

2.3

1.8

0.5

1.8

− 0.

1823

1−

0.25

249

0.33

4888

0.19

8619

− 0.

1128

50.

2321

562

0.45

2.15

0.65

0.4

1.2

0.39

6452

0.40

3079

− 0.

3437

20.

3354

51−

0.18

120.

2632

0550

1.8

2.3

1.4

3.9

01

0.51

2223

− 0.

2960

60.

5874

5−

0.58

925

0.45

1566

− 0.

3463

81.

53.

10.

932.

650.

015

1.1

0.41

4662

− 0.

1501

50.

6873

73−

0.73

152

0.29

4003

− 0.

4945

41.

55.

750.

435.

650.

012

0.07

0608

0.45

9072

0.27

8546

− 0.

1225

20.

1022

610.

1563

372

1.3

1.6

3.8

0.5

1.8

0.15

3985

− 0.

4023

− 0.

0833

70.

3812

55−

0.27

72−

0.32

743

21.

22.

31.

80.

51.

80.

1119

670.

1997

920.

1050

13−

0.25

070.

0809

380.

0518

122

0.45

2.15

0.65

0.4

1.2

0.27

5−

0.28

298

− 0.

3670

4−

0.26

281

− 0.

1431

5−

0.26

767

100

1.8

2.3

1.4

3.9

01

− 0.

1052

50.

2273

91−

0.09

405

− 0.

6714

60.

4522

0.26

2165

1.5

3.1

0.93

2.65

0.01

51.

10.

0194

08−

0.24

609

− 0.

1346

40.

0943

210.

0337

52−

0.31

785

1.5

5.75

0.43

5.65

0.01

20.

1535

69−

0.36

599

− 0.

4047

3−

0.06

738

0.19

1963

− 0.

1655

22

1.3

1.6

3.8

0.5

1.8

− 0.

0785

2−

0.21

469

− 0.

0884

20.

0909

490.

1605

29−

0.20

546

21.

22.

31.

80.

51.

80.

4316

350.

2341

70.

0374

470.

2411

97−

0.10

608

0.34

3295

20.

452.

150.

650.

41.

2−

0.33

975

0.16

6262

− 0.

0242

9−

0.45

210.

3344

710.

3604

0430

01.

82.

31.

43.

90

10.

2744

580.

0495

490.

4743

420.

3454

840.

3682

89−

0.24

068

1.5

3.1

0.93

2.65

0.01

51.

10.

4248

180.

0262

34−

0.15

182

0.23

7266

− 0.

4969

90.

3120

691.

55.

750.

435.

650.

012

0.09

1822

0.01

2653

− 0.

3740

5−

0.31

214

− 0.

4161

50.

3226

522

1.3

1.6

3.8

0.5

1.8

− 0.

3407

8−

0.25

521

0.33

6573

0.48

9306

0.16

9098

0.53

6246

21.

22.

31.

80.

51.

8−

0.21

183

0.07

2752

− 0.

1223

1−

0.05

324

− 0.

4737

90.

1069

582

0.45

2.15

0.65

0.4

1.2

− 0.

1578

80.

1779

510.

2086

110.

3527

640.

1535

25−

0.16

458

Page 24: A Six Parameters Beta Distribution with Application for

80 Annals of Data Science (2021) 8(1):57–90

1 3

Tabl

e 3

(con

tinue

d)

nA

ctua

l val

ueB

ias

ab

αβ

AB

ab

𝛼𝛽

AB

500

1.8

2.3

1.4

3.9

01

− 0.

3730

40.

2786

84−

0.31

915

− 0.

1944

10.

4321

840.

3771

81

1.5

3.1

0.93

2.65

0.01

51.

10.

3735

71−

0.40

604

− 0.

3817

8−

0.47

487

0.24

8089

0.21

9282

1.5

5.75

0.43

5.65

0.01

2−

0.02

872

− 0.

3208

20.

3808

35−

0.06

513

− 0.

0599

50.

1434

85

21.

31.

63.

80.

51.

80.

5074

51−

0.11

285

0.15

9177

0.14

7718

0.22

3398

0.35

6879

21.

22.

31.

80.

51.

8−

0.36

108

0.14

2093

0.10

2362

− 0.

3618

7−

0.50

241

0.26

8284

20.

452.

150.

650.

41.

20.

0942

340.

1346

63−

0.22

689

− 0.

2296

0.04

2163

− 0.

0038

610

001.

82.

31.

43.

90

10.

2047

21−

0.32

209

0.25

7236

0.11

0389

0.10

4925

0.21

5005

1.5

3.1

0.93

2.65

0.01

51.

10.

0982

760.

1183

520.

1049

070.

0029

790.

4100

95−

0.38

891

1.5

5.75

0.43

5.65

0.01

2−

0.14

047

0.25

2312

− 0.

4626

90.

1905

93−

0.14

857

0.29

962

21.

31.

63.

80.

51.

80.

1779

880.

0111

11−

0.31

224

− 0.

0495

80.

2272

56−

0.11

569

21.

22.

31.

80.

51.

80.

2589

150.

0157

33−

0.33

35−

0.37

711

− 0.

3599

8−

0.04

232

20.

452.

150.

650.

41.

20.

0600

130.

1712

330.

1379

930.

1160

840.

3895

42−

0.13

421

Page 25: A Six Parameters Beta Distribution with Application for

81

1 3

Annals of Data Science (2021) 8(1):57–90

Tabl

e 4

The

MSE

of t

he e

stim

ated

par

amet

ers o

f the

sim

ulat

ed S

PBD

dat

a se

ts fo

r eac

h sa

mpl

e si

ze n

nA

ctua

l val

ueM

SE

ab

αβ

AB

ab

𝛼𝛽

AB

251.

82.

31.

43.

90

11.

3420

131.

9544

710.

9328

491.

8953

582.

0849

471.

7624

171.

53.

10.

932.

650.

015

1.1

1.40

2194

2.39

6725

0.79

3264

1.62

9038

1.91

1926

2.32

5153

1.5

5.75

0.43

5.65

0.01

21.

4772

811.

8332

931.

4066

342.

2771

181.

5630

661.

6137

32

1.3

1.6

3.8

0.5

1.8

1.64

0697

2.31

3115

0.90

0516

1.30

6511

1.61

811.

9043

782

1.2

2.3

1.8

0.5

1.8

0.94

4398

2.11

7512

1.00

5152

2.53

4064

1.15

113

1.95

4144

20.

452.

150.

650.

41.

21.

3429

382.

3325

131.

4675

291.

0192

391.

2394

751.

4075

7350

1.8

2.3

1.4

3.9

01

1.17

0272

1.65

7704

0.79

9091

1.40

1564

1.75

5979

1.65

4412

1.5

3.1

0.93

2.65

0.01

51.

11.

1329

471.

7432

920.

5982

611.

3092

351.

6814

941.

7769

361.

55.

750.

435.

650.

012

1.32

4648

1.78

1486

1.32

801

2.06

0954

1.01

9974

1.36

1083

21.

31.

63.

80.

51.

81.

3165

21.

9978

40.

7017

761.

1576

321.

3111

151.

1870

632

1.2

2.3

1.8

0.5

1.8

0.78

5298

1.91

0105

0.86

2128

2.28

2315

1.02

1342

1.48

9123

20.

452.

150.

650.

41.

20.

9939

722.

0117

721.

1622

890.

8737

391.

1574

831.

2514

4210

01.

82.

31.

43.

90

10.

7565

391.

2100

550.

6070

760.

9929

141.

4606

461.

3457

881.

53.

10.

932.

650.

015

1.1

0.88

934

1.19

9509

0.40

4822

1.20

1444

1.35

7685

1.20

6554

1.5

5.75

0.43

5.65

0.01

21.

1174

831.

5298

781.

0375

741.

5958

880.

8290

851.

1212

122

1.3

1.6

3.8

0.5

1.8

0.99

8634

1.75

2594

0.53

7625

0.82

8906

1.06

102

0.93

9682

21.

22.

31.

80.

51.

80.

5659

061.

5468

550.

7052

822.

0179

840.

7177

131.

1538

982

0.45

2.15

0.65

0.4

1.2

0.80

5249

1.72

2256

0.77

7865

0.69

6402

0.85

1498

1.01

8501

300

1.8

2.3

1.4

3.9

01

0.54

7782

0.85

2198

0.40

1467

0.64

4066

0.77

3053

1.00

7341

1.5

3.1

0.93

2.65

0.01

51.

10.

7118

530.

7814

240.

2691

560.

7889

360.

9167

830.

8319

531.

55.

750.

435.

650.

012

0.91

0827

1.28

0392

0.93

0746

1.26

0824

0.41

0875

0.78

9677

21.

31.

63.

80.

51.

80.

6090

681.

3874

630.

3854

20.

5643

0.86

5376

0.66

4865

21.

22.

31.

80.

51.

80.

4126

691.

1726

850.

4756

871.

8187

850.

5005

910.

8919

262

0.45

2.15

0.65

0.4

1.2

0.48

3891

1.39

0424

0.57

0112

0.45

5555

0.53

0735

0.69

1226

Page 26: A Six Parameters Beta Distribution with Application for

82 Annals of Data Science (2021) 8(1):57–90

1 3

Tabl

e 4

(con

tinue

d)

nA

ctua

l val

ueM

SE

ab

αβ

AB

ab

𝛼𝛽

AB

500

1.8

2.3

1.4

3.9

01

0.42

7417

0.64

704

0.33

2296

0.43

5204

0.20

0255

0.85

4437

1.5

3.1

0.93

2.65

0.01

51.

10.

4494

410.

4344

710.

1554

220.

5758

850.

6509

760.

6509

11

1.5

5.75

0.43

5.65

0.01

20.

7656

1.07

478

0.70

4428

0.97

0532

0.32

5475

0.61

8804

21.

31.

63.

80.

51.

80.

2726

691.

1993

30.

2419

670.

2715

550.

4188

780.

4229

75

21.

22.

31.

80.

51.

80.

2783

610.

9959

570.

3219

931.

5315

320.

3679

420.

5425

43

20.

452.

150.

650.

41.

20.

1786

670.

8239

10.

3404

920.

2024

940.

3885

720.

4879

8710

001.

82.

31.

43.

90

10.

3548

720.

4154

30.

2136

080.

3044

660.

1016

160.

4723

491.

53.

10.

932.

650.

015

1.1

0.30

1129

0.20

5473

0.08

1764

0.27

106

0.33

213

0.37

6744

1.5

5.75

0.43

5.65

0.01

20.

6457

360.

6480

550.

6582

580.

5855

630.

1487

380.

4255

842

1.3

1.6

3.8

0.5

1.8

0.14

9529

0.62

8928

0.11

3052

0.24

3147

0.30

0697

0.31

1164

21.

22.

31.

80.

51.

80.

1780

810.

7143

940.

1638

71.

2812

510.

1980

190.

2153

852

0.45

2.15

0.65

0.4

1.2

0.05

2826

0.54

7684

0.10

2232

0.05

7344

0.15

9444

0.25

5718

Page 27: A Six Parameters Beta Distribution with Application for

83

1 3

Annals of Data Science (2021) 8(1):57–90

MSE decreases as the sample size increases. Hence, from the result, as the MLS plots decreases as the sample size increases, we may conclude that the MLE method seems to have high efficiency as the sample size become large.

Table 5 shows the actual values and the MLE parameter values (as the aver-age values for the 5000 replications) of the different simulated SPBD data sets, and Fig. 4 shows visually their corresponding pdf’s plots.

In conclusion, the simulation indicates that the MLE method is appropriate and can be used to estimate the parameters of the SPPBD models.

Sample Size Sample Size

Sample Size Sample Size

Sample Size Sample Size

0

0.5

1

1.5

2

0 100 200 300 400 500 600 700 800 90010000

0.5

1

1.5

2

2.5

3

0 100 200 300400500600 7008009001000

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0 100 200 300 400 500 600 700 800 900 1000

0

0.5

1

1.5

2

2.5

3

0 100 200 300 400 500 600 700 800 9001000

0

0.5

1

1.5

2

2.5

0 100 200 300 400 500 600 700 800 900 1000

0

0.5

1

1.5

2

2.5

0 100 200 300 400 500 600 700 800 900 1000

Fig. 3 Behaviour of the MSE plots of the estimated parameters for the SPBD simulated data sets

Page 28: A Six Parameters Beta Distribution with Application for

84 Annals of Data Science (2021) 8(1):57–90

1 3

6 Application of Fitting SPBD Model to Real‑Life Data

We consider two real-life data sets in order to show the usefulness of the proposed estimation procedure to estimate and fit the SPBD model to these real-life data sets. The data sets are;

Data Set 1 Represents the waiting period of Muslim worshipers from the time of entering the mosque till the actual time of starting Alfajir pray (the early morning and first pray of the day) in Al-Mani Jamieh Mosque (Masjid no. 942), where Friday prayers are held and it accommodates more than two thousand worshipers, in Al-Waab town in Doha-Qatar. The data consists of 4539 observations recorded in this masjid for the period from 30th October 2017 till 15th January 2020. We will abbre-viate this data set by main street mosque data.

Data Set 2 Represents the waiting period of Muslim worshipers from the time of entering the mosque till the actual time of starting Alfajir pray in Saeed bin Fahad Al-Dosari Mosque (Masjid no. 1031), where Friday prayers are not held and it accommodates no more than two hundred fifty worshipers, in Al-Waab town in Doha-Qatar. The data consists of 3360 observations recorded in this mosque for the period from 25th January 2015 to 20th October 2017. We will abbreviate this data set by within streets mosque data.

Table 6 presents some statistics of the observed mosque data sets.Using both mosque data sets, the MLE method was employed to estimate the

parameters of the SPBD model for each, and Table 7 shows the actual and the pre-dicted frequencies, model parameters estimates, the Chi squares goodness of fit test for the SPB, the gamma, the exponential, the four parameters beta, and the general-ized beta of the first kind distributions, as well as, the likelihood ratio test (LRT) for

Table 5 Actual and MLE parameters values of the simulated SPBD data sets

Data set

Value Parameters Variable Range

a b α β A B Minimum Maximum

1 Actual 1.8 2.3 1.4 3.9 0 1 0 1.8MLE 1.777778 2.339845 1.373519 3.877778 0.000111 1.01218 0.036285462 1.787000108

2 Actual 1.5 3.1 0.93 2.65 0.015 1.1 0.38702 1.546834MLE 1.435333 3.117889 0.923133 2.587889 0.019188 1.211111 0.403895566 1.526273077

3 Actual 1.5 5.75 0.43 5.65 0.01 2 0.673387689 1.69217121MLE 1.487889 5.855124 0.411889 5.444889 0.010134 2.227889 0.679167201 1.706033177

4 Actual 2 1.3 1.6 3.8 0.5 1.8 1.17346046 3.143355214MLE 1.993333 1.443825 1.473245 3.895556 0.46999 1.933889 1.18159022 3.147485177

5 Actual 2 1.2 2.3 1.8 0.5 1.8 1.122462048 3.264052108MLE 1.879889 1.213113 2.351245 1.778986 0.512333 1.933333 1.083200324 3.236995497

6 Actual 2 0.45 2.15 0.65 0.4 1.2 0.261047095 2.999081861MLE 2.054789 0.467478 2.035899 0.598789 0.397999 1.198758 0.286325174 3.028201867

Page 29: A Six Parameters Beta Distribution with Application for

85

1 3

Annals of Data Science (2021) 8(1):57–90

the nested models of the SPB distribution, namely; the four parameters beta, and the generalized beta of the first kind distributions. Figure 5, illustrating the histograms and the fitted pdfs for both main and within street mosque data sets. Now, for the main street data set case, since the p values of Chi squares goodness of fit test for the gamma, the exponential, the four parameters beta, and the generalized beta of the first kind distributions, is smaller than 0.05, and that the p value of the SPBD model equals to 0.9488, the SPBD performs better than all these distributions. Although, for the within street mosque data set, the Chi squares goodness of fit test p value of the generalized beta of the first kind distribution equals to 0.23087 inducting that this distribution can fit this data, the SPBD model perform better in this case since its p value equals to 0.96088, and since the p values of Chi squares goodness of fit test for the gamma, the exponential, and the four parameters beta, is smaller than 0.05, the SPBD performs better than all these distributions also. Next, the p val-ues of the likelihood ratio test (LRT) for the nested models of the SPB distribution,

Actual Predicated

0.5 1.0 1.5x

0.2

0.4

0.6

0.8

1.0

1.2f x

0.6 0.8 1.0 1.2 1.4x

0.2

0.4

0.6

0.8

1.0

1.2

f x

0.8 1.0 1.2 1.4 1.6x

0.5

1.0

1.5

2.0

f x

1.5 2.0 2.5 3.0x

0.2

0.4

0.6

0.8

1.0

f x

1.5 2.0 2.5 3.0x

0.2

0.4

0.6

f x

0.5 1.0 1.5 2.0 2.5 3.0x

0.2

0.4

0.6

0.8

1.0f x

Data Set 1 Data Set 2

Data Set 4Data Set 3

Data Set 6Data Set 5

Fig. 4 Plots of the actual and simulated SPBD pdf’s

Page 30: A Six Parameters Beta Distribution with Application for

86 Annals of Data Science (2021) 8(1):57–90

1 3

namely; the four parameters beta, and the generalized beta of the first kind distribu-tions, are less than 0.05, indicating statistically, that SPBD preforms better, in both main and within street data sets. These finding indicates that the SPBD outperforms the gamma, exponential, the four parameters beta, and the generalized beta of the first kind distributions and provides the best fit for both main and within mosque data sets.

7 Summary

A new six parameters beta distribution is introduced, which has a more flexible shape and a wide bounded domain than the than the two (standard) and the four parameters beta distributions, and its properties consisting of, and some of its different various shapes are given to show its flexibility. Its boundaries, limits, mode, quantities, reli-ability and hazard functions, Renyi entropy, Lorenz and Bonferroni curves are studied. This distribution is closed under scaling and exponentiation, and has reflection sym-metry property, and has some well-known distributions as special cases, such as, the two and four parameters beta, generalized modification of the Kumaraswamy, general-ized beta of the first kind, the power function, Kumaraswamy power function, Mini-max, exponentiated Pareto, and the generalized uniform distributions. Its order statis-tics, moment generating function, with its moments consisting of the mean, variance, moments about the origin, harmonic, incomplete, probability weighted moments, and mean deviations are derived. The maximum likelihood estimation method is used

Table 6 Some statistics of the observed mosque data sets

Statistics Observed

Main Within

No. of observation 4539 3360Mean 7.0986 5.2372Standard error of mean 0.08194 0.07554Median 5.65258 4.4779Mode 0.685519 0.6196016SD 5.52032 4.37859Variance 19.172 30.474Skewness 0.706 1.162Standard error of skewness 0.036 0.042Kurtosis − 0.378 1.161Standard error of kurtosis 0.073 0.084Minimum 0.07 0.08Maximum 24.9 24.9Percentiles25 2.44012 1.4632150 5.5 4.477975 10.64123 7.60603

Page 31: A Six Parameters Beta Distribution with Application for

87

1 3

Annals of Data Science (2021) 8(1):57–90

Tabl

e 7

Obs

erve

d an

d pr

edic

ted

freq

uenc

ies,

mod

el p

aram

eter

s esti

mat

es a

nd g

oodn

ess o

f fit f

or m

osqu

e da

ta se

ts

Dat

a ra

nge

Mai

n str

eets

mos

que

With

in st

reet

s mos

que

Obs

erve

dPr

edic

ted

Obs

erve

dPr

edic

ted

Prop

osed

6

para

met

ers

beta

Gam

ma

Expo

nen-

tial

4 pa

ram

-et

ers b

eta

Gen

eral

-iz

ed B

ata

of th

e fir

st ki

nd

Prop

osed

6

para

met

ers

beta

Gam

ma

Expo

nen-

tial

4 pa

ram

-et

ers b

eta

Gen

eral

ized

B

ata

of th

e fir

st ki

nd

0.0–

1.0

599

589

259

596

699

533

523

517

380

583

611

536

1.1–

2.0

405

412

417

518

451

426

430

428

465

482

416

430

2.1–

3.0

373

362

460

450

381

375

372

374

441

398

346

367

3.1–

4.0

323

330

453

391

335

339

330

329

388

329

297

317

4.1–

5.0

310

306

423

339

300

310

285

288

329

272

257

276

5.1–

6.0

291

284

382

295

272

285

250

251

273

224

224

240

6.1–

7.0

260

265

337

256

247

263

215

217

223

185

196

208

7.1–

8.0

243

247

294

222

226

242

182

186

181

153

171

179

8.1–

9.0

219

229

253

193

207

223

160

158

145

127

148

154

9.1–

10.0

221

212

215

168

189

205

129

133

116

105

128

131

10.1

–11.

018

819

518

214

617

218

811

011

192

8611

011

111

.1–1

2.0

185

178

153

126

157

172

8691

7371

9493

12.1

–13.

015

116

212

811

014

215

675

7358

5980

7713

.1–1

4.0

143

145

107

9512

814

151

5845

4967

6214

.1–1

5.0

135

128

8983

115

126

4945

3540

5550

15.1

–16.

010

811

274

7210

211

236

3428

3344

3916

.1–1

6.0

9996

6162

9099

2625

2227

3530

17.1

–17.

071

8150

5478

8518

1717

2327

2218

.1–1

8.0

6866

4147

6772

1311

1319

2016

19.1

–20.

049

5234

4156

608

710

1514

10

Page 32: A Six Parameters Beta Distribution with Application for

88 Annals of Data Science (2021) 8(1):57–90

1 3

Tabl

e 7

(con

tinue

d)

Dat

a ra

nge

Mai

n str

eets

mos

que

With

in st

reet

s mos

que

Obs

erve

dPr

edic

ted

Obs

erve

dPr

edic

ted

Prop

osed

6

para

met

ers

beta

Gam

ma

Expo

nen-

tial

4 pa

ram

-et

ers b

eta

Gen

eral

-iz

ed B

ata

of th

e fir

st ki

nd

Prop

osed

6

para

met

ers

beta

Gam

ma

Expo

nen-

tial

4 pa

ram

-et

ers b

eta

Gen

eral

ized

B

ata

of th

e fir

st ki

nd

20.1

–21.

044

3828

3645

485

48

1310

621

.1–2

2.0

2426

2331

3536

32

611

63

22.1

–23.

018

1619

2725

252

15

93

223

.1–2

4.0

87

1523

1514

10

47

11

24.1

–25.

04

142

158

54

10

340

00

Tota

l45

3945

3945

3945

3945

3945

3933

6033

6033

6033

6033

6033

60M

odel

pa

ram

eter

sa

25.0

1238

� =

0.21

3�

= 0.

141

124

.971

11a

24.2

0246

� =

0.27

3�

= 0.

191

124

.511

778

b2.

0234

41𝛽

= 1.

643

10.

9999

99b

1.15

9141

𝛽 =

1.43

11

0.99

9999

𝛼0.

3814

510.

7533

0.89

295

α0.

7994

510.

8094

50.

9155

56𝛽

2.61

2456

2.00

145

2.23

9999

β3.

9012

452.

9782

53.

3177

8

A0.

0000

120

0A

0.00

0132

00

B0.

9989

1225

1B

1.03

8113

251

Goo

dnes

s of

Fit

�2

8.71

4173

6.85

416.

134

53.1

875

48.2

2419

�2

6.21

1310

1.44

111

2.32

452

.473

319

.767

7df

17*

21*

2320

*18

*df

14*

21*

2317

*16

*p

valu

e0.

9488

0.0

0.0

0.00

008

0.00

0141

p va

lue

0.96

088

0.0

0.0

0.00

0017

0.23

087

Like

lihoo

d ra

tio te

st (n

este

d)**

LRT

10.3

345

8.76

259

LRT

20.5

047

7.61

264

df2

2df

22

p va

lue

0.00

560.

0125

1p

valu

e0.

0000

40.

0222

2

*The

num

ber o

f int

erna

ls w

ere

adju

sted

in o

rder

to m

ake

the

expe

cted

num

ber o

f obs

erva

tions

in e

ach

inte

rval

equ

al to

or g

reat

er th

an 5

, whi

ch is

in te

rn e

ffect

ed th

e nu

m-

ber o

f the

deg

ree

of th

e fr

eedo

m**

The

4 pa

ram

eter

s bet

a di

strib

utio

n an

d th

e ge

nera

lized

bet

a of

the

first

kind

dist

ribut

ion

are

spec

ial c

ases

of t

he S

PBD

, see

Sec

t. 3.

7 ca

ses 1

and

4

Page 33: A Six Parameters Beta Distribution with Application for

89

1 3

Annals of Data Science (2021) 8(1):57–90

for estimating its parameters and applied to estimate the parameters of six different simulated data sets of this distribution having different pdf shapes, in order to check the performance of the estimation method through the estimated parameters mean squares errors computed from different simulated sample sizes, which are shown to be decreasing as the sample size increases, indicating that the MLE method is appropriate and can be used to estimate the parameters of the SPPBD models. Finally, two real life data sets, represent the waiting period of Muslim worshipers from the time of entering the mosque till the actual time of starting Alfajir pray in two different mosques, are used in order to show the usefulness and the flexibility of this distribution in appli-cation to real-life data sets. The MLE method was employed using these data set to estimate the parameters of the SPBD, the gamma, the exponential, the four parameters beta, and the generalized beta of the first kind distributions, and the Chi squares good-ness of fit test for these distributions, as well as, the LRT for the nested models of the SPB distribution, namely; the four parameters beta, and the generalized beta of the first kind distributions, were employed, and all the results through the p values of these tests, statistically, outperforms SPBDs over the other stated distributions.

Acknowledgements Open Access funding provided by the Qatar National Library. The publication of this article was funded by the Qatar National Library.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Com-mons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.

References

1. Sheskin DJ (2011) Handbook of parametric and nonparametric statistical procedures, 5th edn. Chapman and Hall, New York

Main Within

6 Para Beta Gamma Exponential 4 Para Beta G Beta Type I

5 10 15 20 25x0.00

0.05

0.10

0.15

f x

5 10 15 20 25x0.00

0.05

0.10

0.15

0.20

f x

Fig. 5 Histograms and the fitted pdfs for the Mosque data sets

Page 34: A Six Parameters Beta Distribution with Application for

90 Annals of Data Science (2021) 8(1):57–90

1 3

2. Shi Y (2014) Big data: history, current status, and challenges going forward. Bridge US Natl Acad Eng 44(4):6–11

3. Olson D, Shi Y (2007) Introduction to business data mining. McGraw-Hill, New York 4. Johnson NL, Kemp AW, Balakrishnan N (1995) Continuous univariate distributions, vol 2, 2nd edn.

Wiley, New York 5. Johnson NL, Kemp AW, Balakrishnan N (1995) Continuous univariate distributions, vol 1, 2nd edn.

Wiley, New York 6. Abramowitz M, Stegun IA (2013) Handbook of mathematical functions with formulas, graphs, and

mathematical tables. Dover, New York 7. Armero C, Bayarri MJ (1994) Prior assessments for prediction in queues. Statistician 43(1):139–153 8. Gordy MB (1998) Computationally convenient distributional assumptions for common-value auc-

tions. Comput Econ 12:61–78. https ://doi.org/10.1023/A:10086 45531 911 9. Pathan MA, Garg M, Agrawal J (2008) On a new generalized beta distribution. East West J Math

10(1):45–55 10. Srivastava HM, Manocha HL (1984) A treatise on generating functions. Ellis Horwood Ltd/Wiley,

Chichester/New York 11. Ng DWW, Koh SK, Sim SZ, Lee MC (2018) The study of properties on generalized Beta distribu-

tion. J Phys Conf Ser. https ://doi.org/10.1088/1742-6596/1132/1/01208 0 12. Gómez-Déniz E, Sarabia JM (2018) A family of generalised beta distributions: properties and appli-

cations. Ann Data Sci 5:401–420 13. Alshkaki RSA (2020) A generalized modification of the Kumaraswamy distribution for modeling

and analyzing real-life data. Stat Optim Inf Comput J 14. Kumaraswamy P (1980) A generalized probability density function for double-bounded random

processes. J Hydrol 46(1–2):79–88. https ://doi.org/10.1016/0022-1694(80)90036 -0 15. McDonald JB (1984) Some generalized functions for the size distribution of income. Econometrica

52:7–664 16. Abdul-Moniem IB (2017) The Kumaraswamy power function distribution. J Stat Appl Probab Lett

6(1):81–90 17. Gupta RC, Gupta PI, Gupta RD (1998) Modeling failure time data by Lehmann alternatives. Com-

mun Stat Theory Methods 27:887–904 18. Tiwari RC, Yang Y, Zalkikar JN (1996) Bayes estimation for the Pareto failure model using Gibbs

sampling. IEEE Trans Reliab 45(3):471–476 19. Forbes C, Evans M, Hastings N, Peacock B (2011) Statistical distributions, 4th edn. Wiley, New

York 20. Virchenko N, Kalla S, Al-Zamel A (2001) Some results on a generalized hypergeometric function.

Integral Transforms Spec Funct 12(1):89–100. https ://doi.org/10.1080/10652 46010 88193 36 21. Cordeiro GM, Nadarajah S, Ortega EMM (2012) The Kumaraswamy Gumbel distribution. Stat

Methods Appl 21(2):139–168. https ://doi.org/10.1007/s1026 0-011-0183-y 22. Shi Y, Tian YJ, Kou G, Peng Y, Li JP (2011) Optimization based data mining: theory and applica-

tions. Springer, Berlin 23. Henningsen A, Toomet O (2011) maxLik: a package for maximum likelihood estimation in R. Com-

put Stat 26(3):443–458. https ://doi.org/10.1007/s0018 0-010-0217-1 24. Stasinopoulos DM, Rigby RA (2008) Generalized additive models for location scale and shape

(GAMLSS) in R. J Stat Softw 23(2008):1–46. https ://doi.org/10.18637 /jss.v023.i07

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.